Depending on the amount of data you have and the number of variables you want to focus on in your analysis, cluster analysis may be the perfect tool for you. Especially when it comes to improving your user experience, cluster analysis can help you discover the personas of people you are trying to market to by pulling in more data and creating an unbiased picture of your digital property. Simply put, it’s a way to discover insights from your data based on groups that are formed.


An Introduction to Cluster Analysis

So how is cluster analysis different from segmenting? Think of segmenting as an umbrella term, with cluster analysis as one of many ways to segment your data. Cluster analysis is a tool that can segment your users based on behavior/tendencies, interests, age range, and more. The algorithm used is based on machine learning principles, but you have the option to define how many groups you want. Depending on the method, you can pick a number of desired clusters, or let the algorithm figure out how many distinct groups you have within your data. The image below is an example of having three distinct groups.  


Cluster analysis example

Image Source: GeeksforGeeks


In this image, you can clearly see where your three clusters are without incorporating an algorithm. However, data is usually not this simple, leading to the need for cluster analysis.


Choosing Your Cluster Algorithm

When trying to implement cluster analysis, my advice is to, if possible, pick a specific number of groups while keeping in-mind your business goals. It’ll take much less time to run the analysis instead of letting the algorithm pick the number for you. What the algorithm does is measure the distance between the dots and takes averages of those distances between the groups. Cluster analysis tries to minimize the distances between dots within a group, but also maximize the distance between dots in different clusters. This example is a closer representation of what your data may look like after a clustering algorithm is applied and visualized.


Clustering algorithm example

Image Source: GeeksforGeeks


If there is no precedent for a “normal” number of groups, it may be good to take the extra time to explore the appropriate number through cluster analysis. Keep in mind that more clusters does not mean better data. You don’t want to make the descriptions of your data too granular by having 100 different clusters. That much information would render your clusters completely useless. You want your own cluster analysis to spit out specific groups that you can use to build better analyses and a better product for a variety of users.  


Choosing Your Variables

Before you start running data through your clustering algorithm, you need to choose the variables you would like the clusters to be based on. Ask yourself what kind of people are using your app or buying your product. What variables describe the differences in you users? Just a few common variables to use would be device type, age-range, customer tenure, average time of visit, and acquisition channel. You could also pull in conversion rate to add a user-performance aspect to your clusters.


It is often helpful to run a principal component analysis to see which variables are most important in describing the variation in your users. Be careful about using too many variables, as this will often cause convoluted clusters. I won’t go into principal component analysis in this article, but it is very helpful to identify the most important variables to pull for a cluster analysis.


Here’s an example that shows how using only two variables in your cluster analysis can lead to insightful clusters from a mess of data.


Choosing cluster variables
Image Source: PLOS ONE


After you have pulled user data with the variables describing them, it’s finally time for you to run your dataset through your chosen cluster analysis algorithm.


Characterizing Your Clusters

Each of the clusters that are output from the algorithm describes your users as different types of users. Your next step is to identify how each cluster is characterizing in your data. Do the clusters heavily correlate with certain age ranges? What about marketing channels? Do signed-in/registered users show up more in a certain cluster? After seeing how your different clusters correlate to different aspects of your data, you will be able to see the characterization of each data point. It is often helpful (and fun!) to name your clusters based on their characterizations.


Applying Your Clusters

Once you have your groups and you’ve characterized them, you can start to look at the tendencies or behavior patterns of those specific groups. These patterns will begin to reveal what these groups like, don’t like, and help you discover how to change your website or digital experience based on these segments. Does the cluster that represents young late-night users have a higher or lower average check size? Does the cluster that represents mothers of families over or under index with registration rate?


Once you see how the different types of users perform across your product, you will be able to think about how to optimize your strategy for your different user groups. A good example is if you have a segment of elderly customers that are typically reached through email. Maybe make your email advertising a little more user-friendly for that kind of group, or make links that are better-suited for desktop users.



Since there are so many ways to segment your data, you’re probably wondering why cluster analysis would be a good option to use for your own analysis. With cluster analysis, you get to pull in more data and create a more data-driven and unbiased picture of your digital property. If you had to dig through all this data and try to group data points based on the information you have, it would be nearly impossible and extremely time-consuming. By running your data through an algorithm instead, it’ll put that data into a spatial organization that properly organizes it. Now that your data is organized into segments, you’ll be able to leverage this new knowledge to improve the user experience of your digital property.


Are you ready to connect the dots of your digital experience? Contact us today, and we can help guide you on how to do cluster analysis to efficiently segment your data.

About the author

Parker Smith

Parker Smith
Parker is a data analyst who is energized by creating collaborative solutions for complex problems that clients face. He is constantly in awe of the power of data, how it helps us learn and change, and how it impacts the way our world works.

eBook: Understand Your Customers

Cognetik eBook: Guide to User Journey Analysis


Related Articles