TL;DR
In this post I argue that most of the time you can deliver faster and better results, by using supervised machine learning techniques instead of segmentation methods.
Segmentation aims at finding homogenous, interpretable and actionable groups of customers.
If you mostly care about impact, you can trade off interpretability with predictive performance, and this is most easily done with supervised machine learning.
Unsupervised ML may be sexy, but it’s almost always the case that, if available, a supervised alternative is superior.
Introduction
Some days ago I was discussing with a colleague their approach towards a common problem we were both facing. While they used some hard-core segmentation techniques, I decided to skip that step and train a classification model.. In this post I’ll make the case that you can (almost) always skip the segmentation part, and use some supervised machine learning (ML) instead.
What is customer segmentation
According to one book on the topic, “(c)ustomer segmentation is the process of dividing customers into distinct, meaningful, and homogenous subgroups based on various attributes and characteristics. It is used as a differentiation marketing tool. It enables organizations to understand their customers and build differentiated strategies, tailored to their characteristics.”
This definition encapsulates the key properties of segmentation analysis:
Partition: the customer base is split into groups
Interpretable: each subgroup can be differentiated by one or more behavioral characteristics (e.g. health enthusiasts, tech savvy users, value seekers, etc.)
Actionable: these characteristics allow the marketing team to design structured customer journeys where specific levers are applied at specific moments.
To be sure, the split need not be a partition in the mathematical sense, since a customer may belong to one or more segments, but some of the methods in the toolkit actually aim at creating mutually exclusive groups.
Before moving on, compare this to the concept of customer personalization, which lies at the core of the data-driven revolution, and sometimes described with the motto of “selling the right product, to the right person, at the right time”.1
At a minimum, personalization sounds more ambitious and practical: the focus is clearly on impact, and though it doesn’t reject the idea that customers may be similar to each other, it doesn’t require it either. Also, interpretability is no longer required. It’s in this sense that personalization is also more pragmatic: we only require from the methods that they provide actions at the individual level. Storytelling, or any other desirable outcome achieved through interpretability can be tackled elsewhere.2
Methods for customer segmentation can be sexy
The simplest way to find segments is by setting some predefined rules to partition the customer base. Typical usage applies thresholds on demographic data, such as “males between 15 and 35 years old”. While some of these groupings may sound more natural than others, in practice, most of them are arbitrary.
Unsupervised machine learning can help get rid of such arbitrariness, and at the same time, help practitioners find relatively homogenous and interpretable groups. Clustering algorithms, such as K-means or DBSCAN, are commonly used for such purposes, but dimensionality reduction techniques can also help identify similarities, even if they don’t provide explicit segments themselves (e.g. principal components or T-SNE).
Unsupervised ML techniques can be sexy since they help practitioners find order where only chaos is apparent. This is automation and data-driveness at its best: select some data, pass it on to an algorithm, let it run its course, and then go and cash out.
If you’re like me, you might’ve spent hours, even days, running and rerunning your scripts to find interpretable segments, or even better, unrecognized groups that can deliver instant value for the company. Also, if you’re like me, you may have fallen disappointed because rarely is such a promise delivered in practice.
While I can’t prove that you won’t deliver value with this approach, I do feel that it’s such a high-effort and low-probability scenario, that data scientists can do better otherwise. The path I’ve been pitching since I wrote Analytical Skills for AI and Data Science is to start by asking the right business questions, and only approach the data when these are well-defined and can thus help you guide your analysis.3
Supervised ML comes to the rescue
As I said earlier, if you just care about impact, most of the time you can just skip the segmentation stage, and let your supervised ML tell you who to target. The typical scenario goes like this:
Your business stakeholder wants to impact some metric (i.e. cross-selling, onboard new customers,etc.)
She tells you to perform some segmentation analysis so that she can direct the campaigns to some of these segments.
She even tells you how to segment the data.
I’ve encountered similar scenarios many times, and I always approach my stakeholder with a set of questions:4
Question 1: So you want to increase your metric X?
Question 2: You think that attributes a,b and c help explain why some customers have better X?
Question 3: Do you mind if I use an alternative approach, where I can ensure that (i) your insights about the attributes will be taken into consideration, and (ii) I will deliver better performance with respect to metric X?
Question 4 (if the answer to Q3 is negative): What if I do both and test them?
As you may have also experienced, this approach never fails to deliver results. Even the worst supervised ML models that I’ve seen at the workplace end up being superior to the traditional segmentation analysis. I won’t claim that supervised ML is easy (but it gets substantially easier with experience). You still need to come up with hypotheses, spend quite a bit of time doing data wrangling and feature engineering, but then you can quickly iterate. I’ve literally come up with suboptimal ML solutions in just one or two days of work. The key is to deliver quick results and iterate.
How does this work? Typically you already have some data, let’s call it Y, that will help guide your algorithms. Y is many times binary (i.e. can only take two values, such as 0/1), for example denoting whether a campaign was successful or not with a specific customer. Other times it can be continuous (e.g. monthly spend). The important thing is that this has been tried in the past. Your task as a data scientist is to find features X that help predict Y. That’s it.
Notice that this supervised ML approach doesn’t require customer segments. If the algorithm is good (and we have very powerful algorithms in our toolkits) it will take care of similarities under the hood. You may want to understand why it works, but then you can use your interpretability toolkit and try to open the black box. It won’t be perfect, but it would be good enough so that you can create powerful narratives and tell a story to your stakeholder, and check if things “make sense”.
What happens if this is a new product?
If you don’t have a Y to guide your algorithm, does it make sense to segment your data? A fair answer is that it may make sense. But I would further qualify this answer by saying that the preferred approach in this scenario doesn’t look like the segmentation strategy described before.
Here I’d recommend starting with hypotheses like “I think that feature A will work with customer persona X, because… “. Notice that this again sounds like supervised learning: you have some ideas that guide your analysis and you test them. The idea is to collect information as quickly as possible so that you can make better decisions. Strong hypotheses are great for information collection purposes. If you don’t have strong hypotheses, it might be better to go back to the drawing board though.
What next?
My suggestion is that the next time you feel the urge to do some segmentation analysis — or the next time your stakeholder asks for it – you first consider any alternative supervised ML approach. In my experience, it's (almost) never the case that segmentation will provide fast-enough impact. I’m not saying that there’s no place for segmentation methods in the data science toolkit. My claim is that top performing data science teams should focus on impact first and foremost. If some algorithm helps you deliver value fast enough, go ahead and use it.
See Chapter 1 of Analytical Skills in AI and Data Science.
In Chapters 7 and 13 in Data Science: The Hard Parts I discuss how interpretability helps building robust narratives.
In particular, see Chapter 3.
Approaching your stakeholder with questions, rather than solutions, can help bridge a communications gap that many times can be difficult to overcome.