Data Scientists as Storytellers
Data science is commonly defined by the ability to extract insights from data. Under this definition, data engineers play a crucial role in transforming and delivering usable data, thereby enabling data scientists, who are primarily responsible for generating insights (Figure 0).
A more holistic definition replaces “insights” with actionable insights, making explicit that we want information to make better decisions (Figure 1).
In this simplified world, where each step in the lifecycle is taken care of by a specific role, who should take ownership of the last step? Not so long ago, at the height of the data hype, a new role was created, ushering in the era of data translators.
The data translator is responsible for making sense of the needs from the business side and transforming them into something that can be tackled with the data scientist’s toolkit. Moreover, once the data scientist does their magic, they go back to the business side and, ta da!, provide actionable insights and get stakeholder buy-in!
Many companies fell for this new buzz word, and were quick to hire data translators (usually provided by a big consulting firm). But the truth of the matter is that most companies can’t afford hiring this new role, even if available, because the business case doesn’t add up.
For this reason, in Data Science: The Hard Parts (DSHP) I push forward the idea that data scientists ought to own the end-to-end lifecycle (excluding the data engineering part, if the organization permits). And to become better end-to-end data scientists, they out to become better storytellers.
A holistic view of storytelling
A common misconception among non-data practitioners is the belief that merely spending enough time with the data will automatically lead to the emergence of insights. The truth of the matter is that data alone conveys no useful information. Put differently, there are so many plausible stories in one dataset that without putting more structure, chaos quickly emerges.
Where does this structure come from? Usually from our knowledge of the world, and more specifically, our knowledge of the business and the problem at hand. This is why data scientists ought to become as knowledgeable about the business as their business stakeholders. Actually, the analytical toolkit allows the data scientist to go even further and decompose the business in ways that the business stakeholder cannot even think of.
“Data scientists must know the business as well as their business stakeholders (or better).”
So where does storytelling come into the picture? In DSHP, I propose that storytelling in data comes in two flavors: the better-known form is what I call ex-post storytelling. The lesser-known form is ex-ante storytelling.
Ex-post storytelling
Once the data scientist finishes with their analysis or model, it’s time to start wearing the sales-person hat. Why should your stakeholders believe in your results, but most importantly, why should they do anything about it? This is ex-post storytelling at its maximum.
How do you become a better ex-post storyteller? Fortunately there’s plenty of material out there, and in DSHP I walk you through what I think are the most important skills you need to acquire (I also provide references to material I’ve found useful). These include things like:
Data visualization
Interpretability in machine learning
Effective communication strategies
While the sales-person metaphor is useful, be careful not to overuse it. Ex-post storytelling with data places some natural restrictions to ensure that, above all, your pitch is credible and consistent.
Ex-ante storytelling
Film director and screenwriter, Jean Luc Godard, once said, “sometimes reality is too complex. Stories give it form.” Needless to say, Godard was not talking about data science. Nonetheless, it’s in this precise sense that ex-ante storytelling works.
Data scientists are trying to solve business questions, and to make sense of rather complex phenomena. Why is a product not growing? Why are customers churning? Why did last quarter’s revenues decelerate? Who is trying to defraud us? All of these questions are terribly hard to answer. An approach is to tackle them one hypothesis at a time. These hypotheses are stories, and these stories guide the discovery process for scientists and data scientists alike.
To be sure, ex-ante storytelling is more like ex-ante/interim storytelling, as this is usually an iterative process where you start with one or several hypotheses that are judged in the light of evidence. Most of the time, different types of evidence come from the same dataset. What changes is how you look at the data, and what types of questions you ask from it. Having a rich toolkit is paramount for success in data science.
Can you learn ex-ante storytelling? Naturally, the answer is positive. What does it take to become better at it? I’d say it’s a combination of several things:1
If you already have a scientific mindset, you’re halfway there. You have already mastered the process of coming up with hypotheses that ought to be contrasted against evidence. I don’t say this lightly, it takes time and effort to internalize the scientific method.
If not, you need to sharpen your curiosity. Scientists start by asking questions like “why is this happening”? Fortunately, humans are inherently curious, we just forget about it as we grow up. Don’t feel ashamed of asking questions. On the contrary, embrace your ignorance.
You have to become comfortable at simplifying problems. There will never be one simple answer that explains away all of the variance. So we must necessarily simplify away some complexity. A trick that often works is to focus on first-order effects first. Relatedly, focus on the averages, and tackle the tales of the distribution later on. For instance, begin by putting yourself in the shoes of your “average” customer. This can take you very far.
Summing it all up
Becoming better storytellers should benefit anyone, data scientist or not. But because of the nature of the job, data scientists can benefit even more by further developing this skill. Data scientists are first and foremost scientists. They come up with stories that aim at explaining complex phenomena, contrast these hypotheses, and then need to communicate their findings in such a way as to drive actions within an organization. Each of these steps require storytelling skills. The good news is that we can all learn to become better storytellers.
If you want to go in detail, you can check chapters 3-6 of Analytical Skills for AI and Data Science.