Looking for data science? Or are you really looking for data astrology?
Data science is a rigorous process of question-forming, analysis, and iterative learning. It usually takes a long time and leaves us with as many follow-up questions as it does answers.
Data astrology, on the other hand, is simple. As long as you know what you want to predict, you can just run the data through whatever statistical method you choose, like swirling around so many tea leaves in a teacup, and you’ll get an answer.
All you need is a data set, some kind of software and someone to push the virtual buttons in the software and keep tweaking things until it finally runs without errors. Data astrology is the heedless application of statistics and analytical methods to data sets to get an answer to a fuzzy question.
Why Data Astrology?
When you’re looking around for answers in your data, it’s easy to get lost in a sea of options. Do we need analytics? Data science? Statistics? Should I just hire a better accountant?
But wait, what is this? An application that will give me all the answers I need? And I don’t even have to know the questions!
It’s almost magical!
Sometimes, it’s very tempting to look outside for an answer that will silence all the competing voices in the room. Data astrology promises that simple answer relatively quickly. But there are pitfalls to data astrology.
Pitfalls of Data Astrology
I asked our data science team and associates about some of the pitfalls they saw in data astrology, and here were their answers:
Pitfall #1: All Answers Are Not Created Equal (Karla Slenkamp)
If you want to get an answer, any answer, data astrology works just fine. There is a formula and you will get a number, maybe even many numbers, at the end. But those might not answer questions you actually have.
For example, you might find that there is an uncanny relationship (never mind what kind of relationship) between people who like country music, and people who bought bagel slicers on Amazon. Who knew? (Who cares?)
Just as frustrating is that your crystal ball spits out something you already kind of knew. At that point, your data astrology software begins to look like a junior office assistant. For example, did you know that current income is a great predictor of future income? Congratulations, you’re smarter than a machine learning program running a naïve Bayes algorithm on a data dump.
(That last sentence was supposed to be a joke. If you want to know why it’s funny, ask a data scientist.)
Data astrologists choose the model that produces an answer, not the model that makes sense according to what we know about the real world.
Pitfall #2: The Right Answer to the Wrong Question (Tessa Jones)
Some software to assist data scientists is great for solving simple modeling problems in predictive analytics.
But what if you need to figure out how a system changes over time (time series analysis)? Data astrology deals in simple formulae. Here is what is, here are your predictors, here is your target, voila. Here’s your answer. A data processing tool doesn’t know which variables relate to time, and how to interpret those, especially not with the myriad of data formatting options.
What if you need to optimize the way you treat your data, to find more meaningful answers? Data astrology comes pre-packaged. It won’t optimize itself. (At least, not yet…)
What if you need more detail? Data astrology looks at the data given and doesn’t demand further detail. “Mercury is here, the moon is here, Andromeda is moving into the House of the Rising Sun, so it looks like revenue is going to stagnate in Q3 unless you drive really hard in the individual consumer market.”
But suppose you want to know what effect seasonality will have on Q3 revenue. You can’t add that to the model without thinking through a whole other set of questions and technical issues.
Data astrology applies formulae, built-in assumptions, and software to problem solving. That’s great for a known problem, but not great for building the most robust, meaningful interpretations of data. It’s also not great for evolving as a business.
Pitfall #3: Correlation Is Not Causation and You Guys Are Supposed to Know This Already (Jamie Kroll)
A data scientist will likely present complex results, with varying levels of certainty. The scientist will validate the assumptions they use. They help a business understand the difference between possible causation, noise, and unseen relationships.
Not so the data astrologist. Data astrology promises fast results using existing data. But it also allows correlation to masquerade as a business growth lever.
“What leads to growth?” is a causal question that requires a causal model. Predictive models don’t always do that. And though most of us know that “correlation is not causation”, it’s very tempting to take a model out of context if nobody is there to provide the context.
How to Avoid the Data Dark Arts
It’s natural to look to authority for a definitive answer to complex questions.
But certainty is a false promise.
There is no business knowledge in the algorithms that machines push data through. Those algorithms come from the scientist. A scientist is a person who applies a method, the scientific method, and iterates continually based on feedback from the business.
The true value of data science is the person looking into the crystal ball. They work hard to read a business, to read the data, and to understand the questions that businesses need answered. And as a scientist, they are trained to bring in new leads, pursue new directions, and think creatively about problems.
Next up: How do I know if I’m doing data astrology or data science? Jamie Kroll will shed some light on this question.
Posted by Elizabeth Kronoff, Karla Slenkamp, Tessa Jones, Jamie Kroll