Data miners gather project specifications, create robust predictive models, and offer business managers reliable, insightful data and problem-solving analysis.
A data miner needs an in-depth understanding of database structures and algorithms, but it’s their “Columbo-like skills” that really sets them apart, according to Dean Abbott, co-founder and chief data scientist for Smarter Remarketer Inc., a marketing intelligence firm headquartered in Indianapolis, Ind.
“Average data miners run down a checklist to create a predictive model,” he said in an interview. “The real pros have a ‘Freakonomics’ mindset.”
Freakonomics is a groundbreaking book, written by two economists, that uses data to show the hidden causes and correlations underlying a diverse range of systems, from sumo-wrestling competitions to crime rates. When he mentions that text, Abbott is referring to a data miner’s habit of asking questions and insatiable desire to understand a problem before they build a model.
Here are some of the questions Abbott typically asks during an interview to identify a data miner with a “Freakonomics” mindset:
Which courses or work experiences have been most useful in helping you build good predictive models?
- What Most People Say: “My classes weren’t that helpful. I would say the most valuable course I took was Python programming.”
- What You Should Say: “After being a data miner for about a year, I went back and revisited the principles I learned in a graduate level neuroscience class. Originally, I didn’t see the connection. But now I see how the course’s theories, skills and problem-solving approach relate to the model building process.”
- Why You Should Say It: The best modelers connect the theory of predictive modeling with the practice. As professionals acquire hands-on experience, they learn to apply their education and knowledge to the solution of business problems and how to reach conclusions by connecting disparate pieces of information. Moreover, the best predictive models result from collaboration—that’s why we work in teams. Showcasing your ability to synthesize information from a variety of sources and create collaborative models is a great way to separate yourself during an interview.
You’re meeting with a program manager and stakeholder about an analytics project to identify customers who are likely to churn. What are three things you need from the stakeholder or IT to create a plan and build a predictive model?
- What Most People Say: “I need to know the target vertical market, where the data is stored and when they need the model.”
- What You Should Say: “In addition to the basics, I need to make sure that I understand the real business objective. For instance, do we want to reduce churn within the first 30 days of inactivity, or when customers are inactive for two to three months? Do we want to predict inactivity in all customers, or just those who spend a certain amount of money? And what’s the endgame? Do we want to benchmark against other companies or reduce customer churn by 10 percent?”
- Why You Should Say It: The success of a remediation plan hinges on accurate predictive models and analysis. You need to thoroughly understand the business objectives in order to produce relevant data and help business leaders reach the right conclusions, so they can achieve the desired outcome.
You’re working on two projects for an organization that investigates fraud. The first project requires fast deployment in a transactional system and needs to produce highly accurate results with a low false alarm rate. The second project is primarily intended to provide insights, including key predictors and interaction terms, but accuracy is less important. Which algorithms would you consider using for these two projects?
- What Most People Say: “I would probably use a neural network for the first project because it’s fast and highly transactional. I’d create some sort of scorecard or decision tree for the second project.”
- What You Should Say: “I like to use an ensemble method when accuracy is paramount, as long as the system deploys quickly enough. I’ve encountered problems with opaque models before, so I’ve developed a unique way to interpret the results. It’s a set of extraction techniques that enable usable information and key predictors to be pulled out of the raw data. It’s not something you’d find in a textbook. I developed it on the job.”
- Why You Should Say It: Once the task is decided and the goals are established, it’s important to choose the right data mining technique. Experienced data miners are familiar with a variety of algorithms and they consider speed, transparency, accuracy and the project’s goals in deciding which one to use. Plus, they learn from experience. They aren’t afraid to try an unconventional process or formula if it works better.
When you build models with different inputs or different algorithms, how do you assess their accuracy and choose the “best” model?
- What Most People Say: “For classification problems, I usually build a confusion matrix; and for regression problems, I use R-square.”
- What You Should Say: “I’ve used lift charts, decile tables and randomization testing to assess a model’s accuracy and strength. But most of the time, I run the data through various models and analyze the results to make sure I select the model that most closely aligns with the business objective. In other words, if I’m trying to predict churn of high dollar customers within the first 30 days of inactivity, I’ll keep testing until I find the model that optimizes that metric.”
- Why You Should Say It: When building prediction models, the primary goal should be to build a model that will solve the business problem. Sometimes, models are statistically significant but they’re not operationally significant. That’s why experienced data miners don’t use traditional statistical tests to assess models—they assess them with data.