Data scientists are the people who crunch the data that an increasing number of companies, government agencies and other organizations collect from a range of sources, using mathematical models to analyze it and create narratives or visualizations to explain what it means, then suggest how to use it to make decisions.
Half Analytics, Half ‘Counting Things’
Former bit.ly chief scientist Hilary Mason, now at venture capital firm Accel Partners, described a data scientist as someone who can obtain, scrub, explore, model and interpret data by blending hacking, statistics and machine learning. To Forbes, she described the role as encompassing half analytics, half “counting things.”
Gregory Piatetsky-Shapiro, president of the consultancy KDnuggets, prefers the term “knowledge discovery” to “data scientist” – hence the KD in his company’s name. However, he agrees that the job is about finding “some understandable knowledge and not just incomprehensible patterns” in data.
What does that mean in terms of skills? There’s not a lot of consensus, though most observers agree that professionals here should understand how to analyze data, says Jill Dyché, vice president of SAS Best Practices. “The key is driving new business insights through the use of data,” she explains.
Usually this implies an understanding of statistical analysis tools, but sometimes prowess with more prosaic business intelligence software is as important, as well. Some positions call for master’s degrees in statistics, machine learning, data mining, operations research, applied mathematics, electrical engineering, computer science or computer engineering, Ph.D. preferred. Others mention artificial intelligence, machine learning, data modeling, Hadoop, MapReduce, R and programming languages like Java, C++, C# or Python.
Whatever you call the job — and however you define it — the number of opportunities isn’t likely to slow down anytime soon. In five years, there will be almost half a million data scientist positions, and a shortage of up to 190,000 people when it comes to filling them, according to the McKinsey Global Institute. On top of that, the U.S. will need 1.5 million executives and support staff who can analyze data to make effective business decisions.
While heavy, quantitative statistical jobs aren’t new, the types of data and the techniques and tools to harness it have changed, says Barb Wixom, principal research scientist at the MIT Sloan Center for Information Systems Research.
“The hardest part of this data scientist craze is to know what people mean when they use that term,” she says. “[Often] when people say data scientist, they really mean more of a business analyst. That’s a business person who can make sense of data in a way that would inform decision-making. As businesses leverage data more and more, you’re going to need a whole spectrum of people who have data-related skills.”
A Range of Talent
A report from the Business Intelligence Congress aimed at better aligning college curricula with the needs of business points to a spectrum of skills needed for positions in business intelligence/business analysis. “The top ones are management and communication skills, data skills,” says Wixom. “In the continuum of skills, the closer you are to that heavy statistical role, the more important those heavy [quantitative] skills become, and basically you need a Ph.D. in stats or programming.”
“In more of a business analyst role, you’ll need less of the heavy [quantitative] skills, but you’re going to have to be really strong in communication skills, management and have a deep functional knowledge of the business,” she continues. “That is the person who has to understand how to convert insight into action.”
In addition to polling faculty and students, the BI Congress survey included responses from 446 practitioners, 308 of them involved in hiring. It found the number of master’s and other advanced degrees in analytics is growing, and that employers prefer people with degrees specifically in analytics.
“If you have an advanced math degree, you’ll be able to pick up the other analytics skills over time,” Wixom says. “But an analytics degree takes the math and combines it with the other portfolio of skills, as well. It gets you closer to that perfect mix of skills than a degree in math or statistics.”
Of course, “a perfect mix of skills” is hard to define. One practitioner quoted in the report believes, “Analytical ability and natural inquisitiveness are more important than technological skills.” Another suggested that “core technical skills should be well augmented with skills in requirements gathering, defining a solution architecture, and communicating effectively with non-technical folks.”
Newcomers to the field should pay special attention to the words of yet a third practitioner, who said, “Entry-level IT workers appear to have little understanding of the complexity of business processes, strategy and operations — so they tend to oversimplify the problem of reporting and analysis. They are overly focused on the technology challenges of data movement and storage.”
Real Data to Solve Real Problems
Those seeking to show they’d be a good data scientist need to demonstrate to employers how they can hit the ground running, Wixom says. Spell out a class project or homework assignment using analytics to solve real problems. In the best-case scenario, show that it used real data sets – terabytes of data.
“I have grad students who have experience as a business user and if they want to get much more involved in the analytics side, they could play up that they know the problems to be solved, they know the actions that need to happen and the insights to be found,” she says. “That’s where they could start explaining their interest in developing the skills in that continuum [that moves them closer to analytics].”
The same can hold true for more experienced candidates. “Foundational skills are most critical. If a person doesn’t feel like they have analytics skills per se, but can show they have foundational skills, like showing that you understand data management, maybe you have SQL skills, maybe you’re a strong communicator — if you can show you have those, we’ve found that employers are willing to hire and develop them because there’s such a great need right now.”
Indeed, 80 percent of the businesses responding to the BIC survey provide additional training to their data recruits. That training ranged from a one-week introduction to the company’s practices to an 18-month program with mentorship.