Data Engineer, Data Analyst, Data Scientist: What’s the Difference?

If you want to make sure you don’t lose your job in the next five years, you probably want to know something about Big Data, or even switch to a data-related career. But what do Big Data jobs entail?

Speaking at last week’s Women of Silicon Roundabout conference in London, Dr. Rebecca Pope, the head of data science and engineering at KPMG, said you don’t need to be an excellent statistician or a high-class mathematician to work in data science or analytics. Nor do you need a lot of prior programming knowledge (although that always helps).

However, you do need an interest in statistics, you do need to be willing to learn how to code, and you do need to know how to do some high-level mathematical operations.

Pope herself didn’t study pure statistics (she’s a neuroscientist). Nor did she study programming. Instead, she learned how to program after graduating, and she attended “endless hackathons.” 

“I started learning R. But my advice would be that if you are launching a career in data science you should specialize in Python… make Python the first language you learn,” said Pope. 

Data scientists are not just statisticians, Pope added: “A statistician is interested in building a model that builds a relationship between a variable and an outcome.” A data scientist wants to do something more: predict. They train models that can predict the future as accurately as possible.

These kinds of jobs come in stages. A business use has to be established, and raw data must be wrangled; then the algorithms are written and tested on the available datasets. If they’re machine-learning algorithms, they learn to predict the future. Visualizations and APIs have to be created so that the business can engage with the resulting product.

Different sorts of professionals are engaged at these various stages. Alternatively, “generalist” data scientists are capable of serving in many roles related to information-gathering and analysis.

What Does a Data Engineer Do?

What skills do data engineers need? Basically, it’s a lot of software engineering and dataset preparation.

These engineers are tasked with “the representation and movement of data so that it is consumable and usable,” Pope said. “If you’re a data engineer, you need to take the raw data, clean it, move it into a database, tag it, and generally make sure it’s ready for the next stage of the process…”

Pope said the programming languages and platforms you’ll need for these jobs include Apache Spark, Scala, Docker, Java, Hadoop, and Kubernetes NiFI.

What Does a Data Analyst Do?

The data analyst’s job is “about interpreting current information to make it useful for the business,” Pope said. There’s not much machine-learning modeling or machine learning deployment in the role.

If you want this role, Pope said, it will help if you understand how to use RapidMiner predictive analytics software and Postgresql, an open-source relational database.

What Does a Data Scientist Do?

Lastly, there’s the “pure data scientist,” who interfaces heavily with the business and works with engineers. They train machine learning programs on specially prepared datasets in order to provide easy-to-use visualizations that suit the needs of the business. They also create models and advise executives on strategic decisions, Pope said.

Data scientists need to understand statistics, but most machine learning algorithms are based on multivariable calculus, as well as linear and non-linear algebra. “This is the level of mathematics you need to know,” she added.

You’ll also need good data-visualization and people skills so that you can present your model and its findings to the business (and encourage them to use it).

Getting a Job

Pope is hiring at KPMG. And she isn’t just looking for PhDs and highly accomplished graduates with a master’s degree. Being a good data scientist is all about being the “Swiss Army knife” who can operate across the spectrum of the above-mentioned roles.

When Pope recruits at KPMG, she says she’s “blind” to candidates’ degrees: What matters most is how well they perform on the technical challenge set by the firm: “I am far more interested in what technology you can build and what you can drive for our client base [than qualifications].”

To this end, she suggested that, rather than studying an expensive degree, you pursue internships and work experience, and compete on platforms such as Kaggle.

“It’s not about being a deep technical expert in Scala or Python. It’s about working out what you need in order to answer the questions being posed by the business,” Pope concluded.

This article originally appeared in eFinancialCareers.