Main image of article Data Scientist: Education, Training, Interviewing

Your typical data scientist works with various forms of data to discover insights and knowledge. Then they develop products and services that support optimal decision-making.

The data can be structured (coming from a pre-defined data model and residing in relational databases) or unstructured (having no pre-defined format, such as text files or user-generated content).

A data scientist is responsible for understanding and aggregating these different datasets, and employing statistical and machine learning techniques to create predictive analytics and models. They work with data and application engineers to integrate these models into the product, thereby improving user experience and engagement with the product. They also help identify opportunities to improve organizational efficiency and increase business value.

Data scientists often interact with people from multiple departments, such as business development, sales, product management, project management, UX/UI designs, and software engineering teams.

“Data scientists can continue to grow their professional career as an individual contributor or take a managerial path in data science,” Seongjoon Koo, chief data officer at J.D. Power, said. “Also, it is possible to move onto a product manager role by managing data science products and services.”

Vibha Srinivasan, director of data science at Spiceworks, explained the career path for data scientists is actually similar to that of a software developer.

“At the entry level, you have well-defined problems to work on—for example, building recommendation engines to drive product purchases,” she said. “As you grow into senior and lead data scientist roles, you would be expected to look at the business goals and see how data science can be used most effectively to help meet those goals.”

That involves evaluating different approaches and making tradeoffs between accuracy and speed of deployment.

“You would take initiative in evaluating third-party data sources and external APIs for machine learning to see if they would add business value or help you deliver your product quicker,” Srinivasan said. “You’ll also mentor and train junior data scientists within your team.”

Irrespective of the business use cases and career level, the day-to-day work will involve a lot of data cleaning, analysis, feature extraction, modeling, and visualization.

“You will also be spending time reading and staying up-to-speed on industry trends, since this is a fast-growing field,” Srinivasan noted.

Typical Data Scientist Job Posting

Srinivasan said tech pros should look for job descriptions that clearly outline the responsibilities of the position, because they can vary greatly from company to company.

“The job posting should also detail what teams and departments the data scientist will collaborate with, and some examples of the products they’ll focus on at the company,” Srinivasan said.

In companies that are just starting to build a data science team, though, the part about responsibilities could be intentionally vague, since you’ll be expected to help evaluate how data science can help the business.

Education/Training/Certification

Education and formal training in data science, analytics, statistics, computer science, and electrical engineering, or closely related technical disciplines, are often preferred. Massive Open Online Courses (MOOCs) can help people from different backgrounds gain necessary educational training and experience.

Koo said hands-on coding skills and experience in Python, R, and/or other programming languages are required for data scientists. The ability to understand data quickly, and interpret the results for business, is also critical. Due to the collaborative nature of work, good communication skills are preferred. Srinivasan agreed that a strong background in mathematics and statistics is essential, along with good programming skills.

Experience with a range of data mining and machine learning techniques, such as classification, clustering, natural language processing, neural networks, etc. is highly desirable.

“Good SQL skills go a long way in helping you extract and analyze structured data,” Srinivasan said. “Knowledge of basic statistics is required to assess your datasets and make reasonable assumptions.”

These skills can often be acquired through a bachelor’s (or higher) degree in mathematics, statistics, computer science, or related degree, and through experience in the field.

“There are several machine learning bootcamps and online courses available, as well,” Srinivasan said. “Participating in Kaggle data science competitions is also a great way to hone your skills.”

Typical Data Scientist Interview

Typically, interview questions cover the following:

Ideally, questions will be designed to reflect the nature of the work you’ll be performing at the company, and the kinds of data you will be dealing with.

“For example, you may be given a file containing mock data about traffic to different landing pages of your website, and asked to build a model that predicts conversion rates,” Srinivasan said. “More than the solution itself, interviewers are looking to see if you ask clarifying questions about the data, state the assumptions you’re making, and explain your thought process as you work through the problem.”

Candidates will be asked to explain why they selected a particular approach and its pros and cons compared to other techniques. Some interviewers may ask potential hires to explain the math underlying machine learning, such as L1 versus L2 regularization, or concepts such as cross-validation.

Since labeled data is often a luxury, you may be asked about how you can build a predictive model in the absence of labeled data (using unsupervised ML techniques, or keyword-based approaches to generate labels).

“When it comes to statistics, problems around the Bayes’ theorem and conditional probabilities are interview favorites,” Srinivasan added. “As mentioned already, it’s important to communicate your approach clearly to technical (data scientists) and non-technical (product managers) alike.”

Koo also noted hands-on coding exercises, with real data and interpretation of the results, are gaining popularity as a means to test candidates' true capabilities. Deep understanding of algorithms, instead of just familiarity with certain machine-learning libraries and packages, is often preferred.

What to Include on a Résumé/Cover Letter

In addition to highlighting individual skills and experience, candidates should amplify their proficiency with various tools and libraries used by data scientists, such as natural language processing libraries (including Gensim and Spacy), deep learning libraries (such as TensorFlow, Keras, Pytorch), Big Data technologies (Hadoop and Spark), and analytics tools such as SQL.

As Srinivasan noted, it’s also important to include any personal projects that you worked on and data science competitions that you participated in. Experienced candidates should elaborate on their current and past analytics and machine learning projects, as well as the business value that their work delivered.

If you evaluated additional data sources or alternate approaches that simplified processes at your previous workplaces, it would be something to highlight. And remember: every bullet-point in the ‘Experience’ section of your résumé should mention the positive impact of your actions (for example, “Increased unit revenue by 25 percent after using data to streamline production process.”), because most of all, potential employers want to see how you can change an organization for the better.