The Key Skills Needed by Big Data Engineers

by Kate Matsudaira Aug 21, 2014 5 min read

A mix between data scientist and engineer, Big Data engineers are a new breed in the technology community. Do you have what it takes to be a pioneer? The skills required for Big Data engineering roles aren’t necessarily new things, but they do require a certain level of understanding in a few particular areas for candidates to be successful. Those particular areas? Math and scientific analysis. If you’ve been successful in engineering roles with those skills in the past, even if you don’t have all the skills or experience listed below, you might be a great fit for a Big Data engineering role. Click here to find Big Data engineer positions.

You Can Do Lots of Things With Data

Not all Big Data roles are the same, but there are a few things you can expect to see if you take on a position in this field. Typically the role will include a subset of the following high-level skills:

Data Analysis: Are you a pro with MapReduce, Hadoop or even data mining? In addition to processing data, you may also need to know more specialized techniques like machine learning or even statistical analysis.
Data Warehousing: Are you familiar with large data stores? Do you know how to get data in or take data out? Good.
Data Transformation: Sometimes data needs to be changed or transformed into a different format in order to properly analyze it. Can you make it work? You may know this work as ETL or even just scripting.
Data Collection: You have to crawl before you can walk. Crawling the Web or extracting data from an existing database or API are common chores for Big Data engineers.

Every role is different, though, so some may require more specialized knowledge in one of these areas over the others. However, if you are an expert in one, it’s not usually too challenging to translate those skills to the other areas. What You Need

Data Analysis

MapReduce, Hadoop, Cloudera, IBM Big Insights, Hortonworks or MapR. Most people tend to have experience with one implementation of MapReduce (since many of these tools are only a few years old) but the underlying algorithms make it easy to learn new ones with a few weeks of ramp-up time. If you are familiar with one of the tools listed here, or one of the many flavors of MapReduce (like Hive or Pig), you’ll most likely be able to step into a role using a similar tool.
Data mining or machine learning. This can include technologies like Mahout, or more specialized techniques like Neural Networks. Having these skills can be a huge asset for you over other candidates if the role requires this kind of work, since these skills are more specialized and harder to learn.
Statistical analysis software: R, SPSS, SAS, Weka, MATLAB. Most data scientists have some statistical experience, but not all of them will use software to do their work. If that’s you—if you use Java, for example—you may be expected to learn these software tools, but it should be fairly easy to ramp up from what you’re used to.
Programming skills: Java, Scala, Ruby, C++. Typically, more heavy lifting programming skills will be required for custom implementations or specialized implementations (leveraging things like machine learning, etc.).

Data Warehousing

Relational databases: MySQL, MS SQL Server, Oracle, DB2. Expertise with one of these tools takes time, so if your experience matches the tools used at the company you’re interviewing with, that’s a great thing. However, if you’re not an expert with their tools, experience with one of these will make it easier to learn the basics of a new one in a matter of weeks.
NoSQL: HBase, SAP HANA, HDFS, Cassandra, MongoDB, CouchDB, Vertica, Greenplum, Pentaho and Teradata. In this area, it’s best if your experience matches what the company already uses. Knowledge of one won’t necessarily translate well to others.

Upload Your ResumeEmployers want candidates like you. Upload your resume. Show them you're awesome.

Data Collection

Data APIs (e.g., RESTful interfaces). Most candidates should have some experience working with APIs to collect or ingest data. If not, any candidate with programming or scripting experience can pick this up in less than a week.
SQL expertise and data modeling. This is something all candidates for Big Data engineering roles should have, so you’ll need to brush up your skills if you haven’t done this kind of work for a while.

Data Transformation

ETL Tools: Informatica, DataStage, SSIS, Redpoint. In general, your experience with one of these tools will be applicable to using a different one, if required.
Scripting. Do you know Linux/Unix commands, Python, Ruby or Perl? While each of these languages works differently, your knowledge of one should translate fairly easily to mastery of another.

Big Data engineering is a new field with a lot of new technologies and new positions. Not all roles require expertise in every area, so pay attention to what needs the company you’re looking at really has. By taking on one of these roles, you’re tackling a brand new field with lots of possibilities, which means you need to be flexible and open to learning on the fly to do the most amazing work possible.

Image: PHOTOCREO Michal Bednarek/Shutterstock.com

Main image of article The Key Skills Needed by Big Data Engineers

You Can Do Lots of Things With Data

Data Analysis: Are you a pro with MapReduce, Hadoop or even data mining? In addition to processing data, you may also need to know more specialized techniques like machine learning or even statistical analysis.
Data Warehousing: Are you familiar with large data stores? Do you know how to get data in or take data out? Good.
Data Transformation: Sometimes data needs to be changed or transformed into a different format in order to properly analyze it. Can you make it work? You may know this work as ETL or even just scripting.
Data Collection: You have to crawl before you can walk. Crawling the Web or extracting data from an existing database or API are common chores for Big Data engineers.

Data Analysis

MapReduce, Hadoop, Cloudera, IBM Big Insights, Hortonworks or MapR. Most people tend to have experience with one implementation of MapReduce (since many of these tools are only a few years old) but the underlying algorithms make it easy to learn new ones with a few weeks of ramp-up time. If you are familiar with one of the tools listed here, or one of the many flavors of MapReduce (like Hive or Pig), you’ll most likely be able to step into a role using a similar tool.
Data mining or machine learning. This can include technologies like Mahout, or more specialized techniques like Neural Networks. Having these skills can be a huge asset for you over other candidates if the role requires this kind of work, since these skills are more specialized and harder to learn.
Statistical analysis software: R, SPSS, SAS, Weka, MATLAB. Most data scientists have some statistical experience, but not all of them will use software to do their work. If that’s you—if you use Java, for example—you may be expected to learn these software tools, but it should be fairly easy to ramp up from what you’re used to.
Programming skills: Java, Scala, Ruby, C++. Typically, more heavy lifting programming skills will be required for custom implementations or specialized implementations (leveraging things like machine learning, etc.).

Data Warehousing

Relational databases: MySQL, MS SQL Server, Oracle, DB2. Expertise with one of these tools takes time, so if your experience matches the tools used at the company you’re interviewing with, that’s a great thing. However, if you’re not an expert with their tools, experience with one of these will make it easier to learn the basics of a new one in a matter of weeks.
NoSQL: HBase, SAP HANA, HDFS, Cassandra, MongoDB, CouchDB, Vertica, Greenplum, Pentaho and Teradata. In this area, it’s best if your experience matches what the company already uses. Knowledge of one won’t necessarily translate well to others.

Upload Your ResumeEmployers want candidates like you. Upload your resume. Show them you're awesome.

Data Collection

Data APIs (e.g., RESTful interfaces). Most candidates should have some experience working with APIs to collect or ingest data. If not, any candidate with programming or scripting experience can pick this up in less than a week.
SQL expertise and data modeling. This is something all candidates for Big Data engineering roles should have, so you’ll need to brush up your skills if you haven’t done this kind of work for a while.

Data Transformation

ETL Tools: Informatica, DataStage, SSIS, Redpoint. In general, your experience with one of these tools will be applicable to using a different one, if required.
Scripting. Do you know Linux/Unix commands, Python, Ruby or Perl? While each of these languages works differently, your knowledge of one should translate fairly easily to mastery of another.

Image: PHOTOCREO Michal Bednarek/Shutterstock.com

The Key Skills Needed by Big Data Engineers

You Can Do Lots of Things With Data

Data Analysis

Data Warehousing

Data Collection

Data Transformation

Related Articles

The Key Skills Needed by Big Data Engineers

You Can Do Lots of Things With Data

Data Analysis

Data Warehousing

Data Collection

Data Transformation

Related Articles

Dice Staff

The Key Skills Needed by Big Data Engineers

You Can Do Lots of Things With Data

Data Analysis

Data Warehousing

Data Collection

Data Transformation

Related Articles

The Key Skills Needed by Big Data Engineers

You Can Do Lots of Things With Data

Data Analysis

Data Warehousing

Data Collection

Data Transformation

Related Articles

Dice Staff

Related Articles

Data Mining, Analysis Are Key Big Data Skills [Poll]

Growth Market for Data Engineers

Key Skills That Data Scientists Need