Tech firm Stitch, which maintains an ETL (Extract, Transform, Load) service for developers, recently conducted a study into the state of data engineering in the United States.
According to Stitch’s research, the number of data engineers in this country grew 122 percent between 2013 and 2015; some 42 percent of those engineers came from a software-engineering background. SQL, Java, Python, Hadoop, and Linux constituted the top five data engineering skills.
Stitch’s report also suggested that the growth in data engineers has outpaced that of data scientists—and with good reason. “Data engineers build and maintain the pipelines that keep your data clean and flowing,” read the company’s blog posting on the data. “Insights are great, and you need them. But to deliver insights at scale, you need data infrastructure.” Stitch pegs the expansion in data scientist jobs at 47 percent between 2013 and 2015.
Because data engineers work on very complex architecture, they often boast a variety of skills, from Python and Java programming to data warehousing and machine learning. While data scientists focus on analytics, data engineers spend all day immersed to their elbows (proverbially speaking) in data consolidation and warehousing.
While recent tools have automated at least some of the processes related to data architecture, data engineers must display an aptitude for dealing with complex problems on a code level. Those just entering the profession should familiarize themselves with a variety of database types, as well as basic regression and summary statistical techniques. Some knowledge of AForge.NET, Scikit-learn, and other machine-learning technologies is also a plus, given the broader industry trends at the moment.
Like data scientists, data engineers must also keep their soft skills polished. In any given company, a lot of stakeholders depend on data (and the analysis of said data) to carry out their jobs. Many of those stakeholders don’t have an in-depth knowledge of the technologies that allow that data warehousing and analysis to happen. As a result, data engineers need to know how to explain core concepts and results in ways that colleagues without technical backgrounds will understand.