How to Become a Data Engineer

As more companies rely on data analysis to drive their strategies, data engineers have become more important than ever. Data engineers are tasked with constructing and maintaining repositories for data, such as customer-information databases; their work allows data scientists and data analysts to effectively do their jobs.

Data engineers must possess key skills such as programming, data modeling and knowledge of algorithms. Once they’ve mastered those core concepts, they can build systems for collecting, managing and converting raw data into usable information for interpretation by analysts. While a bachelor’s degree in STEM is a good start, a data engineer must also understand development tools, the intricacies of SQL query optimization, and “Big Data” platforms such as Scala and Apache Hadoop.

If you manage to pull that off, though, becoming a data engineer can translate into a fulfilling (and lucrative!) career. Plus, you get to help companies of all sizes manage their biggest data-related challenges.

Start with Cloud Certifications

Jon Osborn, currently field CTO of Ascend.io, spent a great deal of time as a data engineer prior to his current role. However, his career journey didn’t begin with data engineering. “I started my software career working on embedded software and then front-end applications,” he says.

As his skills and architecture knowledge grew, he gradually moved toward back-end API development where he could serve more customers. “I love challenges, so the final step was to understand how data ebbs and flows through an organization, embracing data challenges, and learning yet again more skills,” he says.

Osborn says if he were a data engineer starting out right now, he would get a basic cloud certification (AWS/GCP/Azure), learn SQL and Python, and seek at least a basic certification in the data platforms that most interest you (Databricks, Snowflake or BigQuery, among others).

He believes the hardest data skill to learn, yet the most valuable, is knowing how an underlying SQL query optimizer works. “Troubleshooting problems and understanding how to improve performance is rooted in how the optimizer is choosing to execute a particular request,” he explains. “Understanding these details can inform early architectural decisions that avoid future problems.”

From Osborn’s perspective, learning the most important technical skills results in direct, valuable business outcomes.

Starting in Civil Before Moving to Data

Chris Hurst, vice president of value engineering at OnSolve, explains that he had a non-traditional path to data engineering. He started his career as a civil engineer, beginning with undergraduate training at West Point, then spent five years as an Army Diver/Engineer officer, then worked as a civilian for four years in Iraq and Afghanistan.

“There, I led infrastructure planning teams developing and repairing systems for water, power, airfields, roads,” he says. “This experience felt deeply meaningful—but over those years, I wanted to understand better how the infrastructure engineering efforts I was involved with contributed to the broader mission of the U.S. improving stability.”

He points to the Army framework Measures of Efficacy (MOE), which asks, “If we accomplish the tasks we said we would, will we achieve the outcomes we want to achieve?” This latter framework was muddled: What do you measure to know if you’re making a difference?

“I realized I needed more training,” he says. “This ultimately encouraged me to apply to Harvard’s joint degree program and after that, to start a data science company.” He completed the Harvard Kennedy School of Government’s MPA/ID program, an economics-centered, multidisciplinary program integrating training in analytical and quantitative methods with an emphasis on policy and outcomes informed by global perspectives.

Choosing from Two Paths to Start a Data Career

Hurst recommends one of two paths for those looking to start a career in data. The first path was the one he followed: deep exposure to a meaningful problem space, leading to meaningful questions (for him: “Is a place more stable? If so, why?”).

“In this phase, I would recommend developing a deeper understanding of theories of change, how practitioners and constituents fill roles within an ecosystem, and why practitioners and systems develop the narratives that drive them,” he says.

Ask hard questions about evidence, gaps in data, and outcomes. “With this framework in hand, next, develop quantitative skills,” he suggests.

The second path would be the opposite: first, focus on data analysis itself, initially on quantitative skillsets. “In this path, one would become an expert at statistical analysis and software engineering, and then apply this expertise to the problems one finds most meaningful,” he says.

A skill more engineers need to learn is how to write software at scale, Osborne adds: “Building systems from scratch that massively scale without friction is a highly desirable talent that is surprisingly hard to find.”

Understanding the Interplay of Data and Narrative

Hurst says that, beyond fundamental skills like statistical and quantitative methods, it’s key to understand the important interplay of narrative and data. Data refers to “evidence supporting or contradicting a theory of change,” and narrative refers to people (especially leaders) creating strong mental frameworks about the world around them and their domain. 

“We make decisions with our animal brains,” he says. “For example, practitioners of defense and development often rely first on stories that drove their success or failures, anecdotes of change.” Understanding how decisions are made at the organizational level or policy level, as well as human emotion and process, are fundamental to success in working with data. 

Take More CS Classes, Grow Leadership Skills

For those who want to grow their career in data engineering, Hurst says two pieces of advice come to mind, the first somewhat broad. “In the late 1990s, computer science was required at West Point—but only one semester,” he says. “In retrospect, more CS classes would have been helpful.”

Second: wherever possible, push your boundaries to lead a team. “Whatever direction your career takes, the experience of bringing people together to solve problems will be extremely helpful and ultimately attack big problems,” he says. “My teammates have made all the difference for me.”

Osborn says if you are passionate about data, there is a company out there that wants to hire you. “Your first job won’t be your last, so pick a current data platform and get started,” he says. “Don’t worry too much about which platform is ‘the best’ because that will change and won’t be under your control.”

Osborn has two other pieces of advice. First, learn from the talented people around you: “Pay attention to what they care about, and, most importantly, understand why they do what they do.”

Second, it’s important to be open to solving problems outside of your comfort zone. “You will put many tools in your toolbox as you master skills moving from one project to another,” he says. “Be open to learning about a tool someone hands you that you’ve never seen before. It might be the best opportunity of your career.”