Main image of article Big Data Engineer: Must-Ask Interview Questions

Prepping to interview a candidate for your Big Data Engineer position, but unsure of what questions to ask to ensure you’re finding someone who can bring that “extra something” to your company? We’ve got what you need. Use these sample questions to help you determine which candidate is more than merely skilled and experienced in the basics—someone who will bring the acumen that can help elevate your company and support its needs above and beyond the usual. Not only will that help make the interview process easier, it will also help you uncover the tech professionals who are deep-thinkers, high-performers, and all-around true standouts.


Interviewing another position? Check out Dice’s library of interview questions.


Question: How do you use big data to enhance or support an increase in business revenue?

Why you should ask: Having the technical know-how associated with a Big Data Engineer position is a big part of the puzzle, but having a deeper understanding of how and why to use it to your company’s advantage shows an advanced acumen that can help your company advance—and ultimately, that’s what you want. When you ask your candidate this question, you’re not only gauging their skill set, but also their understanding of this genre of the tech industry as a whole, and how they can best leverage it to help your company reach new heights.

An answer you’d hear from a standout candidate: First, big data can help us understand where we are as a company in the overall landscape, and also in comparison to our nearest competitors. Both of these standpoints can help Big Data Engineers understand strengths and weaknesses, and amend our approach accordingly. And secondly, using big data analytics can help us get to the micro level of those strengths and weaknesses so we can see to a finite point how we can customize our approach to our target audiences, and get even more detailed in our offerings by understanding more about their needs. By gaining a more intimate view of our data both internally and part of the bigger picture, it can help us make changes to our plans going forward, which can ultimately help us increase our revenue—in some cases, to a significant degree.


Question: What is your approach to data preparation, and how do you turn unstructured data into structured data?

Why you should ask: Like they say, there’s more than one way to skin a cat—and there’s more than one way to handle data preparation. When you present this question to Big Data Engineer, you’re definitely looking for a window into their process, especially if they’ll be handling a lot of data. But you should also listen for their ability to think analytically, assess whether or not they’re security-minded, get further insight into their understanding of the technical aspect of the process, and whether or not they’re able to work efficiently.

An answer you’d hear from a standout candidate: I like to start with company governance as my guide, so I ensure that I’m compliant with company policy and mandates. Then, I work with the raw material and determine what I need to meet the request being asked of me, identify the sources I need to get the data from, and make sure what I need will be available as needed to complete the project. Once I’ve extracted what I need, I work on examining the data to create a solid profile, working on small sets to minimize errors and make it easier to identify data types.

When necessary, I’ll create a graph of key finds to make it easier for the audience to digest, then thoroughly cleans and filter the results to ensure that what I have—and what I ultimately present to the appropriate business owner—is thoroughly vetted so the information is reliable and offers the clearest picture for modeling purposes. As for turning unstructured data into structured data, I’ve had to transform data in previous positions, and I’ve had good experience using Hadoop to organize datasets in a more defined manner so it can be properly analyzed.


Question: How are Hadoop and big data related?

Why you should ask: You’ve already opened the door with the previous question by asking the Big Data Engineer candidate how they transform unstructured data into structured data, so now’s the time to delve deeper into the more involved technicalities of their job. And while having an understanding of Hadoop in conjunction with data doesn’t necessarily show process, it does show proficiency and a deeper understanding of data and data management—which is important when working with a high volume of content across multiple departments.

An answer you’d hear from a standout candidate: To me, they’re one in the same—you can’t really have one without the other, because Hadoop is such a powerful framework for data analysis. I’ve found it to be invaluable to deal with structured, unstructured and semi-structured data, and I don’t believe any company dealing with big data can properly function without this powerful tool in their corner. It’s also an exceptional tool for collecting, storing and processing data. As a Big Data Engineer, I find the best hardware configuration for using Hadoop relies on dual processors or core machines using a configuration of four to eight GB of RAM, with ECC memory, but we can always vary the configuration and customize the hardware to our needs depending on our projects and workflow.


Question: How do you deploy a big data solution?

Why you should ask: This is all about understanding a Big Data Engineer's process capabilities, full stop. Part of the question relates to their technical acumen, of course, but by and large, your goal in asking this question is to determine how much your candidate understands about big data and managing it in a variety of ways by boiling down everything to a step-by-step process related to one specific goal. Plus, you’ll be able to see exactly how much experience they have in this kind of data management, and the depth of their skill set.

An answer you’d hear from a standout candidate: I start by extracting data from a designated source, such as a CRM, SAP, databases like MySQL, social media feeds, or whatever source is required to support the needs of the business owner or department making the request. I use either batch jobs or real-time streaming to ingest it, then store the extracted data in HDFS or a NoSQL database such as HBase, depending on our needs. Then I process the data through a framework like Spark or MapReduce, and ensure it’s where we need it to be for the project we’re using it for.


Question: Have you worked with big data in a cloud computing environment?

Why you should ask: As more and more companies are moving their systems and services to the cloud, IT professionals need to be prepared for the possibility that every aspect of the business will exist in virtual form—and they’ll need to know how to work with it, and what challenges it may present a company, particularly when dealing with data. When you ask your Big Data Engineer candidate this question, you’ll be identifying their understanding of cloud computing capabilities, industry trends, and how it can affect the future of the company’s data.

An answer you’d hear from a standout candidate: Yes, I have experience primarily with AWS and Azure, and I find it’s a solid solution from a production, development and testing standpoint. I like that it offers the flexibility to scale up an environment as needed, and it’s a great option for hosting back-ups in case of emergency. Plus, it allows me as a Big Data Engineer the freedom and flexibility to access the data we need from anywhere—with the appropriate security precautions in place, of course—and using any appropriate platform for access.