Main image of article Interview Questions for Hadoop Developers

Hadoop ElephantHadoop is an open distributed software framework that enables programmers to run an enormous number of nodes handling terabytes of data. One of its most significant abilities is allowing a system to continue to operate even if a significant number of nodes fail. Since Hadoop is continuing to mature, hiring managers and recruiters are finding few Hadoop specialists out there. Consequently, many of those being hired for Hadoop-heavy jobs are those able to demonstrate that they can learn it quickly and are familiar with similar Big Data tools. How do you convince a hiring manager you’re one of those people? We asked Eric Sammer, Engineering Manager at software provider Cloudera, to share a few common interview questions and the types of answers he’s looking for. Explain how Hadoop differs from a traditional database.

  • What Most People Say:  Hadoop is not a database.
  • What You Should Say:  Describe authoritatively how Hadoop processes data in large batches.
  • Why You Should Say It: “Being able to define those more fundamental uses demonstrates more complete knowledge,” says Sammer

Give an example of a case where using Hadoop makes sense, and a case where using Hadoop is the way to solve a particular problem.

  • What Most People Say: It’s useful for offline batch operations.
  • What You Should Say:  Hadoop is useful in cases where you’re processing large quantities of hourly data and transforming data en mass.
  • Why You Should Say It: “This answer shows you thoroughly understand it,” says Sammer. Simply saying it’s useful for offline batch operations, while not incorrect, is a superficial answer, he says.

What happens when you invoke a read function in the operating system? How does the operating system perform IO?

  • What Most People Say: “I use the Java I/O system and the data gets returned from the disk.”
  • What You Should Say:  Walk through the various components of what the operating system does in order to be able to describe the entire I/O pipeline.
  • Why You Should Say It: “Hadoop is a low level system for processing data,” Sammer explains. “Candidates who are more familiar with operating system fundamentals can handle Hadoop at a much deeper level. That’s the kind of person we’re looking for.  This question is one of the earliest indicators about whether you know how computers really work.”

If you only had 32 megabytes of memory how would you sort one terabyte of data?

  • What Most People Say: “I don’t know.” (In fact, most candidates either get it right or don’t).
  • What You Should Say: “Take a smaller chunk of data and sort it in memory, so you partition it in lots of little data sets. Then merge those sorted lists into one big list before writing the results back to disk.”
  • Why You Should Say It: “Any candidate who does Hadoop or knows it at a deep level will be able to understand the depth of what Hadoop does,” Sammer believes. “It’s a great qualifying question. It demonstrates an understanding of how you manage data at that scale.”

Have you ever participated in open source in any way?

  • What Most People Say: “No,” or, “I’m familiar with open source.”
  • What You Should Say: “Here’s an example of a project I did for a previous employer with open source. I have also contributed code.”
  • Why You Should Say It: “Passion goes a long way,” says Sammer. “It gives us a high level gauge of interest in what they do for a living. People who do that tend to be a much better fit for us. Generally I -- as well as the rest of Cloudera -- believe there are a lot of ways to participate. You can contribute code, devote time by answering questions or write documentation. It’s so impressive to see in a candidate.”