Big Data architects are the ones responsible for designing the domain where the data resides, but they don’t operate in a vacuum. Whether they use Hadoop, Storm, NoSQL or MapReduce, these versatile builders must consider the vision and needs of data scientists and analysts when creating their technical blueprint.
“Big data architects need to demonstrate their flexibility when answering technical questions,” says R. Emmett O’Ryan, the Dice Big Data Talent Community guide. “Each platform design has pros and cons, and a top architect will consider those and the needs of stakeholders when designing a domain.”
We asked O’Ryan, who interviews a number of candidates for Big Data jobs, to share a few of the technical questions he asks, and the answers he’s looking for.
How would you architect the system to handle the ingestion of both real-time and periodic data?
- What Most People Say: I’d use Hadoop for both. Oh, you say Hadoop can’t ingest real-time data. Then, I’d use an add-on like Impala for that.
- What You Should Say: I’d use Hadoop for periodic data and Storm to ingest real-time data.
- Why You Should Say It: “Hadoop is the wrong answer,” O’Ryan says. “And while Impala can process real-time queries, it can’t handle real-time data ingestion.”
What tools would you put in place to allow data scientists and analysts to perform visualization?
- What Most People Say: Well, I’d do something with Python or Ruby for visualizing the data.
- What You Should Say: I’d look for an open source solution or a product that offers a large data library in order to create a variety of data visualizations. For instance, I might consider Pentaho, one of the Tableau products or R since it has over 5,000 libraries.
- Why You Should Say It: “Technically, you can use Python or Ruby to achieve visualization but they don’t have large libraries,” O’Ryan explains. “Even if you aren’t an R advocate, your answer should demonstrate your willingness to consider a range of alternatives in order to make the system as flexible as possible with the data scientist in mind.”
What’s the best way to protect data at rest?
- What Most People Say: I would encrypt sensitive data or put guards around it. What’s that? Oh, you’re wondering if encrypting data at rest would affect the data flow. Um, I’m not sure.
- What You Should Say: The best way to protect data at rest is to use a NoSQL database like Apache Accumulo. Or, use a relational database like Oracle or any other database that provides access control.
- Why You Should Say It: An experienced architect will consider the type of data he’s trying to protect when architecting the structure, notes O’Ryan. For instance, he’ll consider how the data is used, stored and processed in order to eliminate bottlenecks. Volume, variety and velocity are three defining properties or dimensions of Big Data that architects must consider throughout the design process.
“The successful candidate will showcase their cross-functional knowledge, communication skills and ability to serve as a consultant when answering technical questions,” O’Ryan says. “Most importantly, experienced architects view flexibility as a way to survive in a world where there are few absolutes and many shades of gray. Dissecting the problem and offering several solutions is a great way to highlight your flexibility during an interview.”