Big Data’s Job Market

By R. Emmett O’Ryan

The expected market for Big Data is expected to surpass $100 billion, according to the Economist, as more and more data and data sources come online each year. This means that employment opportunities in the area will increase — for those with the right skill sets are backgrounds.

So what are these skills sets?

Many of the skills needed to support Big Data environments are based on information management and statistical analysis, including the use of technologies such as:

  • Association rule learning
  • Classification
  • Cluster Analysis
  • Crowd Sourcing
  • Data Fusion and Integration
  • Ensemble Learning
  • Genetic Algorithms
  • Machine Learning
  • Natural Language Processing
  • Neural Networks
  • Pattern Recognition
  • Predictive Modelling
  • Regression
  • Sentiment Analysis
  • Signal Processing
  • Supervised and Unsupervised Learning
  • Simulation
  • Sime Series Analysis
  • Visualization

From an infrastructure standpoint, those with Linux, Apache HTTP Server, MySQL and PHP (or sometimes Perl or Python) or LAMP skills become highly desirable. UNIX/Linux systems administrators, systems programmers, and developers who have backgrounds managing, supporting and developing the technologies being applied include distributed computing or HPC systems, massively parallel-processing (MPP) databases, search-based applications, data-mining grids, grid computing, distributed file systems, distributed databases, cloud computing platforms, the Internet and scalable storage systems.

Finally, the role of Data Scientist is becoming ever important. These are people who create the tools that can be used to interpret the information and analytics in a Big Data environment. As for the skills necessary for this position, consider the ideal individual: He or she would have a passion for computing like those of a hacker, expertise in math or statistics, a background in data mining and the creativity and insight into what all the tools can do, how to put them together and then represent the answers provided.