If there is anything that I have enjoyed about being in software, it is the inexorable change, and emergence of new technologies and opportunities. I am often heard saying, “I love living in the future.” Just looking at the past few years, the advent of virtualization and cloud computing has opened up new job types, areas of expertise and ways to make money or find satisfying employment.
One area in particular has emerged recently, a field of mathematics reborn with a revised focus. It is an area that requires skills in data transformation, advanced mathematics, programming, cloud and several other disciplines. As an added attractor, proper application of this technology can truly appear like magic. The field of which I speak is the emerging area of machine learning.
Machine learning encompasses so many areas because there are so many things one must do in order to create a successful outcome. Data, the raw goods for a successful machine-learning deployment, is once again at the center of the action. Over the past few years, the meteoric rise of Hadoop (MapReduce), NoSQL databases and ETL companies (extract/transform/load) has resulted from organizations realizing that the massive amounts of data they had been data-warehousing just might be useful for determining buying patterns, reducing field service time, maintaining SLAs and so on. Companies such as Tableau, along with tools from Microsoft (including Power BI), have made headlines as consumers of large data sets use them to finally make sense out of the oceans of data on hand.
The early tools we have now will pale in comparison to the next step of data evolution. Machine learning promises to offer significant advantages to businesses willing to invest the time and effort to create tangible results. Today, that investment can be sizeable, but the cost to get there is already declining due to new companies and technologies entering in the space. With a decreasing number of steps from data to insight, a broader set of organizations will seek qualified computer and data scientists who can make magic out of data.
Machine learning projects go through several stages as a precursor to success. I’ve already mentioned the pure data aspects required (ETL and data jockeying), which means employment/startup opportunities in those areas will be many and varied. It’s the next step, an evolution of mathematics, which offers more hybrid computing opportunities. If you haven’t heard the term ‘Data Scientist’ before, you most likely will soon. Data scientists have the background to take reams of data, run it through a set of algorithms, and emit predictions that appear almost magical. But to understand the magic, one must understand the kinds of things being created by machine learning.
In essence, there are two general types of machine-learning outcomes. One is termed supervised and the other unsupervised. Boiled down to basics, supervised algorithms can predict things based on historical data with known outcomes. If you have a dataset that has historical data about how to repair a car by make, model, year and symptom, then you might be able to predict what ails a car with reasonable accuracy. In other words, the known historical data—having been run through a machine-learning algorithm—would have a correlated set of variables that could predict a repair outcome. The problem is called “supervised” because we used a set of known data to predict unknown outcomes given a similar set of inputs.
Unsupervised machine learning algorithms take existing data and generate new insights previously unknown. An example of unsupervised might be to take a set of companies and evaluate them with a clustered (a type of machine-learning algorithm) process that intuits how those companies are similar.
The last area in machine learning touches on traditional development. Making machine-learning deployments accessible requires bringing all the pieces together into an understandable, scalable, and consumable offering. Today, startups such as GraphLab, as well as cloud providers such as Microsoft (with Azure ML) and Google, bring those tools together such that mere mortals can use them, provided they have DevOps, cloud, data-scaling and deployment expertise, and, of course, software development skills.
The best part is, this field is still as nascent as the realization that the transformation of held data into actionable information is worth billions of dollars. As tools become more enhanced and case studies emerge that describe real success stories, cost savings and efficiencies, the demand will continue to rise. The question is, who will jump on this next-wave opportunity? How soon will we see the next round of startups and funding, as “pick and shovel” technologies emerge that allow the industry to take full advantage of this transformative technology?
JD Marymee is a technology evangelist with over 25 years’ experience, including startup launches, as well as an executive for large companies such as Novell and Microsoft. JD is currently a venture partner with Pelion Venture Partners, an architect evangelist for Microsoft, and co-founder of the technology incubator Technology Innovations Group LLC.
- DARPA Wants to Improve Machine Learning
- The Key Skills Needed by Big Data Engineers
- Data Science Is Dead