Main image of article Milk May Speed Up Big Data Management
shutterstock_192846977 Everybody wants faster code. But squeezing out just a bit more speed is often an exercise in frustration, especially when dealing with older programming languages. In the quest for additional speediness, some organizations simply dispose of the old in favor of something entirely new. Swift, Apple’s new language for building iOS and Mac OS X apps, is built to be faster than Objective-C, its predecessor (the degree of its speed advantage, however, is the subject of much debate). In the realm of Big Data, a handful of companies have released distributions and variations on Apache Hadoop, hoping to accelerate the processing of massive datasets. Now there’s Milk, a new language developed by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) for the express purpose of speeding up data crunching through more efficient memory management. For developers who must wrestle with data-points scattered across multiple datasets, algorithms written in Milk will supposedly accelerate processing four times faster than existing languages. Hunting through large datasets for a few data-points is a woefully inefficient process. “It’s as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton, and put it back in the fridge,” Vladimir Kiriansky, first author on the paper diagramming Milk’s capabilities, wrote in a statement on MIT’s news site. Rather than make repeated requests to retrieve huge blocks of data in order of locality, processor cores running a Milk program will only request the data that the software knows it needs. (Before making the requests to the cores, the language builds a locally stored list of data items’ addresses.) In broad terms, it’s like going to the fridge just once for that milk carton. Milk also earns points for efficiency: developers can insert a few lines of code around any data-processing instruction, and the compiler will determine how to manage memory in the most efficient way possible. Milk is still in early development, but will hopefully appear on GitHub and other repositories soon. In coming years, datasets will only grow bigger, as will the need to mine them as quickly as possible for insight. Watch for more languages like Milk to come online to deal with the issue.