Some businesses want all the benefits of a top-shelf data analysis package, but lack the budget to purchase one from SAS Institute, MathWorks, or another established, proprietary vendor. However, analysts can still rely on open-source software and online-learning resources to bring data-mining capabilities into their organization. In fact, many are turning to R, Octave and Python with exactly this goal in mind.

Why Those Three?

When it comes to machine learning (the creation of algorithms that allow machines to recognize and react to patterns), matrix decomposition algorithms are critical. R, Octave and Python are flexible and easy to use for vectorization and matrix operations; they’re not just data-analysis packages, but also programming languages for creating one’s own functions or packages. For analysts who lack the time to engage in extensive coding, these open-source packages also offer some very handy built-in functions and toolboxes. For example, both R and Octave have simple zscore functions for computing Z-Score; for Python, the function can be defined in a very straightforward manner:
def zscore(X): mu = mean(X,None) sigma = samplestd(X) return (array(X)-mu)/sigma
If you want to use MCMC Bayesian estimation, R boasts MCMCpack, Octave includes pmtk3, and Python has PyMC. All three options feature large and growing user communities (i.e., the R mailing list) that serve as vital hubs for sharing information and exchanging experiences.

Which Software Package to Choose?

Can any one of these packages do more than the other two? The answer is probably no; the three functionalities have a lot in common. That being said, R is popular among statisticians thanks to its emphasis on statistical computing. Octave has a number of industry and academic applications, and engineers and analysts often utilize Python for building software platforms. It would definitely prove easier for someone who has worked with Matlab to pick up Octave, as Octave is often described as the open source “clone” for Matlab. My suggestion is to try all three, and see which offering’s toolbox solves your specific problems. As previously mentioned, R’s strength is in statistical analysis. Octave is good for developing Machine Learning algorithms for numeric problems. Python is a general programming language strong in algorithm building for both number and text mining. Based on my own user experience and research, here is a high-level summary for the three: If you don’t have time or need to learn an entire programming language, an online universe of open-source software can provide you with multiple solutions for your specific needs. Take a little time to experiment and find the one that fits best. When searching for open source solutions, it's a good idea to search both for the broad terms such as machine learning, data mining or artificial intelligence, along with specific implementations such as neural networks. No matter what your skill level, open source software may have a solution for you. Open source software can range from all-in-one solutions to code libraries for sophisticated users who want a more customized solution. So whether you're looking to learn simple regression or robotic vision, open source may have an ideal solution for you.