What’s the Best Programming Language for Data Science?

If you want to break into data science, which programming languages should you learn?

That’s a complicated question. New data from Cloud Academy suggests that Python is an increasingly ubiquitous programming language in the context of data science, outpacing R, which many data scientists have utilized for projects.

Specifically, Cloud Academy analyzed job postings for data engineers and found that 66 percent mentioned Python, while only 18 percent cited R. (Hat tip to Tech Republic for surfacing Cloud Academy’s data.)

Earlier this year, a KDnuggets poll of tech pros who use both R and Python showed that, over the past two years, there’s been a slow decline in R usage in favor of Python. That echoed a separate survey from Burtch Works, which revealed that Python use among analytics professionals grew from 53 percent to 69 percent over that same period (meanwhile, the R user-base shrunk by nearly a third).

Python is famous for its versatility and scalability, whereas R is much more specialized (and many tech pros complain about its scale issues). Although Python is clearly enjoying significant gains among data professionals, its rise doesn’t mean that R is slated for the dustbin of tech history. In fact, if you’re interested in data science as a profession, learning both is the best move you could make.

“Combining R and Python is both reasonable and feasible,” Enriko Aryanto, the CTO and a co-founder of the Redwood City, Calif.-based QuanticMind, a data platform for intelligent marketing, told Dice earlier this year. “We run them both in our data science platform internally. But if I were starting my career all over again today, I might consider focusing on Python rather than R. It’s a more-general language with broader applications.”

Although R doesn’t have the same audience as Python, it’s also firmly entrenched in existing initiatives, particularly academic ones. But for those tech pros who already know Python (or who are learning it), and who are interested in learning more about data science, the bigger language can provide an entry point. (And meanwhile, the job market for Python developers remains extremely healthy.)

2 Responses to “What’s the Best Programming Language for Data Science?”

  1. Steven L Scott

    The thing about R is that it was built from the ground up for data analysis. Python was built to be a general purpose programming language, so the data analysis bits feel bolted on, and in many cases amateurish. There is one standard way to fit a linear model in R, but there are at least six libraries for handling linear models in python, none of which is as good as the base R solution. Python gas much to recommend it when it comes to tool building and general purpose data munging, but as a statistics package it us clearly inferior to R.