R Risks Python Swallowing It Whole: TIOBE

Is the R programming language in serious trouble?

According to the latest update of the TIOBE Index, the answer seems to be “yes.” After managing to keep a place amongst the Index’s top 20 languages for the past three years, R has finally tumbled out.

R is a very niche language, used primarily in data analytics (and even then, primarily in a university and research context). So its fall “is quite surprising because the field of statistical programming is still booming, especially thanks to the popularity of data mining and artificial intelligence,” read the note accompanying this month’s list. “It seems that there is a consolidation going on in the statistical programming market. Python has become the big winner.”

Why is Python (which ranked fourth on this month’s list) winning big in data science? That’s a very good question, and TIOBE has a theory: “Statistical programming is finding its way from university to industry nowadays and Python is more accepted by the industry.”

That certainly makes sense: Python is widely taught in schools (ensuring a steady flood of Python-ready tech pros into the broader economy), and powers a hefty portion of most companies’ tech stacks, and so it seems inevitable that firms large and small would come to rely on it for the all-important work for data analytics.

(In order to generate its monthly rankings, TIOBE leverages data from a variety of aggregators and search engines, including Google, Wikipedia, YouTube, and Amazon. For a language to rank, it must be Turing complete, have its own Wikipedia entry, and earn more than 5,000 hits for +”<language> programming” on Google. This methodology, as you can imagine, has sparked its own share of controversy over the years.)

This isn’t the first time an organization has reported that Python is swallowing R whole. Way back in ye olden days of February 2018, a KDnuggets poll showed a slow decline in R usage in favor of Python among tech pros who utilized both languages; at the same time, a separate survey from Burtch Works revealed that Python use among analytics professionals grew from 53 percent to 69 percent over that same time two-year period, while the R user-base shrank by nearly a third.

“R has issues with scalability,” Enriko Aryanto, the CTO and a co-founder of the Redwood City, Calif.-based QuanticMind, a data platform for intelligent marketing, told Dice. “It’s a single-threaded language that runs in RAM, so it’s memory-constrained, while Python has full support for multi-threading and doesn’t have memory issues. When choosing a language, it all comes down to choosing what’s best to solve your problem.” And companies, wrestling with ever-larger datasets, really need something that can scale.

Of course, many researchers still prefer R for data-analytics work, and so the language seems unlikely to completely fade away anytime soon. Nonetheless, Python’s broad base seems to be giving it a sizable advantage when it comes to the language people choose to crunch their datasets.

10 Responses to “R Risks Python Swallowing It Whole: TIOBE”

  1. R is much easier to use for data analysis than python and it has a very large user base, so in my opinion the survey mentioned does not tell the whole story.

  2. Jeremy

    Re: the quote in the article — I wouldn’t say that Python has “full support for multi-threading”. At least in the most commonly used implementation (CPython), the interpreter has a Global Interpreter Lock which means that all execution of Python instructions is effectively serialized; so no matter how good your multithreaded Python code is, your multithreaded Python program will never be able to use more than one core’s worth of CPU power. There are work-arounds (such as spawning multiple processes rather than multiple threads, or writing some of your code in C or another language instead), but that still represents a rather serious language limitation (and one that is unlikely to go away soon, as people have been trying to get rid of the GIL for literally decades now, but have been unable to come up with a solution that doesn’t unacceptably reduce single-threaded performance)

  3. Ricardo Schifini

    You are coming to a wrong conclusion from a false premise. Python has risen in the TIOBE rankings, that is undeniable. But Python’s rise is not due to Data Science.

    Python is not exclusively used for Data Science.

    It is a language that is being used in many areas (backend, web, games, etc.) that have nothing to do with DS.
    This means that taking the TIOBE level to compare Python to R as a Data Science tool is wrong.

    Is there any other indicator that could discriminate Python used for DS from other uses?
    I think there is.

    StackOverflow tracks questions by tags, such as [r], [python]. But it also has tags for specific uses like [pandas], [numpy] and [scikit-learn].

    StackOverflow tracks the volume of questions for each one of these tags. As of May 2019, tag [r] has 288K+ questions, while [pandas] has 104K+, [numpy] 60K+ and [scikit-learn] 14K+. These are the numbers that should be compared.

    Python, as a Data Science tool has a lot growth to do before it reaches the level of R.

    • Michael Huang

      Well, you need to also add Tensorflow, Pytorch, etc. to that list…. There are way more tools than just those for the python data science stack.

      • Ricardo Schifini

        Let’s count those two:
        Tensorflow: 42K
        Pytorch: 3K

        Still not near R.

        Of course, adding up all these individual tags is overestimating the total posts related to Python use for Data Science. There are many questions that contain two or more tag combinations and should be counted once.

  4. Chuck

    “It’s a single-threaded language that runs in RAM, so it’s memory-constrained, while Python has full support for multi-threading”
    Wow! This guy is a real expert! R had support for parallelization of many flavors for over a decade. Both languages are interpreted (wtf he meant by runs in RAM). Please, get a clue.

  5. Matt Sandy

    Multithreaded applications in R can be done, but often times aren’t necessary. R also is fantastic because of packages, and some of those packages allow for processing to be done without loading everything into memory (ex: feather). Here is a tutorial I wrote a while back to load information into memory in parallel.

    https://rlang.io/importing-large-ndjson-files-into-r/

    Not only that, but R plays well with Apache Spark/Hadoop, allowing for fast (and smaller footprint) datasets to be loaded into memory AFTER map/reduce, so the in memory is pretty negligible. There is a reason you want data in memory.

    Python is fantastic. It is an excellent general purpose language. It is great for deep learning, scripting, game making, etc. R still feels a long way ahead of it for many things including exploratory analysis and general statistics.

    There also has been no real decline in R, https://stackoverflow.blog/2017/10/10/impressive-growth-r/, which you would expect if it was being gobbled up.

  6. Duncan Munslow

    Move along people, this is clickbait trash, written by someone who has no place making this kind of claim. If you don’t code, you shouldn’t be making statements like this.

    Anytime you compare Python vs R using a developer survey, R will be dwarfed by Python because it is not a data science/statistics specific language like R is. Python can be used for front end, web and many other applications that have nothing to do with data science, so it’s user base will naturally be much larger than R.

    Anyone who has used both languages for data science (i.e. Someone who actually codes, unlike this author) should understand that R is vastly superior in data science tasks, like visualization and data manipulation. There are reasons to put up with Python’s inferiority in these areas because it can be easier to put into production. This is mostly due to the fact that productionalizing code in any organization usually means dealing IT people who have most likely never even heard of R.