Why It’s Worth Learning the R Language

shutterstock_279555938

If you’re a developer, chances are good you’ve heard about the programming language R, used for statistical computing and graphics. Although nowhere near as ubiquitous (or popular) as programming languages such as Java and C++, R has nonetheless maintained a steady rate of adoption, remaining ahead of other specialized languages such as PL/SQL (and sometimes MATLAB) in TIOBE’s monthly rankings.

R is open-source and multi-platform (for Windows, Mac and Linux) and comes as a small download (just 62 MB for the Windows installer). R is not only a programming language, but also an IDE. It can handle data, and has a REPL that lets you run programs and perform calculations.

R releases, like Ubuntu ones, come with a name. The current release, 3.2.3, is “Wooden Christmas-Tree,” and came out about nine months after 3.1.3 (a.k.a. “Smooth Sidewalk”).

The language’s extensibility through packages is one thing that makes it great. These packages are written in R (though Java, C, C++ and Fortran can also be used), and obtained from CRAN (Comprehensive R Archive Network). CRAN lists 7,871 packages currently, with over 40 available just for Bayesian methods alone.

An important link on CRAN is Task View, which shows packages categorized by purpose. This gives a good feel for the range of tasks that R is suited for, and even includes High Performance Computing.

RStudio and Other IDEs

A decent IDE makes all the difference; with R, the default is a little basic. Microsoft has taken a big interest in R, and a plugin for Visual Studio will be forthcoming. There’s an open-source IDE named RStudio, also available with a commercial licence and support. I liked RStudio, which features four windows, tear-away editing (handy with second monitors), and support for SVN and/or Git; it’s pretty full-featured, reminding me of Eclipse but much snappier.

RStudio also provides an open-source server for making use of more powerful computing resources. Check out the free cheatsheets.

Given R’s background, which is originally Unix/Linux, you can also use Vim with Tmux or the cross-platform, open-source RKWard as well as Eclipse. JetBrains doesn’t yet have an R IDE, but it must be only a matter of time.

If you have another IDE installed, you may have a conflict of locations and need to update the R_LIBS_USER environment variable. I had to do so; this StackOverflow question addresses that very concern. If you do an update.packages() to bring packages up to the latest on Windows, and it fails because it can’t write to the Program Files path, you may have to close the IDE down and rerun it as administrator.

R the Programming Language

R may be a little idiosyncratic compared to normal programming languages, but given its relatively narrow use-case, perhaps that’s understandable. Assignments use the <- symbol as in:

f <- c(1,1,2,3,5,8)

That populates the first 6 numbers in the Fibonacci sequence into the vector f. The “c” is a built-in function to create a vector from a list of data items. There are no individual (i.e. scalar) variables, though you could have a single vector value. In addition to vectors, which have all the same type for all elements, there are lists where every element can be a different type.

How Do I use R?

The easiest way is via REPL: You can do simple arithmetic, fetch any package, navigate through the filesystem, run R files and so on. Create R scripts in a .R file and you can load and run them.

To show an example of how easy it is to plot graphs, check out this example from this R visualization guide. I pasted it into the R console. The guide neglects to mention the first step, which is installing the package; just do install.packages(packagename). The library he used is RColorBrewer:

install.packages('RColorBrewer')

Here’s the code that powers the example:

data(VADeaths)

par(mfrow=c(2,3))

hist(VADeaths,breaks=10, col=brewer.pal(3,"Set3"),main="Set3 3 colors")

hist(VADeaths,breaks=3 ,col=brewer.pal(3,"Set2"),main="Set2 3 colors")

hist(VADeaths,breaks=7, col=brewer.pal(3,"Set1"),main="Set1 3 colors")

hist(VADeaths,,breaks= 2, col=brewer.pal(8,"Set3"),main="Set3 8 colors")

hist(VADeaths,col=brewer.pal(8,"Greys"),main="Greys 8 colors")

hist(VADeaths,col=brewer.pal(8,"Greens"),main="Greens 8 colors")

VADeaths is one of the built-in data sets that comes with R. It’s the death rates in Virginia circa 1940. To see the other built-in data sets, type in data() at the console. To select a set, use data(data set name), e.g. data(Seatbelts).

The par function (second line) is used to set or query graphical parameters. Here with the mfrow parameter, it’s used to draw the histograms in row order first; change it to mfcol to draw it in column order first.

The hist function plots a histogram. You can see the documentation for this by typing help(RColorBrewer), which opens a page in your web browser using its own locally-hosted website. It’s pretty decent documentation, as it shows what you need to call, the parameters, and examples you can try.

Conclusion

There’s a lot of usefulness to R, and I’ve probably only covered the tip of the iceberg here. The platform is an example of open source at its best, but it has something of a learning curve. If you’re starting out with R, I recommend the Cookbook for R as a nice, easy start.

Image Credit: autsawin uttisin/Shutterstock.com

Comments

One Response to “Why It’s Worth Learning the R Language”

April 05, 2017 at 7:06 pm, Gaurav Jain said:

I couldn’t agree more, R is indeed a great language for data science.
I found this tool for R visualizations its pretty cool.

Reply

Post a Comment

Your email address will not be published.