If you’re a Python programmer who hasn’t encountered NumPy, you’re potentially missing out. NumPy is an open-source Python library for scientific and numeric computing that lets you work with multi-dimensional arrays far more efficiently than Python alone. It’s probably one of the top five Python packages, and there have been a couple of books written about it.
Here are five reasons why you should know NumPy:
- It’s fast
- It works very well with SciPy and other Libraries
- It lets you do matrix arithmetic
- It has lots of built-in functions
- It has universal functions
For the purposes of this article, I installed NumPy on Ubuntu Linux. You can run Python on Linux, Mac or Windows, but Linux feels the most natural. The command for installing NumPy plus SciPy, iPython and some other related packages on Ubuntu and Debian is below; it’s a 1GB download (or as I call it, a coffee break):
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
NumPy is written in C, and executes very quickly as a result. By comparison, Python is a dynamic language that is interpreted by the CPython interpreter, converted to bytecode, and executed. While it’s no slouch, compiled C code is always going to be faster. But just how much faster?
Note that, when doing any kind of performance benchmarking, Python 2 programs usually run faster than Python 3 by 5 to 15 percent. In the examples shown below, I’ve used Python2; I also installed IPython, a better shell than the standard Python. This lets you run and time Python scripts with this command:
%time %run scriptname.py
In this somewhat trite example, I run the sum function on a list generated using range(), which produces 10 million numbers from 1,2,3 up to 10,000,000. The total is 49,999,995,000,000, and it takes 4.26 seconds.
count = 10000000 y = range(count) print sum(y)
The NumPy version using arrange(), its own range equivalent (seen below), takes 1.67 seconds. Note that we’re comparing it against a built-in function, so expect much better gains against Python code:
import numpy as np count = 10000000 x = np.arange(count) print x.sum()
The arrays are held in memory as contiguous blocks that are all the same size and type. This allows not only fast access but lets different-size arrays be used together. Python relies extensively on lists, general-purpose containers that are easy to use but can contain objects of different types. Python loops are slower than C loops.
It’s Used in SciPy and Many Other Libraries
There are many libraries that use NumPy, though a few are usually bundled with it: SciPy, MatPlotLib, pandas, sympy and nose. NumPy and SciPy in particular are two sides of a coin. Historically, NumPy was formed from two packages, so it contains not just the ndarray type and array manipulation functions but the numeric functions, as well.
NumPy contains quite a few linear algebra functions, even though these should be in SciPy. Plus SciPy offers more fully featured versions of the linear algebra modules, as well as many other numerical algorithms. If you are doing scientific computing with Python, it’s best to install both NumPy and SciPy.
Another difference is that NumPy is all C, whereas much of SciPy is a thin layer of code on top of the scientific routines that are freely available at Netlib in C and Fortran.
It Lets You Do Matrix Arithmetic
Certainly relevant to linear algebra, NumPy’s ndarray lets you do dot product and inner product of two matrices as well as matrix product and raising a matrix to a power. It can solve tensor equations and three different types of matrix inversion.
With version 1.8 (NumPy is at version 1.11) comes the ability to perform several linear algebra routines on multiple matrices stacked into one matrix.
It Has Lots of Built-In Functions
A full list would be too long, but suffice to say there are functions for financial calculations, indexing, linear algebra, math functions, polynomials, random sampling, statistics, binary, logic, sorting, searching and string operations. A list of 217 examples demonstrating most of the NumPy functionality is available on GitHub.
Here’s a variant on the permutation function example:
from numpy.random import permutation print permutation(6)
When run, it gives one of the possible 720 different permutations of the digits 0-5. So after two runs:
[0 5 3 2 1 4] [3 1 0 2 4 5]
Universal functions, also known as ufuncs, are functions applied to every element of the input array, with the result stored in the corresponding output array of the same size.
A useful feature associated with universal functions is array broadcasting, which is how several arrays of different shapes and sizes can all end up used in a function; the shapes are adjusted with ‘1’s so they all fit into the shape of the largest.
Once you’ve mastered all of this, get ready to move on to SciPy! Seriously, though, once you’ve learned about the ndarray type, you know half of NumPy. But knowing all of it is important if you want to become a Python developer.