Main image of article Using Python for High-Performance Frameworks

A common programming paradigm these days is asynchronous programming. Whether your software needs to run on the web or not, you’ll want to create software that scales well, whether to multiple cores or even multiple servers. Asynchronous programming is an important aspect of scalable, optimized software. A fundamental issue in asynchronous programming is that you don’t write code that blocks; at a basic level, that means your code can perform one task while you’re waiting for another task, such as I/O, to complete. Here’s an example. Suppose you perform a write to a database. Without asynchronous programming, you might just write code that does the write; the following line executes after the write is complete, like so:

db.write(data) print 'Finished'

With synchronous programming, the print line doesn’t run until the db.write is finished. That means that when the second line runs, you can expect that the data has been written to the database, and any response from the database is present. But what happens while the write is actually taking place? In single-core, single-thread computers, the computer may be devoted to the actual process of writing. But in today’s multicore computers, which typically allow for two threads per core, many cores may well be sitting idle, doing nothing as the write is taking place. Why not take advantage of that time? With a language such as C, you would need to spawn a thread to do so. But with the help of modern languages and libraries, you don’t need to worry about threads, as that’s handled for you. Instead, you can just write code. Python supports such asynchronous code. In an asynchronous approach, the second line in the above code will execute immediately after the write is initiated, and possibly (even likely) before the write is complete. Of course, going down this route requires some care on your part. For example, if you’re doing a read from a database, with synchronous programming, you can do a read on one line, and in the following line assume either the data has been read, or perhaps an error occurred. But with asynchronous programming, you can’t make such an assumption. Instead, you have to write a separate function that runs after the database read takes place. Meanwhile, the lines following the database read may run before the read is complete. In other words, you’re now dealing with two separate sequences of code, and you can’t be sure which one runs first. Instead, you have to write your code without assuming that one runs before the other. Recent versions of Python provide native support for such programming techniques. But even then, it can get complex. That’s where libraries and frameworks can help. This is especially true in the case of web development. When a web request comes in, you might have to do a database lookup to authenticate a user, and then a database lookup to get the user’s data. While such work is taking place, you could have another request come in. With asynchronous programming, you can handle the next request before the first request is finished. The key to asynchronous programming and frameworks is understanding that the framework includes an underlying mechanism whereby chunks of execution get queued up. At the heart of the framework is what’s called an event loop. As you provide code that runs in response to an I/O event, that code gets queued up. But this isn’t precisely a queue by strict definition, since the framework will then run that code only after the I/O operation completes. Meanwhile, other code chunks can run. As of version 3.4, Python’s standard library includes an asynchronous framework called asyncio. Some web frameworks also support such asynchronous approaches; others do not. We often call the former non-blocking frameworks, and the latter blocking frameworks. Without a doubt, if you’re doing web development, Tornado is the most popular framework that supports asynchronous programming. Next: What About Non-Web? (Click below)

shutterstock_101820334

What About Non-Web?

Not everybody does web development. Suppose you need to do some serious number-crunching and your algorithms function in parallel. Why not make use of all those cores? While this is an exciting area to be working in, utilizing Python in this context presents some difficulties. If you’re interested in exploring multicore programming, what I recommend is an approach likely to make people unhappy: First, learn the basics of parallel programming using a language such as C++, combined with tools that support multicore programming (such as OpenMP). Once you’re familiar with the ins and outs of multicore programming, you can start work in Python, recognizing some of the difficulties involved and possible ways to overcome them. For a fantastic discussion on those challenges and workarounds, check out Nick Coglan’s article on multicore python programming. When it comes to high-performance computing within the context of Python, a technology called SIMD Single Instruction, Multiple Data) is having a big impact; it refers to a processor’s ability to perform a computation on an array of numbers with a single assembly-language instruction. SIMD, which is not new by any means (it was used in the 1970s), is also known as vectorization, because you’re operating on vectors of data. If this is a topic you’re interested in, then check out this blog about PyPy, as well as this blog about IPython and NumPy.

Conclusion

At present, much high-performance programming is being done in C++ because of the closeness to the processor and the ability to spawn threads across cores. But higher-level languages are catching up. Python now has asynchronous programming, and there are ongoing efforts to help you write multi-core and vectorized programming in various dialects and implementations of Python. Explore a bit, and you might be pleasantly surprised by what Python can offer.