Threads are a useful way to get more out of your CPU. Technically known as a “thread of execution,” a thread is the smallest possible sequence of programmed instructions handled by a scheduler. Multiple threads can power a single process, although all must share memory resources. The processor cores on your PC run threads; these cores have registers, small bits of data that can include a current executing instruction address (for example). When a core switches to another thread, it’s called a context switch: the thread’s current state is “saved” (i.e., stored) while another state is retrieved or resumed. Many applications are single-threaded, running on one core. My PC's CPU, an Intel i7-5930K, has 6 cores—a bit like a V6 engine in a car, metaphorically speaking. Each core can also run two threads at the same time, via a process known as
hyper-threading. If I build an application that just runs on one thread, I’m using 1/12th of my CPU’s potential (i.e., 8 percent). Multi-threading means writing your application to run on many cores, and on one or more threads per core—when done well, it can result in a big difference in performance. That being said, it's not the easiest thing to do. For example, I once wrote a C++ program to evaluate poker hands, reading in five-card hands from a text file with a million randomly generated hands; it took about 3 micro-seconds to evaluate each hand, so a million hands took just over three seconds, and that was running single-threaded. When I made it multi-threaded, it took 4 seconds, because the million context switches added a second! After slowing hand evaluation down by adding a single millisecond delay per hand, it now takes 35 seconds to process 20,000 hands when running single-threaded—but the multi-threaded version is much quicker, taking 3.5 seconds in total. The 12 cores together run the application about 10 times faster. The lesson is that the time needed to do context switches becomes less important with longer-running applications. Multi-threading is useful for background computations and updating the Graphical User Interface (GUI) while fetching or sending data to slower devices such as network cards, disk drives and so on. With all that being said, thread programming presents some real challenges. Here are five:
Shared Access to Data
If two threads access a shared variable without any kind of guard, writes to that variable can overlap. Let’s say both threads add 1 to the same memory location; they do this by reading the value in the memory location into a register, incrementing it, then writing it back. Thread 1 (T1) reads the value (let’s say ‘0’ initially), increments it, and writes 1 back. After T1 has read it, but before it writes the value back, T2 reads the 0 value, and then writes 1 back after T1. So despite two increments, the value is 1, when it should be 2. The access has to be atomic, i.e., done in one operation, in order to prevent overlapping writes. This can be done in a number of ways. The .NET framework has an Interlocked Class with atomic increment and decrement methods; Java has java.util.concurrent.atomic.AtomicInteger with increment and decrement methods.
Locks Can Cause Performance Issues
A lock instruction prefix applies to certain CPU instructions that read-modify-write memory; for instance, INC, XCHG, and others. The core has exclusive ownership of the appropriate cache line, which is typically 64 bytes long. If the core knows that memory address is in a cache, then it reads from that instead of memory, which is much slower (it reads in 64 bytes at once from the cache; this is known as the cache line). The lock can also stop overlapping increments (as detailed earlier) and the CPU executing instructions out of order, as well as provide atomic access. If another processor is accessing the same cache line, then it can lead to a situation called
False Sharing, which impacts performance. That's more of a problem for C/C++ code than .NET or Java. There are other locking mechanisms; the following code from
Wikipedia shows C# code doing a lock to each call of Account.deposit() and to Account.Withdraw():
class Account { // this is a monitor of an account
long val = 0;
object thisLock = new object();
public void deposit(const long x) {
lock(thisLock) { // only one thread at a time may execute this statement
val += x;
}
}
public void withdraw(const long x) {
lock(thisLock) { // only one thread at a time may execute this statement
val -= x;
}
}
}
One way to avoid performance problems with locks is lock-free programming. For example, the University of Cambridge’s Systems Research Group provides libraries of concurrent safe lock-free object-based software transactional memory, multi-word compare-and-swap, and a range of search structures (skip lists, binary search trees, red-black trees).
Exceptions in Threads Can Cause Problems
In .NET and Java, exceptions in threads must be handled by exception handlers within the thread code, or the application will close down. Both .NET and Java can catch unhandled (“uncaught,” in Java lingo) exceptions that come out of a thread via a special handler, which lets you log the exception. If that happens, your application is still likely to be in a bad state.
Background Threads Need Care When Updating a GUI
Performing any non-trivial processing in a GUI thread is likely to make it unresponsive; best practice is to process in another thread. Background threads (or .NET background workers) can run tasks in the background, but they need some care when updating the GUI thread. In the pre-Tasks era in .NET applications, you could perform this in WinForms, using Invoke on a control such as a label on a form.
string Text = "Some value";
form.Label.Invoke((MethodInvoker)delegate {
form.Label.Text = Text;
});
Xamarin, C# on iOS/Android supports InvokeOnMainThread(), which runs the Lambda expression in the main thread:
InvokeOnMainThread(() => Label.Text = Text;
A Way to Avoid Many Thread Problems
Many threading problems center on accessing or sharing data between threads. One way to avoid this is to use a messaging system, which provides a robust way of storing and delivering messages between two endpoints; these could be two parts of the same application or two applications running on different networked machines. After sending, the message is stored by the messaging system until it can be delivered. For instance, the open-source
RabbitMQ messaging broker, written in Erlang, lets you send tens of thousands of messages per second. (If you want to see what’s available, try this list of ten open-source
messaging libraries.) Windows has a component,
MSMQ, which likewise provides a messaging service. Creating a message is as simple as this code below:
var srm = new SendReceiveMessage ();
If (srm.CreateQueue(@“Private$\Test1”,”Test”)) {
If (srm.SendMsg(“Hello World!”) {
Console.WriteLine(“Message sent ok.”);
To get this to compile, you will have to add a reference to System.Messaging in the Solution references:
using System;
using System.Messaging;
namespace SendMessage
{
[Serializable]
public sealed class SimpleMessage
{
public TimeSpan LifeInterval { get; set; }
public DateTime BornPoint { get; set; }
public string Text { get; set; }
}
class SendReceiveMessage
{
MessageQueue messageQueue = null;
public Boolean CreateQueue(string queuename,string Description)
{
if (!MessageQueue.Exists(queuename))
{
try
{
MessageQueue.Create(queuename);
}
catch (Exception ex)
{
// do something with exception
return false;
}
}
messageQueue = new MessageQueue(queuename);
messageQueue.Label = "Test Queue";
return true;
}
public Boolean SendMsg(string messagetext)
{
var m1 = new SimpleMessage();
m1.BornPoint = DateTime.Now;
m1.LifeInterval = TimeSpan.FromDays(7); // A week to deliver
m1.Text = messagetext;
try
{
messageQueue?.Send(m1); // only call if queue exists
return true;
}
catch (Exception ex)
{
// do something with exception
return false;
}
}
}
}
Conclusion
Consider using Tasks instead of threads (if available). Tasks typically use Threadpools and manage all the threads for you. For example, the .NET
Task Parallel Library (TPL) uses a Threadpool, and you never worry about threads at all. Given that threads have an overhead of about 1 MB of RAM when first created, a Threadpool reuses threads as needed without demanding explicit thread creation (an
ExecutorService in Java does a similar job of managing a pool of threads).