Possibly the cleverest thing that Sun ever did was create the JVM that underpins Java. JVM is short for Java Virtual Machine and it comes as quite a surprise to many Java developers to find out that the most commonly run JVM (Oracle’s HotSpot) is actually written in C++.
So what is a JVM and why is it needed? Portability for one, but also it allows Java code to be executed in a virtual environment rather than directly on the underlying hardware. Sun didn’t originate this technique — it was used in mainframes in the 1960s and on Apple IIs and later computers running the USCD Pascal system.
Writing portable software has always been problematic because the different processors have different instruction sets. Processors have varying word sizes, different endianness and some processor architectures use registers instead of stacks.The C language was invented to be a portable high level assembler but moving a large program between different processors and operating systems was still not that much easier.
Once the JVM is written for a processor, Java programs can run on it without having to worry about the portability considerations. Running in a virtual machine also offers other benefits, such as safeness. It’s not difficult to write a program in C that crashes through trying to access memory addresses that it shouldn’t. It’s even possible to crash a computer and need a reboot with C.
Java programs cannot do that; they are compiled into bytecode, an intermediate language. When a Java program is loaded by the JVM, the bytecode is verified so nothing untoward happens. Instruction decision branches are guaranteed to jump to valid locations, data is properly initialized and references are always type-safe. Plus access to private or package private data and methods is tightly controlled.
The compiled bytecode is stored in .class or .jar files. These get distributed and will work on any environment where the JRE (Java Runtime Environment) exists. The JRE includes the JVM plus a collection of Class libraries that are distributed in bytecode.
Having loaded the bytecode into memory, the program can now run. The older method of doing this is interpreting, which is how the GNU Compiler for Java works. It calls code to interpret each bytecode and evaluate expressions. To get better performance, the bytecode is compiled into native machine code. This is called Just-In-Time (JIT) compilation as it’s only done as the code is required.
There’s a tradeoff between the degree of optimization the JVM can achieve and how long it takes to start execution of the code. It’s just not acceptable to take 10 minutes optimizing the code before running it. Some JVM’s, like Oracle‘s HotSpot, monitor performance and can improve execution speed through targeted optimization, better it’s said in some cases than even hand-coded C or C++.
Another benefit of Virtual Machines is that memory access can be tracked, essential for garbage collection. A problem for un-managed languages like C and C++ is the need to request and then release memory. There are a hundred and one different ways for those types of programs to leak memory.
With the JVM handling memory management, garbage collection can reclaim a variable when it is no longer needed. HotSpot uses multiple collectors to manage memory objects through their lifetime profile: a young generation, an old generation and a permanent generation of memory objects. For more on the intricacies of Garbage collection, I recommend this blog post as a good place to start.
JVMs are not just for Java
It’s not just Java programs that run the JVM. Developers of other programming languages have seen the advantages of managed code and languages like Scala, Groovy and Clojure have all appeared specifically designed around the JVM. Other languages including Ada, Basic, C, Cobol, Go, Pascal, PHP, Python and Ruby also have JVM targeted implementations.
Oracle claims that Java is installed on over 3 billion devices, so programs written in JVM languages are much easier to run on a given processor or operating system. Apart from the recent security problems with Java Applets running in browsers, the Java environment and JVM provide a very stable platform for software to run on.
Then there is Dalvik
Every Android smartphone runs a version of Java where the JVM is a register-based virtual machine called Dalvik, which was developed in a clean room environment by Google’s engineers. Following a high profile court case by Oracle against Google over the Java APIs, Oracle’s claims that the Java APIs were its Intellectual property were thrown out.
The APIs (Application Programming Interfaces) are defined in the JRE libraries and your software must work with them to use the phone’s features. The Dalvik bytecode is not the same as Java’s, so Android programs cannot run on other JVMs.
In many ways the .NET environment is similar to Java. The CLR (Common Language Runtime) is the equivalent of the JVM. Both understand their own type of Bytecode — It’s called IL (Intermediate Language) in .NET. The .NET code is also compiled Just-in-time, runs in a managed environment and is garbage collected.
Without the JVM there’d be no Java, and all of the other languages based on it would be different — at the very least, requiring a little more attention to detail, especially in relation to memory management. So you should care about the JVM.