Here are some random things about the JVM that you might (or least I didn’t) not know:
- boolean is represented as a 4-byte integer internally and treated as such in all bytecode-level operations (method parameters can be specified as being booleans)
- Often when a NullPointerException is thrown, the JVM actually has access to the method that was being called, but the NPE is generally thrown in the calling method, rather than the called method.
- Reflection is really inherant to how the JVM is specified and operates; all methods are located by {class,method name,method parameters} dynamically
- The JVM truly has no concept for the Java language; String and Object are for the most part the only classes treated specially
- The JVM is stack-based rather than register-based
Some cool things about C# (.NET as a whole?):
- C# has the notion of Properties, which let clients of your class access members “directly” while at the same time allowing for change to underlying implementation (as they are actually getter/setter methods)
- C# allows for the notion of first-class methods, which it calls delegates. This lets you avoid defining a whole interface to get access to a single generic method.
- C# has first-class support for firing events, which makes use of the delegates (not sure if it has to or not, I’m still learning this stuff)
- C# requires you to use override prefix for a method if it is overriding a parent method, which makes it clear to readers of that class that this is in fact overriding something.
I learned these little tidbits while learning C# by writing a JVM in that language. I am fascinating by emulators and VMs because they are software that represents something that is well-defined (http://java.sun.com/docs/books/jvms/) . So in a sense the (naive) implementation is fairly straightforward, if tedious. (I also learned that you can actually write a non-trivial C# application by effectively writing Java and renaming it with .cs and doing a few basic replacements… or just about.)
This project would not have been at all possible without the GNU Classpath project, which has worked tirelessly to implement all of the standard Java classes as well as reference implementations the classes needed to tie to an actual Virtual Machine implementation. I have not gotten JNI to work yet, so for the time being I am implementing the native stuff directly in C# (which makes sense actually as that is my JVM implementation language and is sitting “below” the JVM).
Both Classpath and the official Sun jre implmentation (which is now open source as OpenJDK) provide real world implementations of that stuff you have not looked at since university (Hashtables, Linked Lists, etc) in a fairly readable format. And because they are real world, they offer glimpses into the optimizations and workarounds that have to be done to make these data structures work in the real world.
There is also a project called IKVM which is a very complete .NET-based implmentation of the JVM as well as the class libraries which allows for .NET applications to actually execute Java classes. I think it includes a combination of GNU Classpath and OpenJDK classes inluding managed .NET implementations of the native methods. If I continue this project (not terribly likely, I don’t call it “ToyVM” for nothing) I will probably migrate to using that so I can focus on the internals of the JVM itself. When I started I just wanted to get going and I had issues with the version of Visual Studio .NET that I had and then could not get GNU Classpath or OpenJDK to build or work in cygwin. I actually used MonoDevelop to do the C# development (so I wrote a JVM in C# using a Linux-based .NET implementation running on x86 VM on top of Windows XP).
A couple of nights ago, I finally got the “Hello, World!” application to work after about 2 months of development on and off, not sure what the actual man-hours was.
I used C#’s event handling/delegate set up to do gather some runtime statistics for the basic “Hello World!” application and apparently it loaded 142 classes before it finally did the output. Most of these were not used obviously, but are part of the environment that is statically loaded by key classes (like Charsets).
Next steps:
- Make it look more like C# (Cxx languages tend to use GetBlah() rather than getBlah(), and C# supports Properties which I would like to make use of)
- Implement Garbage Collection (pluggable perhaps). I am curious about the various methods that are used and which are better in various situations
- Do some additional refactoring into additional namespaces
- Optimize the most heavily used bytecodes if possible
- See what breaks when I run things other than HelloWorld.class 🙂 I am very much a just-in-time developer so I only got the stuff working that were absolutely required to get Hello World to work. 84 byte codes have been implemented (out of the 107 that were encountered, but some load xload,xstore,if,if_icmp get reused with different initialization parameters)
- As mentioned before, look into integrating with the IKVM libraries so I can worry less about the native aspects of it
I have put the code into a local git repository, which is another tool that I have been wanting to play around with. I am happy to push that out somewhere, with the normal caveat that this is your typical homebrew code that is not as well commented as it should be (but is hopefully well structured enough for it to make sense).
Happy Coding.