Become a fan of Slashdot on Facebook


Forgot your password?
Programming Books Media Book Reviews IT Technology

Virtual Machine Design and Implementation in C/C++ 240

wackybrit writes: "The concept of the virtual machine is one of the most important concepts in computer science today. Emulators use virtual machines, operating systems use virtual machines (Microsoft's .NET), and programming languages use virtual machines (Perl, Java)". Read on for his review of Virtual Machine Design and Implementation in C/C++, an attempt to examine and explain virtual machines and the concepts which allow them to exist.
Virtual Machine Design and Implementation in C/C++
author Bill Blunden
pages 670
publisher Wordware Publishing
rating 9
reviewer Peter Cooper
ISBN 1-55622-903-8
summary An in-depth look at virtual machines, assemblers, debuggers, and system architecture in general.

Virtual machines are, in effect, a software model of a whole system architecture and processor. They take in bytecode (formed of opcodes, operands, and other data) and execute it, much in the same way a real system executes code. Running these operations in software, however, gives you more security, and total control over how the system works.

Virtual machines are popular for a number of reasons. The first is that they give programmers a third compiler option. You don't have to either go the dynamic interpreted route or the static compiled route, you can compile for a virtual machine instead. Another is that virtual machines aid portability. If you compile your code for a virtual machine, you can run that binary on any system to which the virtual machine has been ported.

Few books have been written on virtual machines, with only a few Java Virtual Machine titles available. Virtual Machine Design and Implementation by Bill Blunden is therefore a landmark book for anyone with an interest in virtual machines, or even system and processor architecture as a whole.

What's to Like?

Blunden makes sure to cover every topic related to virtual machines in extreme depth. The beauty of this is that you're not left in the dark, but that experts can simply skip sections. The book is well divided up, and off topic rants or notes are clearly marked with dividers. This is an easy book to read, even though it runs to some 650 pages.

To lead the reader through the entire production of a virtual machine, Blunden showcases the development of his own 'HEC' virtual machine (HEC being one of the fictional companies in 'CPU Wars'). Initially he starts slowly, and introduces the reader to how CPUs work, how memory works, how paging works, and how almost any other system process you can imagine works. Nothing is missed out. Multitasking, threads, processes, porting.. he covers it all. This is excellent for those new to some of these topics, and makes this an advanced book that's actually quite readable by someone with a modicum of computer science experience.

After laying down the foundations for the design of the virtual machine, the actual development starts in Chapter 3. All of the code in this book is in C or C++, and nearly all of the code is talks about is actually printed on the right pages in the book. No more flipping between code on your computer and the book, it's all just where it should be!

Further on in the book, a number of extremely advanced concepts are introduced, but even these need not be out of the reach of an intermediate programmer. Blunden presents the most vivid insight into how assemblers and debuggers are created, and the book is worth it for this information alone.

Another important thing about this book is that it looks at creating a register based virtual machine. Stack based virtual machines are covered, but the author makes a compelling argument for using registers. This makes a refreshing change from the Java Virtual Machine books that ram stack based theory down your throat. It's also useful if you're interested in the Perl 6 'Parrot' project, which is also an in-development register based virtual machine, and bound to become rather important over the next few years.

What's to Consider?

Virtual machines aren't for everyone. If you're a high level programmer working with database apps, this isn't really for you. This book is primarily for system engineers, low level programmers, and hobbyists with an interest in compilation, assembler, and virtual machine theory.

This is not a book for beginners. You need to have a reasonable knowledge of C to understand the plentiful examples and source code in the book. C++ is also useful, although OOP is clearly explained, so even a standard C programmer could follow it. That said, this is an excellent book for intermediate programmers or computer science students, as a number of advanced topics (garbage collection, memory management, assembler construction, paging, token parsing) are dealt with in a very easy to understand way.

The Summary

Released in March 2002, this book is extremely up to date. This is good news, as virtual machines are clearly going to take up a good part of future compiler and operating system technology, and this makes it important to learn about their construction and operation now. These technologies are already in the marketplace; Microsoft's .NET, and JVM, for example. Perl 6's 'Parrot' is also going to become a big player, with languages like Ruby, Python, and Scheme being able to run on it in the future.

Whether you want to learn about system architecture, assembler construction, or just have a reasonably fun programming-related read, this book is great.

Table of Contents
  1. History and Goals
  2. Basic Execution Environment
  3. Virtual Machine Implementation
  4. The HEC Debugger
  5. Assembler Implementation
  6. Virtual Machine Interrupts
  7. HEC Assembly Language
  8. Advanced Topics

You can purchase Virtual Machine Design and Implementation in C/C++ from Slashdot welcomes readers' book reviews -- to submit yours, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.

Virtual Machine Design and Implementation in C/C++

Comments Filter:
  • Mainstream operating systems which drop support [] for the virtual machines?
  • Virtual Machine (Score:2, Interesting)

    by JohnHegarty ( 453016 )
    Can anyone give me a substantial difference between a virtual machine, and an emulator...

    because I can't see whats different between my mame and java virutal machine...
    • Re:Virtual Machine (Score:1, Informative)

      by Anonymous Coward
      a vm passes appropriate instructions directly to the CPU, while an emulator simulates each and every instruction.

      this is why (in Java, and VMWare) it's a VM, not an emulator.
    • Re:Virtual Machine (Score:3, Informative)

      by Wolfier ( 94144 )
      An emulator is a virtual machine with a pre-existing non-virtual counterpart.
      • Not always. That is, there have existed virtual machines for which there was no pre-existing non-virtual counterpart. However, emulators emulate something that was designed to be non-virtual, regardless of whether anyone bothered to fab one. :)
    • Re:Virtual Machine (Score:4, Interesting)

      by Anonymous Coward on Tuesday June 25, 2002 @10:45AM (#3762496)
      An emulator is a specific type of virtual machine. VMs don't necessarily reproduce a real machine, while emulators do. Also, many emulators reproduce the original machine down to the timing of instructions.

      For example, some of the nuclear power plants here in Canada are using or switching over to an emulator to run the plants because they are running out of spare parts for their 1972 control machines. Without the use of an emulator, they'd each have to rewrite shelves and shelves of assembler code.

      You can imagine that some of the code is timing critical, so the emulator must be exact down to the timing.
    • Re:Virtual Machine (Score:5, Informative)

      by ranulf ( 182665 ) on Tuesday June 25, 2002 @10:56AM (#3762569)
      Yeah, they're basically the same, but the distinction is fundamentally on the original intention.

      A virtual machine is designed specifically to be general and run in different environments, whereas an emulator is designed to emulate the environment of some existing hardware or software to trick software into beleiving that it genuinely is running on the original device.

      So, whereas a virtual machine will have a fairly abstract policy towards doing things (compare Java's AWT - I'd like to open a window, I'd like a button here, I'd like a menu there) and an emulator will get really bogged down emualting details, e.g. memory address $DFF180 changes the background colour.

      Both can be easily emulated by a state machine (hence why they come up in this book), however virtual machines can be made more efficient as they are intentionally abstract. e.g. in the JVM, you know what is code and what isn't, so you can translate blocks of code into native machine code and run that directly instead of interpreting every instruction. If you try that with an emulator, you'll come unstuck when you come across self-modifying code, or things that access memory mapped registers (e.g. on a 68000 the instruction mov d0,4(a0) offers no clue as to whether the write is to hardware or memory.

      Generally, you'll find that most virtual machine designs aim to reduce the instruction set down to a bare minimum. This allows a virtual machine (if it chooses) to effectively re-build the original parse tree and generate native code. However, emulators are generally trying to emulate CISC processors where as much is squashed into an instruction set as possible. Similarly, most virtual machines are heavily stack based, so as not to make any assumptions about register availability.

      • (...) virtual machines can be made more efficient as they are intentionally abstract. e.g. in the JVM, you know what is code and what isn't, so you can translate blocks of code into native machine code and run that directly instead of interpreting every instruction.

        Isn't that called JIT? Also, if I remember correctly, didn't the first version of Java come without this? (and were therefore unspeakably slow?)
        • Re:Virtual Machine (Score:2, Informative)

          by Java Pimp ( 98454 )
          Sort of...

          A Just in Time compiler will compile all the byte code to native code before it is executed and run it on the hardware. Performance hit at the start of each execution but ultimately faster than interpreting byte code. Note that the JIT is probably not the best optimizer so it still won't be as efficient as platform specific binaries. (among other reasons...)

          An optimized VM will recognize instructions or code sequences within the bytecode that can be directly mapped to native code and execute it directly on the hardware. Not as fast as JIT but faster than interpreting everything.

          Both are still slower than platform specific binaries but that's just the nature of the beast.

        • The JIT has since been folded into (well, at least Sun's VM) the HotSpot VM architecture, which can identify and natively compile bottlenecks ("hot spots") at run time. This is the same idea that the Crusoe chip/software uses, as well as HP's venture into dynamic interpretation/compilation where they showed that running a program in interpreted mode actually turned out *faster* because they had the benefit of dynamically profiling and optimizing it at runtime (e.g. dynamically unrolling loops and frequently followed branch paths).
      • Pipelining the KVM (Score:2, Informative)

        by BobLenon ( 67838 )
        Yea, so for a class project we took the kvm and (Java VM for embeded devices), and turned it into a pipelined architecture. It was very educational, but the practicality is lacking ... You at least need a 4 proc machine to be useful, as it was a simple 4-stage. But the speed was soo lacking.

        It was worthwile experience, though I do wish java was reg based. ;) .... as it was only a learning experince no big deal. By the end i could've written my java in assembly ;)
      • If you try that with an emulator, you'll come unstuck when you come across self-modifying code, or things that access memory mapped registers (e.g. on a 68000 the instruction mov d0,4(a0) offers no clue as to whether the write is to hardware or memory.

        Perhaps you should check uae-jit patch (which has been ported to basillisk and integrated into win series of uae) before concluding jit based 68s emulation is not practical.

    • A VM is basically nothing but a simulator for a machine that doesn't exist. This is qualitatively different from an emulator. While a simulator mostly pretends to do something, an emulator mostly just does it.

      For instance, if you were to make a Pentium emulator to run on a 486, then many of the instructions could be executed as-is by the hardware. Most register values could be stored in actual registers. And so on.

    • Can anyone give me a substantial difference between a virtual machine, and an emulator...

      Yes, an emulator simulates a piece of hardware that once existed, a Virtual Machine is an idealised machine.

      The difference is significant because many Virtual Machines have features that either cannot be supported in hardware or would be prohibitively expensive. This is the principle design difference between the original Java VM and the Microsoft .NET CLI. The original Java VM was designed so that it could be implemented as silicon gates, the .NET CLI could be implemented in Silicon if you really tried really hard and were a complete masochist but that was never a design goal.

      I don't know about the second Java VM, my interest in Java VM kinda died after it was clear Sun wanted no external inputs they did not control absolutely. I would guess that the redesign would be towards a more abstract representation that would allow for better JIT compilation.

      Strictly speaking .NET does not use a VM, it uses an intermediate compiler representation, however any turing complete representation could be called a VM. The distinction matters because you can compile .NET code down to the Java VM if you choose but going the other way would not be a great idea...

    • Can anyone give me a substantial difference between a virtual machine, and an emulator

      Others have commented on the theoretical differences, but I feel I should say something as to what distinguishes a VM from an emulator in practice. Virtual machines do not promote piracy because software is designed to run on virtual machines. On the other hand, an emulator is often written with unlawful redistribution of proprietary software in mind, even if it is wink-wink-nudge-nudge.

      because I can't see whats different between my mame and java virutal machine

      I find the most important difference between MAME and JVM that there is a much larger library of free software designed to run under JVM than under MAME.

  • thank you! (Score:2, Interesting)

    by jeffy124 ( 453342 )
    I must say I'm pleased to hear about this book. I actually would like to do something with VMs in my upcoming academic life (read: grad school), but am having trouble getting started, nor am sure if this is what i want to study. Every search engine out there returns everything Java for the phrase "virtual machine," which is not exactly what I'm looking for.
    • How to use Google (Score:1, Informative)

      by Anonymous Coward
      So you want more info on searching on virtual machines on Google, not using Java ?

      Search on : "virtual machine -java"

      It's simple & off topic.

      • yes, I'm aware of how to exclude certain tokens from a google search. Unfortunately, one mention of the word "java" in a document (like the review above, or the book's webpage), and the page is eliminated from the result set, leaving behind very little to work with.
    • Check out Intermediate Machines in the iServer or AS/400 context. It's the same type deal, and they've been in existance for almost 20 years now.

      Hope this helps.

  • Inside my virtual machine, where then I can run some sort of virtual reality program where I can interface with Eliza.
  • by eyepeepackets ( 33477 ) on Tuesday June 25, 2002 @10:38AM (#3762447)
    Some alternate titles for this tome might be:

    1. Reversi: C64 Speed on a Pentium IV
    2. Double Your Code, Halve Your Speed
    3. Real Men Don't Use Real Computers
    4. VM:Very Macho or Verily laMe
    5. Atari ST Rebirth: a 20 Year Reversal

    etc., etc.

    Ack, I'm turning into a crank! Oy.

    • by Anonymous Coward
      You forgot:

      Maintainable code
      Faster time to market
      Minimum breakage during enhancements
      Ability to easily port to other platforms

      If you owned a software business would these things be important to you? I don't think performance would be the primary concern.
      • Another use (Score:5, Interesting)

        by Erbo ( 384 ) <> on Tuesday June 25, 2002 @11:02AM (#3762599) Homepage Journal
        Another thing virtual machines have historically been used for is to assist in the development of new computers, by creating a perfect model of the new computer's hardware in software, so the microcode authors (and maybe the OS authors, too) can get their code working before the hardware's completely debugged. In this guise they're called "simulators," and there was mention of them in Tracy Kidder's seminal work The Soul of A New Machine. As the book says, "A simulator makes a slow computer, but a fast tool."

        Also, don't forget the UCSD P-System, which used a virtual machine to run code compiled in that environment. I know of at least one commercial product that used the P-System; I believe there were many.

        Virtual machines have been around awhile; they're an interesting field, made newly relevant by the ascendancy of environments such as Java and the MS CLR. I just wish I had a good excuse to drop $50 on this book...:-)


        • Re:Another use (Score:2, Insightful)

          by dgym ( 584252 )
          History isn't just in the past. AMD's next processor, codenamed the Hammer, will be the first x86-64 instruction set CPU. To kick start projects wishing to make good use of this 64bit extension to x86, AMD developed and made freely (beer) available virtual machine called SimNow [] over a year before the chip is due to launch.

          What I found particularly interesting was that this seemingly hopefull project was taken up so well that Simics [] thought it prudent to add x86-64 support to thier existing commercial multi-architecture simulator.

          The good news in all of this is that Linux and a fair few of the GNU tools are x86-64 ready now, well in advance of any x86-64 chips' release.
  • by Anonymous Coward on Tuesday June 25, 2002 @10:40AM (#3762465)
    One of the things that has surprised me about virtual machines ever since Java became a buzzword was that no one had ever thought to eliminate the relative performance penalty by implementing the VM as hardware on a PCI card (or a licensed chipset to put on the mobo. I can understand the portability implications of using VM's, and I'm glad that much work is being developed in this area.

    My question to anyone qualified to comment: Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such? I can imagine real performance benefits happening with such an idea...
    • Zuccotto and a few other companies have done just that...the JVM is an actual hardware chip. There were at least 5 companies doing the same at Java One
    • Hardware virtual machine, doesnt that defeat the purpose. I mean wasn't the whole concept developed so that the sofware could be MACHINE INDEPENDANT.

      It seems to me that the only application of such a piece of hardware was when you want to use code written in a certain language exclusively , like that semi-new Sharp pda with a linux based Java VM.
      • There's no reason that it can't be both machine independent and have an accelorator chip. You can get SSL accelorators, yet, I can still run SSL on this machine, which hasn't one.

        It would be useful for certain applications for the computer to have a (or multiple!) hardware Java chips to speed up execution of Java code. Java servlets, for example. Sun is hyping Java on the server side awfully hard. But they can be slow, especially when you have thousands going at a time.

        You could have this *and* a cross-platform VM. They're not mutually exclusive. Schmucks in Windows can still run Limewire with a setup like this on some machines.

        The Zaurus doesn't use Java exclusively. From my playing with them, Java (like elsewhere) is still very much as second-class citizen in Qtopia. PocketLinux, which is now defunct (no doubt because Java sucks- sorry, couldn't help it) was a PDA operating environment that ran on top of Linux that used Java exclusively.
        • Server performance has more to do with IO overhead than with execution speed. BTW. part of the coolness of java on the serverside is being able to use stuff like loadbalancing, caching, etc. in a more or less transparent way.

          Hardware acceleration makes sense on embedded platforms where there are not enough resources to properly do on the fly compilation and optization (basically a non issue on servers).

          As for your comments regarding pocketlinux, I suspect they were simply outcompeted by other pda oses who managed to produce a more usable os & apps. Users don't care about kernels, they do care about whether they can do stuff on a PDA: read mail, edit word documents, read pdf, manage agenda's, synchronize with popular pc applicatins, etc.
          • If you wish, then toss out the server example. There are times when better Java performance is desired. Including small devices and on the desktop. It's for those applications, regardless of where, such a co-processor would be nice. There are a fair amount of companies who are investing a lot into creating Java desktop apps, and many of them have suboptimal performance on today's machines. Sure, CPUs will get faster.

            There are also a few people doing math-based research using Java. It would be amazing to be able to use RMI and a shared class pool to do distributed processing. Who needs entire computers to farm out computations to, when you have a 4 Java CPUs on a PCI card?

            No, this kind of technology isn't required. We're getting along, using Java, without it today. But it could be nice for some applications.

            PocketLinux wasn't so much out-competed as it was abandoned. They weren't poised to compete. Like a lot of open source companies, they expected the community to really take interest and start churning out apps for them. A lot of the Linux-nerd-world isn't into Java, they're still using C. That's all good, but you can't write (GUI) apps for PocketLinux in C. They're market, initially, wasn't end-users. It was the same group of people who read Slashdot and now own Zaurus and Agendas.
    • Running a few tests with C# proves that in some cases C# is faster then C++. This is, however, the exception and not the rule. If we use "unsafe" C# with "pointers" (not _quite_ the same as a pointer in C), even graphics processing is a reasonably good speed. This is a speculative statement, but I theorize that for most applications it would cost more for a specific "VM as hardware" PCI card then it would be to upgrade from a 1Ghz Athlon to a 1.2Ghz Athlon because in many cases C++ is only 15-20% faster on average. Search the newsgroups and MSDN for some early performance comparisons.

      One final point: I've found that some graphics applications, even with "unsafe code", perform a lot slower then it's C++ counterpart. This may be due to a general lack of experience with graphics programming (the technical barrier of entry is lower), and the relative immaturity of the CLR. Remember, the JVM is a lot faster then it was in the late 90's.
      • The problem is that C# is a fully-VM language, and neither is Java. Both (C# more so than most Java VMs) compile to native code as the program runs and caches the native code. After awhile, all the performance critical paths (ie. the ones that are called more than once) are running directly on the hardware. The speedup you see with C++ is not so much the performance hit of running on the VM, but the hit due to less mature optimizers in the C# compilers and the overhead of various saftey features in C$. These fancy-smancy new languages are nice, but I think the real reason they're so popular is the great class libraries supporting them. If C++ had nearly as nice and unified a set of class libraries, it would have a significant advantage over Java and C#. C++ is a far more powerful language. It doesn't force the programmer to a particular way of doing things. As I was learning Java, I kept coming across statements that to the effect of "feature X could be misused by the programmer, so we chose to leave it out." I'm not five years old, and don't like being treated like one.
    • Actually, it has been attempted. Sun created a java chip, called picoJava []. There also is an ARM chip with a hardware interpreter for JVM bytecodes, Jazelle []. There are plenty [] of other [] examples [] of this.

      Nothing that sits on the mobo to supplement a 'real' CPU tho.

      Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such?

      I'm no hardware guy. But I have a wee bit of experience hacking on the Smalltalk virtual machine. I imagine that this is so because VMs are designed as VMs, not as a blueprint for hardware. To support an entire computer, I wouldn't be surprised if you had to add a lot more instructions than most VMs provide.
    • One of the things that has surprised me about virtual machines ever since Java became a buzzword was that no one had ever thought to eliminate the relative performance penalty by implementing the VM as hardware
      Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such?

      Ok here are some potential problems with the idea that I can think of:

      • [Specifically for Java:] VM is stack-based, and stack-based designs were tried couple of decades ago, and eventually were obsoleted by register-based CPUs. Memory-to-memory just isn't efficient way to do it, even with caches.
      • Do-it-on-software philosophy. "Native" CPUs are seen as CISC, in "too specific" sense, not necessarily size or complexity of instruction set itself.
      • [Java] General design of bytecode. Java bytecode probably wasn't really designed to be implemented in hardware (I think making it stack-based makes this obvious by itself but...), and as such there may be implementation problems regarding performance.
      • Too specific; market for these CPUs would be more limited than the market for generic CPUs. I'm not aware of a complete fully Java-based desk top system, so desk top systems would need to be hybrid ones, which leads to:
      • Complexity (and specificity) of h/w design, to support multiple non-symmetric CPUs. Which, in turn leads to:
      • Complexity of the OS that would make use of the add-on special CPU.

      However, there are niches where h/w implementation might well make sense; tiny mobile devices where performance is not so much the key but simplicity, and where ease of development is another strong selling point (for companies that develop s/w for such products). Being able to omit advanced JVMs is a plus, and performance may still be decent, if not stellar.

      • [Specifically for Java:] VM is stack-based, and stack-based designs were tried couple of decades ago, and eventually were obsoleted by register-based CPUs. Memory-to-memory just isn't efficient way to do it, even with caches.

        That's not true -- stack-based chips were dropped for other reasons. The modern stack-based chips are very fast indeed -- consider the X25, shBOOM, or P21.

        But I think you're confusing "stack based" with "memory to memory". Not all stacks are implemented in memory; an on-chip stack is very fast, and allows the CPU to operate at almost the full ALU clock, since there's no register access delay.

        Your other reasons are, of course, sufficient and correct.

        • The modern stack-based chips are very fast indeed -- consider the X25, shBOOM, or P21.

          Interesting. I probably should read something about those... I'm not a h/w specialist, but it's good to have basic knowledge and try to keep that up-to-date.

          And yes, I thought that caches generally were (always) implemented using main memory, so I did confuse the issues.

          By the way, where are the chips you mention usually used? For signal-processing? (I'm sure Google can answer that one)

          • P21 was used in a settop box for Internet access, RIP (too many settop boxes, too little demand). It's not exclusively for that, but I'm not aware of any other use.

            The shBOOM is now being marketed as a Java accelerator (it's called the PSC1000, by Patriot); another variant of it is rad-hardened and used in orbit.

            The 25x (sorry, I mispelled it originally), described at, is a new development from the guy who designed both the above chips, and isn't funded.

            I found a list of similar chips at Interesting stuff.

    • There have been Forth chips, there have been Java chips. I think that there have even been Basic chips. CDC once thought about using APL as the assembler language on one of their fancy machines (the Star), but as I recall they decided against it. (It might have made sense if the machine had been more designed for array processing, but it was instead a pipelined vector computer.)

      There have been lots of virtual machines that turned real. And the converse, of course. Just consider VMWare for one example.

    • It's not something to do for performance.

      Implementing a stack-based machine in hardware is straightforward, and has been done many times. The first one was the English Electric Leo Marconi KDF9, in 1958. Burroughs followed, and thirty years of Burroughs stack machines were sold. Java has a small implementation of the Java engine in hardware. Forth chips have been manufactured.

      But all these machines have used sequential execution - one instruction at a time. Nobody has yet built a stack machine with out-of-order execution. There's been a little research [] in this area. Sun's picoJava II machine has some paralellism in operand fetches and stores. But nobody has wanted to commit the huge resources needed to design a new type of superscalar processor. The design team for the Pentium Pro, Intel's first superscalar, was over 1000 people. And that architecture (which is in the Pentium II and III) didn't become profitable until the generation after the one in which it was first used.

      In the end, a superscalar stack machine probably could be designed and built with performance comparable to high-end register machines. For superscalar machines, the programmer-visible instruction set doesn't matter that much, which is why the RISC vs. CISC performance debate is over. But so far, there's no economic reason to do this. Sun perhaps hoped that Java would take off to the point that such machines would make commercial sense. But it didn't happen.

  • At the lab, we do simulations of nuclear bomb explosions, particle interactions, etc all the time. The "virtual events" are critical in making sure our equations are accurate and save a lot of resources and money vs actually exploding a bomb. However, keep in mind that the simulation is only as accurate as our knowledge of it. We don't actually gain new information from the simulation (new insight, yes, new information no).

    The same is true of virtual machines. Simulating how a computer might react to certain error codes and so forth is all right in small doses, but the only way to get real data is go out there and buy some actual hardware.

    Just my $.02.

    • I think you may have missed the point a little. Virtual machines are not necessarily used to simulate hardware. (Although they can be used for that. The N64 dev kit was an SGI Onyx running an emulator, IIRC)

      Plus, digital circuits are a little less complicated and better understood than nuclear explosions and particle interactions.

  • by Anonymous Coward

    The Implementation of the Icon Programming Language

    This book describes the implementation of Icon in detail. Highlights include:

    * Icon's virtual machine
    * the interpreter for the virtual machine
    * generators and goal-directed evaluation
    * data representation
    * string manipulation
    * structures
    * memory management

    Information on the Icon programming language itself can be found at
  • All of the code in this book is in C or C++, and nearly all of the code is talks about is actually printed on the right pages in the book. No more flipping between code on your computer and the book, it's all just where it should be!

    Practically all coding books do this, and I mostly find it a cheap way to poop out thick books and massive volumes... Not a measure of quality in any way.
  • by idfrsr ( 560314 ) on Tuesday June 25, 2002 @10:55AM (#3762567)

    I program in Java mostly right now, and so when people begin the usual 'vm is slow' crank I am curious about what they exactly mean.

    Programs written to run on vm's can be significantly slower due to the extra layer. Yet, if the design of the vm is done well enough (by perhaps reading this tome?) then the vm should be comparable. Certainly C is faster generally than an interpreted language. But there are native compilers out there than provide very comparable results, and the advantage of a language that forces careful programing. Here is the slashdot link []

    If adding layers to programs automatically makes them slower, and so slow that they are useless, we all would code in assembly.

    Good design is important. A badly written C program of which there are thousands, will be just as slow (read bad) as a badly written vm program.

    • Actually, C with all its aliasing problems (read: ponters), defeats all sorts of static analysis and other optimizations that could otherwise be done. e.g. you could do copy-on-write at the struct level, or you could rearrange memory for better locality. You just plain can't do that in C, because it enforces a very rigid ABI -- which may be dandy for some programs, like OS's, but makes you otherwise do all that nifty stuff by hand. No one does, ergo the C program is slower. And even then, one technique that optimizes well on one machine is likely to fail miserably on another.

      It's not so much the language as it is the runtime. CPU's don't really like to be micromanaged anymore, except by experts (again, like OS's). With a properly tuned runtime (like a good VM -- not saying Java is one), every program gains its benefit. C pretty much completely lacks a runtime layer completely, and the mismatch is starting to show.
      • Some compilers have introduced certain #pragmas to compensate for the aliasing problem. The SGI MipsPro 7.3 compiler, for instance. The programmer can simply declare whether a pointer is aliased. Of course, if you lie, the compiler will generate incorrect code.
    • If adding layers to programs automatically makes them slower, and so slow that they are useless, we all would code in assembly.

      Useless depends on the performance needed by the implementation. For example a sort routine in C may be fine for a database application, but in sorting visible polygons in an arcade game it may be too slow in which case there may be no choice except to implement that particular routine in assembly and interface it to the C program.

      Java may use a VM and be slower than C but it has taken hold in server-side programming where the network connection is the bottleneck rather than the application. Even if the server becomes heavily loaded it's cheaper to throw more hardware at it than rewrite it in something faster.

      It's all about how far you can get away with moving up the speed vs maintainability curve. It's for this reason we don't see any arcade games coming out written in Java, and why web designers will knock up a web site in PHP rather than write optimised C CGI/ISAPI/etc.

  • by Anonymous Coward
    It seems that everyone assumes that VMs these days (JVM, CLR, Mono, Parrot) must include garbage collection and not use pointer-based ops. Why is that? Knuth's MMIX VM is modelled after a traditional RISC CPU which modern compilers like GCC can target. C, C++, FORTH, Objective C can be targetted toward it out of the box.
    I think that VMs these days are getting bloated with everything including the kitchen sink. This makes them harder to port and test. Performance suffers. What ever happened to keep it simple stupid?
    • Good point. ' Guess that answers the question [] about the subtle difference between an emulator and a VM. VMs tend to include more high level concepts than emulators.

      Still, as much as (software) emulators emulate existing hardware there have also been several attempts to create "virtual" machines in hardware. (For example: P-code interpreters (low-level Pascal) and Sun's attempts to hard wire a Java VM.)
    • If the GC is considered part of the VM then it can be implemented in low-level code and be fast. If it is not part of the VM then most likely programs will have to implement GC (or at least C malloc/free) in the VM language, which would never be as fast and would be just a bug-prone as doing it now, and would also probably make assumptions about whatever memory allocation is provided by the VM that would prevent it from taking advantage of new system designs. It also allows services provided by the VM to create objects that can be manipulated like any other objects.

      Even the earliest VMs did garbage collection (take a look at Lisp which for some reason nobody has mentioned here yet).

      However it is true that this argument could be made for any feature added to the VM, but it does seem that using the VM design to get away from numerically-addressed memory is a natural division that most designers go for.

    • Ha! I think your question is answered quite well by this message [].

    • There are many virtual machines are aren't bloated. How large is the JVM? Just the JVM, not the library (jars, class files). I imagine it's pretty large. Just because Java is poorly done it doesn't mean all languages with VMs are as bloated. Smalltalk, which is by all measures a language with a huge library, can have a VM as small as 100k, but still get to all the standard libraries. Sure, you could target it to hardware, but if the language is well designed (like Smalltalk, not like Java) it's not as much of an issue.

      Not even the JVM includes anything near the kitchen sink. The libraries do. They're not terribly hard to port when all they do is interpret bytecodes.

      It's sad to see people with these kind of attitudes. In their minds, all virtual machine-based languages equals Java. Anything that's not compiled directly to native code equals QBasic. That's not the case.
    • Partially true. For example, Forth's VM avoids all that, and is VERY fast -- as a bonus, its VM is extremely easy to produce native code for (native code compilers are entirely compatible with others).

      Others have discussed why GC isn't as bad as you say; I agree with them, although they're a little extreme (it's NOT true that you always need GC).

      I'm working on a VM which can handle both GC'ed and non-GC'ed stuff at the same time, for a substantial speed advantage. Unfortunately, my VM has a language tiedown; I'm not sure how to add the type support I need to most languages.

  • by magi ( 91730 ) on Tuesday June 25, 2002 @11:03AM (#3762601) Homepage Journal
    Are there any VMs currently, for Java, Python or some other language, that can execute each thread one VM instruction at a time?

    It would also be nice to have language-level support for parallel processing, like in Occam.

    For example, in a Python implementation, the following code would execute the two for-statements in the "par"-block in parallel:

    for a in range (0, 3):
    print "a = %d" % (a)
    for b in range (0, 3):
    print "b = %d" % (b)

    As the two threads would be executed exactly at the same speed, the output would be:

    a = 0
    b = 0
    a = 1
    b = 1
    a = 2
    b = 2
    • by Anonymous Coward
      You're assuming print is atomic.
      The output would more likely be:

      a =b 0
      a= 0
      b= =1
      b =
      = 2

      Occam is an interesting language, but I think it has a too restrictive view. No global variables, no mutexes, everything uses channels - even shutting down a multithreaded Occam program is a major pain in the ass - message passing nightmare.
      • You're assuming print is atomic.

        Well, yes, I think it's a rather safe assumption if the threading is implemented in the VM; formatting and printing a buffer would probably be implemented with an efficient native function, which would be atomic.

        But, yes, if we use native threading, the context switch could occur anywhere, and the output would be a mess, just as you describe. As noted in other messages in this thread (ummm...this discussion thread), using native threading would probably be the wisest choise.

        I don't know Occam really at all, but I don't quite like the normal Java or Posix ways of threading either. The PAR statement in Occam might make threading so much easier.
    • by Anonymous Coward
      While I'm sure someone could write a VM that did that, I don't think anyone would want to use it. On an x86, a context switch costs enough that one instruction per context switch would give you a bit more than 95% overhead lost on context switching. This means your programs would run at 1/20th speed at best.
      • While I'm sure someone could write a VM that did that, I don't think anyone would want to use it. On an x86, a context switch costs enough that one instruction per context switch would give you a bit more than 95% overhead lost on context switching.

        You're right, assuming that the VM uses native threads. I was thinking of having the threading implemented in the VM; I guess it would be kind of trivial and it would have very little (if any) overhead because of context switching, although there might be some other costs.

        But of course, without native threads, we would lose the possibility to use multiple processors easily, which wouldn't be very nice.

        Then why have this low resolution? A friend of mine has a home-brewn VM for an ad-hoc language for embedded programming that handles concurrent execution one instruction at a time. He says it's very important for his embedded application. I don't know his specific reasons, but I'd imagine it has something to do with controlling multiple embedded devices and interfaces.

        Anyhow, low-resolution concurrency just sounds cool. ;-)
        • You're right, assuming that the VM uses native threads. I was thinking of having the threading implemented in the VM

          IBM's Jalapeno [], now known as the Jikes Virtual Machine for Research, does its own thread scheduling instead of using native threads. The compiler generates yield points in method prologues and the back-edges of loops where the VM can preempt the thread. I suppose if you really wanted to you could have it generate a yield point for every instruction...

      • Below is a VM with a "reduced instruction set", illustrating what I meant with low-resolution threads. The VM has full threading capability. (The code actually works.)
        #include <magic/mapplic.h>
        #include <magic/mobject.h>
        #include <magic/mpackarray.h>
        #include <magic/mlist.h>

        enum vm_bytecodes {VMBC_BRK=000, VMBC_NOP=001, VMBC_FORK=002, VMBC_JOIN=003};

        class Thread {
        static int smThreadCounter;
        int mThreadId;
        int mInstruction;
        int mChildCount;
        Thread* mpParent;

        Thread (Thread* pParent=NULL, int position=0) {mpParent=pParent; mInstruction = position; mChildCount=0; mThreadId=smThreadCounter++;
        printf ("Thread %d created\n", mThreadId);}

        bool execute (PackArray<int>& code, List<Thread*>& threads) {
        printf ("Thread %d, instruction %05d: %03d\n", mThreadId, mInstruction, code[mInstruction]);
        switch (code[mInstruction]) {
        case VMBC_BRK: // Break; exit the thread.
        if (mpParent)
        return true;
        case VMBC_NOP: // Do nothing
        case VMBC_FORK: // Create a thread
        threads.add (new Thread (this, code[mInstruction+1]));
        mInstruction += 2;
        case VMBC_JOIN: // Join all child threads
        if (mChildCount == 0)
        return false; // Do not exit the thread.
        int Thread::smThreadCounter = 0;

        Main ()
        PackArray<int> instructions (20);
        instructions[0] = VMBC_FORK; // Fork to instruction 5
        instructions[1] = 5;
        instructions[2] = VMBC_FORK; // Fork to instruction 8
        instructions[3] = 8;
        instructions[4] = VMBC_JOIN; // Wait for the threads to exit.
        instructions[5] = VMBC_NOP; // Thread 1; do nothing.
        instructions[6] = VMBC_NOP; // Thread 1; do nothing.
        instructions[7] = VMBC_BRK; // Thread 1; exit thread.
        instructions[8] = VMBC_NOP; // Thread 2; do nothing.
        instructions[9] = VMBC_BRK; // Thread 2; exit thread.

        // Add the main thread.
        List<Thread*> threadpool;
        threadpool.add (new Thread ());

        // Run VM
        int threadCount;
        do {
        threadCount = 0;
        for (ListIter<Thread*> i (threadpool); !i.exhausted (); (), threadCount++)
        if (i.get()->execute (instructions, threadpool))
        i.deleteCurrent (); // The thread exited
        } while (threadCount > 0);
        (Implemented with my MagiC++ library.)
    • Are there any VMs currently, for Java, Python or some other language, that can execute each thread one VM instruction at a time?

      if you have a language that optimizes tail-calls, you could have the front-end of the language convert the separate threads of execution into continuation-passing style [], and then execute the code one continuation at a time, simulating threading on a VM level. if i remember correctly, the scheme48 VM [] could do that kind of threading, though on a coarse level.

      in CPS a function decomposes into a sequence of more primitive functions, each returning a continuation, ie. a handler for computation yet to come. for a simplified example, the evaluation of (+ (* 2 3) (* 4 5)) would evaluate (* 2 3) into 6 and return a continuation that evaluates (+ 6 (* 4 5)), which in turn would evaluate (* 4 5) into 20 and return a continuation that evaluates (+ 6 20), and that would finally evaluate to 26.

      but the point here is that one could explicitly halt the evaluation after receiving the first continuation, store it on the queue, and go off and compute something else. after a while you can come back, pop the continuation off the queue, and pick up the computation where you left off.

      the problem with such a setup is that it makes optimization difficult. i'd suggest looking at the CPS for more details...
  • by totallygeek ( 263191 ) <> on Tuesday June 25, 2002 @11:19AM (#3762719) Homepage
    I for one am happy to see the influx of ideas that are representative of what mainframes and minicomputers of old were accomplishing. The use of virtual computers or machines along with centralized processing has been a welcome change from the mentality of the PC market for about the last 15 years.

    People had said for a long time that personal computers connected to file servers was a lower-cost, better system. However, now many places are going to web-based or host-based connections because of buggy issues at the desktop and the unmanageability of the personal computer. Couple this with the fact that licensing manangement is such a bear and you see why us Unix folks are glad to see the turn-around.

    Mainframes had been on their way out before the personal computer, in favor of smaller satellite processing via minicomputers. However, now people are realizing that virtual computers in a big iron case gives you a better managed array of computing power for multiple users or processes. I for one welcome this back, and hope that we will continue to see vitual computing take over the personal computer business market approach. Bring in the network computers!

  • I'd like to nominate some software I wrote for the most random use of a virtual machine.

    I was asked to code a registration routine for a piece of software - after getting the username + serial number from the user I would have typically done some magic to calculate a checksum from the name and see if it matched the given key.

    Instead I wrote a small virtual machine which executed z80 machine code. The protection routine litererally started the VM - where all the magic happened. Each opcode was fetched decoded and executed. I think it would have been a real pain to decode ;)

    (I guess the clever cracker could have disassembled my windows binary with a z80 disassembler and gotten lucky; but it would have been hard to see what was being executed - unless they could do clever things like disassemble z80 in their head...)

  • It seems to me that more and more languages nowadays are designed for a VM, thus adding a level in between the OS and the application. Of course, this leads to a slowdown because the JIT has to do its thing AND the compiled code that runs has to use system calls to do what it wants to. But has anybody given any thought to making a VM that runs almost on top of the hardware with almost no system calls? Perhaps the only interference the OS would make is scheduling and possibly memory management. This kind of approach (like the exokernel approach in the EROS [] project) would allow a VM to greatly speed up its resultant code because it would have almost direct control of the hardware, and the VM would be optimized for that hardware. I am working on an OS (Middle Earth OS []) that is looking into this kind of design, and would appreciate your input either here or in the project's forums as to whether you think this would work (well) or not.
    • It sounds to me like you are proposing an operating system that *only* runs programs written for the VM. This is a totally viable idea and I would agree it would produce the fastest VM possible.

      Probably the main thing that makes people not consider this idea is that it would be a new OS that does not run any existing programs. Although there are plenty of alternative OS's out there, most people see VM as a way to get their new interface onto an existing system so they never consider this way of writing it.

    • But has anybody given any thought to making a VM that runs almost on top of the hardware with almost no system calls?

      It's been done many times since the 70s. Not sure the first time a VM-based language was the OS, but it was the case with Smalltalk, as far back as 1972 or 1976. You can still get a Smalltalk-based OS with SqueakNOS []. Squeak traditionally runs on top of a host OS like Linux, Mac OS, Windows and many others, but it has almost all of the features of an OS, including an awesome (but non-traditional) GUI system, compleat with remote viewing. The binaries are identical between the OS-version of Squeak an the hosted-on-Linux version.

      The current state of SqueakNOS is that you still have to write a little C for certain things. Luckily, you can write your low-level code in a subset of Smalltalk and have it translated to C. That's how the Squeak virtual machine is written, no manual C coding required. However, there is active work being done on Squeampiler [], which allows Squeak itself to compile and generate native code. Which means the entire system 100% will be in Smalltalk.

      As it is now, if you want to change (in SqueakNOS or Squeak on top of a 'normal' OS) fundamental changes to the language can be made within the environment. The only thing compiled to C is the virtual machine and other C plugins, like OS-specific functions. Everything else, the bytecode compiler, the parser, an emulator for itself, all the development tools and libraries are all written in Smalltalk.

      I am working on an operating environment for PDAs, Dynapad [] along these lines. I'm doing the development on top of Linux/PPC, Solaris/SPARC, and Windows/x86 and run it on my iPAQ under WinCE/ARM. Eventually, I'd like to run it as the OS, if something like OSKit ever makes it's way to the iPAQ platform.
  • Parrot (Score:2, Informative)

    by god ( 5628 )
    For more information on Parrot, which will be the Perl 6 virtual machine, and which is register-based, you may want to check out [].
    • Parrot is what got me interested in perl. Perl is too inconsistent for me to take very seriously, but Parrot is promising as a .NET work-a-like with no MS tie-ins. :)
    • Although it's being done with Perl in mind, it's not just the Perl 6 VM; it's actually aimed at pretty much any dynamic language. Hence we should also see backends for Ruby, Python, Basic, any pretty much any language you care to impliment.

      There's also talk of Parrot bytecode to Java/CLR bytecode convertors. Interesting stuff, even if we're gonna have to wait ages to actually get something useful.
  • It seems to me that many of you are viewing VM as some kind of emulation application rather than a virtual machine. What you may not realize is that many(most?)OS kernels including Linux virtualize the hardware to make the software more portable and less able to crash your entire system. What you lose in performance you make up for in stability. Operating systems books are a great reference for studying VMs.

    Operating System Concepts by Abraham Silberschatz, et al.

    Design and Implementation of the 4.4bsd Operating System by McKusic, et al.

    Design of the UNIX Operating System by Bach

    Modern Operating Systems by Tanenbaum

    Operating Systems Design and Implementation by Tanenbaum
  • hmm infocom... (Score:3, Interesting)

    by YakumoFuji ( 117808 ) on Tuesday June 25, 2002 @11:54AM (#3762974) Homepage
    but does it cover infocoms famous zmachine VM, which runs on more hardware than any other virtual machine ever... (considering it can run under java as well.. a vm runnnig a vm!)..

    or magnetic scrolls 68k VM, that that even ran on the c64 with its mighty 8bit chip, was emulating the 16/32bit 68K!

    aaah long live interactive fiction and virtual machines.
  • Can anyone explain what's similar/different between doing this and doing something like ReBirth?
  • by r ( 13067 ) on Tuesday June 25, 2002 @12:17PM (#3763145)
    one thing to remember is that the microsoft .net infrastructure does not run on a virtual machine!

    .net code (c#, etc.) compiles down to a standard intermediate language, which gets JITted into machine code, and linked to .net support libraries. there is no interpreted code execution going on, and indeed, the IR is not optimized for interpreted execution. hence, there's no virtual machine running, unlike in the case of Java or other bytecode interpreters.

    .net is not a virtual machine any more than gcc is a virtual machine.
  • Was in Unreal. That was, what, five years ago? It was a revalation to me as a commercial games developer. You could script object behaviour in C-like code, and load it dynamically at run time without having to restart the engine or try and do clever tricks with dll's. The development time that saved was simply breathtaking, and it pretty much defined the future of games engines and games development, which epitomise the RAD concept. Heck, the first thing that we did was crank out our own C-like VM, and we never looked back.

    • The development time that saved was simply breathtaking, and it pretty much defined the future of games engines and games development

      Of course, while UnrealScript is cool, let's not be too quick to give it credit for VMs in games. UnrealScript was heavily influenced by Quake C from two year earlier (much like everything in Unreal, which should be obvious enough anyone). And there were a number of games from the VGA days which included scripting languages. A popular example is the lame side-scroller Abuse, but there certainly were others.
  • The reviewer said: Blunden makes sure to cover every topic related to virtual machines in extreme depth. ... Released in March 2002, this book is extremely up to date.

    I couldn't agree less. I flipped through the book in the bookstore, and I wasn't impressed. Blunden is a C/Assembly language programmer with little understanding of the requirements that a modern programming language places on a virtual machine. So his virtual machine is single threaded and runs in a fixed block of address space, with a fixed size code and data section, a growable stack, and a growable explicitly managed heap. This is fine if the target language is C or assembly language, but not so fine if you want garbage collection, threads, closures, first class continuations, or any of those other language features that were considered cutting edge back in the 1970s. How does his system link to external code, like the system calls in libc? Well, there are 11 "interrupts" called int0 through int10, sort of like the DOS system call interface.

    His explanation of why he doesn't support garbage collection is pretty muddled: basically, he's not comfortable with the idea, and doesn't think its practical.

    Although I think that a register machine probably is better than a stack machine for this kind of system, he gives none of the arguments I was expecting to see to support this design decision. Instead, we get vague handwaving: apparently, he's more comfortable with register machines, because that's what he's used to.

    Doug Moen

  • They make the Java and php VMs look like the Johnny-come-lately's that they are. (Sorry Squeak! Wa-ay too slow.)

    Blazing fast object allocation (both) interning & loading (VA by a nose,) and both have a full IDE that the others have been trying to achieve since Smalltalk'80 came out.

    But remember thei're IDEs not production/delivery. For that you want internationalizable, database drivable GUIs, dialog managers, state machine and transition engines.

    All in all. Look at VW & VA and weep. (Or better yet learn.) They've been at it since the days of UCSD Pascal. They've forgot more than you'll ever know.

As of next Tuesday, C will be flushed in favor of COBOL. Please update your programs.