Virtual Machine Design and Implementation in C/C++ 240

Posted by timothy on Tuesday June 25, 2002 @10:30AM from the truly dept.

wackybrit writes: "The concept of the virtual machine is one of the most important concepts in computer science today. Emulators use virtual machines, operating systems use virtual machines (Microsoft's .NET), and programming languages use virtual machines (Perl, Java)". Read on for his review of Virtual Machine Design and Implementation in C/C++, an attempt to examine and explain virtual machines and the concepts which allow them to exist.

Virtual Machine Design and Implementation in C/C++
author	Bill Blunden
pages	670
publisher	Wordware Publishing
rating	9
reviewer	Peter Cooper
ISBN	1-55622-903-8
summary	An in-depth look at virtual machines, assemblers, debuggers, and system architecture in general.

Virtual machines are, in effect, a software model of a whole system architecture and processor. They take in bytecode (formed of opcodes, operands, and other data) and execute it, much in the same way a real system executes code. Running these operations in software, however, gives you more security, and total control over how the system works.

Virtual machines are popular for a number of reasons. The first is that they give programmers a third compiler option. You don't have to either go the dynamic interpreted route or the static compiled route, you can compile for a virtual machine instead. Another is that virtual machines aid portability. If you compile your code for a virtual machine, you can run that binary on any system to which the virtual machine has been ported.

Few books have been written on virtual machines, with only a few Java Virtual Machine titles available. Virtual Machine Design and Implementation by Bill Blunden is therefore a landmark book for anyone with an interest in virtual machines, or even system and processor architecture as a whole.

What's to Like?

Blunden makes sure to cover every topic related to virtual machines in extreme depth. The beauty of this is that you're not left in the dark, but that experts can simply skip sections. The book is well divided up, and off topic rants or notes are clearly marked with dividers. This is an easy book to read, even though it runs to some 650 pages.

To lead the reader through the entire production of a virtual machine, Blunden showcases the development of his own 'HEC' virtual machine (HEC being one of the fictional companies in 'CPU Wars'). Initially he starts slowly, and introduces the reader to how CPUs work, how memory works, how paging works, and how almost any other system process you can imagine works. Nothing is missed out. Multitasking, threads, processes, porting.. he covers it all. This is excellent for those new to some of these topics, and makes this an advanced book that's actually quite readable by someone with a modicum of computer science experience.

After laying down the foundations for the design of the virtual machine, the actual development starts in Chapter 3. All of the code in this book is in C or C++, and nearly all of the code is talks about is actually printed on the right pages in the book. No more flipping between code on your computer and the book, it's all just where it should be!

Further on in the book, a number of extremely advanced concepts are introduced, but even these need not be out of the reach of an intermediate programmer. Blunden presents the most vivid insight into how assemblers and debuggers are created, and the book is worth it for this information alone.

Another important thing about this book is that it looks at creating a register based virtual machine. Stack based virtual machines are covered, but the author makes a compelling argument for using registers. This makes a refreshing change from the Java Virtual Machine books that ram stack based theory down your throat. It's also useful if you're interested in the Perl 6 'Parrot' project, which is also an in-development register based virtual machine, and bound to become rather important over the next few years.

What's to Consider?

Virtual machines aren't for everyone. If you're a high level programmer working with database apps, this isn't really for you. This book is primarily for system engineers, low level programmers, and hobbyists with an interest in compilation, assembler, and virtual machine theory.

This is not a book for beginners. You need to have a reasonable knowledge of C to understand the plentiful examples and source code in the book. C++ is also useful, although OOP is clearly explained, so even a standard C programmer could follow it. That said, this is an excellent book for intermediate programmers or computer science students, as a number of advanced topics (garbage collection, memory management, assembler construction, paging, token parsing) are dealt with in a very easy to understand way.

The Summary

Released in March 2002, this book is extremely up to date. This is good news, as virtual machines are clearly going to take up a good part of future compiler and operating system technology, and this makes it important to learn about their construction and operation now. These technologies are already in the marketplace; Microsoft's .NET, and JVM, for example. Perl 6's 'Parrot' is also going to become a big player, with languages like Ruby, Python, and Scheme being able to run on it in the future.

Whether you want to learn about system architecture, assembler construction, or just have a reasonably fun programming-related read, this book is great.

Table of Contents

History and Goals
Basic Execution Environment
Virtual Machine Implementation
The HEC Debugger
Assembler Implementation
Virtual Machine Interrupts
HEC Assembly Language
Advanced Topics

You can purchase Virtual Machine Design and Implementation in C/C++ from bn.com. Slashdot welcomes readers' book reviews -- to submit yours, read the book review guidelines, then visit the submission page.

Virtual Machine Design and Implementation in C/C++

This discussion has been archived. No new comments can be posted.

Search 240 Comments Log In/Create an Account

Comments Filter:

Re:Virtual Machine (Score:1, Informative)

by Anonymous Coward writes: on Tuesday June 25, 2002 @10:38AM (#3762449)

a vm passes appropriate instructions directly to the CPU, while an emulator simulates each and every instruction.

this is why (in Java, and VMWare) it's a VM, not an emulator.

Re:Virtual Machine (Score:3, Informative)

by Wolfier ( 94144 ) writes: on Tuesday June 25, 2002 @10:41AM (#3762472)

An emulator is a virtual machine with a pre-existing non-virtual counterpart.

How to use Google (Score:1, Informative)

by Anonymous Coward writes: on Tuesday June 25, 2002 @10:45AM (#3762493)

So you want more info on searching on virtual machines on Google, not using Java ?

Search on : "virtual machine -java"

It's simple & off topic.

Cheers,
T.

Re:Virtual Machine (Score:0, Informative)

by boltar ( 263391 ) writes: on Tuesday June 25, 2002 @10:45AM (#3762497)

An emulator simulates an entire machine in that it displays the virtual screen in a window, does the I/O etc. A VM merely simulates a ficticious architecture just enough to be able to get a
program running on the host system.

Re:Why *virtual* machines? (Score:3, Informative)

by JohnnyCannuk ( 19863 ) writes: on Tuesday June 25, 2002 @10:47AM (#3762508)

Zuccotto and a few other companies have done just that...the JVM is an actual hardware chip. There were at least 5 companies doing the same at Java One

Implementation of the Icon Programming Language (Score:2, Informative)

by Anonymous Coward writes: on Tuesday June 25, 2002 @10:49AM (#3762532)

The Implementation of the Icon Programming Language
[cover]

This book describes the implementation of Icon in detail. Highlights include:

* Icon's virtual machine
* the interpreter for the virtual machine
* generators and goal-directed evaluation
* data representation
* string manipulation
* structures
* memory management

http://www.cs.arizona.edu/icon/ibsale.htm

Information on the Icon programming language itself can be found at

http://www.cs.arizona.edu/icon

Re:Virtual Machine (Score:5, Informative)

by ranulf ( 182665 ) writes: on Tuesday June 25, 2002 @10:56AM (#3762569)

Yeah, they're basically the same, but the distinction is fundamentally on the original intention.
A virtual machine is designed specifically to be general and run in different environments, whereas an emulator is designed to emulate the environment of some existing hardware or software to trick software into beleiving that it genuinely is running on the original device.
So, whereas a virtual machine will have a fairly abstract policy towards doing things (compare Java's AWT - I'd like to open a window, I'd like a button here, I'd like a menu there) and an emulator will get really bogged down emualting details, e.g. memory address $DFF180 changes the background colour.
Both can be easily emulated by a state machine (hence why they come up in this book), however virtual machines can be made more efficient as they are intentionally abstract. e.g. in the JVM, you know what is code and what isn't, so you can translate blocks of code into native machine code and run that directly instead of interpreting every instruction. If you try that with an emulator, you'll come unstuck when you come across self-modifying code, or things that access memory mapped registers (e.g. on a 68000 the instruction mov d0,4(a0) offers no clue as to whether the write is to hardware or memory.
Generally, you'll find that most virtual machine designs aim to reduce the instruction set down to a bare minimum. This allows a virtual machine (if it chooses) to effectively re-build the original parse tree and generate native code. However, emulators are generally trying to emulate CISC processors where as much is squashed into an instruction set as possible. Similarly, most virtual machines are heavily stack based, so as not to make any assumptions about register availability.

Re:Low-resolution thread concurrency? (Score:2, Informative)

by Anonymous Coward writes: on Tuesday June 25, 2002 @11:14AM (#3762685)

While I'm sure someone could write a VM that did that, I don't think anyone would want to use it. On an x86, a context switch costs enough that one instruction per context switch would give you a bit more than 95% overhead lost on context switching. This means your programs would run at 1/20th speed at best.

Pipelining the KVM (Score:2, Informative)

by BobLenon ( 67838 ) writes: on Tuesday June 25, 2002 @11:35AM (#3762836) Homepage

Yea, so for a class project we took the kvm and (Java VM for embeded devices), and turned it into a pipelined architecture. It was very educational, but the practicality is lacking ... You at least need a 4 proc machine to be useful, as it was a simple 4-stage. But the speed was soo lacking.

It was worthwile experience, though I do wish java was reg based. ;) .... as it was only a learning experince no big deal. By the end i could've written my java in assembly ;)

Re:Virtual Machine (Score:2, Informative)

by Java Pimp ( 98454 ) writes: on Tuesday June 25, 2002 @11:35AM (#3762838) Homepage

Sort of...

A Just in Time compiler will compile all the byte code to native code before it is executed and run it on the hardware. Performance hit at the start of each execution but ultimately faster than interpreting byte code. Note that the JIT is probably not the best optimizer so it still won't be as efficient as platform specific binaries. (among other reasons...)

An optimized VM will recognize instructions or code sequences within the bytecode that can be directly mapped to native code and execute it directly on the hardware. Not as fast as JIT but faster than interpreting everything.

Both are still slower than platform specific binaries but that's just the nature of the beast.

Parrot (Score:2, Informative)

by god ( 5628 ) writes: on Tuesday June 25, 2002 @11:52AM (#3762960) Homepage

For more information on Parrot, which will be the Perl 6 virtual machine, and which is register-based, you may want to check out http://www.parrotcode.org/ [parrotcode.org].

Overlooking importance of VM in kernel (Score:2, Informative)

by gwayne ( 306174 ) writes: on Tuesday June 25, 2002 @11:53AM (#3762965)

It seems to me that many of you are viewing VM as some kind of emulation application rather than a virtual machine. What you may not realize is that many(most?)OS kernels including Linux virtualize the hardware to make the software more portable and less able to crash your entire system. What you lose in performance you make up for in stability. Operating systems books are a great reference for studying VMs.

Operating System Concepts by Abraham Silberschatz, et al.

Design and Implementation of the 4.4bsd Operating System by McKusic, et al.

Design of the UNIX Operating System by Bach

Modern Operating Systems by Tanenbaum

Operating Systems Design and Implementation by Tanenbaum

Re:Low-resolution thread concurrency? (Score:2, Informative)

by kinga ( 1655 ) writes: on Tuesday June 25, 2002 @12:39PM (#3763314)

You're right, assuming that the VM uses native threads. I was thinking of having the threading implemented in the VM

IBM's Jalapeno [ibm.com], now known as the Jikes Virtual Machine for Research, does its own thread scheduling instead of using native threads. The compiler generates yield points in method prologues and the back-edges of loops where the VM can preempt the thread. I suppose if you really wanted to you could have it generate a yield point for every instruction...

It can probably be done, but not profitably. (Score:3, Informative)

by Animats ( 122034 ) writes: on Tuesday June 25, 2002 @01:08PM (#3763556) Homepage

It's not something to do for performance.
Implementing a stack-based machine in hardware is straightforward, and has been done many times. The first one was the English Electric Leo Marconi KDF9, in 1958. Burroughs followed, and thirty years of Burroughs stack machines were sold. Java has a small implementation of the Java engine in hardware. Forth chips have been manufactured.
But all these machines have used sequential execution - one instruction at a time. Nobody has yet built a stack machine with out-of-order execution. There's been a little research [berkeley.edu] in this area. Sun's picoJava II machine has some paralellism in operand fetches and stores. But nobody has wanted to commit the huge resources needed to design a new type of superscalar processor. The design team for the Pentium Pro, Intel's first superscalar, was over 1000 people. And that architecture (which is in the Pentium II and III) didn't become profitable until the generation after the one in which it was first used.
In the end, a superscalar stack machine probably could be designed and built with performance comparable to high-end register machines. For superscalar machines, the programmer-visible instruction set doesn't matter that much, which is why the RISC vs. CISC performance debate is over. But so far, there's no economic reason to do this. Sun perhaps hoped that Java would take off to the point that such machines would make commercial sense. But it didn't happen.

Re:Low-resolution thread concurrency? (Score:3, Informative)

by r ( 13067 ) writes: on Tuesday June 25, 2002 @01:28PM (#3763658)

Are there any VMs currently, for Java, Python or some other language, that can execute each thread one VM instruction at a time?

if you have a language that optimizes tail-calls, you could have the front-end of the language convert the separate threads of execution into continuation-passing style [readscheme.org], and then execute the code one continuation at a time, simulating threading on a VM level. if i remember correctly, the scheme48 VM [s48.org] could do that kind of threading, though on a coarse level.

in CPS a function decomposes into a sequence of more primitive functions, each returning a continuation, ie. a handler for computation yet to come. for a simplified example, the evaluation of (+ (* 2 3) (* 4 5)) would evaluate (* 2 3) into 6 and return a continuation that evaluates (+ 6 (* 4 5)), which in turn would evaluate (* 4 5) into 20 and return a continuation that evaluates (+ 6 20), and that would finally evaluate to 26.

but the point here is that one could explicitly halt the evaluation after receiving the first continuation, store it on the queue, and go off and compute something else. after a while you can come back, pop the continuation off the queue, and pick up the computation where you left off.

the problem with such a setup is that it makes optimization difficult. i'd suggest looking at the CPS for more details...

Re:Parrot (Score:2, Informative)

by PythonOrRuby ( 546749 ) writes: on Tuesday June 25, 2002 @01:49PM (#3763809)

Parrot is the virtual machine runtime, so technically, it won't be compiling anything(though it will alow for JIT compiling of Parrot bytecode).

Rather, the languages will implement compilers that generate Parrot Assembly language, and then the Parrot assembler will take it from there. This approach really does have a number of advantages. It means that the Parrot community can work on optimizing the heck out of the assembler and runtime, without worrying too much about the concerns of each individual lanuage.

It also means that, for embedded use, Perl/Ruby/Python/Tcl/Scheme/etc. programs can be compiled and loaded onto a machine that only has to have the Parrot runtime installed.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Virtual Machine Design and Implementation in C/C++ 240

What's to Like?

What's to Consider?

The Summary

Virtual Machine Design and Implementation in C/C++ More Login