Virtual Machine Design and Implementation in C/C++ 240
Virtual Machine Design and Implementation in C/C++ | |
author | Bill Blunden |
pages | 670 |
publisher | Wordware Publishing |
rating | 9 |
reviewer | Peter Cooper |
ISBN | 1-55622-903-8 |
summary | An in-depth look at virtual machines, assemblers, debuggers, and system architecture in general. |
Virtual machines are, in effect, a software model of a whole system architecture and processor. They take in bytecode (formed of opcodes, operands, and other data) and execute it, much in the same way a real system executes code. Running these operations in software, however, gives you more security, and total control over how the system works.
Virtual machines are popular for a number of reasons. The first is that they give programmers a third compiler option. You don't have to either go the dynamic interpreted route or the static compiled route, you can compile for a virtual machine instead. Another is that virtual machines aid portability. If you compile your code for a virtual machine, you can run that binary on any system to which the virtual machine has been ported.
Few books have been written on virtual machines, with only a few Java Virtual Machine titles available. Virtual Machine Design and Implementation by Bill Blunden is therefore a landmark book for anyone with an interest in virtual machines, or even system and processor architecture as a whole.
What's to Like?
Blunden makes sure to cover every topic related to virtual machines in extreme depth. The beauty of this is that you're not left in the dark, but that experts can simply skip sections. The book is well divided up, and off topic rants or notes are clearly marked with dividers. This is an easy book to read, even though it runs to some 650 pages.
To lead the reader through the entire production of a virtual machine, Blunden showcases the development of his own 'HEC' virtual machine (HEC being one of the fictional companies in 'CPU Wars'). Initially he starts slowly, and introduces the reader to how CPUs work, how memory works, how paging works, and how almost any other system process you can imagine works. Nothing is missed out. Multitasking, threads, processes, porting.. he covers it all. This is excellent for those new to some of these topics, and makes this an advanced book that's actually quite readable by someone with a modicum of computer science experience.
After laying down the foundations for the design of the virtual machine, the actual development starts in Chapter 3. All of the code in this book is in C or C++, and nearly all of the code is talks about is actually printed on the right pages in the book. No more flipping between code on your computer and the book, it's all just where it should be!
Further on in the book, a number of extremely advanced concepts are introduced, but even these need not be out of the reach of an intermediate programmer. Blunden presents the most vivid insight into how assemblers and debuggers are created, and the book is worth it for this information alone.
Another important thing about this book is that it looks at creating a register based virtual machine. Stack based virtual machines are covered, but the author makes a compelling argument for using registers. This makes a refreshing change from the Java Virtual Machine books that ram stack based theory down your throat. It's also useful if you're interested in the Perl 6 'Parrot' project, which is also an in-development register based virtual machine, and bound to become rather important over the next few years.
What's to Consider?
Virtual machines aren't for everyone. If you're a high level programmer working with database apps, this isn't really for you. This book is primarily for system engineers, low level programmers, and hobbyists with an interest in compilation, assembler, and virtual machine theory.
This is not a book for beginners. You need to have a reasonable knowledge of C to understand the plentiful examples and source code in the book. C++ is also useful, although OOP is clearly explained, so even a standard C programmer could follow it. That said, this is an excellent book for intermediate programmers or computer science students, as a number of advanced topics (garbage collection, memory management, assembler construction, paging, token parsing) are dealt with in a very easy to understand way.
The Summary
Released in March 2002, this book is extremely up to date. This is good news, as virtual machines are clearly going to take up a good part of future compiler and operating system technology, and this makes it important to learn about their construction and operation now. These technologies are already in the marketplace; Microsoft's .NET, and JVM, for example. Perl 6's 'Parrot' is also going to become a big player, with languages like Ruby, Python, and Scheme being able to run on it in the future.
Whether you want to learn about system architecture, assembler construction, or just have a reasonably fun programming-related read, this book is great.
Table of Contents- History and Goals
- Basic Execution Environment
- Virtual Machine Implementation
- The HEC Debugger
- Assembler Implementation
- Virtual Machine Interrupts
- HEC Assembly Language
- Advanced Topics
You can purchase Virtual Machine Design and Implementation in C/C++ from bn.com. Slashdot welcomes readers' book reviews -- to submit yours, read the book review guidelines, then visit the submission page.
Wouldn't this lead us to... (Score:1)
Virtual Machine (Score:2, Interesting)
because I can't see whats different between my mame and java virutal machine...
Re:Virtual Machine (Score:1, Informative)
this is why (in Java, and VMWare) it's a VM, not an emulator.
Re:Virtual Machine (Score:3, Informative)
Re:Virtual Machine (Score:2)
Re:Virtual Machine (Score:4, Interesting)
For example, some of the nuclear power plants here in Canada are using or switching over to an emulator to run the plants because they are running out of spare parts for their 1972 control machines. Without the use of an emulator, they'd each have to rewrite shelves and shelves of assembler code.
You can imagine that some of the code is timing critical, so the emulator must be exact down to the timing.
Re:Virtual Machine (Score:5, Informative)
A virtual machine is designed specifically to be general and run in different environments, whereas an emulator is designed to emulate the environment of some existing hardware or software to trick software into beleiving that it genuinely is running on the original device.
So, whereas a virtual machine will have a fairly abstract policy towards doing things (compare Java's AWT - I'd like to open a window, I'd like a button here, I'd like a menu there) and an emulator will get really bogged down emualting details, e.g. memory address $DFF180 changes the background colour.
Both can be easily emulated by a state machine (hence why they come up in this book), however virtual machines can be made more efficient as they are intentionally abstract. e.g. in the JVM, you know what is code and what isn't, so you can translate blocks of code into native machine code and run that directly instead of interpreting every instruction. If you try that with an emulator, you'll come unstuck when you come across self-modifying code, or things that access memory mapped registers (e.g. on a 68000 the instruction mov d0,4(a0) offers no clue as to whether the write is to hardware or memory.
Generally, you'll find that most virtual machine designs aim to reduce the instruction set down to a bare minimum. This allows a virtual machine (if it chooses) to effectively re-build the original parse tree and generate native code. However, emulators are generally trying to emulate CISC processors where as much is squashed into an instruction set as possible. Similarly, most virtual machines are heavily stack based, so as not to make any assumptions about register availability.
Re:Virtual Machine (Score:2)
Isn't that called JIT? Also, if I remember correctly, didn't the first version of Java come without this? (and were therefore unspeakably slow?)
Re:Virtual Machine (Score:2, Informative)
A Just in Time compiler will compile all the byte code to native code before it is executed and run it on the hardware. Performance hit at the start of each execution but ultimately faster than interpreting byte code. Note that the JIT is probably not the best optimizer so it still won't be as efficient as platform specific binaries. (among other reasons...)
An optimized VM will recognize instructions or code sequences within the bytecode that can be directly mapped to native code and execute it directly on the hardware. Not as fast as JIT but faster than interpreting everything.
Both are still slower than platform specific binaries but that's just the nature of the beast.
Re:Virtual Machine (Score:2)
Pipelining the KVM (Score:2, Informative)
It was worthwile experience, though I do wish java was reg based.
Re:Virtual Machine (Score:2)
Perhaps you should check uae-jit patch (which has been ported to basillisk and integrated into win series of uae) before concluding jit based 68s emulation is not practical.
Re:Virtual Machine (Score:3)
For instance, if you were to make a Pentium emulator to run on a 486, then many of the instructions could be executed as-is by the hardware. Most register values could be stored in actual registers. And so on.
Re:Virtual Machine (Score:2)
Yes, an emulator simulates a piece of hardware that once existed, a Virtual Machine is an idealised machine.
The difference is significant because many Virtual Machines have features that either cannot be supported in hardware or would be prohibitively expensive. This is the principle design difference between the original Java VM and the Microsoft .NET CLI. The original Java VM was designed so that it could be implemented as silicon gates, the .NET CLI could be implemented in Silicon if you really tried really hard and were a complete masochist but that was never a design goal.
I don't know about the second Java VM, my interest in Java VM kinda died after it was clear Sun wanted no external inputs they did not control absolutely. I would guess that the redesign would be towards a more abstract representation that would allow for better JIT compilation.
Strictly speaking .NET does not use a VM, it uses an intermediate compiler representation, however any turing complete representation could be called a VM. The distinction matters because you can compile .NET code down to the Java VM if you choose but going the other way would not be a great idea...
Legality (Score:2)
Can anyone give me a substantial difference between a virtual machine, and an emulator
Others have commented on the theoretical differences, but I feel I should say something as to what distinguishes a VM from an emulator in practice. Virtual machines do not promote piracy because software is designed to run on virtual machines. On the other hand, an emulator is often written with unlawful redistribution of proprietary software in mind, even if it is wink-wink-nudge-nudge.
because I can't see whats different between my mame and java virutal machine
I find the most important difference between MAME and JVM that there is a much larger library of free software designed to run under JVM than under MAME.
thank you! (Score:2, Interesting)
How to use Google (Score:1, Informative)
Search on : "virtual machine -java"
It's simple & off topic.
Cheers,
T.
Re:How to use Google (Score:1)
Re:thank you! (Score:2)
Hope this helps.
I want to run a virtual Machine (Score:2, Funny)
Alternate titles (Score:3, Funny)
1. Reversi: C64 Speed on a Pentium IV
2. Double Your Code, Halve Your Speed
3. Real Men Don't Use Real Computers
4. VM:Very Macho or Verily laMe
5. Atari ST Rebirth: a 20 Year Reversal
etc., etc.
Ack, I'm turning into a crank! Oy.
Re:Alternate titles (Score:1, Insightful)
Maintainable code
Faster time to market
Minimum breakage during enhancements
Ability to easily port to other platforms
If you owned a software business would these things be important to you? I don't think performance would be the primary concern.
Another use (Score:5, Interesting)
Also, don't forget the UCSD P-System, which used a virtual machine to run code compiled in that environment. I know of at least one commercial product that used the P-System; I believe there were many.
Virtual machines have been around awhile; they're an interesting field, made newly relevant by the ascendancy of environments such as Java and the MS CLR. I just wish I had a good excuse to drop $50 on this book...:-)
Eric
Re:Another use (Score:2, Insightful)
What I found particularly interesting was that this seemingly hopefull project was taken up so well that Simics [virtutech.com] thought it prudent to add x86-64 support to thier existing commercial multi-architecture simulator.
The good news in all of this is that Linux and a fair few of the GNU tools are x86-64 ready now, well in advance of any x86-64 chips' release.
Re:p-System apps (Score:2)
I never did get the p-Code interpreter card for my TI-99/4A, though; an environment like that might have run p-System programs a little faster. (There's an early example of "VMs implemented in hardware" for you...way ahead of the JavaChip.)
Eric
Re:Alternate titles (Score:1)
Why *virtual* machines? (Score:3, Interesting)
My question to anyone qualified to comment: Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such? I can imagine real performance benefits happening with such an idea...
Re:Why *virtual* machines? (Score:3, Informative)
Re:Why *virtual* machines? (Score:3, Insightful)
Re:Why *virtual* machines? (Score:2)
Does Sun own the exclusive rights to create native Java CPUs? I know other companies have paid Sun to license picoJava designs, but what if someone else made a Java-compatible CPU but just did not call it "Java(tm)"?
Re:Why *virtual* machines? (Score:1)
It seems to me that the only application of such a piece of hardware was when you want to use code written in a certain language exclusively , like that semi-new Sharp pda with a linux based Java VM.
Re:Why *virtual* machines? (Score:2)
It would be useful for certain applications for the computer to have a (or multiple!) hardware Java chips to speed up execution of Java code. Java servlets, for example. Sun is hyping Java on the server side awfully hard. But they can be slow, especially when you have thousands going at a time.
You could have this *and* a cross-platform VM. They're not mutually exclusive. Schmucks in Windows can still run Limewire with a setup like this on some machines.
The Zaurus doesn't use Java exclusively. From my playing with them, Java (like elsewhere) is still very much as second-class citizen in Qtopia. PocketLinux, which is now defunct (no doubt because Java sucks- sorry, couldn't help it) was a PDA operating environment that ran on top of Linux that used Java exclusively.
Re:Why *virtual* machines? (Score:2)
Hardware acceleration makes sense on embedded platforms where there are not enough resources to properly do on the fly compilation and optization (basically a non issue on servers).
As for your comments regarding pocketlinux, I suspect they were simply outcompeted by other pda oses who managed to produce a more usable os & apps. Users don't care about kernels, they do care about whether they can do stuff on a PDA: read mail, edit word documents, read pdf, manage agenda's, synchronize with popular pc applicatins, etc.
Re:Why *virtual* machines? (Score:2)
There are also a few people doing math-based research using Java. It would be amazing to be able to use RMI and a shared class pool to do distributed processing. Who needs entire computers to farm out computations to, when you have a 4 Java CPUs on a PCI card?
No, this kind of technology isn't required. We're getting along, using Java, without it today. But it could be nice for some applications.
PocketLinux wasn't so much out-competed as it was abandoned. They weren't poised to compete. Like a lot of open source companies, they expected the community to really take interest and start churning out apps for them. A lot of the Linux-nerd-world isn't into Java, they're still using C. That's all good, but you can't write (GUI) apps for PocketLinux in C. They're market, initially, wasn't end-users. It was the same group of people who read Slashdot and now own Zaurus and Agendas.
Re:Why *virtual* machines? (Score:2)
One final point: I've found that some graphics applications, even with "unsafe code", perform a lot slower then it's C++ counterpart. This may be due to a general lack of experience with graphics programming (the technical barrier of entry is lower), and the relative immaturity of the CLR. Remember, the JVM is a lot faster then it was in the late 90's.
Re:Why *virtual* machines? (Score:2)
Re:Why *virtual* machines? (Score:2)
Nothing that sits on the mobo to supplement a 'real' CPU tho.
Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such?
I'm no hardware guy. But I have a wee bit of experience hacking on the Smalltalk virtual machine. I imagine that this is so because VMs are designed as VMs, not as a blueprint for hardware. To support an entire computer, I wouldn't be surprised if you had to add a lot more instructions than most VMs provide.
Re:Why *virtual* machines? (Score:2)
Re:Why *virtual* machines? (Score:2)
Re:Why *virtual* machines? (Score:2)
...
Is there a reason why these virtual machines aren't taken as a blueprint for real hardware and implemented as such?
Ok here are some potential problems with the idea that I can think of:
However, there are niches where h/w implementation might well make sense; tiny mobile devices where performance is not so much the key but simplicity, and where ease of development is another strong selling point (for companies that develop s/w for such products). Being able to omit advanced JVMs is a plus, and performance may still be decent, if not stellar.
Re:Why *virtual* machines? (Score:2)
That's not true -- stack-based chips were dropped for other reasons. The modern stack-based chips are very fast indeed -- consider the X25, shBOOM, or P21.
But I think you're confusing "stack based" with "memory to memory". Not all stacks are implemented in memory; an on-chip stack is very fast, and allows the CPU to operate at almost the full ALU clock, since there's no register access delay.
Your other reasons are, of course, sufficient and correct.
-Billy
Re:Why *virtual* machines? (Score:2)
Interesting. I probably should read something about those... I'm not a h/w specialist, but it's good to have basic knowledge and try to keep that up-to-date.
And yes, I thought that caches generally were (always) implemented using main memory, so I did confuse the issues.
By the way, where are the chips you mention usually used? For signal-processing? (I'm sure Google can answer that one)
Re:Why *virtual* machines? (Score:2)
The shBOOM is now being marketed as a Java accelerator (it's called the PSC1000, by Patriot); another variant of it is rad-hardened and used in orbit.
The 25x (sorry, I mispelled it originally), described at http://www.colorforth.com/25x.html, is a new development from the guy who designed both the above chips, and isn't funded.
I found a list of similar chips at http://www.ultratechnology.com/chips.htm. Interesting stuff.
-Billy
Re:Why *virtual* machines? (Score:2)
Great question! There's a little bit of a false assumption in there; I believe you're assuming that only superscalar chips can be fast. Not true; superscalar has a LOT of advantages, but a lot of disadvantages as well. The worst disadvantage is complexity; the superscalar chips eat a LOT more energy.
The fastest stack chips don't execute multiple instructions at the same time. (They do have multiple stacks, but that's a different issue.)
The trick is that although they can't execute 5 instructions at the same time, they can and do execute 5 instructions in the time it takes other processors to execute one, for two reasons: first, their cycle length is that much shorter; second, their instruction encodings are that much shorter.
The cycle length is shorter because the top two registers of the stack are gated directly to all the chip's functional units. As soon as the stack stabilizes from the last instruction, the functional units begin computing correct results for the next instruction. By the time the next instruction's decoded, all it has to do is MUX the correct result onto the stack, and possibly pop the stack.
The encoding is shorter because there are no operands: instructions always apply to the top of the stack (well, except the LITERAL instruction). This means that (for the fastest chip) 4 instructions fit into one machine word -- so one can fairly consistently get 4 instructions per RAM cycle.
The chip speed is so much faster than the 18-bit 4ns cache chip being used as RAM for this that the multiprocessor version of this chip, the X25, uses one of its processors to drive each of its logic pins, thereby making its logical interface essentially software-driven. 250MHz isn't impressive (that's the effective speed of the cache RAM, if my calculations are correct), but having enough spare MHz to implement the access protocol in software IS impressive. Doing all this at 500 mW @ 1.8 V (this assumes all 25 computers running at full bore, which isn't needed; at minimum draw, 100mAh battery life is 1 year, with 1 computer running throttled (I'm quoting from the specs, I haven't bothered to do the conversions needed to compare wattage directly).
But I don't want to get caught in the trap of claiming stack chips are faster than superscalar chips. Not true -- but you have to use a LOT of power and a LOT of transistors to beat a stack chip.
-Billy
Re:Why *virtual* machines? (Score:2)
There have been lots of virtual machines that turned real. And the converse, of course. Just consider VMWare for one example.
It can probably be done, but not profitably. (Score:3, Informative)
Implementing a stack-based machine in hardware is straightforward, and has been done many times. The first one was the English Electric Leo Marconi KDF9, in 1958. Burroughs followed, and thirty years of Burroughs stack machines were sold. Java has a small implementation of the Java engine in hardware. Forth chips have been manufactured.
But all these machines have used sequential execution - one instruction at a time. Nobody has yet built a stack machine with out-of-order execution. There's been a little research [berkeley.edu] in this area. Sun's picoJava II machine has some paralellism in operand fetches and stores. But nobody has wanted to commit the huge resources needed to design a new type of superscalar processor. The design team for the Pentium Pro, Intel's first superscalar, was over 1000 people. And that architecture (which is in the Pentium II and III) didn't become profitable until the generation after the one in which it was first used.
In the end, a superscalar stack machine probably could be designed and built with performance comparable to high-end register machines. For superscalar machines, the programmer-visible instruction set doesn't matter that much, which is why the RISC vs. CISC performance debate is over. But so far, there's no economic reason to do this. Sun perhaps hoped that Java would take off to the point that such machines would make commercial sense. But it didn't happen.
The downside of virtual machines (Score:1)
The same is true of virtual machines. Simulating how a computer might react to certain error codes and so forth is all right in small doses, but the only way to get real data is go out there and buy some actual hardware.
Just my $.02.
Re:The downside of virtual machines (Score:2)
Plus, digital circuits are a little less complicated and better understood than nuclear explosions and particle interactions.
Implementation of the Icon Programming Language (Score:2, Informative)
The Implementation of the Icon Programming Language
[cover]
This book describes the implementation of Icon in detail. Highlights include:
* Icon's virtual machine
* the interpreter for the virtual machine
* generators and goal-directed evaluation
* data representation
* string manipulation
* structures
* memory management
http://www.cs.arizona.edu/icon/ibsale.htm
Information on the Icon programming language itself can be found at
http://www.cs.arizona.edu/icon
umm (Score:2)
Practically all coding books do this, and I mostly find it a cheap way to poop out thick books and massive volumes... Not a measure of quality in any way.
Take a Further look at this! (Score:4, Interesting)
I program in Java mostly right now, and so when people begin the usual 'vm is slow' crank I am curious about what they exactly mean.
Programs written to run on vm's can be significantly slower due to the extra layer. Yet, if the design of the vm is done well enough (by perhaps reading this tome?) then the vm should be comparable. Certainly C is faster generally than an interpreted language. But there are native compilers out there than provide very comparable results, and the advantage of a language that forces careful programing. Here is the slashdot link [slashdot.org]
If adding layers to programs automatically makes them slower, and so slow that they are useless, we all would code in assembly.
Good design is important. A badly written C program of which there are thousands, will be just as slow (read bad) as a badly written vm program.
Re:Take a Further look at this! (Score:2)
It's not so much the language as it is the runtime. CPU's don't really like to be micromanaged anymore, except by experts (again, like OS's). With a properly tuned runtime (like a good VM -- not saying Java is one), every program gains its benefit. C pretty much completely lacks a runtime layer completely, and the mismatch is starting to show.
Re:Take a Further look at this! (Score:2)
Re:Take a Further look at this! (Score:2)
Useless depends on the performance needed by the implementation. For example a sort routine in C may be fine for a database application, but in sorting visible polygons in an arcade game it may be too slow in which case there may be no choice except to implement that particular routine in assembly and interface it to the C program.
Java may use a VM and be slower than C but it has taken hold in server-side programming where the network connection is the bottleneck rather than the application. Even if the server becomes heavily loaded it's cheaper to throw more hardware at it than rewrite it in something faster.
It's all about how far you can get away with moving up the speed vs maintainability curve. It's for this reason we don't see any arcade games coming out written in Java, and why web designers will knock up a web site in PHP rather than write optimised C CGI/ISAPI/etc.
Phillip.
Re:Take a Further look at this! (Score:2)
What it comes down to is how I want to use my time. For my use, Smalltalk and Lisp are fast enough. They are compiled languages, compiled to bytecode which is then JITed. I can get a lot more done with their benefits and only loose a little speed, speed that isn't missed for the kind of stuff I work on.
Re:Take a Further look at this! (Score:2)
the person that doesn't think so will be running their office apps on their 100 Ghz machine and say, "look, fast enough", the person that does think so will be running a complete simulation of a galaxy at 1,000,000 yrs/s.
"Fast enough" is subjective opinion, a feeling, an intuition, biased by expectation. All that matters is relative speed and relative efficiency and the stability of the code in the end, and of course the ease of development. C and C++ increase flexibility without reducing ability. This means you are still able to seg fault, but it also means you are free to anything-else too. And I like Java, but the VM will always be compiled, remember that! (well, always... until it's put on a chip and made into a ubiquitous coprocessor )
Why are current VMs preoccupied with GC? (Score:1, Interesting)
I think that VMs these days are getting bloated with everything including the kitchen sink. This makes them harder to port and test. Performance suffers. What ever happened to keep it simple stupid?
Re:Why are current VMs preoccupied with GC? (Score:1)
Still, as much as (software) emulators emulate existing hardware there have also been several attempts to create "virtual" machines in hardware. (For example: P-code interpreters (low-level Pascal) and Sun's attempts to hard wire a Java VM.)
Re:Why are current VMs preoccupied with GC? (Score:2)
Even the earliest VMs did garbage collection (take a look at Lisp which for some reason nobody has mentioned here yet).
However it is true that this argument could be made for any feature added to the VM, but it does seem that using the VM design to get away from numerically-addressed memory is a natural division that most designers go for.
Because they *must* be. (Score:2)
You missed the point of the two doofuses (Score:2)
If a VM doesn't support garbage collection, then programs written for it will be buggier and less safe than programs written for a VM with garbage collection.
One of the biggest reasons that existing software is so unreliable and unsafe is because of its dependence on C, and the lack of both type safety and garbage collection in C. This allows buffer overflows and memory access violations. You're correct that adding garbage collection (and true type safety) doesn't buy security in and of itself, but it buys a heck of a lot of safety.
Still missing the point (Score:2)
Re:Why are current VMs preoccupied with GC? (Score:2)
Not even the JVM includes anything near the kitchen sink. The libraries do. They're not terribly hard to port when all they do is interpret bytecodes.
It's sad to see people with these kind of attitudes. In their minds, all virtual machine-based languages equals Java. Anything that's not compiled directly to native code equals QBasic. That's not the case.
Re:Why are current VMs preoccupied with GC? (Score:2)
Sorry, don't follow you. I don't have the same attitude, having practical experience in the "real world" designing and developing, I've found that some VM-based languages can be slow, even if code is designed well. Usually, performances depends on the code itself.
Sure, having an opinion based on ignorance is valid. However, it's still rooted in ignorance. One of the purposes of discussion is to share information, and that's what I do.
Re:Why are current VMs preoccupied with GC? (Score:2)
Others have discussed why GC isn't as bad as you say; I agree with them, although they're a little extreme (it's NOT true that you always need GC).
I'm working on a VM which can handle both GC'ed and non-GC'ed stuff at the same time, for a substantial speed advantage. Unfortunately, my VM has a language tiedown; I'm not sure how to add the type support I need to most languages.
-Billy
Low-resolution thread concurrency? (Score:4, Interesting)
It would also be nice to have language-level support for parallel processing, like in Occam.
For example, in a Python implementation, the following code would execute the two for-statements in the "par"-block in parallel:
As the two threads would be executed exactly at the same speed, the output would be:
Re:Low-resolution thread concurrency? (Score:2, Interesting)
The output would more likely be:
a =b 0
a= 0
b= =1
a1
b =
= 2
2
Occam is an interesting language, but I think it has a too restrictive view. No global variables, no mutexes, everything uses channels - even shutting down a multithreaded Occam program is a major pain in the ass - message passing nightmare.
Re:Low-resolution thread concurrency? (Score:2)
Well, yes, I think it's a rather safe assumption if the threading is implemented in the VM; formatting and printing a buffer would probably be implemented with an efficient native function, which would be atomic.
But, yes, if we use native threading, the context switch could occur anywhere, and the output would be a mess, just as you describe. As noted in other messages in this thread (ummm...this discussion thread), using native threading would probably be the wisest choise.
I don't know Occam really at all, but I don't quite like the normal Java or Posix ways of threading either. The PAR statement in Occam might make threading so much easier.
Re:Low-resolution thread concurrency? (Score:2, Informative)
Re:Low-resolution thread concurrency? (Score:2)
You're right, assuming that the VM uses native threads. I was thinking of having the threading implemented in the VM; I guess it would be kind of trivial and it would have very little (if any) overhead because of context switching, although there might be some other costs.
But of course, without native threads, we would lose the possibility to use multiple processors easily, which wouldn't be very nice.
Then why have this low resolution? A friend of mine has a home-brewn VM for an ad-hoc language for embedded programming that handles concurrent execution one instruction at a time. He says it's very important for his embedded application. I don't know his specific reasons, but I'd imagine it has something to do with controlling multiple embedded devices and interfaces.
Anyhow, low-resolution concurrency just sounds cool.
Re:Low-resolution thread concurrency? (Score:2, Informative)
You're right, assuming that the VM uses native threads. I was thinking of having the threading implemented in the VM
IBM's Jalapeno [ibm.com], now known as the Jikes Virtual Machine for Research, does its own thread scheduling instead of using native threads. The compiler generates yield points in method prologues and the back-edges of loops where the VM can preempt the thread. I suppose if you really wanted to you could have it generate a yield point for every instruction...
Re:Low-resolution thread concurrency? (Score:2)
Re:Low-resolution thread concurrency? (Score:3, Informative)
if you have a language that optimizes tail-calls, you could have the front-end of the language convert the separate threads of execution into continuation-passing style [readscheme.org], and then execute the code one continuation at a time, simulating threading on a VM level. if i remember correctly, the scheme48 VM [s48.org] could do that kind of threading, though on a coarse level.
in CPS a function decomposes into a sequence of more primitive functions, each returning a continuation, ie. a handler for computation yet to come. for a simplified example, the evaluation of (+ (* 2 3) (* 4 5)) would evaluate (* 2 3) into 6 and return a continuation that evaluates (+ 6 (* 4 5)), which in turn would evaluate (* 4 5) into 20 and return a continuation that evaluates (+ 6 20), and that would finally evaluate to 26.
but the point here is that one could explicitly halt the evaluation after receiving the first continuation, store it on the queue, and go off and compute something else. after a while you can come back, pop the continuation off the queue, and pick up the computation where you left off.
the problem with such a setup is that it makes optimization difficult. i'd suggest looking at the CPS for more details...
Mainframe mentality creeping back in (Score:3, Insightful)
People had said for a long time that personal computers connected to file servers was a lower-cost, better system. However, now many places are going to web-based or host-based connections because of buggy issues at the desktop and the unmanageability of the personal computer. Couple this with the fact that licensing manangement is such a bear and you see why us Unix folks are glad to see the turn-around.
Mainframes had been on their way out before the personal computer, in favor of smaller satellite processing via minicomputers. However, now people are realizing that virtual computers in a big iron case gives you a better managed array of computing power for multiple users or processes. I for one welcome this back, and hope that we will continue to see vitual computing take over the personal computer business market approach. Bring in the network computers!
Best non-stadard use for a vm? (Score:1)
I'd like to nominate some software I wrote for the most random use of a virtual machine.
I was asked to code a registration routine for a piece of software - after getting the username + serial number from the user I would have typically done some magic to calculate a checksum from the name and see if it matched the given key.
Instead I wrote a small virtual machine which executed z80 machine code. The protection routine litererally started the VM - where all the magic happened. Each opcode was fetched decoded and executed. I think it would have been a real pain to decode ;)
(I guess the clever cracker could have disassembled my windows binary with a z80 disassembler and gotten lucky; but it would have been hard to see what was being executed - unless they could do clever things like disassemble z80 in their head...)
Re:Best non-stadard use for a vm? (Score:2, Funny)
- Phil Greenspun
Re:Best non-stadard use for a vm? (Score:2)
Any protection scheme that relies on a single test point is child's play to circumvent. If you have code sequence like so:
can be trivially defeated by simply adding one jump:
now, insert your Z80 interpreter where the above code reads "<do some magic on the credentials>" and see how hard it is to defeat. Even liberally sprinkling your program with calls to the magic won't help, it just increases the number of times that cracker has to insert jumps into the machine code.
Re:Best non-stadard use for a vm? (Score:2)
So, your code is *not* just a jump to a registration routine that happens to run in a Z80 VM. That would indeed be trivial to hack.
Instead, the code runs in the Z80 VM at a high level. Presumably, you run 99% of the cycles in native code, but the high level control is Z80 VM. So, let's say the top-level loop has 1000 or so instructions of Z80 code, some of which are the registration routine.
If the attacker looks for the code that displays a dialog saying "you are not registered" they will simply find the code for branching in the VM, which is executing many times for other purposes.
If I were the attacker, my approach would probably then be to find the address of the code that would be executed by the VM if it had *not* branched, and to replace the "you are not registered" code with a jump back to that location.
Alternatively, I might observe that the same code was being executed again and again, and surmise that it had multiple purposes. I might splice in a test for the Nth execution of the code and have it not branch. That would slow down the VM, but since you aren't running much code in the VM it won't make a big difference.
Bottom line? Yes, it probably would trip up some people. If your program is only $30, they will get discouraged. If it's $300 they will probably figure it out.
Is there anything I'm missing?
VMs in the OS (Score:1)
Re:VMs in the OS (Score:2)
Probably the main thing that makes people not consider this idea is that it would be a new OS that does not run any existing programs. Although there are plenty of alternative OS's out there, most people see VM as a way to get their new interface onto an existing system so they never consider this way of writing it.
Re:VMs in the OS (Score:2)
It's been done many times since the 70s. Not sure the first time a VM-based language was the OS, but it was the case with Smalltalk, as far back as 1972 or 1976. You can still get a Smalltalk-based OS with SqueakNOS [swiki.net]. Squeak traditionally runs on top of a host OS like Linux, Mac OS, Windows and many others, but it has almost all of the features of an OS, including an awesome (but non-traditional) GUI system, compleat with remote viewing. The binaries are identical between the OS-version of Squeak an the hosted-on-Linux version.
The current state of SqueakNOS is that you still have to write a little C for certain things. Luckily, you can write your low-level code in a subset of Smalltalk and have it translated to C. That's how the Squeak virtual machine is written, no manual C coding required. However, there is active work being done on Squeampiler [gatech.edu], which allows Squeak itself to compile and generate native code. Which means the entire system 100% will be in Smalltalk.
As it is now, if you want to change (in SqueakNOS or Squeak on top of a 'normal' OS) fundamental changes to the language can be made within the environment. The only thing compiled to C is the virtual machine and other C plugins, like OS-specific functions. Everything else, the bytecode compiler, the parser, an emulator for itself, all the development tools and libraries are all written in Smalltalk.
I am working on an operating environment for PDAs, Dynapad [swiki.net] along these lines. I'm doing the development on top of Linux/PPC, Solaris/SPARC, and Windows/x86 and run it on my iPAQ under WinCE/ARM. Eventually, I'd like to run it as the OS, if something like OSKit ever makes it's way to the iPAQ platform.
Parrot (Score:2, Informative)
Re:Parrot (Score:2)
Parrot's not just a Perl VM (Score:2)
There's also talk of Parrot bytecode to Java/CLR bytecode convertors. Interesting stuff, even if we're gonna have to wait ages to actually get something useful.
Overlooking importance of VM in kernel (Score:2, Informative)
Operating System Concepts by Abraham Silberschatz, et al.
Design and Implementation of the 4.4bsd Operating System by McKusic, et al.
Design of the UNIX Operating System by Bach
Modern Operating Systems by Tanenbaum
Operating Systems Design and Implementation by Tanenbaum
hmm infocom... (Score:3, Interesting)
or magnetic scrolls 68k VM, that that even ran on the c64 with its mighty 8bit chip, was emulating the 16/32bit 68K!
aaah long live interactive fiction and virtual machines.
Virtual Analog Circuits and Stuff (Score:2)
.net is not a virtual machine (Score:3, Insightful)
.net code (c#, etc.) compiles down to a standard intermediate language, which gets JITted into machine code, and linked to
.net is not a virtual machine any more than gcc is a virtual machine.
Re:.net is not a virtual machine (Score:2)
garbage collection doesn't require a virtual machine, and neither do delegates, since they're merely function closures. and btw, by your JITting example, any compiler would be a 'virtual machine'.
nice attempt at a troll, though.
One of the nicest VM's I've seen (Score:2)
Was in Unreal. That was, what, five years ago? It was a revalation to me as a commercial games developer. You could script object behaviour in C-like code, and load it dynamically at run time without having to restart the engine or try and do clever tricks with dll's. The development time that saved was simply breathtaking, and it pretty much defined the future of games engines and games development, which epitomise the RAD concept. Heck, the first thing that we did was crank out our own C-like VM, and we never looked back.
Re:One of the nicest VM's I've seen (Score:2)
Of course, while UnrealScript is cool, let's not be too quick to give it credit for VMs in games. UnrealScript was heavily influenced by Quake C from two year earlier (much like everything in Unreal, which should be obvious enough anyone). And there were a number of games from the VGA days which included scripting languages. A popular example is the lame side-scroller Abuse, but there certainly were others.
don't waste your time (Score:2)
I couldn't agree less. I flipped through the book in the bookstore, and I wasn't impressed. Blunden is a C/Assembly language programmer with little understanding of the requirements that a modern programming language places on a virtual machine. So his virtual machine is single threaded and runs in a fixed block of address space, with a fixed size code and data section, a growable stack, and a growable explicitly managed heap. This is fine if the target language is C or assembly language, but not so fine if you want garbage collection, threads, closures, first class continuations, or any of those other language features that were considered cutting edge back in the 1970s. How does his system link to external code, like the system calls in libc? Well, there are 11 "interrupts" called int0 through int10, sort of like the DOS system call interface.
His explanation of why he doesn't support garbage collection is pretty muddled: basically, he's not comfortable with the idea, and doesn't think its practical.
Although I think that a register machine probably is better than a stack machine for this kind of system, he gives none of the arguments I was expecting to see to support this design decision. Instead, we get vague handwaving: apparently, he's more comfortable with register machines, because that's what he's used to.
Doug Moen
Try Smalltalk (VisualWorks & VisualAge) VMs (Score:2)
Blazing fast object allocation (both) interning & loading (VA by a nose,) and both have a full IDE that the others have been trying to achieve since Smalltalk'80 came out.
But remember thei're IDEs not production/delivery. For that you want internationalizable, database drivable GUIs, dialog managers, state machine and transition engines.
All in all. Look at VW & VA and weep. (Or better yet learn.) They've been at it since the days of UCSD Pascal. They've forgot more than you'll ever know.
Re:A replacement for C (Score:2)
Re:Operating Systems (Score:2, Interesting)
Re:Operating Systems (Score:2)
I guess you could say it blurs the line/is highly integrated with the underlying OS, but wasn't there just a protracted legal battle based on that?
Re:Operating Systems (Score:2, Interesting)
Emulators use virtual machines, operating systems use virtual machines (Microsoft's .NET), and programming languages use virtual machines (Perl, Java)".
Microsoft's .NET is an example of a virtual machine used by a particular operating system - there are no claims that .NET is an operating system by itself. Similarly, the Perl and Java programing languages have been implemented on virtual machines - the JVM, and the stack-based (soon to be register-based) Perl virtual machine.
Re:OT: Perl Virtual Machine? (Score:2)
Parrot (Score:2)
Whee!
So a C/C++ library that interfaces well to Parrot would be accessible by LOTS of different scripting languages.
(I haven't been following the developer lists, so this is just based on what I overheard as a casual outsider with a bit of interest.)
Re:Parrot (Score:2, Informative)
Rather, the languages will implement compilers that generate Parrot Assembly language, and then the Parrot assembler will take it from there. This approach really does have a number of advantages. It means that the Parrot community can work on optimizing the heck out of the assembler and runtime, without worrying too much about the concerns of each individual lanuage.
It also means that, for embedded use, Perl/Ruby/Python/Tcl/Scheme/etc. programs can be compiled and loaded onto a machine that only has to have the Parrot runtime installed.