How to Kill x86 and Thread-Level Parallelism 72
kid inputs: "There's an interesting article discussing how one might go about 'killing' x86. The article details a number of different technological solutions, from a clean 64-bit replacement (Alpha?), to a radically different VLIW approach (Itanium), and an evolutionary solution (Opteron). As is often the case in situations like these, market forces dictate which technologies become entrenched and whether or not they stay that way (VHS vs Beta, anyone?). Another article by the same author covers hardware multi-threading and exploiting thread level parallelism, like Intel's Hyperthreading or IBM's POWER4 with its dual-cores on a die. These types of implementations can really pay off if the software supports it. In the case of servers, most applications tend to be multi-user, and so are parallel in nature."
endian-little post first! (Score:4, Funny)
Post! First
A From Litte A system endian!
Rules! x86
Re:endian-little post first! (Score:2)
Re:endian-little post first! (Score:2)
Re:endian-little post first! (Score:2)
Re:endian-little post first! (Score:2)
It's the same issue as decompiling, really; binary translation is decompile-reco
Re:endian-little post first! (Score:2)
Re:endian-little post first! (Score:2, Informative)
If only it were that simple. Ever heard of a "computed goto"?
Re:endian-little post first! (Score:2)
If only it were that simple. Ever heard of a "computed goto"?
Computed goto can be a problem, but it can be overcome, or in a new archeteture eliminated entirely.
A fully general solution requires JIT translation. Lay the code out in blocks with block metadata to indicate the state of translation. Pre-translate by starting at the entry point and following program flow through both sides of branches. As you note, a computed jump cannot be predicted for all cases. However, once a computed jump instruction
Re:endian-little post first! (Score:2)
Re:endian-little post first! (Score:2)
Please explain why my post is stupid... the original post that we are all responding to suggested doing static binary translation. I explained why this is extremely difficult. His response showed that he had not considered the case of variable input; if you neglect this case, you can statically compile all programs to a few print statements and a return value, so it's really just as
Re:endian-little post first! (Score:2)
The dynamic linker is nothing more than a program that does the "same thing" every time it runs: what it does is equivalent to reading the requested executable, writing its contents to memory as data, then jumping to it. (The reality is both more and less complicated due to mmaping.) System-level programs and libraries, plus just-in-time environments like Java, do stuff like that all the time.
Re:endian-little post first! (Score:1)
The x86 is terrible. It sprung from a chip designed for calculators and still carries all of the baggage from that origin. This is why the Athlon64 is so distressing. Sure it's a more convenient transistion and I'm all for dethroning Intel, but it maintains
Re:endian-little post first! (Score:2)
How to kill x86 (Score:2, Funny)
Don't forget (Score:4, Interesting)
Re:Don't forget (Score:1)
But why do they need that? Why use carbon copying to make three copies of a form when you can just run off three copies on a laser printer?
Re:Don't forget (Score:1)
Re:Don't forget (Score:1)
Re:Don't forget (Score:2)
Basically, if you use DRAM in space, the tiny capacitors inside end up getting disrupted by the ambient radiation, causing bits to get flipped.
Re:Don't forget (Score:1)
Anyway, lots of Intel Pentium and later class computers have flown on the shuttle. I don't think they have too much trouble with the radiation, despite being off-the-shelf models. The shuttle is still well protected from radiation at its typical altitude.
Re:Don't forget (Score:2)
However, if desktop memory gets bigger, ECC RAM will become necessary. It appears to have been constant at 256/512 for a while now, so the increase has slowed, if not stopped.
old stuff (Score:1)
And while old hardware still works, especially as long as you have software that's ported to it, old software does not. For that matter, since old hardware is so cheap, people who would keep using 16-bit processors should buy 32-b
Let's kill x86! (Score:2, Insightful)
Might as well compound the folly of tossing out a perfectly good instruction set with the folly of tossing out perfectly good source code.
Update, don't reinvent. The desire to reinvent is a junior engineer character flaw. It takes several experiences in spending long hours tracking down bugs in the new implementation rather than simply updating some older code that worked fine.
Re:Let's kill x86! (Score:2, Funny)
1. The mythical man-month: Plan to build one to throw away. You will anyhow.
2. Hack something together. Extend it. It will work fine. (This approach really works excellent in Common Lisp and proves deadly for Perl programs)
It is true that Intel's base instruction set survived the last 18 years quite unchanged. And if you consider the pre-80386-era even longer. It is also true that it is proven and works. But if you ever tried to write an assembler or disasse
Re:Let's kill x86! (Score:5, Informative)
Re:Let's kill x86! (Score:2, Insightful)
Re:Let's kill x86! (Score:2)
Re:Let's kill x86! (Score:2)
Nope. It doesn't. It used to in the early 90s, but now we have transistors to spare. The ISA doesn't matter anymore. It's at most 2nd order effect on die size, power and performance. I design x86 processors for a living - It's a fact.
There are a million tricks architects can play to get around poor ISAs. What are the fastest SPECint machines on the planet? Hmm...x86 machines!
The only rea
Re:Let's kill x86! (Score:1)
Re:Let's kill x86! (Score:2)
There's no reason that real mode can't be phased out. The first take on it will need a strap to determine if the CPU starts in real or protected mode. This is mostly to avoid a great deal of chaos for BIOS. There's no reason the CPU can't start in flat 32 bit mode with all segments set to 0-0xffffffff.
Of course, LinuxBIOS spends as little time as possible in real mode before going to flat 32bit mode but other BIOS will need more significant changes.
It should be possible to phase in a new mode where the
Re:Let's kill x86! (Score:2)
That is not always true. There was company called Wright Aircraft engines and yes it was started by the Wright brothers. In the late 40s and early 50s they where one of the top engine makers. They did not want to waist time with those new fangled jet engines. There piston engines where the standard in airliners and they thought it would go on for ever.... It didn't.
Sometimes starting over is a good thing.
The x86 is also fa
h/w vs s/w (Score:2, Insightful)
Re:h/w vs s/w (Score:3, Interesting)
Re:h/w vs s/w (Score:1)
This is what SUN calls (Score:4, Informative)
See their media kit available at [sun.com]
http://www.sun.com/aboutsun/media/presskits/thr
However, I believe the whole idea is nothing new. AFAIK, there are only two ways of increasing the performance of a processor (Operations Per Second) - either increase the IPC (Instructions per cycle) by increasing parallelism or decrease the cycle time by increasing the clock Rate (Ghz).
Each method has its limits and follows the law of diminishing returns - for e.g. increasing the clock rate implies increasing the number of stages in the pipeline...and after say 10000 stages, the penalties imposed due to flushing the pipeline might compensate for the increased GhZ. Similarly if you manage to place 100000 cores on a chip, scheduling amongst these cores and providing realtime access to the memory for all these cores will become the bottleneck. Hence, I take statements like "how to kill the x86" with a pinch of salt.
Finally, it will the fabcrication (physical) technology that decides which one of these dies. For e.g. if tomorrow someone is able to come up with a process that enables 100Ghz chips at the (think extensions of SOI etc) decreasing the cycle time will win. Similarly, if someone comes out with femto (10^-15 ) metre fabrication technology, then parallelism will win.
Re:This is what SUN calls (Score:1)
Buy who knows, maybe we will have superstring transistors in the future.
Re:This is what SUN calls (Score:2)
It had no cache -- no cache logic! It did have hardware support for a god-awful number of threads per cpu tho. Each time one of them stalled, it would thread switch and keep going. After about 60 or so cycles (this was a few years back), the memory read would be back, so if you had 64 threads per CPU, you would never see a memory-latency related stall.
As all things extreme (think CM
it doesn't matter anymore (Score:3, Insightful)
And VLIW in particular is quite unconvincing: processors should rely less on compilers, not impose a bigger burden on software writers.
Smaller decoder benefits (Score:1)
Today, we can put enough logic between the instruction strem and the processor
Wouldn't less decoder logic allow for a smaller decoder, which requires less die space and emits less heat?
processors should rely less on compilers
To the other extreme, do you propose a processor that can run Perl directly? What compromise would you find best?
not quite ... (Score:2)
x86 however has a ridiculously small number of registers. This means that you have to go to memory A LOT. It's easy to make register operations fast, extremely hard to make memory fast. The performance gap between memory and processors is constantly increasing.
That's why x86-64 has 16 general purpose registers, Alpha - 64 and Itanium ... 128.
Bottom
Cost-efficiency > * (Score:1, Interesting)
Sure, I could spend $20 on eBay and get a Sparc Lunchbox, but there's not enough processing power in there for me. I could also go out and buy a year-old IBM mainframe, but I doubt any auction site will have them anywhere near my price range. I want something that's decent but also cheap. I
Re:Cost-efficiency * (Score:2)
the most bang for the buck, but there are more
dimensions to the purchasing decision than
mips, mflops, and $$. There are watts and
hours and then, god forbid, intangibles.
ARM and PPC have the best shot at displacing
ia32 and its best successor, amd64, because
they accomodate very real market segments.
We keep waiting for commodity PPC hardware,
but it never emerges because the OSS community
isn't big enough to drive sales to economical
volume; but some magical event co
Power 4? (Score:1)
Re:Power 4? (Score:2, Informative)
so each 'cpu' will look like 4 logical cpus
Re:Power 4? (Score:2, Insightful)
Step 1: Hyperthreading
Step 2: Multicore
Step 3: Crush competition (i.e. Profit)
Multicore would increase the Windows Tax (Score:1)
I wouldn't expect seeing multicore in home PCs within the next five years, even if multicore becomes so cheap Intel could start putting it in its Celeron chips. The limitation is that Microsoft charges for Windows licenses per core; a license for Windows XP Professional, which can handle two cores, costs much more than a license for Windows XP Home Edition, which can handle one core. Wouldn't multicore require selling the machine with a more expensive version of Microsoft Windows?
I say "next five years"
Architecture for software reliability (Score:5, Informative)
The neat hardware implementation of this would be to make all MOV instructions take nearly the same time, regardless of the amount of data moved. A MOV should result in a remapping of the source and destination memory in the cache system. Even if this were just implemented for aligned moves, it would be a big help. When your application's 8K buffer needs to be copied to the file system, that copy should be done by updating cache control info, not by really doing it.
With this, windowing becomes far simpler. Each window is maintained locally. Shared window management is reduced to screen space allocation, which is done by commanding the window MMU.
Re:Architecture for software reliability (Score:5, Interesting)
Windows are drawn on OpenGL surfaces and their layering is handled entirely by the GPU in Quartz Extreme, and plain old Quartz does basically the same thing in software buffers. In either case, an app never has to do any redreawing when one of it's windows is revealed, it's all handled by Quartz.
And supposedly, whenever it eventually comes out, Longhorn will do more or less the same thing.
Channelized I/O is probably a good idea, but it's either going to cost you some bandwidth (route all IO through a expanded version of current MMUs), or be expensive (a seperate MMU for IO). I'm not saying it might not be worth it in the long run, but it will take a bite out of price/performance in the short term for questionable immeadiate stability gains (one would hope that most people writing kernel space drivers have the sense to KISS).
High speed copy sounds really interesting, but I'm not sure how practical it is to add to current systems.
Re:Architecture for software reliability (Score:2)
It shouldn't hurt bandwidth. The problem with MMUs is latency, and adding a few hundred nanoseconds to I/O latency isn't going to hurt. I/O accesses have far more coherency than regular memory accesses, so you don't need that much cacheing within the I/O MMU.
The original Apollo Domain machines had an MMU between t
Re:Architecture for software reliability (Score:3, Interesting)
Re:Architecture for software reliability (Score:2)
The point here is that we're tied to some architectural decisions from an era when transistors were more expensive, and those decisions are worth a new look.
Multiple chips (Score:3, Interesting)
Emulators can be implemented such that old chips can still run code from the new standard (and visa versa), just slower. For development, training, simple apps, and testing that is usually fast enough.
A box could come with both an X86 and an Alpha-clone, for example. Eventually over time the X86 chip is not worth it. The few old apps laying around just use emulation mode.
Re:Multiple chips (Score:2, Informative)
Re:heavily scripted page (Score:2)
So, yes. Please stop using Netscape 4.
Re:heavily scripted page (Score:2)
wait, real men browse????????