Are 64-bit Binaries Slower than 32-bit Binaries? 444
JigSaw writes "The modern dogma is that 32-bit applications are faster, and that 64-bit imposes a performance penalty. Tony Bourke decided to run a few of tests on his SPARC to see if indeed 64-bit binaries ran slower than 32-bit binaries, and what the actual performance disparity would ultimately be."
Couldn't time fix this? (Score:4, Insightful)
Most "tech gurus" I've talked to at my university about the benefites of 64bit processing say that it is in part due to the increase of the number of registers (allowing you to use more at the same time, shortening the number of cycles needed). Could time allow us to write more efficient kernels, etc for 64 bit processors?
So either the code isn't good enough, or perhaps there's another physical limitation (longer pipelines, etc) on the chip itself? Correct me if I'm wrong.
*Why* do I have that feeling... (Score:-1, Insightful)
Seriously though, there's *so* many other factors involved:
How much cache is ideal for hello.c?
How many branches does it need? Is the prediction worth a shit?
Does hello.c run faster at 2 GHz?
THINK before you post please so my hair doesn't hurt so much... thx.
Re:Moving more data (Score:4, Insightful)
Re: OSNews (Score:5, Insightful)
More bits doesn't automatically mean more speed (Score:5, Insightful)
An architecture with 32-bits of address space can directly address 2^32 or approximately 4 billion bytes of memory. There are many applications where that just isn't enough. More importantly, an architecture whose registers are 32-bits wide is far less efficient when it comes to dealing with values that require more than 32 bits to express. Many floating point values use 64 bits and being able to directly manipulate these in a single register is a lot more efficient than doing voodoo to combine two 32-bit registers.
So, if you have an problem where you're dealing with astronomical quantities of very large (or precise) values, then a 64-bit implementation is going to make a very big difference. If you're running a text editor and surfing the web then having a wider address bus and wider registers isn't going to do squat for you. Now that doesn't mean that there may not be other, somewhat unrelated, architectural improvements found in a 64-bit architecture that a 32-bit system is lacking. Those can make a big difference as well, but then you're talking about the overall efficiency of the design, which is a far less specific issue than whether 64-bits is better/worse than 32.
Lee
Retarded article. (Score:1, Insightful)
Medium answer: If you're not a programmer, yes. Expect about the same speed, but maybe slightly less.
Long answer: Direct comparisons like this are in no way valid because the code is identical. It's the same algorithm running at the same clockspeed. Your compiler can't program. Think about this: There's only so much space taken up by a logical operation. The question:
"is this bit set to one? if yes, do this.. if no, do that"
Which is why Intel is more concerned with clockspeed than number of bits.
Re:Something is wrong. (Score:5, Insightful)
It's you.
OpenSSL in the 32-bit environment as the guy configured it was doing 64-bit arithmetic. Just because the guy had 32-bit pointers doesn't mean that his computer wasn't pushing around 64-bit quantities at once. It's called a "long long".
In fact, as he had OpenSSL configured, he was using some crafty assembly code for his 32-bit OpenSSL builds that even used 64-bit registers. His 64-bit builds were using plain old compiled C.
But he didn't even know that.
Big whoop.
Mark
Re:How mature are the compilers? (Score:5, Insightful)
If you want FAST code you should use the compiler from your hardware vendor. The downside is that they might cost money, and almost definitly implement things in a slightly weird way. Weird when compared to the official standard, weird when compared to the defacto standard that is GCC.
I though this was common knowladge, at least amongst people who would be trying to benchmark compilers...
This is unfair comparison (Score:3, Insightful)
Re: OSNews (Score:4, Insightful)
How can you be certain that this isn't simply comparing the efficiency of the compilers - and not the resulting binaries???
What 'system of belief' is he following? (Score:5, Insightful)
Running benchmarks of 32 vs. 64 bit binaries in a 64 bit Sparc/Solaris environment has shown little or no difference for us, on many occasions. If the author had used Sun's compiler instead of the substantially less-than-optimal gcc, I expect that his 20% average difference would have disappeared.
Re:Moving more data (Score:3, Insightful)
So, memory bandwidth remains an issue, and I concede the point.
Still, buse widths tend to optimize around typical transfer patterns, and pointers tend to grow to to be "always big enough" -- the cases where we tailor pointers to be within smaller constraints are quite specilized. It's more convenient to have one pointer size -- does anyone remember the four memory models that Microsoft C compiler used to support (probably still does)? tiny (16 bit data and code pointers), small (16 bit data, 32 bit code, IIRC), large, and huge? 'Course that isn't a perfect comparison because of the brain dead segmented x86 memory architecture, but you get the idea. It was (is) a pain.
But, bus widths and memory capacities will grow to the point where the 64 bit code of tomorrow will be as fast as the 32 bit code of today, and the need to optimize further will occur only in esoteric bits of code.
Besides, with 64 bits, you can do fun things, like allocate different objects in different virtual memory spaces and use the memory management system to catch wild-pointer bugs (because no two different objects need be adjacent in the logical memory space).
On the whole the advantages outweigh the disadvantages, and the performance penalties will be moot quite shortly.
There's always a trade-off (Score:5, Insightful)
It's not surprising that 64-bit processors are rated much slower than 32-bit ones. The fastest 64-bit AMD is rated 2.0ghz while the fastest AMD 32-bit is 2.2ghz.
If you use a shovel you can move it very fast to dig a hole. If you use a backhoe you're going to move much slower but remove more dirt at a time.
Using modern technology to build a 386 chip would result in one of the highest clock speeds ever but it would be practically useless. Using 386 era technology to build a 64 bit chip would be possible but it'd be massive and horribly slow.
I'm still debating whether or not to go with 64-bit for my next system. I'd rather not spend $700 on a new system so I can have a better graphics card and then have to spend several hundred more shortly after to replace the CPU and MB again. But then again, 64-bit prices are still quite high and I'd probably be able to be productive on 32-bit for several more years before 32-bit goes away.
Ben
Re: OSNews (Score:5, Insightful)
That is the sort of "obvious" conventional wisdom that the article is questioning. In fact, 64-bit architecture means a lot more than pointer size, and merely counting bits is no way to estimate performance.
Re:Anyone ever used WinXP-64bit edition? (Score:5, Insightful)
Re:If 32bit is faster than 64... (Score:5, Insightful)
Of course, my conceptions back then might be getting a bit dated now. But not too terribly much. 32 bits will probably be the optimum for general use for quite some time. There's not too many applications that need a 64 bit address space. Not too many applications need 64 bit integers. We'll need 64 bit sometime, but I don't see the need for it in *general* purpose computing for the remainder of the decade. (Longhorn might actually need to a 64 bit address space, but that's another story...).
Remembering back to the 80286 days, people were always running up against the 16 bit barrier. It was a pain in the butt. But unless you're running an enterprise database, or performing complex cryptoanalysis, you're probably not running up against the 32 bit barrier.
But of course, given that you're viewed as a dusty relic if you're not using a box with 512Mb video memory and 5.1 audio to calculate your spreadsheets, the market might push us into 64 bit whether we need it or not.
Old news (Score:2, Insightful)
The fact is as true as it was then: some applications are going to run faster just because 32-bit compilers are more 'mature'. Once the newer method becomes mainstream, you will see either the same speed, or a gain in speed.
Needless to say, the guy in the other post who stated an anology with an abacus had it right- something small is obviously going to execute faster. We arent switching to 64-bit processors so we can run
10 print "64-bit is k3wl"
20 goto 10
The more complex applications of the future generation, as well as the ability to move large amounts of data from memory to cpu, is what is driving the move.
What else is new. This is about scaling (Score:3, Insightful)
Both in terms of direct CPU performance and for the software that runs on it.
This has happened a bunch of times during history. Remember the introduction of MMUs for instance? Definately slows down the software running on the machine, but without an MMU we all know that it was virtually impossible to do stable multitasking.
1/2 GB of memory basically the standard these days with XP.
A lot of people are buying home computers with 1 GB or more.
Dell in Japan (where I live) has a special offer these days on a lattitude D600 with 1GB of ram. That is, they expect to sell this thing in quantities.
I think a fair amount of PC users will hit the 4GB limit within a few years. Personally, I already swear about having just 1GB in my desktop at times when I have a handful of images from my slide scanner open in photoshop + the obvious browsers/mail programs and maybe an office program or 2 open.
Introducing 64bit does not make todays HW any faster than their counterparts, but it will make it possible to continue making machines better, faster and capable of handling increasingly more complex tasks.
Ah well (Score:2, Insightful)
This would appear to miss the point... (Score:3, Insightful)
Re:There's always a trade-off (Score:2, Insightful)
Re: OSNews (Score:2, Insightful)
Let's not do anything like that! (Score:3, Insightful)
Uniform and simple is good...
Useless tests (Score:2, Insightful)
No, it's not a test of whether 32 or 64 bit is faster. It's a test of whether an obsolete architecture whose fastest younger siblings are still outperformed by IBM, Intel, and AMD.
The results tell you nothing about whether you should seriously consider 64 bit, nor where you should actually be using a 64 bit setup.
Maybe someone can post the performance results for Doom running on a new AMD 64 bit box with a top-end ATI or NVidia card. It'd be about as relevant as the performance of a SPARC5 is to making a purchase decision.
Re:Moving more data (Score:1, Insightful)
Re:gcc? (Score:3, Insightful)
Actually I wouldn't say that gcc produces particularly bad code on all computers, it's sorta average, but not bad. Certainly the 3.3.x series are alot better than 2. Pretty good at number crunching [randombit.net] and it is more standards compliant than most.
Re: 64 bits: no magic (Score:1, Insightful)
A CPU with a lot of slow transistors
is s worse than a CPU with fews quick transistors, so, short paths are better.
The page-translation in long mode is very slow!!!
In Opteron: 4-level for 64 bits VS 2-level for 32 bits, 512*512*512*512-4KiB vs 1024*1024-4KiB, so, legacy mode is quicker than long mode.
And, the cache penalization is a little high:
With 1 MiB of L2 cache, an array of 10'000'000 longs is a bit slower than an array of 10'000'000 ints.
And for building like-LEG0-machines, is better with AthlonXP than with the expensive Opteron.
open4free
Re: OSNews (Score:3, Insightful)
A benchmark is useless without interpretation. The people at OSNews have failed to give us any technical background information on the SparcV chip (penalties running in 64-bit as well as benefits), a proper breakdown of the type of math done by the example programs, as well as analyses of bottlenecks in the benchmarks (MySQL, for instance, is possibly I/O limited).
They've given us raw numbers, with no thought behind them. This is what makes a bad article.
sizeof(int) (Score:3, Insightful)
I don't know about Sun, but in some other environments in which a 32 bit and a 64 bit model exist, the compiler will always treat an int as 32 bits, so as not to cause structures to change size. Hell, even on the Alpha, which was NEVER a 32 bit platform, gcc would normally have:
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
Now, consider the following code:
for (int i = 0; i 100; ++i)
{
frobnicate(i);
}
IF the compiler treats an int as 4 bytes, and IF the compiler has also been informed that the CPU is a 64 bit CPU, then the compiler may be doing dumb stuff like trying to force the size of "i" to be 4 bytes, by masking it or other foolish things.
So, the question I would have is, did the author run a test to insure that the compiler was really making int's and unsigned's be 64 bits or not?
Re:Benchmarks (Score:5, Insightful)
And they often show disparity in their results due to being interupted. This would be a baddly carried out benchmark under less than ideal conditions. This is human error. Of course there are slight variations in subsequent runs, but these should be able to be explained and compensated for. It is most certainly not a benchmark lie though. If it took that long, then it took that long, now find out why!
But in benchmarking scientific rigor is always lost
Failing to retain a scientific approach is a human failing. It does not always happen and is not the benchmark telling lies, but due to poor procedure.
But the benchmark choice is frequently meaningless or misleading.
[poor] "Choice", "Meaningless" and "misleading" [results] each require an incompetent person. Don't blame the benchmark. Even if they wrote the benchmark, they might not understand the results.
Benchmarks do not elucidate any fact.
Yes they do. Very very specific facts which can later be used to make considerations for future decisions. It could be a specific application, algorithm, overall CPU ALU, FPU or single CPU instruction, it could be bus type, etc. Specific facts leading to educated decisions.
You will always see in CPU tests LAME encoding. The p4 will always win against an Athlon.
If this is the case, then LAME as it stands is specifically faster on a P4 than an Athlon. That would be a coarse benchmark though. Some would call it "real world". And it is. It is specific to LAME, but not specific at a lower level where it could be found why this might be the case and how to improve LAME on both P4's and Athlons seperately (with an end result that might have the Athlon out-perform the P4, due to new insight gained from benchmarking specific areas).
The reviewer will not explain why this is the case and that LAME encoding is simply clock cycle dependent.
So the reviewers fault becomes the benchmarks fault?
Benchmarkers need to be able to explain all the dependent variables, to tell why the results happen.
Thus my original statements?
In graphics cards Q3 benchmarks above a certain magnitude are meaningless.
Bad choice of benchmark is the fault of the benchmark?
Benchmarks need to be interpetted by someone competent enough to do so. Just because someone carried out a poor benchmark procedure or could not understand the results, does not mean the benchmark lied.
The reviewer with meaningless variables creates an inauthentic conditioned desire in the consumer that leads to bad and lax software and hardware engineering.
Incompetent reviewer, ignorant consumer, deceitful engineering.
Morrowind and other games have horrible problems with their graphics engine that can not be saved by faster GPUs and dx9.
So they are CPU bound? Memory? Sounds like maybe they don't know how to profile their code too well. When profiling, it helps to know how to benchmark and make meaning out of the results.
You cannot improve that which you do not understand, through anything other than luck. Benchmarks provide specific facts which, when correctly interpreted, can bring about improvements. People who can't interpret them, say they are meaningless.
Re: OSNews (Score:3, Insightful)
Re:retarded. (Score:5, Insightful)
bits (Score:2, Insightful)
and merely counting bits is no way to estimate performance.
If you only have room for 16k of data in your L1 cache and all your size_t, pointers, and in most cases longs too take twice as much memory at worst it is like you have only 8k of cache now compared to the 32bit version!At best it is going to make no difference, but at worst it is like your system now has only half the cache and half the memory bandwidth. Seems to me that by counting bits you can estimate your performance will be between 100% and 50% of the 32bit version, all other things equal.
A noteable exception would be when you need a 64bit value and are forced to emulate that.