Catch up on stories from the past week (and beyond) at the Slashdot story archive

Are 64-bit Binaries Slower than 32-bit Binaries? 444

Posted by michael on Saturday January 24, 2004 @12:05AM from the glass-half-empty dept.

JigSaw writes "The modern dogma is that 32-bit applications are faster, and that 64-bit imposes a performance penalty. Tony Bourke decided to run a few of tests on his SPARC to see if indeed 64-bit binaries ran slower than 32-bit binaries, and what the actual performance disparity would ultimately be."

This discussion has been archived. No new comments can be posted.

Are 64-bit Binaries Slower than 32-bit Binaries?

Load All Comments

Search 444 Comments Log In/Create an Account

Comments Filter:

Re: OSNews (Score:5, Funny)

by duffbeer703 ( 177751 ) * writes: on Saturday January 24, 2004 @12:06AM (#8072821)

In case anyone hasn't realized it yet, this article proves that OSNews is the most retarded website on the planet.

The typical story is titled like "A comprehensive review of the Atari ST". The contents are typically something like... "I found an old Atari ST, but my cdrom wouldn't fit in the 5.25" disk drive and mozilla wouldn't compile. So the Atari sucks"

I benchmarked a skilled Chinese abacus user against a C-programmer implementing an accounting system. The chinese dude figured out that 1+1=2 before the C-programmer loaded his editor, so the abacus is faster.

Share
twitter facebook
- Re: OSNews (Score:5, Insightful)
  
  by rainwalker ( 174354 ) writes: on Saturday January 24, 2004 @12:27AM (#8072929)
  
  Your "analysis" may be valid, but it's really not applicable. The title of the story is, "Are 64-bit Binaries Really Slower than 32-bit Binaries?" The author takes a 64-bit machine, compiles a few programs, and tests the resulting binaries to see which is faster. I'd say that the review is aptly titled and an interesting point to think on. Certainly he didn't compile every open source program known to mankind, as it sounds like he missed some pet app of yours. OpenSSL might be kind of arbitrary, but gzip and MySQL seem like reasonable apps to test. Like the last page says (you *did* RTFA, right?), if you don't like his review, go write your own and get it published.
  
  Parent Share
  twitter facebook
  - Re: OSNews (Score:4, Insightful)
    
    by chunkwhite86 ( 593696 ) writes: on Saturday January 24, 2004 @12:55AM (#8073070)
    
    Your "analysis" may be valid, but it's really not applicable. The title of the story is, "Are 64-bit Binaries Really Slower than 32-bit Binaries?" The author takes a 64-bit machine, compiles a few programs, and tests the resulting binaries to see which is faster.
    
    How can you be certain that this isn't simply comparing the efficiency of the compilers - and not the resulting binaries???
    
    Parent Share
    twitter facebook
    - Re: OSNews (Score:3, Informative)
      
      by be-fan ( 61476 ) writes:
      
      Because he used the same compiler, in 32-bit and 64-bit mode???
      - Re: OSNews (Score:3, Insightful)
        
        by cruelworld ( 21187 ) writes:
        
        And what if the compiler sucks/has no optimizations for 64-bit binaries?
        
        Re: OSNews (Score:3, Interesting)
        
        by be-fan ( 61476 ) writes:
        
        If the compiler sucks, then it would suck equally for 32-bit and 64-bit binaries! They use the same code generator!
      - Re: OSNews (Score:5, Informative)
        
        by be-fan ( 61476 ) writes: on Saturday January 24, 2004 @01:18AM (#8073156)
        
        GCC uses the same code generator for both Sparc32 and Sparc64.
        
        Parent Share
        twitter facebook
        
        Re: OSNews (Score:5, Informative)
        
        by be-fan ( 61476 ) writes: on Saturday January 24, 2004 @02:00AM (#8073332)
        
        On SPARC, there are no 64-bit-only optimizations. The only reason to use 64-bit math is either if you need 64-bit integers, or use 64-bit pointers. Since none of the benchmarks can use either (the MySQL benchmark could, but the machine only had 256MB of RAM).
        
        Parent Share
        twitter facebook
  - - Re: OSNews (Score:5, Insightful)
      
      by Guuge ( 719028 ) writes: on Saturday January 24, 2004 @01:34AM (#8073209)
      
      News flash: 64-bit apps are, usually, slightly slower than 32-bit ones. Duh. Any developer who's been around 64-bit environments for more than a few weeks knows this. It's not like there's some subtle magic going on here; bigger pointers means more data to schlep around.
      
      That is the sort of "obvious" conventional wisdom that the article is questioning. In fact, 64-bit architecture means a lot more than pointer size, and merely counting bits is no way to estimate performance.
      
      Parent Share
      twitter facebook
    - OSNews = UnNews? (Score:3, Flamebait)
      
      by MrNybbles ( 618800 ) writes:
      
      Dant makes a good point about taking a little longer to read 64 bits than 32.
      Still, the word size of the processor is not a major factor in now fast a CPU is. Finding fater ways to process instructions, caches, and how fast you run the CPUs at make more of a difference. I am probably leaving out a lot of other major factors. Oh well.
      The article is a bit interesting although it seems very amateurish. Just my personal opinion.
      In fact the same logic means that with all else being equal an 8 bit processor
      - Re:OSNews = UnNews? (Score:5, Informative)
        
        by fitten ( 521191 ) writes: on Saturday January 24, 2004 @04:28AM (#8073790)
        
        Don't know how they could be exactly the same *except* for the word size. In order to process the two different word sizes, there will have to be differences in circuitry (ALU is wider, so are lots of things like the buffers between pipeline stages and such).
        
        One of the issues that people forget is that a 64-bit processor may be able to retire a set number of 64-bit, say, integer additions per clock cycle (NOTE: retiring an operation per clock cycle does NOT mean that the operation takes one clock cycle to perform). Well, the odds are that it will also retire the same number of 32-bit integer additions per clock cycle. It may take 5 clock cycles to do either sized addition even. So, what do you have that is different? Well, on the SPARC, most simple operations are going to be similar in execution time. Regarless of the number of register windows that the particular architecture supports (which may come into play in some codes), you still basically have 32 registers for use in your computational kernel. The only real difference between many 32-bit and 64-bit versions of the code will be the amount of data that has to be moved around.
        
        Where the 64-bit will help is when the 32-bit code has to synthesize 64-bit operations or has to do things like work on bit streams (not word/byte streams exactly) and can work on 64-bits at a time rather than doing really the same thing on 32-bits two times as much (128 bytes can be traversed in 32 32-bit operations or 16 64-bit operations - half the number of reads/operations).
        
        All of this is pretty well understood by those who have dealt with these type systems before. However, the relative newcomer Opteron has an additional twist. In 64-bit mode, there are twice as many registers that can be used compared to 32-bit mode. This may (read: will) cause some codes to be done faster simply because more data can be stored in registers rather than memory, even L1 cache is a bit slower than a register.
        
        Parent Share
        twitter facebook
  - - Re:retarded. (Score:5, Informative)
      
      by fucksl4shd0t ( 630000 ) writes: on Saturday January 24, 2004 @02:09AM (#8073372) Homepage Journal
      
      They've at best proved a supposition about a single architecture/process/compiler family. They have not proved a general case. Did they test on amd64? Alpha? Mips? No? Then why are they making unwarranted generalizations? Ah, they're retarded.
      Actually, they didn't make generalizations. He very specifically stated that he only tested on a 64-bit Sparc, and an older one at that. He pointed out that while you can make some general conclusions, you can and should run tests on other architectures.
      He also pointed out that he only tested a few applications, not a whole bunch of them. He was questioning conventional wisdom and wanted to know if there was any fact behind it, and he determined that there was. He did not determine the entire scope of the facts, and he did not claim to do so.
      Sorry, I found it to be an interesting read, but you really have to take the first page seriously when he says "I only tested these things, so I can only conclude based on these tests, and it doesn't prove the general case." If you ignore that, then yes, you'll wind up with what you took away from the article.
      
      Parent Share
      twitter facebook
      - Re:retarded. (Score:5, Insightful)
        
        by JacobO ( 41895 ) writes: on Saturday January 24, 2004 @11:55AM (#8074997)
        
        I just wonder why some are so offended by the article. I have to believe that some people feel that he has "disagreed" with them or something to have such violent reactions. It's just some benchmarks, as he infers, it's better than some people just supposing the answer to a things they are wondering about.
        
        Parent Share
        twitter facebook
  - - What purchase decision? (Score:4, Interesting)
      
      by Valdrax ( 32670 ) writes: on Saturday January 24, 2004 @05:57PM (#8077211)
      
      This is modded Insightful?
      
      You've completely missed the entire point of the test. This has nothing to do with your next purchase decision -- it's purely designed to test whether or not the common claim that using 64-bit values decreases performance due to memory latency is true. This test makes no claims whatsoever that it has anything to do with whether or not you should be using a 64-bit setup. RTFA.
      
      The "obsolete architecture" is one of the few where 64-bit and 32-bit operations have no inherent performance advantage on the processor, unlike the Opteron and Itanium processors where 64-bit mode has several advantages over 32-bit mode (extra registers or not being emulated). This makes it a perfect testbed for evaluating this claim. The speed of the processor has absolutely no relevance to the question at hand (with the exception of testing memory access starvation on system with a greater CPU to bus clock difference).
      
      It's a shame you're too wrapped up in a "buy, buy, buy" mindset to consider the value of curiosity and of testing commonly held beliefs.
      
      Parent Share
      twitter facebook
- Benchmarks (Score:3, Funny)
  
  by Anonymous Coward writes:
  
  As it needs to be said for any benchmarking story:
  
  There are 3 types lies. Lies. Damned Lies. ...and benchmarks.
  - Re:Benchmarks (Score:5, Funny)
    
    by Frymaster ( 171343 ) writes: on Saturday January 24, 2004 @12:55AM (#8073068) Homepage Journal
    
    There are 3 types lies. Lies. Damned Lies. ...and benchmarks.
    i've got some specint stats that show that damned lies are up to 30% faster.
    
    Parent Share
    twitter facebook
  - - - Re:Benchmarks (Score:5, Insightful)
        
        by Shanep ( 68243 ) writes: on Saturday January 24, 2004 @10:45AM (#8074726) Homepage
        
        Benchmarks are meant to ideally test minimal pairs
        
        And they often show disparity in their results due to being interupted. This would be a baddly carried out benchmark under less than ideal conditions. This is human error. Of course there are slight variations in subsequent runs, but these should be able to be explained and compensated for. It is most certainly not a benchmark lie though. If it took that long, then it took that long, now find out why!
        
        But in benchmarking scientific rigor is always lost
        
        Failing to retain a scientific approach is a human failing. It does not always happen and is not the benchmark telling lies, but due to poor procedure.
        
        But the benchmark choice is frequently meaningless or misleading.
        
        [poor] "Choice", "Meaningless" and "misleading" [results] each require an incompetent person. Don't blame the benchmark. Even if they wrote the benchmark, they might not understand the results.
        
        Benchmarks do not elucidate any fact.
        
        Yes they do. Very very specific facts which can later be used to make considerations for future decisions. It could be a specific application, algorithm, overall CPU ALU, FPU or single CPU instruction, it could be bus type, etc. Specific facts leading to educated decisions.
        
        You will always see in CPU tests LAME encoding. The p4 will always win against an Athlon.
        
        If this is the case, then LAME as it stands is specifically faster on a P4 than an Athlon. That would be a coarse benchmark though. Some would call it "real world". And it is. It is specific to LAME, but not specific at a lower level where it could be found why this might be the case and how to improve LAME on both P4's and Athlons seperately (with an end result that might have the Athlon out-perform the P4, due to new insight gained from benchmarking specific areas).
        
        The reviewer will not explain why this is the case and that LAME encoding is simply clock cycle dependent.
        
        So the reviewers fault becomes the benchmarks fault?
        
        Benchmarkers need to be able to explain all the dependent variables, to tell why the results happen.
        
        Thus my original statements?
        
        In graphics cards Q3 benchmarks above a certain magnitude are meaningless.
        
        Bad choice of benchmark is the fault of the benchmark?
        
        Benchmarks need to be interpetted by someone competent enough to do so. Just because someone carried out a poor benchmark procedure or could not understand the results, does not mean the benchmark lied.
        
        The reviewer with meaningless variables creates an inauthentic conditioned desire in the consumer that leads to bad and lax software and hardware engineering.
        
        Incompetent reviewer, ignorant consumer, deceitful engineering.
        
        Morrowind and other games have horrible problems with their graphics engine that can not be saved by faster GPUs and dx9.
        
        So they are CPU bound? Memory? Sounds like maybe they don't know how to profile their code too well. When profiling, it helps to know how to benchmark and make meaning out of the results.
        
        You cannot improve that which you do not understand, through anything other than luck. Benchmarks provide specific facts which, when correctly interpreted, can bring about improvements. People who can't interpret them, say they are meaningless.
        
        Parent Share
        twitter facebook
- Re: OSNews (Score:5, Funny)
  
  by DNS-and-BIND ( 461968 ) writes: on Saturday January 24, 2004 @12:57AM (#8073080) Homepage
  
  Are you kidding? This guy is a genius. Not only did he actually figure out that the UltraSPARC-II processor is 64-bit, but he can actually use the file and time utilities! Most of the "linux admin" types I know who buy old Sparcs for the novelty factor end up putting linux on them anyway..."This Solaris stuff is too hard".
  
  Parent Share
  twitter facebook
  - Re: OSNews (Score:4, Interesting)
    
    by Endive4Ever ( 742304 ) writes: on Saturday January 24, 2004 @01:26AM (#8073184)
    
    I put NetBSD on most of my Sparc hardware. Because then I can run and build from the same exact source tree of packages as I use on my Intel boxes. And run a kernel built from exactly the same source.
    
    Which brings up a point: both NetBSD/Sparc and NetBSD/Sparc64 will run on an Ultra 1, which is a 64 bit machine. Why doesn't somebody install each NetBSD port on two seperate Ultra 1 machines. Then the benchmark comparision can be between the normal apps that build on both systems, running in parallel on two identical systems. Its exactly the same codebase except for the 32 or 64 bittedness.
    
    Parent Share
    twitter facebook
- Re: OSNews (Score:3, Insightful)
  
  by BigFootApe ( 264256 ) writes:
  
  You are correct, although the issues are more subtle than your examples (not hard).
  
  A benchmark is useless without interpretation. The people at OSNews have failed to give us any technical background information on the SparcV chip (penalties running in 64-bit as well as benefits), a proper breakdown of the type of math done by the example programs, as well as analyses of bottlenecks in the benchmarks (MySQL, for instance, is possibly I/O limited).
  
  They've given us raw numbers, with no thought behind them. T
- - Re: OSNews (Score:2, Informative)
    
    by Ninwa ( 583633 ) writes:
    
    Well neither of you have provided any actual evidence proving they rock.. or sock... o.O -tromps off to OSNews to check out their benchmarks- I shall be back ^_^
    - Re: OSNews (Score:2)
      
      by juuri ( 7678 ) writes:
      
      Good luck, you'll need it.
It all depends... (Score:5, Funny)

by paul248 ( 536459 ) writes: on Saturday January 24, 2004 @12:07AM (#8072830) Homepage

It all depends on how many of those 64 bits are 1's. 1's are a lot heavier than 0's, so too many of them will slow your program down a lot. If you compare a 32-bit program with all 1's, it will run significantly slower than a 64-bit program with only a few 1's. It's simple, really.

Share
twitter facebook
- Re:It all depends... (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  How do you figure? CMOS only uses energy when transitioning between one and zero, when both transistors are in the ohmic region (drawing current). I don't see how 1 is any more heavy than 0.
  - Re:It all depends... (Score:4, Funny)
    
    by Anonymous Coward writes: on Saturday January 24, 2004 @12:11AM (#8072848)
    
    Here on planet Jokeania, we laugh at his statement.
    
    Parent Share
    twitter facebook
- Re:It all depends... (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  Pointy haired boss: Dilbert! My laptop is awfully heavy. Is there anything I can do?
  
  Dilbert: Sure! Just start randomly deleting things. All that data can be pretty heavy!!!!
  
  (later)
  Pointy haired boss: Hmmm...Windows? My house already has all I need! *click* Yes! That's gotta be like 5 pounds!
- Re:It all depends... (Score:5, Funny)
  
  by Uncle Gropey ( 542219 ) writes: on Saturday January 24, 2004 @12:48AM (#8073035) Journal
  
  It's not that the 1's are heavier, it's that they tend to snag in the system bus and take longer to travel than the smoother 0's.
  
  Parent Share
  twitter facebook
  - Re:It all depends... (Score:5, Funny)
    
    by Frymaster ( 171343 ) writes: on Saturday January 24, 2004 @12:59AM (#8073085) Homepage Journal
    
    it's that they tend to snag in the system bus and take longer to travel than the smoother 0's.
    this reminds of "back in the day" when we ran a token ring network. when end users would complain about net outage we'd simply tell them that the token got stuck or, worse yet lost. fortunately, we have a backup token on floppy back in the systems room. it's an fddi token, mind you, so it's a bit bigger but if you don't kink the cabling it should work fine for now.
    
    Parent Share
    twitter facebook
    - Re:It all depends... (Score:4, Funny)
      
      by Art Tatum ( 6890 ) writes: on Saturday January 24, 2004 @01:02AM (#8073103)
      
      Did you ever have them look for the token under their desks? More fun than telling them where the "any" key is. :-)
      
      Parent Share
      twitter facebook
    - Re:It all depends... (Score:5, Funny)
      
      by fucksl4shd0t ( 630000 ) writes: on Saturday January 24, 2004 @02:15AM (#8073402) Homepage Journal
      
      Man, I'm going offtopic, but back in my oil-changing days...
      Some new guy had started working, and his neck was redder than desert sand. He told me that his girlfriend's car had a blinker out on the left and he replaced the bulb and the light didn't come back on. I asked him if he checked his blinker fluid. He said he didn't know what blinker fluid was. I told him that blinker fluid sits in a reservoir in the middle of the car, and when you make a turn the fluid flows in the opposite direction of the turn, into the blinkers, to make sure that the electrical connection is good.
      He spent 3 hours the next morning, on his day off, calling up parts stores and asking them if they had any blinker fluid. Poor guy. I had to break it to him slowly...
      
      Parent Share
      twitter facebook
      - Re:It all depends... (Score:5, Funny)
        
        by ari_j ( 90255 ) writes: on Saturday January 24, 2004 @07:11AM (#8074162)
        
        In high school, we put a girl up to getting her blinker fluid topped off at a service station. She went and asked about it, and the next day was quite irate with us. But that didn't stop us - within a week, we sent the same girl to go have the summer air taken out of her tires, to be replaced with winter air. Apparently she went back to the same shop to have them take care of this for her.
        
        That's the difference between a natural blonde and a dyed blonde.
        
        Parent Share
        twitter facebook
    - Cleanup on aisle 3! (Score:4, Funny)
      
      by CaptainCarrot ( 84625 ) writes: on Saturday January 24, 2004 @02:55AM (#8073500)
      
      Yeah, like that one time when I tripped over a coax while walking behind a row of Apollo DN660s and yanked it clean out of the connector. Yeesh! Tokens everywhere! I had to get the mop out, and here I was not even in the janitors' union. That by itself could have gotten me fired. As it was I didn't get caught for that, but the network went down and everyone knew it was my fault because of all the squashed token guts on the bottom of my shoes.
      We were finding the damn things in the ventilators for weeks afterward.
      
      Parent Share
      twitter facebook
- Re:It all depends... (Score:5, Funny)
  
  by appleLaserWriter ( 91994 ) writes: on Saturday January 24, 2004 @12:59AM (#8073089)
  
  1's are a lot heavier than 0's
  
  On early systems, particularly before the 286, the mass differential between 0 and 1 was a serious issue. However, the 286's innovative pipeline system introduced a shift in focus from mass to width. As pipelines became increasingly narrow, words composed primarily of "1"s began to execute at a more rapid pace than those with a heavy weighting of "0"s.
  
  Parent Share
  twitter facebook
More SCO code. (Score:2, Funny)

by Ken Broadfoot ( 3675 ) writes:

From the article:

I create a very simple C file, which I call hello.c:

main()
{
printf("Hello!\n");
}

Watch out... SCO owns this bit of code too...

--ken
- - Re:More SCO code. (Score:5, Funny)
    
    by jusdisgi ( 617863 ) writes: on Saturday January 24, 2004 @12:16AM (#8072888)
    
    Hey...that's funny...I just called a post off-topic that was a direct quote from the artical.
    
    Cool.
    
    Of course, the funny thing is that I'm right (in a way).
    
    Parent Share
    twitter facebook
architectural differences... (Score:4, Informative)

by jusdisgi ( 617863 ) writes: on Saturday January 24, 2004 @12:08AM (#8072833)

I can only assume that this is only going to be limited to SPARC...I mean, we've already seen the major differences between Itanium and Opteron dealing with 32 bit apps, right? Or is this a different question, since Opteron gets to run 32bit effectively "native"? And, at this point, when running 32 bit apps on a 64 bit chip, just what can "native" mean anyway?

Share
twitter facebook
- Re:architectural differences... (Score:2)
  
  by ebbomega ( 410207 ) writes:
  
  Native is a very specific concept actually. If you can take a piece of code in its very base-bones assembler 1s and 0s and put it through a processor and obtain the expected result, it processes that code natively. If not, then it can't. Itanium cannot process 32-bit code natively. It needs an emulator which trades those 1s and 0s to a different format that the itanium can read (thus changing the input to the processor) and then the processor can read it. Obviously emulation would take a lot more clock cycl
- Re:architectural differences... (Score:5, Informative)
  
  by calidoscope ( 312571 ) writes: on Saturday January 24, 2004 @01:10AM (#8073134)
  
  I can only assume that this is only going to be limited to SPARC...
  Probably applicable to the G5 as well (and Alpha, PA-RISC, MIPS), which like the SPARC has pretty much the same architecture for 32 bits and 64 bits.
  The Itanic has an IA-32 subsystem hanging on it - performance is really poor compared to the main 64 bit core. The Opteron has more registers available in 64 bit mode than 32 bit mode and should show some performance improvements just for that reason.
  As has been said mucho times - 64 bit processors really shine when you have lots of memory to work with. Having said that, one advantage of 64 bits is being able to memory map a large file and can result in better performance even with much less than 4 GB of memory - witness the MySQL tests.
  
  Parent Share
  twitter facebook
Moving more data (Score:5, Interesting)

by Sean80 ( 567340 ) writes: on Saturday January 24, 2004 @12:10AM (#8072844)

I'm no expert in this specific area, but I remember a conversation from a few years back abour the 32-bit versus the 64-bit version of the Oracle database. The guy I was speaking with was pretty knowledgeable, so I'll take his word as truth for the sake of this post.
In his explanation, he said something of the order of "if you want speed, use the 32-bit version of the binaries, because otherwise the computer physically has to move twice as much data around for each operation it does." Only if you want the extra memory mapping capability of a 64-bit binary, he said, would you need to use the 64-bit version.
I suppose in summary, though, it depends on exactly what your binary does.

Share
twitter facebook
- Re:Moving more data (Score:4, Insightful)
  
  by renehollan ( 138013 ) writes: <.rhollan. .at. .clearwire.net.> on Saturday January 24, 2004 @12:15AM (#8072883) Homepage Journal
  
  *cough* wider data busses *cough*. 'course this does mean that 64 bit code on systems with 32 bit wide data paths will be slower, but, like, migration always involves speed bumps. I remember the a.out to elf transition pain days of Linux.
  
  Parent Share
  twitter facebook
  - Re:Moving more data (Score:5, Informative)
    
    by Waffle Iron ( 339739 ) writes: on Saturday January 24, 2004 @12:33AM (#8072959)
    
    *cough* wider data busses *cough*. 'course this does mean that 64 bit code on systems with 32 bit wide data paths will be slower
    By the same token, 32-bit code on systems with 64-bit wide data paths will move twice as many pointers in one bus cycle.
    Today's CPUs almost completely decouple buses from ALU-level operations. Buses usually spend most of their time transfering entire cache lines to service cache misses, so if your pointers take up a bigger portion of a cache line, 64-bit code is still consuming more bus clock cycles per instruction on average no matter how wide your buses are.
    BTW, 32-bit processors have been using 64-bit external data buses since the days of the Pentium-I.
    
    Parent Share
    twitter facebook
    - - Re:Moving more data (Score:3, Insightful)
        
        by renehollan ( 138013 ) writes:
        
        Well yes, but there is an advantage to smaller pointers when you can get away with them *if the processor has native support for them*. While exploited in small object allocators, it isn't always the case that the CPU can gallop through instructions as fast as they can be fed to it, multiple functional units notwithstanding. Though, clearly this is an issue only with data and pointers at the closest cache level to the processing units.
        So, memory bandwidth remains an issue, and I concede the point.
        Still,
  - Re:Moving more data (Score:5, Informative)
    
    by dfung ( 68701 ) writes: on Saturday January 24, 2004 @01:14AM (#8073145)
    
    Oh, now I'll *cough* a little too.
    
    Modern processors (which actually stretches back at least 10 years) really want to run out of cache as much as possible, both for instruction and data access. And they've never wanted to do it more than now when in the x86 world, the processor core and L1 cache are operating at 3200MHz vs. 400MHz for the RAM.
    
    One thing that has to happen is that you make a bet on locality of execution (again both for instructions and data) and burst load a section of memory into the caches (L2 and L1, and sometimes even L3). In implementation terms, it takes some time to charge up the address bus, so you increase bandwidth and execution speed by charging up address n, but doing a quick read of n+1, n+2, n+3, and more on the latest CPUs. You only have to wiggle the two low-order address lines for the extra reads, so you don't pay the pre-charge penalty that you would for access randomly in memory.
    
    That's good if you're right about locality and bad if you're wrong. That's what predictive branching in the processor and compiler optimizations are all about - tailoring execution to stay in cache as much as possible.
    
    On a 64-bit processor, those burst moves really are twice as big and they really do take longer (the memory technology isn't radically different between 32- and 64-bit architectures, although right now it would be odd to see a cost-cutting memory system on a 64-bit machine). If all the accesses of the burst are actually used in execution, then both systems will show similar performance (the 64-bit will have better performance on things like vector opcodes, but for regular stuff, 1 cycle is 1 cycle). If only half of the bursted data is used, then the higher overhead of the burst will penalize the 64-bit processor.
    
    If you're running a character based benchmark (I've never looked at gzip, but it seems like it must be char based), then it's going to be hard for the 64-bit app and environment to be a win until you figure out some optimization that utilizes the technology. If your benchmark was doing matrix ops on 64-bit ints, then you'll probably find that that Opteron, Itanium, or UltraSparc will be pretty hard to touch.
    
    A hammer isn't the right tool for every job as much as you'd like it to be. I actually think that the cited article was a reasonable practical test of performance, but extrapolating from that would be like commenting on pounding nails with a saw - it's just a somewhat irrelevant measure.
    
    I guess I'm violently agreeing with renehollan's comment about speed bumps - apps that can benefit from an architectural change are as important as more concrete details such as compiler optimizations.
    
    Parent Share
    twitter facebook
    - Re:Moving more data (Score:4, Interesting)
      
      by soundsop ( 228890 ) writes: on Saturday January 24, 2004 @04:47AM (#8073848) Homepage
      
      In implementation terms, it takes some time to charge up the address bus, so you increase bandwidth and execution speed by charging up address n, but doing a quick read of n+1, n+2, n+3, and more on the latest CPUs. You only have to wiggle the two low-order address lines for the extra reads, so you don't pay the pre-charge penalty that you would for access randomly in memory.
      
      This is incorrect. It has nothing to do with charging the address lines. Loading multiple sequential locations is slow on the first access and fast on the subsequent bytes because a whole memory row (made of multiple words) is read at once. This full memory row (typically around 1kbit) is transferred from the slower capacitive DRAM storage to faster transistor-based flip-flops. The subsequent sequential words are already available in the flip-flops so it's faster to route them off-chip since the slow DRAM access is avoided.
      
      Parent Share
      twitter facebook
- Re:Moving more data (Score:2, Funny)
  
  by momerath2003 ( 606823 ) * writes:
  
  ...because otherwise the computer physically has to move twice as much data around for each operation it does.
  
  64-bit computers have to physically move data around? I suppose I'll have to buy a grappling arm attachment for my G5 to get it to work. :(
Couldn't time fix this? (Score:4, Insightful)

by Transcendent ( 204992 ) writes: on Saturday January 24, 2004 @12:11AM (#8072846)

Aren't there certian optimizations and, in general, better coding for most 32 bit applications (on the lowest level of the code) because people have used it for so long? Couldn't it just be that we need to refine coding for 64 bit processors?

Most "tech gurus" I've talked to at my university about the benefites of 64bit processing say that it is in part due to the increase of the number of registers (allowing you to use more at the same time, shortening the number of cycles needed). Could time allow us to write more efficient kernels, etc for 64 bit processors?

So either the code isn't good enough, or perhaps there's another physical limitation (longer pipelines, etc) on the chip itself? Correct me if I'm wrong.

Share
twitter facebook
- Re:Couldn't time fix this? (Score:4, Informative)
  
  by Ken Broadfoot ( 3675 ) writes: on Saturday January 24, 2004 @12:16AM (#8072889) Homepage Journal
  
  "Most "tech gurus" I've talked to at my university about the benefites of 64bit processing say that it is in part due to the increase of the number of registers (allowing you to use more at the same time, shortening the number of cycles needed)."
  
  Not just kernels. All programs.. however this happens in the compiler. Or assembly code. Not in "kernels" unless they are assembly code kernels..
  
  Basically this test is moot without using compilers optimized for the 64 bit chips..
  
  --ken
  
  Parent Share
  twitter facebook
- Re:Couldn't time fix this? (Score:3, Informative)
  
  by drinkypoo ( 153816 ) writes:
  
  64 bit architectures do not automatically have more general purpose registers than 32 bit ones. x86-64 happens to have four times as many GPRs as x86, but that's a special case.
  The benefit of a 64 bit processor is a larger address space and the ability to work on 64 bit data types much much faster than on a 32 bit system. More GPRs is an additional, separate benefit.
gcc? (Score:5, Interesting)

by PineGreen ( 446635 ) writes: on Saturday January 24, 2004 @12:11AM (#8072847) Homepage

Now, gcc is known to produce shit code on sparcs. I am not saying 64 is always better, but to be hones, the stuff should at least have been compiled with Sun CC, possibly with -fast and -fast64 flags...

Share
twitter facebook
- Re:gcc? (Score:4, Informative)
  
  by PatMouser ( 1692 ) writes: on Saturday January 24, 2004 @12:54AM (#8073066) Homepage
  
  Yup! It turns out poorly optimized code in 32 bit mode and I shudder to think what the 64 bit code would look like.
  
  And before you start complaining, that comes from 3 years coding for a graphics company where every clock tick counts. We saw a MAJOR (like more than 20%) difference in execution speed of our binaries depending upon which compiler was used.
  
  Hell, gcc didn't even get decent x86 (where x>4) support in a timely manner. Remember pgcc vs. gcc?
  
  Parent Share
  twitter facebook
- Re:gcc? (Score:3, Interesting)
  
  by ctr2sprt ( 574731 ) writes:
  
  gcc is known to produce shit code on computers. I find these benchmarks interesting not because of what they say about the hardware, but because of what they say about gcc. It would make me nervous if my 64-bit platform of the future were tied to gcc. I hope for AMD's sake that they are working very hard either on producing their own compiler (maybe they have and I just haven't heard about it) or making gcc stop sucking quite so hard.
  - Re:gcc? (Score:3, Insightful)
    
    by orbitalia ( 470425 ) writes:
    
    You mean like this portland compiler [amd.com]
    
    Actually I wouldn't say that gcc produces particularly bad code on all computers, it's sorta average, but not bad. Certainly the 3.3.x series are alot better than 2. Pretty good at number crunching [randombit.net] and it is more standards compliant than most.
I'll save you guys the read. (Score:2)

by Sj0 ( 472011 ) writes:

Yes they are, but only by about 10-20%.

Makes me wonder what tricks AMD has managed to pull out of their hat to increase 64 bit performance by 20-30%...
- Re:I'll save you guys the read. (Score:2, Funny)
  
  by archen ( 447353 ) writes:
  
  The same tricks that boost the performance of their CPU model numbers 20-30% over their clockspeed? =P
- Re:I'll save you guys the read. (Score:4, Funny)
  
  by HardCase ( 14757 ) writes: on Saturday January 24, 2004 @12:21AM (#8072908)
  
  Makes me wonder what tricks AMD has managed to pull out of their hat to increase 64 bit performance by 20-30%...
  
  They didn't use an obsolete UltraSparc chip? ;-)
  
  Parent Share
  twitter facebook
- Re:I'll save you guys the read. (Score:5, Informative)
  
  by ParisTG ( 106686 ) writes: <tgwozdz&gmail,com> on Saturday January 24, 2004 @12:22AM (#8072912)
  
  Makes me wonder what tricks AMD has managed to pull out of their hat to increase 64 bit performance by 20-30%...
  
  They added more registers to an architecture that had very few of them. This is likely where most of the performance increase comes from in 64bit mode on the Opteron, not from the fact that it is 64bit.
  
  Parent Share
  twitter facebook
How mature are the compilers? (Score:5, Interesting)

by Anonymous Coward writes: on Saturday January 24, 2004 @12:14AM (#8072867)

The surmise that ALL 64 bit binaries are slower than 32 is incorrect...

At this stage of development for the various 64-bit architectures, there is very likely a LOT of room for improvement in the compilers and other related development tools and giblets. Sorry, but I don't consider gcc to be necessarily the bleeding edge in terms of performance on anything. It makes for an interesting benchmarking tool because it's usable on many, many architectures, but in terms of its (current) ability to create binaries that run at optimum performance, no.

I worked on DEC Alphas for many years, and there was continuing progress in their compiler performance during that time. And, frankly, it took a long time, and it probably will for IA64 and others. I'm sure some Sun SPARC-64 users or developers can provide some insight on that architecture as well. It's just the nature of the beast.

Share
twitter facebook
- Re:How mature are the compilers? (Score:5, Insightful)
  
  by T-Ranger ( 10520 ) writes: <jeffw AT chebucto DOT ns DOT ca> on Saturday January 24, 2004 @12:49AM (#8073043) Homepage
  
  GCC's primary feature is, has always been, and likey will be for a long time: portability. GCC runs on everything.
  If you want FAST code you should use the compiler from your hardware vendor. The downside is that they might cost money, and almost definitly implement things in a slightly weird way. Weird when compared to the official standard, weird when compared to the defacto standard that is GCC.
  I though this was common knowladge, at least amongst people who would be trying to benchmark compilers...
  
  Parent Share
  twitter facebook
Opteron is faster in 64 bit (Score:5, Informative)

by citanon ( 579906 ) writes: on Saturday January 24, 2004 @12:15AM (#8072874)

But that's only because it has two extra execution units for 64 bit code. 64 bit software is not inherently faster. Most people here would know this, but I just thought I might preemptively clear up any confusion.

Share
twitter facebook
- Re:Opteron is faster in 64 bit (Score:5, Informative)
  
  by fifirebel ( 137361 ) writes: on Saturday January 24, 2004 @12:26AM (#8072924)
  
  Also because in 64-bit mode, the Opteron has access to more registers. The IA-32 architecture is so register-limited that throwing more registers at any task makes a huge difference.
  
  Parent Share
  twitter facebook
And 32 bit is slower than 16 bit (Score:5, Interesting)

by gvc ( 167165 ) writes: on Saturday January 24, 2004 @12:15AM (#8072877)

I recall being very disappointed when my new VAX 11/750 running BSD 4.1 was much slower than my PDP 11/45 running BSD 2.8. All the applications I tested: cc, yacc, etc. were faster on the 16-bit PDP than the 32-bit VAX.

I kept the VAX anyway.

Share
twitter facebook
- Re:And 32 bit is slower than 16 bit (Score:3, Interesting)
  
  by trb ( 8509 ) writes:
  
  Yes, and programs compiled for 16-bit PDP-11 running on the VAX-11/780 in "compatibility mode" were faster than the same programs compiled for 32-bit VAX native mode running on the same VAX. It makes sense, they were doing pretty much the same stuff, and fetching half as much data. But of course, 11's had limited address space, and the VAX address space was relatively huge.
Not so simple for AMD64 (Score:4, Interesting)

by martinde ( 137088 ) writes: on Saturday January 24, 2004 @12:21AM (#8072904) Homepage

My understanding is that when you switch an Athlon64 or Opteron into 64bit mode, that you suddenly get access to more general purpose registers than the x86 normally has. So the compiler can generate more efficient code in 64bit mode, making use of the extra registers and so forth. I don't know if this makes a difference in real world apps or not though.

Share
twitter facebook
- Re:Not so simple for AMD64 (Score:3, Interesting)
  
  by afidel ( 530433 ) writes:
  
  AND you can get the best of both worlds because data sizes do NOT have to be 64 bit just to use the 64bit registers. In fact here is a quote from an AMD manual on driver porting for 64bit-Windows on AMD64 platforms:
  *Stupid freaking AMD engineer made the PDF world accessible but then went and encrypted it so that I can't cut and paste, print it, or do anything with it*
  Well basically INT's and LONG's remain 32 bit while Pointers, LONG LONG's and Floats are 64 bit by default.
  The paper can be found at AMD's [amd.com]
What I found most remarkable... (Score:3, Interesting)

by Grey Ninja ( 739021 ) writes: on Saturday January 24, 2004 @12:21AM (#8072906) Homepage Journal

The guy seemed to have his conclusion written before he started... Or at least that's how it seemed to me. When he was doing the SSL test, he said that the results were ONLY about 10% slower on the 64 bit version. Now I might be far too much of a graphics programmer.... but I would consider 10% to be a rather significant slowdown.

The other thing that bothered me of course was when he said that the file sizes were only 50% bigger in some cases... sure, code is never all that big, but... still...

Share
twitter facebook
If 32bit is faster than 64... (Score:5, Funny)

by CatGrep ( 707480 ) writes: on Saturday January 24, 2004 @12:27AM (#8072931)

Then 16bit binaries should be even faster then 32.

And why stop there?

8bits should really scream.

I can see it now: 2GHz 6502 processors, retro computing. The 70's are back.

Share
twitter facebook
- Re:If 32bit is faster than 64... (Score:5, Insightful)
  
  by Brandybuck ( 704397 ) writes: on Saturday January 24, 2004 @01:52AM (#8073286) Homepage Journal
  
  You're right. A 2GHz 6502 would be a screamer. But the drawbacks are numerous. When the world finally went to 32bit, I jumped for joy. Not because I thought stuff would be faster, but because I could finally use a flat memory space large enough for anything I could conceivably want. Integers were now large enough for any conceivable use. Etc, etc.
  
  Of course, my conceptions back then might be getting a bit dated now. But not too terribly much. 32 bits will probably be the optimum for general use for quite some time. There's not too many applications that need a 64 bit address space. Not too many applications need 64 bit integers. We'll need 64 bit sometime, but I don't see the need for it in *general* purpose computing for the remainder of the decade. (Longhorn might actually need to a 64 bit address space, but that's another story...).
  
  Remembering back to the 80286 days, people were always running up against the 16 bit barrier. It was a pain in the butt. But unless you're running an enterprise database, or performing complex cryptoanalysis, you're probably not running up against the 32 bit barrier.
  
  But of course, given that you're viewed as a dusty relic if you're not using a box with 512Mb video memory and 5.1 audio to calculate your spreadsheets, the market might push us into 64 bit whether we need it or not.
  
  Parent Share
  twitter facebook
Jebus christ. (Score:2, Informative)

by eddy ( 18759 ) writes:

This article sounds completely stupid. Someone didn't know that pulling 64-bits across the bus( reading/writing can take longer than 32-bits? Never thought of the caches?

Just read the GCC Proceedings [linux.org.uk], there's explanations and benchmarks of the why/how/when of x86-64 in 32 vs 64-bit mode, both speed of execution and image size.
I'd kill for a 64 bit platform... (Score:3, Interesting)

by yecrom2 ( 461240 ) writes: on Saturday January 24, 2004 @12:31AM (#8072951)

The main product I work on, which was designed in a freaking vacuum, is so tightly tied to wintel that I've had to spend the greater part of a year gutting int and making it portable. Kind of. We currently use 1.5 gig of for the database cache. If we go any higher, we run out of memory.
We tried win2k3 and the /3gb switch, but we kept having very odd things happen.
This database could very easily reach 500 gig, but anything above 150 gig and performance goes in the toilet.

My solution...

Get a low-to-midrange Sun box that can handle 16+g and has a good disk subsystem. But that's not a current option. Like I said, this thing was designed in a vacuum. The in-memory data-structures are the network data structures. That are all packed on 1-byte boundaries. Can you say SIGBUS? A Conversion layer probably wouldn't be that hard, if it weren't build as ONE FREAKING LAYER!

Sorry, I had to rant. Anyway, a single 64 bit box would enable us to replace several IA32 servers. For large databases, 64bits is a blessing.

Matt

Share
twitter facebook
More bits doesn't automatically mean more speed (Score:5, Insightful)

by leereyno ( 32197 ) writes: on Saturday January 24, 2004 @12:32AM (#8072958) Homepage Journal

The point of a 64-bit architecture boils down to two things really, memory and data size/precision.

An architecture with 32-bits of address space can directly address 2^32 or approximately 4 billion bytes of memory. There are many applications where that just isn't enough. More importantly, an architecture whose registers are 32-bits wide is far less efficient when it comes to dealing with values that require more than 32 bits to express. Many floating point values use 64 bits and being able to directly manipulate these in a single register is a lot more efficient than doing voodoo to combine two 32-bit registers.

So, if you have an problem where you're dealing with astronomical quantities of very large (or precise) values, then a 64-bit implementation is going to make a very big difference. If you're running a text editor and surfing the web then having a wider address bus and wider registers isn't going to do squat for you. Now that doesn't mean that there may not be other, somewhat unrelated, architectural improvements found in a 64-bit architecture that a 32-bit system is lacking. Those can make a big difference as well, but then you're talking about the overall efficiency of the design, which is a far less specific issue than whether 64-bits is better/worse than 32.

Lee

Share
twitter facebook
- 6502? (Score:4, Interesting)
  
  by tonywestonuk ( 261622 ) writes: on Saturday January 24, 2004 @02:09AM (#8073374)
  
  By what method is a processor judged to be 16,32 or 64 bits?...
  
  The 6502 had 8bit data, but 16 bit address bus, and was considered an '8 bit'
  68000 had 16 bits data, 32 bits address - this was a 16 bit
  
  So, why can't we just increase the address bus size of processors, to 64,while keeping the databus size at 32bits. have some new 64 bit addressing modes. The processor can still be 32 bit data, so the size of programs can stay the same....Store program code in the first 4Gigs of memory, (zero page!) , and the pointers within that code can remain 32bits, but have the option of using bigger 64bit pointers to point at data in the remaining 2^63 bytes. This should give best of both worlds of 32vs64 bit.
  
  Parent Share
  twitter facebook
  - Let's not do anything like that! (Score:3, Insightful)
    
    by iamacat ( 583406 ) writes:
    
    I guess you didn't have the "pleasure" of using near, far and huge pointers in DOS compilers. In your model, every library function would have to have two versions - one that takes 32 bit pointers and one that takes 64 bit.
    
    Uniform and simple is good...
This guy is a tool (Score:5, Interesting)

by FunkyMarcus ( 182120 ) writes: on Saturday January 24, 2004 @12:34AM (#8072970) Homepage Journal

First, anyone with half a brain already knows what his "scientific" results prove. Second, anyone with two thirds of a brain has already performed similar (but probably better) tests and come to the same conclusion.

And third, OpenSSL uses assembly code hand-crafted for the CPU when built for the 32-bit environment (solaris-sparcv9-gcc) and compiles C when built for the 64-bit environment (solaris64-sparcv9-gcc). Great comparison, guy.

Apples, meet Oranges (or Wintels).

Mark

Share
twitter facebook
- Re:This guy is a tool (Score:2)
  
  by FunkyMarcus ( 182120 ) writes:
  
  Assembly code vs. C code refers to the big-number library. No substitutions, exchanges, or refunds.
Something is wrong. (Score:5, Interesting)

by DarkHelmet ( 120004 ) writes: <mark.seventhcycle@net> on Saturday January 24, 2004 @12:34AM (#8072972) Homepage

Maybe it's me, but how the hell is OpenSSL slower in 64 bit?
It makes absolutely no sense. Operations concerning large integers were MADE for 64 bit.
Hell, if they made a 1024 bit processor, it'd be something like OpenSSL that would actually see the benefit of having datatypes that bit.
Something is wrong, horribly wrong with these benchmarks. Either OpenSSL doesn't have proper support for 64 bit data types, this fellow compiled something wrong, or some massive retard published benchmarks for the wrong platform in the wrong place.
Or maybe I'm just on crack.

Share
twitter facebook
- Re:Something is wrong. (Score:5, Insightful)
  
  by FunkyMarcus ( 182120 ) writes: on Saturday January 24, 2004 @12:43AM (#8073019) Homepage Journal
  
  Maybe it's me
  
  It's you.
  
  OpenSSL in the 32-bit environment as the guy configured it was doing 64-bit arithmetic. Just because the guy had 32-bit pointers doesn't mean that his computer wasn't pushing around 64-bit quantities at once. It's called a "long long".
  
  In fact, as he had OpenSSL configured, he was using some crafty assembly code for his 32-bit OpenSSL builds that even used 64-bit registers. His 64-bit builds were using plain old compiled C.
  
  But he didn't even know that.
  
  Big whoop.
  
  Mark
  
  Parent Share
  twitter facebook
- - Re:Something is wrong. (Score:3, Informative)
    
    by harlows_monkeys ( 106428 ) writes:
    
    How? Explain please
    All public key systems currently in use depend on doing arithmetic on large integers. Let's start with the classical algorithms for addition/subtraction/multiplication/division.
    The addition and subtraction algorithms are O(N) and multiplication/division is O(N^2), where N is the number of digits.
    What is a digit? On a 32-bit process, it will probably be 32 bits. On a 64-bit processor, it will probably be 64-bits.
    What this means is that operating on large integers, say, 1024 bits
Anyone ever used WinXP-64bit edition? (Score:4, Interesting)

by CatGrep ( 707480 ) writes: on Saturday January 24, 2004 @12:36AM (#8072983)

We've got an Itanic box at work that has WinXP 64bit edition on it so we can build & test some 64bit Windows binaries.

It's the slowest box in the place! Open a terminal (oops, command shell, or whatever they call it on Windoze) and do a 'dir' - it scrolls so slowly that it feels like I'm way back in the old days when I was running a DOS emulator on my Atari ST box.

Pretty much everything is _much_ slower on that box. It's amazingly bad and I've tried to think of reasons for this: Was XP 64bit built with debugging options turned on when they compiled it? But even if that were the case it wouldn't account for all of it - I'd only expect that to slow things down maybe up to 20%, not by almost an order of magnitude.

Share
twitter facebook
- Re:Anyone ever used WinXP-64bit edition? (Score:5, Insightful)
  
  by JanusFury ( 452699 ) writes: <kevin.gaddNO@SPAMgmail.com> on Saturday January 24, 2004 @01:37AM (#8073226) Homepage Journal
  
  The video drivers are probably not optimized for 64-bit at all. In fact, I wouldn't be suprised if the box doesn't have native drivers at all, and is using MS's standard SVGA/VESA drivers. Those drivers are slow and any PC using them is going to feel horribly sluggish, even if it has a 3Ghz P4.
  
  Parent Share
  twitter facebook
Forward thinking (Score:5, Interesting)

by Wellmont ( 737226 ) writes: on Saturday January 24, 2004 @12:45AM (#8073025) Homepage

Well considering that manufacturers have been working like crazy to produce both 64 bit hardware and software applications, one could see that there is still some stuff to be done in the field.
What most of the posts are considering and the test itself are "concluding" is that it has to be slower over all and even in the end when 64 bit computing finally reaches it's true breadth. However when the bottlenecks of the pipeline (in this case the cache) and the remaining problems are removed you can actually move that 64 bit block in the same time it takes to move a 32 bit block.
Producing to 32bit pipes takes up more space then creating a 64bit pipe in the end, no matter which way you look at it and no matter what kind of applications or processes its used for.
However the big thing that could change this theory is Hyper Compressed Carbon chips, that should replace silicon chips within a decade. (that's fairly conservative estimate.

Share
twitter facebook
A Makefile? (Score:3, Interesting)

by PissingInTheWind ( 573929 ) writes: on Saturday January 24, 2004 @12:48AM (#8073036)

From the article:
[...] you'll likely end up in a position where you need to know your way around a Makefile.

Well duh. What a surprise: compiling for a different platform might requires Makefile tweaking.

Am I the only one to think that was a dummy article wasting a spot for much more interesting articles about 64 bit computing?

Share
twitter facebook
- Re:A Makefile? (Score:3, Informative)
  
  by LoadWB ( 592248 ) writes:
  
  I accept this article as dumbed down a bit for the lower end, non-guru user who is wooed by the 64-bit "revolution" but not technically saavy enough to understand the "32-bit faster than 64-bit" comments that continue to surface in many forums. If bean counters and cheap tech workers can be made to understand that there truly ARE benefits in 64-bit technology, then progress will not be held in place by beating the 32-bit horse to death -- even if it does run at hellaspeeds.
  
  How many times have we slapped a
This is unfair comparison (Score:3, Insightful)

by superpulpsicle ( 533373 ) writes: on Saturday January 24, 2004 @12:49AM (#8073047)

Why are we comparing mature 32-bit software with 64-bit software in its infancy?

Share
twitter facebook
What 'system of belief' is he following? (Score:5, Insightful)

by swordgeek ( 112599 ) writes: on Saturday January 24, 2004 @12:58AM (#8073083) Journal

64-bit binaries run slower than 32? That's certainly the dogma in the x86 world, where 64-bit is in its infancy. That was the belief about Solaris/Sparc and the HP/AIX equivalents FIVE YEARS AGO maybe.

Running benchmarks of 32 vs. 64 bit binaries in a 64 bit Sparc/Solaris environment has shown little or no difference for us, on many occasions. If the author had used Sun's compiler instead of the substantially less-than-optimal gcc, I expect that his 20% average difference would have disappeared.

Share
twitter facebook
of course, they are (Score:5, Informative)

by ajagci ( 737734 ) writes: on Saturday January 24, 2004 @01:12AM (#8073137)

Both 32bit and 64bit binaries running on the same processor get the same data paths and the same amount of cache on many processors. But, for one thing, 64bit binaries use up more cache memory for both code and data. So, yes, if you run 32bit binaries on a 64bit processor with a 32bit mode, then the 32bit binaries will generally run faster. But the reason why they run well and all the data paths are wide is because the thing is a 64bit processor in the first place--that's really what "64bit" means.

64bit may help with speed only if software is written to take advantage of 64bit processing. But the main reason to use 64bit processing is for the larger address space and larger amount of memory you can address, not for speed. 4Gbytes of address space is simply too tight for many applications and software design started to suffer many years ago from those limitations. Among other things, on 32bit processors, memory mapped files have become almost useless for just the applications where they should be most useful: applications involving very large files.

Share
twitter facebook
- Re:of course, they are (Score:4, Interesting)
  
  by j3110 ( 193209 ) writes: <samterrell.gmail@com> on Saturday January 24, 2004 @02:01AM (#8073337) Homepage
  
  Ummm... I beg to differ on the reasons...
  
  Most 64/32bit hybrid machines probably just split the arithmatic/logic units in half (just takes a single wire to connect them anyhow). Having an extra ALU around will surely push more 32bit numbers through the pipe. It's not going be as fast as a 64bit optimized application would gain from having the combined operations should it need them though.
  
  I'm beginning to wonder these days how much CPU speed even matters though. We have larger applications that can't fit in cache, page switching from RAM that isn't anywhere near the speeds of the CPU, and hard drives that are only 40MB/s on a good day/sector, with latency averaging around 5-6ms. CPU is the least of my worries. As long as the hard disk is being utilized properly, you'll probably not see significant differences between processor speeds. I'm a firm believer that people think that 500MHz is slow because the hard drive in the machine was too slow. Unless you are running photoshop, SETI, Raytracing, etc., you probably wouldn't notice if I replaced your 3GHz processor with a 1GHz.
  
  Parent Share
  twitter facebook
  - Re:of course, they are (Score:4, Informative)
    
    by ajagci ( 737734 ) writes: on Saturday January 24, 2004 @02:58AM (#8073505)
    
    Having an extra ALU around will surely push more 32bit numbers through the pipe
    
    That's an additional reason. There are probably many other places that neither of us has thought of that have been scaled up to make a true 64bit processor and that benefit 32bit applications running on the same hardware in 32 bit mode.
    
    I'm beginning to wonder these days how much CPU speed even matters though.
    
    It matters a great deal for digital photos, graphics, speech, handwriting recognition, imaging, and a lot of other things. And, yes, regular people are using those more and more.
    
    Unless you are running photoshop, SETI, Raytracing, etc., you probably wouldn't notice if I replaced your 3GHz processor with a 1GHz.
    
    You probably would. Try resizing a browser window with "images to fit" selected (default in IE, I believe). Notice how that one megapixel image resizes in real time? CPU-bound functionality has snuck in in lots of places.
    
    Parent Share
    twitter facebook
There's always a trade-off (Score:5, Insightful)

by KalvinB ( 205500 ) writes: on Saturday January 24, 2004 @01:18AM (#8073155) Homepage

between precision and speed.

It's not surprising that 64-bit processors are rated much slower than 32-bit ones. The fastest 64-bit AMD is rated 2.0ghz while the fastest AMD 32-bit is 2.2ghz.

If you use a shovel you can move it very fast to dig a hole. If you use a backhoe you're going to move much slower but remove more dirt at a time.

Using modern technology to build a 386 chip would result in one of the highest clock speeds ever but it would be practically useless. Using 386 era technology to build a 64 bit chip would be possible but it'd be massive and horribly slow.

I'm still debating whether or not to go with 64-bit for my next system. I'd rather not spend $700 on a new system so I can have a better graphics card and then have to spend several hundred more shortly after to replace the CPU and MB again. But then again, 64-bit prices are still quite high and I'd probably be able to be productive on 32-bit for several more years before 32-bit goes away.

Ben

Share
twitter facebook
Nit-picking: LD_LIBRARY_PATH vs crle? (Score:4, Interesting)

by LoadWB ( 592248 ) writes: on Saturday January 24, 2004 @01:26AM (#8073181) Journal

The article mentions tweaking the LD_LIBRARY_PATH...

I was told a long time ago by a number of people I considered to be Solaris gurus -- not to mention in a number of books, Sun docs, etc. -- that the LD_LIBRARY_PATH variable was not only heading towards total deprecation, but introduced a system-wide security issue.

In its stead, we were supposed to use the "crle" command to set our library paths.

On all of my boxes I use crle and not LD_LIBRARY_PATH and everything works as expected.

Any pro developers or Solaris technical folks that can comment on this?

Share
twitter facebook
64Bit will be needed when Solid State memory comes (Score:5, Interesting)

by Bruha ( 412869 ) writes: on Saturday January 24, 2004 @01:36AM (#8073222) Homepage Journal

When we get solid state hard drives and if they're reliable and fast as regular ram then ram will be gone and the SSD will take over. So in essence your machine may just allocate itself a huge chunk of the drive as it's own memory space..

Imagine a machine that can grab 16g for it's memory usage and your video card having a huge chunk for itself also. Along with your terrabits of information space if things pan out well enough.

Share
twitter facebook
What else is new. This is about scaling (Score:3, Insightful)

by Anonymous Coward writes: on Saturday January 24, 2004 @01:54AM (#8073299)

Adding more/more complex features to a cpu rarely speed it up by itself, however, it might allow the next generation of CPU to scale beyond the current generation.

Both in terms of direct CPU performance and for the software that runs on it.

This has happened a bunch of times during history. Remember the introduction of MMUs for instance? Definately slows down the software running on the machine, but without an MMU we all know that it was virtually impossible to do stable multitasking.

1/2 GB of memory basically the standard these days with XP.

A lot of people are buying home computers with 1 GB or more.

Dell in Japan (where I live) has a special offer these days on a lattitude D600 with 1GB of ram. That is, they expect to sell this thing in quantities.

I think a fair amount of PC users will hit the 4GB limit within a few years. Personally, I already swear about having just 1GB in my desktop at times when I have a handful of images from my slide scanner open in photoshop + the obvious browsers/mail programs and maybe an office program or 2 open.

Introducing 64bit does not make todays HW any faster than their counterparts, but it will make it possible to continue making machines better, faster and capable of handling increasingly more complex tasks.

Share
twitter facebook
Slower? It depends. (Score:5, Informative)

by BobaFett ( 93158 ) writes: on Saturday January 24, 2004 @01:56AM (#8073313) Homepage

Depends mainly on what data the test is using. If it's floating-point heavy, and uses double, then it always was 64-bit. On 64-bit hardware it'll gain the full-width data path and will be able to load/store 64-bit floating-point numbers faster, all things being equal. If it uses ints (not longs), it is and will stay 32-bit, there will be no difference unless the hardware is capable of loading two 32-bit numbers at once, effectively splitting the memory bus in two (HP-PA RISC can do it, his old Sun cannot, newest Suns can, I don't know if Opterons can). Finally, if the test uses data types which convert from 32 to 64 bits it will become slower, but only if it does enough math on these types. The later is important, since every half-complicated program uses pointers, explicitly or implicitly, but not every program does enough pointer arithmetics compared to other operations to make a difference. However, if it does, then it'll copy pointers in and out of main memory all the time, and you can fit half as many 64-bit pointers into the cache.
That's where the slowdown comes (plus some possible library issues, early 64-bit HP and Sun system libraries were very slow for some operations).
If your process resident memory size is the same in 64 and 32-bit mode, you should not see any slowdown. If you do, it's an issue with the library of the compiler (even though the compiler in this case is the same, the code generator is not, and there may be some low-level optimizations it does differently). If resident size of 64-bit application is larger, you are likely to see slowdown, and the more memory-bound the program is the larger it'll be.

Share
twitter facebook
- Re:Slower? It depends. (Score:3, Informative)
  
  by pe1chl ( 90186 ) writes:
  
  That is why I am a bit astonished that he finds a 20% slowdown, then also examines the increased size of the executables, finds it is about 20%, and considers that a minor issue.
  
  I think the 20% increased size is the reason for the 20% worse performance, because memory access is often the bottleneck for real-life programs.
This would appear to miss the point... (Score:3, Insightful)

by Bored Huge Krill ( 687363 ) writes: on Saturday January 24, 2004 @02:04AM (#8073358)

the tests were all run on a 64-bit machine. The argument is not so much about whether 32-bit or 64-bit binaries run faster, but which is the faster architecture. I'm pretty sure we don't have any apples-to-apples test platforms for that one, though

Share
twitter facebook
Summary of discussion (Score:3, Informative)

by Gadzinka ( 256729 ) writes: <rrw@hell.pl> on Saturday January 24, 2004 @07:22AM (#8074204) Journal

It's all in benchmark. It doesn't matter what you benchmark, only what you benchmark with ;)

But there are several points

1. The results for openssl are no good because openssl for sparc32 has critical parts written in asm, while for sparc64 it is generic C.

2. The results would be much better if you did it with Sun's cc, which is much better optimised for both sparc32 and sparc64.

3. The results, even if they were accurate, are good only for sparc32 vs sparc64. Basically, sparc64 is the same processor as sparc32, only wider ;)

I don't know what's the case for ppc32 vs ppc64, but when you look at x86 vs x86-64 (or amd64 as some prefer to call it) you have to take into account much larger number of registers, both GP and SIMD.

As a matter of fact, x86 is such a lousy architecture that it really doesn't have GP registers -- every register in x86 processor has its purpose, other than the rest. It looks better in case of FP and SIMD operations, but it's ints that most of the programs deal with. Just compile your average C code to asm and look how much of it deals with swapping data between registers.

(well, full symmetry of registers for pure FP, non-SIMD operations was true until P4, when Intel decided to penalize the use of FP register stack and started to ``charge'' you for ``FP stack swap'' commands, which were ``free'' before, and are still free on amd processors)

x86-64 on the other hand in 64bit mode has twice more registers with full symmetry between them, as well as even more SIMD registers. And more execution units accessible only in 64bit mode.

But, from this chaotic notes you can already see, that writing good comparission of different processors is a little bit more than ``hey, I've some thoughts that I think are important and want to share''. And the hard work starts with proper title for the story -- in this case it should be ``Are sparc64 binaries slower than sparc32 binaries?''.

Robert

Share
twitter facebook
sizeof(int) (Score:3, Insightful)

by wowbagger ( 69688 ) writes: on Saturday January 24, 2004 @10:33AM (#8074677) Homepage Journal

The biggest fault I can see with this test depends upon sizeof(int) -

I don't know about Sun, but in some other environments in which a 32 bit and a 64 bit model exist, the compiler will always treat an int as 32 bits, so as not to cause structures to change size. Hell, even on the Alpha, which was NEVER a 32 bit platform, gcc would normally have:

sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8

Now, consider the following code:

for (int i = 0; i 100; ++i)
{
frobnicate(i);
}

IF the compiler treats an int as 4 bytes, and IF the compiler has also been informed that the CPU is a 64 bit CPU, then the compiler may be doing dumb stuff like trying to force the size of "i" to be 4 bytes, by masking it or other foolish things.

So, the question I would have is, did the author run a test to insure that the compiler was really making int's and unsigned's be 64 bits or not?

Share
twitter facebook
- Re:*Why* do I have that feeling... (Score:2)
  
  by Drantin ( 569921 ) writes:
  
  please, read the rest of the article, he just uses that as an example to show that the arguements he was passing to the compiler really were having an effect on the output, although I don't see why he ahd to do that considering what he does afterwards, that part was not the benchmark...
  - - Re:*Why* do I have that feeling... (Score:4)
      
      by fucksl4shd0t ( 630000 ) writes: on Saturday January 24, 2004 @02:32AM (#8073448) Homepage Journal
      
      OK, I'll do that. The compiler does make a difference. I'm just thinking that we don't take the whole picture into account enough here on slashdot.
      You're right about everything, except trying to imply, if that's what you were doing, that he used the 'wrong' compiler. In order to test execution speed of 32-bit vs 64-bit binaries, you need to use the same compiler to build the binaries.
      See, it gets complicated when you use different compilers. Yes, GCC is likely to build better-optimized binaries for 32-bit. Yes, GCC has a reputation for not optimizing binaries very well in the first place. But if he didn't use the same compiler for both binaries, the results would have been seriously skewed in answering the question. The results would have called into question why he used different compilers, whether or not the different compilers were equal, and so forth.
      To answer the question, he needed a compiler that could build both types of binaries to the same level of optimization, no matter how shitty. He wasn't trying to build the fastest binaries on earth, he was trying to build binaries that could be compared to one another in execution speed, using the same source code, and a compiler that would produce the same shitty executable.
      That's all. :)
      
      Parent Share
      twitter facebook
- Re:Of Course They're Going to Be Slower (Score:3, Interesting)
  
  by DeathPenguin ( 449875 ) * writes:
  
  If it makes you feel better, programs from 1995 tend to run a lot faster on modern hardware. Gzip a kernel on a 66MHz Pentium and then on 2GHz Opteron and you'll see what I mean.
- Re:Now I'm confused... (Score:4, Informative)
  
  by NerveGas ( 168686 ) writes: on Saturday January 24, 2004 @05:01AM (#8073879)
  
  The performance increase comes from a combination of lower memory latency (built-in memory controller) and an increased number of registers. The small number of registers on x86 chips has always been one of the main gripes people have had about the architecture.
  
  steve
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: OSNews (Score:5, Funny)

Re: OSNews (Score:5, Insightful)

Re: OSNews (Score:4, Insightful)

Re: OSNews (Score:3, Informative)

Re: OSNews (Score:3, Insightful)

Re: OSNews (Score:3, Interesting)

Re: OSNews (Score:5, Informative)

Re: OSNews (Score:5, Informative)

Re: OSNews (Score:5, Insightful)

OSNews = UnNews? (Score:3, Flamebait)

Re:OSNews = UnNews? (Score:5, Informative)

Re:retarded. (Score:5, Informative)

Re:retarded. (Score:5, Insightful)

What purchase decision? (Score:4, Interesting)

Benchmarks (Score:3, Funny)

Re:Benchmarks (Score:5, Funny)

Re:Benchmarks (Score:5, Insightful)

Re: OSNews (Score:5, Funny)

Re: OSNews (Score:4, Interesting)

Re: OSNews (Score:3, Insightful)

Re: OSNews (Score:2, Informative)

Re: OSNews (Score:2)

It all depends... (Score:5, Funny)

Re:It all depends... (Score:2, Funny)

Re:It all depends... (Score:4, Funny)

Re:It all depends... (Score:2, Funny)

Re:It all depends... (Score:5, Funny)

Re:It all depends... (Score:5, Funny)

Re:It all depends... (Score:4, Funny)

Re:It all depends... (Score:5, Funny)

Re:It all depends... (Score:5, Funny)

Cleanup on aisle 3! (Score:4, Funny)

Re:It all depends... (Score:5, Funny)

More SCO code. (Score:2, Funny)

Re:More SCO code. (Score:5, Funny)

architectural differences... (Score:4, Informative)

Re:architectural differences... (Score:2)

Re:architectural differences... (Score:5, Informative)

Moving more data (Score:5, Interesting)

Re:Moving more data (Score:4, Insightful)

Re:Moving more data (Score:5, Informative)

Re:Moving more data (Score:3, Insightful)

Re:Moving more data (Score:5, Informative)

Re:Moving more data (Score:4, Interesting)

Re:Moving more data (Score:2, Funny)

Couldn't time fix this? (Score:4, Insightful)

Re:Couldn't time fix this? (Score:4, Informative)

Re:Couldn't time fix this? (Score:3, Informative)

gcc? (Score:5, Interesting)

Re:gcc? (Score:4, Informative)

Re:gcc? (Score:3, Interesting)

Re:gcc? (Score:3, Insightful)

I'll save you guys the read. (Score:2)

Re:I'll save you guys the read. (Score:2, Funny)

Re:I'll save you guys the read. (Score:4, Funny)

Re:I'll save you guys the read. (Score:5, Informative)

How mature are the compilers? (Score:5, Interesting)

Re:How mature are the compilers? (Score:5, Insightful)

Opteron is faster in 64 bit (Score:5, Informative)

Re:Opteron is faster in 64 bit (Score:5, Informative)

And 32 bit is slower than 16 bit (Score:5, Interesting)

Re:And 32 bit is slower than 16 bit (Score:3, Interesting)

Not so simple for AMD64 (Score:4, Interesting)

Re:Not so simple for AMD64 (Score:3, Interesting)

What I found most remarkable... (Score:3, Interesting)

If 32bit is faster than 64... (Score:5, Funny)

Re:If 32bit is faster than 64... (Score:5, Insightful)

Jebus christ. (Score:2, Informative)

I'd kill for a 64 bit platform... (Score:3, Interesting)

More bits doesn't automatically mean more speed (Score:5, Insightful)

6502? (Score:4, Interesting)

Let's not do anything like that! (Score:3, Insightful)

This guy is a tool (Score:5, Interesting)

Re:This guy is a tool (Score:2)

Something is wrong. (Score:5, Interesting)

Re:Something is wrong. (Score:5, Insightful)

Re:Something is wrong. (Score:3, Informative)

Anyone ever used WinXP-64bit edition? (Score:4, Interesting)

Re:Anyone ever used WinXP-64bit edition? (Score:5, Insightful)

Forward thinking (Score:5, Interesting)

Re:Why do I have that feeling... (Score:2)

Re:Why do I have that feeling... (Score:4)