A Review of GCC 4.0 429
ChaoticCoyote writes "
I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion:
Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this.
That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0.
Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit.
"
Expected (Score:5, Interesting)
What about... (Score:3, Interesting)
intel compiler (Score:1, Interesting)
Re:The performance of compiled code (Score:2, Interesting)
That's when you spend 10 hours tweaking compilers settings...
The Future? (Score:3, Interesting)
While I know the benefits of Fortran 95 are a big thing, saying it's a technological step forward to incorporate for the first time a 10 year old standard seems a bit ridiculous. When I first saw this article I had to check my calendar to make sure it was May 1st and not April 1st.
Compilation Speed Test by a KDE developer (Score:5, Interesting)
Qt:
-O0 -O2
gcc 3.3.5 23m40 31m38
gcc 3.4.3 22m47 28m45
gcc 4.0.0 13m16 19m23
KDElibs (with --enable-final)
-O0 -O2
gcc 3.3.5 14m44 27m28
gcc 3.4.3 14m49 27m03
gcc 4.0.0 9m54 23m30
KDElibs (without --enable-final)
-O0
gcc 3.3.5 32m56
gcc 3.4.3 32m49
gcc 4.0.0 15m15
I think KDE and Gentoo people will like GCC 4.0
Re:The performance of compiled code (Score:1, Interesting)
Why should they? What problem do you have how someone spends their time? If someone wants to make their system as fast as they can, that's their business.
You're obviously a small box user. Have you ever worked in the real world where huge batch runs can take weeks? You think companies should splash out another million or too on new hardware, just because you use a pissy little machine?
Re:The performance of compiled code (Score:5, Interesting)
Yes.
You think companies should splash out another million or too on new hardware, just because you use a pissy little machine?
I think that companies should re-evaluate their "need" for an extra 5% performance. Here's an idea -- if you need something 10 minutes faster, why not start the process 10 minutes sooner?
5% just gets lost in the noise. You beef up your system, making it 5% faster... And then some retard in production makes a mistake and sets you back six weeks.
Kind of a weird review (Score:5, Interesting)
Second, the runtime benchmarks were close enough to be statistically meaningless in most cases. The author concludes with:
My take would have been "in terms of raw numbers, it's not really any better yet." It's close enough to equal (and slower in few enough cases that I'd be willing to accept them), though, that I'd be willing to switch to it if I could do so without having to modify a lot of incompatible code. It's clearly the way of the future, and as long as it's not worse than the current gold standard, why not?
Re:I'll tell you what the problem is... (Score:3, Interesting)
Re:intel compiler (Score:5, Interesting)
Do the new models replace or confuse old ones? (Score:4, Interesting)
I agree that this compiler is a cornerstone of free software.
But it was very frustrating to me to try to port the compiler to a new platform by modifying existing back ends for similar platforms.
After spending a few months on it (m68k in this case), I could not escape the layers of hack upon cruft upon hack upon cruft, that made it extremely difficult to make even fairly superficial mods because everyone seemed to be using the features differently and all the power seemed lost in hacks that made it impossible to do simple things (for me anyway). I am quite familiar with many assemblers and optimizing compilers.
I hope that the new work makes a somewhat-clean break with the old, otherwise, I would fear yet another layer to be hacked and interwoven, with the other ones that were so poorly fit to the back ends.
I suspect that not all backends are the same and perhaps the same experience would not be true for a more-popular target, but it seems to me it shouldn't be that hard to create a model that is more powerful yet more simple. Such would seem to me to be a major step forward and enable much greateer optimization, utilization, maintainability, etc.
-ftree-* (Score:5, Interesting)
If I was him, I'd repeat the tests again enabling the -ftree stuff when building with gcc4.0.0.
when? why? (Score:2, Interesting)
At what point (of 3's evolution) would you say it surpassed 2.95? Why?
non-x86 arch? (Score:3, Interesting)
What about the performance on MIPS? PPC? C'mon, people...enquiring minds want to know!
Re:The performance of compiled code (Score:4, Interesting)
I think that companies should re-evaluate their "need" for an extra 5% performance. Here's an idea -- if you need something 10 minutes faster, why not start the process 10 minutes sooner?
In any large organization, the process gets in the way. Some suit decides the product needs a new feature, or needs to ship sooner, or whatever, and this slowly trickles down to the developers who suddenly are put in crunch time where every minute counts. Schedules and deadlines may change daily. People's jobs may be at risk. Shit happens.
Nobody really likes it, but that is sometimes how we arrive at the point where we "need" an extra 5% performance, where we "need" the program to finish ten minutes sooner. Starting earlier is not always an option, usually because you don't know you even have to start *at all* until the last minute.
Re:What the hell? (Score:3, Interesting)
Re:Expected (Score:5, Interesting)
First off, all of the programs tested are programs that use hand-tooled assembly in the most performance-sensitive code. That has to mean that the compiler is moot in those sections.
A better test would be to compare three things: the hand-optimized assembly under gcc 3 vs the C code (usually there's a configure switch that tells the code to ignore the hand-tuned assembly, and use a C equivalent) under gcc4 vs that same C code under gcc4.
I think you'd see a surprising result, and if the vectorization code is good enough, you should even see a small boost over the hand-tuned assembly (since ALL of the code is being optimized this way, not just critical sections).
Re:The ? operator (Score:3, Interesting)
Re:I'll tell you what the problem is... (Score:3, Interesting)
Re:Expected (Score:5, Interesting)
The main improvement in GCC 4.0 is implementing Single Static Assignment.
SSA is not an optimization. It is a simplification. If you can assume SSA, then it opens the door to an entire class of optimizations that can help improve your performance without affecting your code's correctness.
That last bit -- optimizing code without affecting correctness -- was a big problem in the days before SSA.
In that regard, SSA is a similar technology to RISC -- it does not speed things up by itself, but it enables speedups for later on.
The lack of SSA is one thing that kept gcc out of the hands of compiler researchers. Now that it does that, academia can start hacking away with gcc, and the delay you expect is the time between implementing SSA and implementing all of the optimizations that really will improve code performance.
Funny you should say that - a story about sprintf (Score:4, Interesting)
Oh yeah, also, for Quake 1, John Carmack hired Michael Abrash, an assembly language guru, to help out. Well Abrash found that GCC's memcpy() (or whatever it was) was copying byte-by-byte instead of by word (or something, I don't remember) and his reimplementation of that alone, doubled the frame rate!
Just some interesting counter examples to keep in mind
Observations on Apple's GCC4 release (Score:5, Interesting)
I've been working with the GNU GSL on my mac a lot, and I recently updated to Tiger. The first thing I noticed when I recompiled the GSL with Apple's modified GCC4.0 is the significant and noticable speed increase. With this intense math stuff, doing SVD on 300x200 matricies, and it's shocking how much faster it is. I went from 3-5 seconds down to less than one.
I am not going to post any hard numbers because I haven't rigorously compared them yet, but I'll make some formal comparisons this week.
No, the third run is for finding bugs (Score:3, Interesting)
Still generating 386 assembly? (Score:2, Interesting)
If the amount of energy being spent on redesigning the kernel architecture, redesigning the compiler architecture, and redesigning the command usages was spent supporting new instruction sets, it could probably catch up to MSVC from 2000.
It's sort of sad that instead of improving the computer's ability to perform a certain amount of work in a certain amount of time, all the energy in GCC has always gone towards the study of compiler design itself.
Re:kettle? black? (Score:3, Interesting)
Exactly. The goals in releasing software are completely different for GCC and MS.
For GCC, like a lot of open-source products, the idea behind releasing all-new x.0 versions is to get it out there so early adopters will start using it, filing bug reports, etc. It's the same reason the Linux kernel releases new even-numbered versions (2.x.0) before they're really ready for mainstream use. If they waited too long, people would avoid them, thinking they're "just development versions", and it'd take forever to get the bugs out. Unlike with commercial software, you need to know what you're doing when you use these open-source products directly, rather than using a packaged distribution. If you want a stable system, don't download GCC or Linux directly and compile from sources. Get the version that comes with your distribution.
MS, on the other hand, wants to get money and marketshare when they release all-new versions of their software. When they release a new version, they do so with the implication that this product is ready for general use, and that all the bugs are worked out (after all, you're paying a lot of money for it, so shouldn't the debugging be part of the price?). This is reinforced by the fact that most updates are not free.
If I buy a car, I expect it to work reliably, and not break down within the warranty term, as long as I perform the required maintenance. If a wheel falls off as soon as I drive out of the dealer's lot, they have to fix it at their own expense. I might even get a new car under the Lemon Law if this happens multiple times. What's more, if the car has any serious defects, these are usually fixed for free under factory recalls.
But when I buy MS software, none of this applies. There is no warranty whatsoever. If it doesn't work, there is no recourse. And if they release a new version that fixes a lot of problems, there's no guarantee it will be free. Did all the Windows ME buyers get a new version of Windows to fix that disaster of an OS? No, they had to upgrade at their own expense.
The bottom line is that the product MS releases is a shrink-wrapped product that is supposedly intended for direct use by the general public, which implies a certain level of fitness. GCC, on the other hand, does not release its product directly to the general public (though they're certainly free to download it and try it out if they wish). Its product is intended for use by software developers (who want to try out the latest, and possibly buggy compiler) and distributions. People who just want to surf the net or write documents should not be concerned with this, nor should developers who want to produce stable code. These people should all be simply using precompiled distributions, and using the software versions provided by them.
Re:Still generating 386 assembly? (Score:1, Interesting)
Re:Fast KDE compile. (Score:1, Interesting)
GCC's optimization passes try to be as machine independant as possible. They, of course, handle machine specific things, like allocating and spilling registers and so on. But sometimes some platforms have some stupid rules for what registers and instructions to be used when (instruction XXX sets a flag, so instruction YYY has to be issued afterward to use that flag before it gets overwritten, or maybe register foo had been spilled so it has to be reloaded before ZZZ, or this value always has to be in register A, etc). A lot of this code ends up in a function called reload, in a file by the same name.
Over the years reload has grown, and grown, and grown. No developer understands all of it. A goal of many people is to improve the register allocator and othr passes to make it so reload can be disabled on most platforms. The thing is, well, not quite buggy, but fragile and causes many bugs when things change.
So when Pinski says "Reload, wow, not that unexcepted really." he means it. There are also other comments in that thread, like "Though it is still a reload patch, and all reload patches are dangerous."
From what I understand from the rest of the thread, reload needs to make two hardware registers the same for an instruction, but ends up using up a virtual register it shouldn't, which later confuses the rest of the compiler since there are still notes to not use the virtual register.
The bug has been fixed in CVS. Patch is in the attached web page, both for khtml to use with 4.0.0 and for GCC if you roll your own and want to compile khtml.
Re:What about... (Score:4, Interesting)
Well, I did have a bunch of results for you, but the CRAPPY LAMENESS FILTER won't let me post them. Apparently I have to use less 'junk' characters (of course the CRAPPY PROGRAMMER didn't define what a 'junk' character is in the error message, so that's NO USE WHAT-SO-EVER.)
So, I guess I'll summarise. gcc version 4 is slightly worse than 3.3, and slightly better when the tree-vectorize option is passed and altivec code is generated.
Simon