A Review of GCC 4.0 429
ChaoticCoyote writes "
I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion:
Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this.
That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0.
Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit.
"
The performance of compiled code (Score:5, Informative)
Some people spend 10 hours tweaking compiler settings and optimizations to get an extra 5% performance from their code.
Other people spend 2 hours selecting the proper algorithm in the first place and get an extra 500% performance from their code.
To semi-quote The Matrix: One of these endeavors... is intelligent. And one of them is not.
email (Score:1, Informative)
I tried to email you about your "thethere" mistake, but you don't want to talk to people apparently. Not the most important of corrections maybe, but anyway...
scott.ladd@coyotegulch.com
SMTP error from remote mailer after RCPT TO::
host smtp.secureserver.net [64.202.166.12]:
553 217.209.223.* mail rejected due to excessive spam
Fast KDE compile. (Score:4, Informative)
Re:Fast KDE compile. (Score:4, Informative)
Re:What about... (Score:5, Informative)
That said, remember that the submitter is talking about GCC4 on x86 platforms, and remember that Apple is putting a lot of work into making sure the PowerPC optimizations are as good as possible. Not to mention things like GCC4's auto-vectorization of code to take advantage of the Altivec unit (which has a more noticeable effect than MMXing x86 code).
It would be nice to see some test results for Apple's GCC versions 3 and 4.
Re:kettle? black? (Score:5, Informative)
Re:Screenshots? (Score:3, Informative)
Here you go:
bash$ gcc -o test main.c
bash$
Re:I'll tell you what the problem is... (Score:5, Informative)
This was meant as a joke, but for those who took this too seriously: if you have ever tried building GCC yourself, you should know that it always recompiles itself.
A gcc "stage 1" build is gcc compiled with your old compiler. The "stage 2" build is gcc compiled with the compiler created in the previous stage. This is the one that gets installed. The "stage 3" build is optional and verifies that the "stage 2" compiler creates the same output as the previous one.
Re:Kind of a weird review (Score:3, Informative)
This is expected, I think (Score:5, Informative)
"Before we get a bunch of complaints about the fact that most binaries generated by GCC 4.0 are only marginally faster (and some a bit slower) than those compiled with 3.4, let me point out a few things that I've gathered from casually browsing the GCC development lists. I'm neither a GCC contributor nor a compiler expert.
Prior to GCC 4.0, the implementation of optimizations was mostly language-specific; there was little or no integration of optimization techniques across all languages. The main goal of the 4.0 release is to roll out a new, unified optimization framework (Tree-SSA), and to begin converting the old, fragmented optimization strategies to the unified framework.
Major improvements to the quality of the generated code aren't expected to arrive until later versions, when GCC contributors will have had a chance to really begin to leverage the new optimization infrastructure instead of just migrating to it.
So, although GCC 4.0 brings fairly dramatic benefits to compilation speed, the speed of generated binaries isn't expected to be markedly better than 3.4; that latter speedup isn't expected until later installments in the 4.x series."
Re:I'll tell you what the problem is... (Score:1, Informative)
1- Compiles 4.0 with system compiler (3.4.3) and no optimizations
2- Compiles 4.0 with stage 1 compiler, full optimization
3- Compiles 4.0 with stage 2 compiler, full optimization.
4- Checks that stage 2 and 3 produced the same code. Results of stage 3 is the final compiler.
(I might be missing a stage there)
Modern GCCs even have a bootstrap target that adds an extra stage where GCC is profiled, to see which branches are taken more often, and the results are fed back into the next stage so the compiler is optimized for real world usage. Nice stuff really.
It really does depend on the code (Score:5, Informative)
On that program (on a P4) I got an 11% reduction in runtime using GCC 4 vs. GCC 3.3.5. This was actually a big deal for me work.
The lesson here: You're mileage with GCC 4.0's improvements may vary from the benchmarks, and you might want to try it on your own code.
Re:The Future? (Score:2, Informative)
GCC4 has a new optimisation architecture called "Tree SSA". This introduces a new representation (well, actually two: GENERIC and GIMPLE, although the latter is a subset if the former) for programs under compilation. The GIMPLE representation is used for advanced high-level optimisations before feeding the code into the compiler "back end" for architecture-specific optimisation and code generation.
The advantages of Tree SSA are multiple:
* cleaner architecture
* allows high level optimisations that were previously hard or impossible to do at the RTL (Register Transfer Language) level used by the backends
* despite being "high level" many optimisations that take advantage of program structure can be made language independent because of the GENERIC / GIMPLE representation
However, it'll take a while for new optimisations that have been enabled by this framework to be written. The idea is that Tree SSA breaks a fundamental barrier to the continued improvement of optimisation in GCC and should yield gains in years to come.
There are some other nifty things in GCC4 like the "mudflap" system for detecting program errors. Enhanced type-checking fussiness is also welcome as far as I'm concerned, even if it results in some compile errors.
Re:Speed/Performance Benchmarks?? (Score:3, Informative)
Also as he hates to have pointed out his options aren't always optimal.
Quite a few applications are faster with 3.4.3 on a P4 with "-fno-regmove" as well as -O3. My AES for instance goes down from >500 cycles/block to 380 cycles/block on my Prescott P4 with this switch.
380 cycles/block is faster than Intel CC v8.0 with "-O3 -xP -ip" by about 30 cycles/block.
Also the guy probably didn't try profiling. I can drop a fair chunk of cycles in doing ECC point multiplies on my P4 with GCC by doing a profiled build system.
ETC!
Tom
Re:Screenshots? (Score:1, Informative)
-K
Re:Fast KDE compile. (Score:4, Informative)
GCC bug report
http://lists.kde.org/?l=kde-cvs&m=111451142117674
KDE CVS report
It involves some problem with register allocation. It seems only to miscompile KHTML, and there is already a patch attached to the GCC bug report (although the patch just disables the optimization that is causing the problem, rather than fixing the core problem itself).
4.0.0 broke backward compatibility big time (Score:5, Informative)
Recently, a discussion took place on a FreeBSD mailing list wether the project wanted to use GCC 4.0.0 as the system compiler. Some objections where:
If I understood it right, We won't have a GCC 4.0.0 system compiler on FreeBSD anytime soon. Installing the gcc40 port is, of course, always possible.
Re:The Future? (Score:1, Informative)
SH
Re:I'll tell you what the problem is... (Score:5, Informative)
Re:-ftree-* (Score:4, Informative)
True, this is the major infrastructure change which justified the "4".
The author of this test didn't seem to notice that this stuff doesn't get enabled in -O2 nor -O3, but does have to be enabled by hand.
No, most tree-ssa optimizers are enabled implicitly at -O2 (they replace quite a few of the old RTL-based optimizers). Only some numerics code can benefit from loop autovectorization (which has to be enabled explicitly; for most source code, it just increases compile time).
Re:Kind of a weird review (Score:2, Informative)
As the ratio of raw CPU MIPS to memory bandwidth and latency continues to increase, systems lean more and more on caches to compensate. Since larger code eats up more of the scarce cache resources to do the same job, small code can be more important than code with the lowest raw instruction clock count. This can be especially important in C++, where redundant code generated by templates can really get out of hand if not properly controlled.
Re:Expected (Score:5, Informative)
As nice as C is, a lot of the improvements in GCC seemed to have been targetted at improving its handling of C++ code. I'd particularly like to know how it fairs with respect to modern C++ style code - massively templated stuff with STL, Boost, traits and policies, smart pointers, lots of small inlined methods, etc. This test tells me nothing about that, and that's where a lot of development is these days.
Re:Expected (Score:3, Informative)
I call it the most reasonable benchmark because it has thousands of contributors and covers a wide range of code purposes and individual coding habits - and yet, performance is omitted.
As someone who has done some kernel (see this old project [ed.ac.uk]) and other programming, I would probably disagree with this statement. The code you find in the linux kernel is rather different (think concurrency, locking, I/O waiting, message passing) to the code you'd find in a number crunching application (think for loops that take forever, huge data sets, nested recursion) which would be rather different to the code you find in something like LAME (think DSP code).
As for the coding habits, the kernel develoment process encourages similar coding habits as it makes the code easier for others to read. There would be differences between different subsystems, which brings us to another problem: where do you start benchmarking the kernel as a whole?
Re:kettle? black? (Score:3, Informative)
The GCC folks released this with it being well documented that it wasn't going to blow the doors off for everyone in every situation, but instead that this was a major step forward for internals, which should allow them to make some major steps forward that are externally visible soon.
Re:This is expected, I think (Score:3, Informative)
Re:Tree-ssa is in there (Score:2, Informative)
Re:Expected (Score:2, Informative)
Re:Expected (Score:5, Informative)
The McCat compiler from McGill (which is what gcc borrowed the ssa rep from), C-- or the LLVM project all provide a much nicer platform. The internal representation is clearly documented, there are frameworks and examples for writing new passes and most importantly they all allow for whole program compilation.
Until gcc decides to support some of this the project will continue to be ignored by research groups. This might be fine since research compiler work can be fairly ugly and it is just easier to port what works.
Otherwise I agree that the move to ssa form is a critical step for gcc to take and it will enable it to become a "modern" compiler. More emportantly it will enable the inclusion of the large body of compiler work that is based on ssa forms.
Mark
Re:Still generating 386 assembly? (Score:5, Informative)
If you build gcc yourself, you can even make them the default by configuring with an appropriate --with-arch option.
Re:4.0.0 broke backward compatibility big time (Score:4, Informative)
"not compiling cleanly" may have been a less-accurate description of the problem.
Re:intel compiler (Score:2, Informative)
Sigh.. I wasn't saying that it doesn't make code run faster. I was saying that it doesn't necessasrily make code run faster. Auto-vectorization is only a win in certain circumstances. There are a whole host of optimizations that only apply in specific circumstances and/or only improve performance in certain circumstances and slow things down in others. If there weren't trade-offs with optimizations, compilers would just have "-O" and wouldn't bother with tons of other optimization flags.
Re:4.0.0 broke backward compatibility big time (Score:5, Informative)
It should not surprise anyone that the first 0.0 release has some bugs. It's the first release of a compiler with a completely new optimization structure (tree-ssa). I would advise waiting for 4.0.1 for a production-quality release, or go with vendor patches (by making Fedora Core 4 with a 4.0.0 based compiler, Red Hat will probably shake out a few more bugs).
Re:No, the third run is for finding bugs (Score:5, Informative)
Historically, GCC tends to bring out the worst in compilers. That is why when you build GCC, the system compiler will be used once,
Unlikely but possible. Look for the paper "Reflections on trusting trust" for a beautiful hack involving intentional miscompilations. The author basically changed the compiler so that when "login" was being compiled, the compiler inserted a back door. And when a new compiler was being compiled, the compiler would insert the code to insert the back door and to change the next compiler. And then no matter how much you checked teh source to either login or the compiler, you would never notice the back door.
Re:Expected (Score:1, Informative)
Even then, a lot of current code is optimized for existing compilers. For example, GCC 4.0 finally includes full-blown scalar replacement of aggregates, so struct members can be accessed with less overhead when accessed repeatedly (as oposed to reloading address and offset). This makes a lot of difference in heavy templated code with lots of iterators. OTOH, GCC has included a simpler version since 2.95 at least, which does something similar for structs with a single member, like most of the STL iterators.
So in this case, if you use code that only uses STL iterators, you won't see any improvement (as there was already no abstraction penalty), so you have to choose benchmarks carefully. Likewise you might run into a lot of code that avoids unoptimized constructs. Like passing and returning pointers to avoid depending on constructier ellision and NRVO.
Smart pointers are something I'm very interested in and will be checking out as soon as I clear out enough space to build a cygwin binary. Modern smart pointers (boost::smart_ptr or std::tr1::smart_ptr) have more than one member, so scalar replacement of aggregates should be really nice in terms of execution speed.
And at some point I'll play around with the new lambda libraries. Should be nice to implicitely build up a function inside the parameter list of a transform
Re:The Future? (Score:3, Informative)
Up until now, my PhD work has needed compilers I can't just simply install without high fees, because the academic free license for propriatary compilers still sounds a bit fishy in it's requirements. This is actually a major boost for the Scientific Computing community.
However, lots of people have just NOW started to trust current F95 compilers (lots of academic code are still written in F77). It will be several more years until they trust the GNU Fortran 95 compilers.
Besides, while it is called Fortran 95, does not mean it was actually in heavy use by 1995.
"problem" with gcc 4.0 (Score:1, Informative)
Question is, should they be upset at the compiler?
Recently, I found this thread [reactos.com] on the reactos [reactos.com] forums. It is about compiling reactos with gcc 4. Sure enough, there were problems. One thing that caught my eye is this:
Seems like a good opportunity to start checking code against 4.0.0 and fix them warnings before they get promoted to errors in a subsequent version...Re:moronic review (Score:3, Informative)
I have no clue what you're talking about.
One benchmark *was* C++ (povray), and, in fact, I use KDE as my desktop. It just so happens that most code in a distribution is written in C.
I have quite a bit of heavily-templatized C++ in my library and customer code, but it is either proprietary (under NDA) or unsuitable to timing. As I state in the article, C++ programmers should seriously consider GCC 4.0 for it's imrpoved compile times, if nothing else.
Re:No, the third run is for finding bugs (Score:4, Informative)
Re:The ? operator (Score:3, Informative)
we're laughing _with_ you.
Re:4.0? (Score:3, Informative)
The dicussion about recommended GCC versions of the Linux kernel regularly pops up on the kernel mailing list. For instance, you can see one such a discussion here:
http://groups-beta.google.com/group/linux.kernel/
A much better indicator on GCC quality is to see what versions various Linux distributions actually use. For instance, if you take SuSE pro 9.3, it uses GCC 3.3.5.
Marcel