A Review of GCC 4.0 429
ChaoticCoyote writes "
I've just posted a short review of GCC 4.0, which compares it against GCC 3.4.3 on Opteron and Pentium 4 systems, using LAME, POV-Ray, the Linux kernel, and SciMark2 as benchmarks. My conclusion:
Is GCC 4.0 better than its predecessors? In terms of raw numbers, the answer is a definite "no". I've tried GCC 4.0 on other programs, with similar results to the tests above, and I won't be recompiling my Gentoo systems with GCC 4.0 in the near future. The GCC 3.4 series still has life in it, and the GCC folk have committed to maintaining it. A 3.4.4 update is pending as I write this.
That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC. Version 4.0.0 is laying a foundation for the future, and should be seen as a technological step forward with new internal architectures and the addition of Fortran 95. If you compile a great deal of C++, you'll want to investigate GCC 4.0.
Keep an eye on 4.0. Like a baby, we won't really appreciate its value until it's matured a bit.
"
Re:The performance of compiled code (Score:4, Insightful)
If you really, positively need an extra 5% performance, you might as well just buy a computer that's 5% faster.
Re:The performance of compiled code (Score:3, Insightful)
kettle? black? (Score:2, Insightful)
quote "That said, no one should expect a "point-oh-point-oh" release to deliver the full potential of a product, particularly when it comes to a software system with the complexity of GCC."
I bet no one would dare say that about certain product from Redmond.
Re:The performance of compiled code (Score:5, Insightful)
Re:The performance of compiled code (Score:5, Insightful)
KFG
Re:The Future? (Score:2, Insightful)
Nice analogy... (Score:1, Insightful)
I'll just have to make sure you never babysit for me, if babies are that value-less to you.
Re:Expected (Score:5, Insightful)
Further, on his most reasonable C benchmark (the Linux kernel), he only records compile time and binary size, but no performance. I call it the most reasonable benchmark because it has thousands of contributors and covers a wide range of code purposes and individual coding habits - and yet, performance is omitted.
In short, I wouldn't trust this benchmark. Probably the best benchmark would be to build a whole Gentoo system with both, with identical configurations, and check build times and performances
Re:The performance of compiled code (Score:5, Insightful)
Optimizing every single line of code is a complete waste of time, since the 80/20 rule generally applies. Use a profiler to determine where that 20% is.
Re:The performance of compiled code (Score:3, Insightful)
Re:-ftree-* (Score:1, Insightful)
My conjecture is that they require it enabled by hand so that people who know what they're doing enable it, watch what code blows up, then produce intelligent bug reports that can be directly linked to the vectorization and therefore fixes can be produced for 4.0.1
Re:kettle? black? (Score:4, Insightful)
I think you're a little mixed up there. When we criticize MS, we're often referring to the release of known buggy and badly implementated software to the general public. Instead the submitter of this article is referring to the "full potential" of the new optimization framework [gnu.org] in GCC-4.0. It will, in theory, allow for much better optimizations to be performed on internal parse tree. But for now many of the CPU models are incomplete or non-existant, or something like that. The full potential of these optimizations will be delivered in a later release, either 4.0.x or 4.1, or perhaps a little of both. And the GCC team wouldn't have released GCC 4.0 with known, serious bugs.
Or perhaps I've just been trolled. Wouldn't be the first time. I see that this is your first comment on slashdot. Welcome. Just don't troll.
Re:The ? operator (Score:2, Insightful)
a=min(a,b)
that is one of the most self-descriptive statements i have ever seen.
Re:The performance of compiled code (Score:5, Insightful)
Re:Gentoo (Score:2, Insightful)
He's not a dumbass because he uses Gentoo. It's pretty obvious that he doesn't know what he's talking about. Straight from TFA:
Some folk may object to my use of -ffast-math -- however, in numerous accuracy tests, -ffast-math produces code that is both faster and more accurate than code generated without it.
"I don't know about you, but if I want my math done fast and wrong I'll ask my cat" - Anonymous
Tree-ssa is in there (Score:4, Insightful)
Auto-vectorization, by the way, does not fall into a "obvious optimization wins which perhaps should be enabled at -O3 by default" category. It can bring very big performance benefits in some situations, but it should be used with caution.
Re:The performance of compiled code (Score:3, Insightful)
If you're talking about a program written by one person to be run by one person, or written by five people to be run by five people, or a program that will be run a limited number of times or while people are getting coffee, then absolutely you are correct.
But if you're talking about a small group of programmers making an interactive program (including simulations which people wait for the answer to before starting another run) to be run by millions of people, or to be run iteratively millions of times or over an enormous dataset of comparably size, then 5% is absolutely worth it. If you spend 10 manhours tweaking out 5%, and you've gained only a mere 100 milliseconds, then as a whole you've made out quite well after the collective time saved by those millions of people, or by the millions of runs, are accumulated. And often 5% can result in much more time savings than that.
If you really, positively need an extra 5% performance, you might as well just buy a computer that's 5% faster.
If you can afford all the computers that are 5% faster, then do both! Then you get 10%, and double the benefit. If the first 5% makes a significant difference for a certain application, then the second one probably will as well.
What a crappy review (Score:3, Insightful)
lame uses assembler code for vectorization. One of the new features of gcc 4 is the beginnings of a vectorization model. A good test for gcc 4 would have been to compile some C-only bignum libraries, and Ogg Vorbis! povray is also a good example, but then you need to test more than one specific test-run. Maybe gcc 4 makes radiosity in pov-ray 400% faster at a 2% cost in the rest of the code?
This guy is the Tom's Hardware of Linux reviews, except he doesn't have the annoying ads, and he does not split his lack of content over 30 HTML pages.
The new warnings of gcc 4 have helped me find a bug in my code. That saved me a week. Consider how much faster gcc 4 needs to make pov-ray or lame to save you a week of work!
gcc 4 can now reorder functions according to profile feedback. That should make large C++ projects faster. Also, the ELF visibility should make KDE start much faster. This should have been tested!
Please note that I'm not saying gcc 4 produces faster code. I don't rightly know. I do know it produces smaller code for my project dietlibc [www.fefe.de], where size matters more than speed.
Re:fpmath=sse (Score:1, Insightful)
float x, y, z;
x=y*13.2*z;
gcc will take 13.2 for a double, it will do half of the expression with sse and the other half with FP.
That is in fact worse.
I wish gcc had an option
--fbest-code-for-current-machine
where it will enable all the options to get the fastest code for the machine it is executing in. So no deps on the incompetent autoconf scripts, etc. Compiler detects CPU and turns on apropriate options.
Re:4.0.0 broke backward compatibility big time (Score:5, Insightful)
Re:Funny you should say that - a story about sprin (Score:3, Insightful)
Re:Nice analogy... (Score:3, Insightful)
Signed, a parent of three.
Re:intel compiler (Score:3, Insightful)
GCC, on the other hand, has a different goal: get a working compiler on as many platforms as possible.
Re:The ? operator (Score:3, Insightful)
If I ever see a developer do something as stupid as this on a job application, there's no way they will ever get a job working for me.
Having clean, readable code is far, far more important than saving a few minutes in total in a project. Using compiler-specific features is generally frowned upon, but acceptable in cases where there are significant performance or time gains. Using a compiler-specific alias to save yourself a few extra keystrokes at the extreme cost of readability is just being lazy, and not thinking about how that code will be maintained in a year.
I feel the same way about the ternary operator, actually. There are a few cases where it's clear enough to be used, and where it saves several lines of typing. However, 95% of the time that people use it, it only makes the code impossible to understand.
Re:The performance of compiled code (Score:5, Insightful)
In my real life coding work, the places where algorithm efficiency makes a difference are far outweighed by those places that don't. And of those places that do make a difference, the performance is rarely a critical need. For example, I just coded up some RAMDAC lookup tables, and a difference of algorithm would make a huge difference in efficiency. But this particular routine was triggered by a user event (clicking a button in a config dialog), so that my dogslow but highly readable/understandable algorithm wasn't a bottleneck for anything. In this case tweaking the compiler settings would have given a 5% boost to everything, but a change in algorithm would only have given a 1/10 second boost for an event that would happen approximately once a week or less.
God, this gets old. (Score:4, Insightful)
4.0.0 is a brand new compiler. Lots of techniques in it are brand new. Lots of tweaks and polish can be applied. If you actually take the time to compare 3.4 to 3.0, you'll find that the gap is bigger than 4.0 to 3.4. Furthermore, if you compare 2.9.5 to 3.0, you'll find 2.9.5 is better than 3.0 by a much wider margin than 3.4 is to 4.0.
This is a misunderstanding of the nature of progress. 4.0 is a brand new compiler with brand new internal behaviors. Lots of things are at the It Works stage, instead of the It's Efficient stage. You can't compare a 3-year polished compiler to a 3-week polished compiler; it's utter nonsense.
If you want to compare 4.0 to something, compare it to 3.0, or sit down.
Re:4.0.0 broke backward compatibility big time (Score:1, Insightful)
You know, I'm sick and tired of idiots who think some standard on a piece of paper is more important than real-world code. The fact of the matter is that there is a lot of real-world C code out there that has compiled just fine for years until the GCC developers got all prissy and started deliberately breaking code.
I'm sick and tired of patching just about every C program written more than two years ago just because the GCC developers decided to break code that compiled just fine.
To hell with the standards. I want something that compiles all of the open source code which compiled fine five years ago without forcing me to make huge patches in the name of standards.
If the standards do not reflect real-world code, they need to be rewritten.
Re:The performance of compiled code (Score:4, Insightful)
How is pointing out that one optimization people crow about is largely ineffective being an asshole?
Re:The performance of compiled code (Score:2, Insightful)
Is this a troll? In most high-performance computing environments (national and local supercomputing centers) -- at least when they are being well-utilized -- they are in non-stop use. You don't start ten minutes sooner because you or another user are hammering the machine with other jobs.