Genetic Algorithms and Compiler Optimizations 222
mfago writes "Scott Robert Ladd has written an enlightening article and accompanying software package that utilizes genetic algorithms to optimize the optimization flags fed to a compiler. Those who have tried to squeeze the last drop of performance
from a code know that it can be very difficult to determine the set of optimization flags beyond -O3 (with gcc for example) that yields the best performance. Lots of background links included in the article."
Beware the simple, elegant solution. (Score:3, Insightful)
A good dose of skepticism should accompany any examination of the dazzingly elegant solution.
Re:Beware the simple, elegant solution. (Score:5, Insightful)
Neural networks are good when there is a massive amount of unknowns to deal with, and you're better off handling any patterns you can detect, but this is just a simple case of using a new(-ish) method to compare runs with (nearly?) all parameters. Nothing to be skeptical about, really.
Re:Beware the simple, elegant solution. (Score:5, Interesting)
If, as I suspect, the search space is disorderly, then random perturbation should, among the same number of test compilations, be able to find a solution comparable to that which his GA finds.
Re:Beware the simple, elegant solution. (Score:5, Funny)
Wait, I'm confused; I thought on Slashdot we're supposed to wait for someone to mention Chess before we bring up Go?
Re:Beware the simple, elegant solution. (Score:2)
Apparently the news items were released out of order [slashdot.org].
Re:Beware the simple, elegant solution. (Score:5, Funny)
anyway, i think it *is* really cool this guy finds it interesting to tackle this 'problem' with a genetic algorithm. why not try some bayesian networks next? support vector machines? improbability drives...
ooops, i'm getting carried away. sorry 'bout that.
(not intended as flamebait but mod me to hell anyway *evil grin*)
Re:Beware the simple, elegant solution. (Score:5, Interesting)
I'm sure some people would love a few percent improvement in their SPEC results...
Personally, I have an algorithm that I'm running hundreds of millions of times as part of a larger calculation for my PhD research. I'd love a "few percent" improvement simply by changing the compiler flags.
Acovea is curently a research tool in its own right, and not quite ready for application to general problems. Some issues that require modest effort to resolve:
1) Code currently is pentium3/4 centric
2) Optimizes flags for gcc 3.4 or intel's compiler
3) Cannot easily handle code residing in more than one file.
Re:Beware the simple, elegant solution. (Score:2, Insightful)
Re:Beware the simple, elegant solution. (Score:5, Funny)
Slow Go player != inelegant solution
I was going to dissagree with you, but I'm not even sure what your arguments are. Do you have any reason to believe that the solution for the problem discussed in the article is inelegant or innefficient, or did you just get a bad grade in your AI class?
Re:Beware the simple, elegant solution. (Score:2)
Re:Beware the simple, elegant solution. (Score:4, Interesting)
I suspect you either do not understand genetic algorithms, or that you do not have experience in a problem domain where GA could have been successfully used.
Just because you don't understand GA doesn't mean that GA isn't applicable to this problem. I recently had a problem to solve similar to the problem outlined in the article (optimizing results with N variables, where N > 5) and the GA solution beat the pants off the brute-force solution.
Note that I'm actually not saying that GA is a panacea; rather, I'm saying that for some problem domains, like the one in the article, it makes a whole lot of sense.
Re:Beware the simple, elegant solution. (Score:2)
Re:Beware the simple, elegant solution. (Score:3, Insightful)
Re:Beware the simple, elegant solution. (Score:2)
Re:Beware the simple, elegant solution. (Score:2)
I would indicate that Ladd's article sets up an invalid comparison between brute force and his GA search, because he seeds his search with the flag set for -O1 or -O3.
I'm not saying he should make the GA start from scratch. I'm saying that he should permit brute force to start from -O1 or -O3 and pertu
Not just compiler flags? (Score:2)
const size_t read_bufsiz = 1000;
const size_t chunk_size = 5000;
const unsigned int
Re:Beware the simple, elegant solution. (Score:2)
Re:Beware the simple, elegant solution. (Score:3, Funny)
All the top of the pack human players use neural networks. Your point?
Re:Beware the simple, elegant solution. (Score:4, Informative)
It's the cleverness of the algorithm that makes a computerised Go player good. Using a simple stinky algorithm and tweaking its parameters isn't going to turn a mediocre Go player program into a great one.
As always, you pick the solution that fits the problem.
Re:Beware the simple, elegant solution. (Score:2)
Have you read the article? Personally I think it's an interesting method to determine, for a specific program or algorithm, what compiler options produce the fastest results. Survival of the fittest certainly works in this case.
Sure, he didn't provide a specific set of optimization flags that are "best", but that's because it's highly implementation-specific. To imply that it's all about "buzz-words" indicates yo
Re:Beware the simple, elegant solution. (Score:2)
but will require some good deal of math.
Isn't this the Windows technique? (Score:4, Funny)
Re:Isn't this the Windows technique? (Score:2, Funny)
No, it's the Mother Nature technique. (Score:2)
and yes, that really does consist of fiddling with the settings (each generation born counts as a twist to one of the control dials) until it's acceptable (Deer 2.0 can outrace Wolf 1.9, now Wolf needs to breed a 2.0 before they all starve).
I did for this for my Ph.D. defense (Score:4, Interesting)
Re:I did for this for my Ph.D. defense (Score:2)
If genetic algorithms can find better optimizations than hand tuning (which has been
Re:I did for this for my Ph.D. defense (Score:2)
Given the rest of your post, I think you meant, "that hand waving will always get you the best optimizations."
SRL has been doing GA's for a loooong time! (Score:2, Informative)
Re:SRL has been doing GA's for a loooong time! (Score:2)
cool application for GA (Score:5, Interesting)
PostgreSQL (Score:2)
Skepticism (Score:5, Interesting)
However, this seems like a great candidate for GA's - a fitness function (ie speed of execution), and a nice binary vector of atributes (flags).
Interesting that the gains aren't that great it would seem to sugges that the flags don't do much
But Kudos nevertheless. No expert in the GA field, so its possible someones tackled this before, but if they haven't, extra Kudos.
Winton
fragile GA, and what this is and isn't (Score:5, Informative)
It's an important distinction. When you're trying to do breed some really new system using Genetic Algorithms, it's possible to get some very fragile results, i.e. systems that do the job incredibly efficiently, but only under certain conditions...
For instance, I remember reading some folks were trying to "breed" a circuit using Genetic Algorithm like techniques that created a "pulser" using as few gates as possible. They got something that used like one tenth the number of logic gates of the previous low record...but didn't work when they moved the setup away from their desktop PCs. Turned out what they "discovered" was a sensitive RF receiver that picked up (I think) something like the normal 120V cycling, and utilized. Similarly, a lot of the bred ciruits are extremely sensitive to temperature and other conditions.
So breeding for efficiency can be a bit dangerous, you get things that rely on odd side-effects and environmental states. Though I think the method the article describes isn't much more dangerous than normal optimization selection, though admittedly more likely to stumble over some unusual flag combination that might break under certain circumstances.
In short, *if* GA is ever going to let software development become more like gardening than engineering, (something I've herd as a goal) we're going to have to find ways to apply very general evolutionary pressures, complex test environments.
Re:fragile GA, and what this is and isn't (Score:3, Informative)
This is not a problem of the Genetic Algorithm in general, this is the problem of the fitness function: The GA always tries to produce something wich gets the best out of the fitness function (e.g, a pulser with as few gates as possible). If the fitness function also uses the stability of the r
Re:fragile GA, and what this is and isn't (Score:2)
The fact that these GAs are exploiting electric interference and other phonomena that aren't easily applied to design by humans to achieve new and lower cost designs isn't unexpected. But I don't envy the people whose job is to p
GA exploit environments' flaws (Score:3, Informative)
The "folks" you refer to is Professor Adrian Thompson of the University of Sussex. A paper describing his interesting experiment can be found here [nec.com]. It was actually a flawed FPGA chip he was programming.
Another example of this tendency of Genetic Algorithms to make use of helpful "flaws" in their environments can be found in the works of Karl Sims [genarts.com]. A round-off error in his physics model resulted in some weird locomotion by a branch of virtual creatures.
You will find details of both examples in this entry [hjalli.com]
Re:fragile GA, and what this is and isn't (Score:2)
You do not know what you are talking about (Score:2)
GAs are nothing more than search algorithms, but then pretty much all AI is search.
The complexity of any modern programming language is so high that using genetic algorithms to search for "a program in C that calculates fibonacci numbers" seems completely outlandish.
It may seem outlandish to you but using a form of GA known as Genetic Programming, programs to calculate Fibonacci numbers have already been evolved [google.com]. I
... :P (Score:2, Funny)
Re:... :P (Score:2)
How about optimizing compiler options? (Score:2, Interesting)
But, a more interesting problem might be more of a meta-compiler problem. Sure, we can optimize a set of options that we humans have determined/guessed are most influential in the building of efficient code. But what about running an analysis on the options themselves?
If we want to get real optimization, why not apply some data-mining theory to the
Plus ca change, plus ca meme chose (Score:2)
That's all right then
Simon
Re:Plus ca change, plus ca meme chose (Score:2)
Compiler optimisations don't win you much ... (Score:3, Informative)
There is more information on his webpage [microsoft.com]. It's not strictly true -- compilers have moving targets (different architectures and hardware optimisations over time, greater complexity in languages, etc etc), and for important things like scientific applications compilers can really optimise code; but in general R&D towards compiler development seems to be sorely lacking compared to other areas.
I work on IA64, which is fundamentally designed to take advantage of smarter compilers. While there are some interesting projects (ORC [sourceforge.net] for example) nothing is really close to ready to take over gcc/intel compiler. We really want something that compiles once, can profile code execution and then recompile to optimise based on the execution profile; something along the lines of what this article talks about but built right into the compiler.
Re:Compiler optimisations don't win you much ... (Score:2)
Sun's compiler had this for years. From their man-page:
Re:Compiler optimisations don't win you much ... (Score:2)
Why use a GA? (Score:4, Interesting)
As such, is there any justification for using a GA to do this rather than, say, simulated annealing? I'd would be interested to see a comparison of the two approaches. Especially in light of papers like this one [helsinki.fi].
Why not use a GA? (Score:2)
The first rule of evolutionary techniques is: if there is a better-known, more specific optimization technique for the problem at hand, use it instead. In this case, there wasn't one.
Simulated annealing has a lot in common with GAs, but they tend to not do so well on multidimensional discontinuous fitness landscapes.
I'm sure annealing could eventually be used to come up with a solution; I'm just amazed whenever somebody does something cool with Tool X, and all of slashdot immediately jumps all over i
Re:Why use a GA? (Score:2)
I understand that GA -O1 uses all the flags included in -O1 (those should never be removed) plus zero or more other flags. GA -O3 includes all -O3 flags plus zero or more other flags. Since -O3 is a superset of -O1, GA -O1 should be able to find any solution that is found by -O3.
Apparently it doesn't, but I have no clue what that means. Not enough generations? Found a local maximum and can't get out of it? But it happens in almost all cases.
What surprises me the most is that the
How bumpy is the problem? Do you need a GA? (Score:5, Insightful)
Any time you consider using a broad-front search algorithm like a GA, a neural net, or simulated annealing, try plain-hill climbing first. If you try any broad-front search, compare it with plain-hill climbing. Only if the space being searched is dominated by local maxima (but not too many of them) do the broad-front algorithms help. And they're always slower.
If this guy had demonstrated that a GA did better than a plain hill-climber, he'd have an interesting result. But he hasn't demonstrated that.
Re:How bumpy is the problem? Do you need a GA? (Score:4, Informative)
It is known that some optimisations can disable others. For example, "merge frequently called functions with parent" will penalise "store frequently used variables in registers" or "give priority to inner-most loop variables in registers". You could certainly build on the research in this article. What are the best types of optimisations for different types of algorithm (B-tree traversal, matrix-vector mathematics, text-editing). Does compiling different modules of an application provide improved performance over compiling all modules with the same set of flags?
Re:How bumpy is the problem? Do you need a GA? (Score:2)
I would fear enabling different flags for different modules, except perhaps flags that are explicitly guaranteed to be only local in scope. Even then, I'm not so sure, as I've run into optimization-dependent bugs in programs that are a real PITA.
Re:How bumpy is the problem? Do you need a GA? (Score:2)
Hill climbing algorithms are more susceptible to getting trapped in a local minima than genetic algorithms are. GAs tend to avoid this because of the cross-over breeding that occurs.
Re:How bumpy is the problem? Do you need a GA? (Score:2)
This problem is complex enough (the combinations you need for good optimization of your code are fragile when other chunks of options are taken out) for the hillclimber to get easily stuck in a local minimum. It's probably a good idea to use something different than the local search, and maybe add some annealing, tabus, or populations, and I think the choice of a GA is very suitable for the problem.
If this guy had demonstrated that a
Re:How bumpy is the problem? Do you need a GA? (Score:3, Insightful)
-O3 (Score:2)
Re:-O3 (Score:2)
Yes, you're right: -O3 is simply -O2 with -finline-functions and -frename-registers, neither of which are likely to make much difference and -finline-functions could actually slow things down.
There's a fantastic freshmeat editorial from a year or two ago on all this here [freshmeat.net]
Shortcomings (Score:5, Insightful)
Having worked on applying GAs to multi-objective optimization, I don't belive that this technique can be used effectively to optimize most programs.
The main issue is to compare the individuals generated by the genetic algorithm. To do so, we need to both compile the program under the specified settings, and then to be able to benchmark its performance. In my current job, a full build of our main product takes over 12 hours on an 8 CPU machine. Using a pretty conservative estimate, 100 generations with 100 individuals each, we'd be talking about more than a year of CPU time for a single run of the algorithm!
Even if we could ignore the amount of time required for compilation, we still have a second, more important flaw: Most programs out there are not really that easy to benchmark automatically. Database applications might need to go back to a known DB state to be able to run the benchmark faily. Also, server apps need to have the exact same load for every test if we want to be able to compare two executables fairly. This problem is increased when many compiler options will just create a 1% performance improvement or so. A poorly run system could lead the comparison function to just pick the wrong executable if the two executables didn't run in the exact same conditions.I see how using GE for this task has a high coolness factor, and how it might even be usable for applications that are by nature easier to benchmark, but don't expect this technique to be applicable to enterprise-sized server applications, or even most GUI based apps any time soon.
Re:Shortcomings (Score:3, Funny)
100 million LOC, really crappy makefiles, or eight 60MHz CPUs?
Re:Shortcomings (Score:3, Insightful)
a full build of our main product takes over 12 hours on an 8 CPU machine
Err.. ever heard of rule that 5% of code does 99% percent of work?
If your application does 'diverse' kind of things (you mention DB's), then most probably when optimizing one module, you're hurting others. The obvious solution is to optimize module by module. Starting from the most abysmally slow, your build environment.
Re:Shortcomings (Score:2, Interesting)
True, but you can still get some improvement with a procedure like this:
Not the point (Score:3, Informative)
Good, because that's not the original intent of the project. When he first contacted the GCC maintainers about the project which would later be called Acovea, the idea was to see which underlying -f/-m options shod be turned on by the -O? options by default. More and more projects using 3.2 and 3.3 were using "-Osomething -fno-something -msomething" so he
Correct code? (Score:2)
Code-generation bugs (Score:2)
Of course, buggy code is more likely actually to fail under aggressive optimization. I've certainly had to maintain lots of buggy code, although of course I never write any of it myself. (Did you know t
Re:Code-generation bugs (Score:2, Informative)
union { char c[8]; double d; } u;
u.d = 1.0;
char c = u.c[1];
yields undefined results? The compiler is allowed (and under -O3, encouraged) to generate code that will erase your disk and impregnate your sister.)
Sorry, you're wrong, for two reasons:
- A char* can be used to point to anything.
Re:Code-generation bugs (Score:2)
Yes, a char* can point to anything. However, actually reading from uninitialized storage has undefined behavior. Writing a different member of a union doesn't count as initializing. Unions really cannot safely be used for type punning.
Casting a pointer, as you say:
would be the correct way achieve that end
Re:Code-generation bugs (Score:2)
Sorry, that's totally false. (I hope you don't get paid for writing C like this!)
It doesn't just have "unspecified results", it is "undefined". The compiler is allowed to give you the second (or seventh!) byte of the double, but if you say -O3, you're asking it to generate the fastest code that is consistent with the standard. The fastest code, in th
Speed Racer (Score:2)
The moral of the story? Just because you're running Gentoo and you've compiled everything yourself doesn't mean you have the fastest possible system. You actually have to know what you're doing, too.
[1] http://www.debian.org/News/weekly/2003/44/
Fix the Compilers (Score:3, Insightful)
Anybody remember Green Hills C Compiler (Score:3, Interesting)
Re:Anybody remember Green Hills C Compiler (Score:2)
We looked at some of the code once, and it stored some (moderately complex) result in a register, and we couldn't figure out why, until about four pages of ASM later, it re-used it!
We were very impressed....
Beyond -O3 (Score:2)
How will the synthetic algorithm determine if the app is for a server or a workstation ? And how to handle dual purpose machines ?
Re:Beyond -O3 (Score:2)
Optimization used to be all about straight-line execution and loop unrolling. These days, branch prediction pretty much obsoletes both. It's far more important to keep code size down and maintain ca
The Golden Rule of Optimization (Score:3, Insightful)
Optimizers change the constants in the algorithm running time. They cannot change the algorithm's order.
With the exception of high-performance computing, no one needs a super-optimizing compiler any more. Instead, they need to (re-)learn basic algorithm and data structure theory.
And your point? (Score:4, Insightful)
While I don't disagree with your observations, I'm confused as to what they have to do with Ladd's project and the results shown.
Okay, find, I've written my decent quicksort. I know my basic algorithms and data structures. Now I would very much like to squeeze every last drop out of them. Here's a project that will tell me a good combination of options to use as a starting point. Don't tell me that I should just ignore all that and go re-re-re-learn basic algorithms. GCC is used by more than students.
More than that, it will tell the GCC maintainers which options should be on by default which currently might not be, and vice versa.
Re:The Golden Rule of Optimization (Score:2, Insightful)
Sounds kind of scary (Score:2)
Interesting but that sounds kind of scary. Once you attempt to find optimizations that actually try to find other equivalent algorithms, you start treading into the dangerous realm of undecidability and Turing completeness. Consider what optimizing Turing-complete systems like partial recursive functions or their equivalent Turing machines entails. The first, and most important thing you need to do is decide whether or not some Turing machine runs faster (or more generally has a better performance metric
You're being too pessimistic (Score:2)
Other ideas (Score:4, Interesting)
This gets me thinking about Nat Friedman's [nat.org] GNU-rope (Grope [nat.org]) project. I heard him talk [lwn.net] at ALS back in '98 and then the project seemed to completely disappear. Searching on "gnu.rope" [google.com] leads to a few mailing list postings asking "where'd it go", but no good information about the project.
The basic idea was to reorder the functions in an executable so that locality of reference was maximized and cache hits were increased. The result is less paging and better performance and memory usage.
The really interesting bit is that the optimization is based on the usage of the program being optimized-- that is to say that my Grope-optimized version of Mozilla might be different than yours based on my differences in usage (i.e. perhaps I browse with images turned off, etc).
The tie-in to the article here is that Nat's system, Grope, used simulated annealing to traverse the n-dimensional space of potential function arrangements and profiled the memory paging of the application as a fitness function of the new arrangement. It's not a GA-- but it's functionally similar.
So-- anybody know what happened to Grope? I'd imagine that a community would spring up around it fairly quickly given the relatively high number of "performance zealots" who are busy "tricking out" their Linux boxes by compiling everything from scratch (think Gentoo) optimized for their procesor. Now they can add the "spolier" or "racing stripe" of having exectuables specifically re-ordered for their personal usage patterns! *smirk*
Re:Other ideas (Score:4, Informative)
-bloo
Atlas :: empirically optimized blas/lapack (Score:3, Informative)
as make up the core of much custom scientific computing packages (or even libraries like the "Gnu Scientific Library")
Basically the jist is atlas "empirically" (read: use an optimizer for instance like GA, though empirical may actually mean brute force in this case) to optimize various parameters that will affect things like optimize the routine for the cache size of the processor etc. The cool thing about this, is they can get w/in 10% of hand machine coded BLAS/LAPACK libraries w/out the pain!
Re:Atlas :: empirically optimized blas/lapack (Score:2)
-avi
Of course (Score:2)
Simulated Annealing for Optimization (Score:3, Interesting)
In other words I randomly swapped pairs of instructions that could be swapped, if it saw an improvement it kept it and if it saw a degradation it rejected it with a probability depending on how bad it made things.
After hours of running it would sometimes trim a clock cycle or two per loop off a short assembler routine.
Good solution to the wrong problem (Score:2, Insightful)
For programmers who are no longer resource limited, using a language that optimizes for the human rather than the computer is a good idea.
Re:Good solution to the wrong problem (Score:2)
Actually we're not. Programmers are ten a penny , but a large computer system costs a fortune. If a company can get as much performance out of a
machine as it can in order to hold off on an upgrade they will. YOU may not work in a enviroment where performance matters
but I can assue you that a whole lot of us still do.
Re:I have a better idea (Score:3, Informative)
A scheme for automating optimization choices is not fragile (when the code changes, just re-run the optimizer) and portable (when compiling on another machine, just re-run the optimizer). Beats assembly hands down.
Re:I have a better idea (Score:2, Interesting)
Perhaps, someone should consider a GA for autoconf...
Reliability, and porting to Macintoshes... (Score:5, Insightful)
More importantly, tweaking code for heavy optimization is not usually a good job for humans. It's fine if you've got a piece of hardware that you have to tweak the last few percent performance out of for an application that will run for many CPU-years, but you're almost always much better off if you
Re:GAs wrong tool for the job (Score:2)
No, GAs make great sense... (Score:4, Interesting)
Consider the case in huffbench when
-fmove-all-movables, -freduce-all-givs and -ftracer were discovered to result in great improvements when used with the other GA-selected optimizations -- but not when applied directly to -O3.
Sure, these situations may be specific to the test cases -- but that's not to say his software can't be used to evaluate real-world scenarios as well, such that companies doing a final release compile of their code let a GA churn for a few days to determine the best flags to use.
That's a Good Thing, no?
Re:GAs wrong tool for the job (Score:2)
Re:GAs wrong tool for the job (Score:3, Funny)
Sorry, couldn't resist
Re:GAs wrong tool for the job (Score:4, Insightful)
That's exactly the point. The goal is not to find a set of gcc switches that will work best on any program on any supported platform. The goal is to find the set that works best with this particular input, without human guidance.
10% or 20% improvements in compiler performance are of largely academic interest only these days most software's time is spent waiting on user input, disk IO, or network activity anyway?
First of all, what if you can get the 10% or 20% basically for free? That is, simply run your compile for a few unattended days to get the optimization? What is not worth human tweaking time can easily be worth machine time. Gentoo, although I do not use it myself, appears to be a vibrant Linux distribution based exactly on this point.
Secondly, most existing software are running on small CPUs all around you, not on your desktop. The constraints of memory usage and CPU power are still very real for them.
Finally, consider also that more efficient software may result in more idle hardware and lower power consumption. Imagine if all PCs in the world can get by with 5% less electrical power.
Re:GAs wrong tool for the job (Score:2)
[TMB]
Re:GAs wrong tool for the job (Score:2)
--
Benjamin Coates
Re:Optimized Code? (Score:2)
Then you shouldn't be using C#1
Re:Optimized Code? (Score:2)
Re:Genetic mumbo-jumbo (Score:2)
Certainly genetic algorithms have been over-hyped by some people; the same thing happens to any emerging technology.
In practical terms, genetic algorithms have proven themselves valuable in searching and optimization routines; they've been used to design optimal jet wings, traffic flows, and to find shortest paths through networks. GAs are not the ultimate programming concept -- but they are very useful for certain problems.
Re:Genetic mumbo-jumbo (Score:2)
There has been lots of research done on what class of problems evolutionary algorithms are good at - google for it or look on citeseer!