Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Genetic Algorithms and Compiler Optimizations 222

mfago writes "Scott Robert Ladd has written an enlightening article and accompanying software package that utilizes genetic algorithms to optimize the optimization flags fed to a compiler. Those who have tried to squeeze the last drop of performance from a code know that it can be very difficult to determine the set of optimization flags beyond -O3 (with gcc for example) that yields the best performance. Lots of background links included in the article."
This discussion has been archived. No new comments can be posted.

Genetic Algorithms and Compiler Optimizations

Comments Filter:
  • by Thinkit3 ( 671998 ) * on Tuesday November 18, 2003 @06:01PM (#7505877)
    Toss around terms like genetic algorithm and neural networks, and some are dazzled by the elegance and simplicity of it. Well, then there's reality, which is often neither elegant nor simple. In Go, there is a computer player that utilizes neural networks, yet it is near the middle of the pack.

    A good dose of skepticism should accompany any examination of the dazzingly elegant solution.
    • by Anonymous Coward on Tuesday November 18, 2003 @06:07PM (#7505940)
      Yes, a good dose of skepticism should accompany such things... however, this particular example is one in which the search-space is finite, and you're using a genetic algorithm to cleverly navigate this space.

      Neural networks are good when there is a massive amount of unknowns to deal with, and you're better off handling any patterns you can detect, but this is just a simple case of using a new(-ish) method to compare runs with (nearly?) all parameters. Nothing to be skeptical about, really.
      • by Dachannien ( 617929 ) on Tuesday November 18, 2003 @10:22PM (#7507763)
        What we should be most skeptical about is that Mr. Ladd wanted to study the effects of using a GA on optimization, yet he neglected to run a control sample involving random perturbation of options from the start case (-O1 or -O3), without selection.

        If, as I suspect, the search space is disorderly, then random perturbation should, among the same number of test compilations, be able to find a solution comparable to that which his GA finds.

    • by tessaiga ( 697968 ) on Tuesday November 18, 2003 @06:11PM (#7505981)
      In Go, there is a computer player that utilizes neural networks, yet it is near the middle of the pack.

      Wait, I'm confused; I thought on Slashdot we're supposed to wait for someone to mention Chess before we bring up Go?

    • by koekepeer ( 197127 ) on Tuesday November 18, 2003 @06:12PM (#7505993)
      anyways think about the overkill of developing such an elegant solution for such a minor problem. i mean those few percentages of performance enhancement...

      anyway, i think it *is* really cool this guy finds it interesting to tackle this 'problem' with a genetic algorithm. why not try some bayesian networks next? support vector machines? improbability drives...

      ooops, i'm getting carried away. sorry 'bout that. /me bows in admiration

      (not intended as flamebait but mod me to hell anyway *evil grin*)
      • by mfago ( 514801 ) on Tuesday November 18, 2003 @06:22PM (#7506085)
        anyways think about the overkill of developing such an elegant solution for such a minor problem. i mean those few percentages of performance enhancement...

        I'm sure some people would love a few percent improvement in their SPEC results...

        Personally, I have an algorithm that I'm running hundreds of millions of times as part of a larger calculation for my PhD research. I'd love a "few percent" improvement simply by changing the compiler flags.

        Acovea is curently a research tool in its own right, and not quite ready for application to general problems. Some issues that require modest effort to resolve:

        1) Code currently is pentium3/4 centric
        2) Optimizes flags for gcc 3.4 or intel's compiler
        3) Cannot easily handle code residing in more than one file.

        • Alternatively:- Add up the number of hours spent running optimisations on compiler flags, deduct actual time savings, times hourly rate and then see if it would be cheaper to add (scrounge) some more processors....
    • Neural networks != genetic algorithms

      Slow Go player != inelegant solution

      I was going to dissagree with you, but I'm not even sure what your arguments are. Do you have any reason to believe that the solution for the problem discussed in the article is inelegant or innefficient, or did you just get a bad grade in your AI class?
    • by pnatural ( 59329 ) on Tuesday November 18, 2003 @06:18PM (#7506048)
      Because you have a single data point from a related problem domain (that does not even use the same solution!) and the single data point does not perform well, the concepts presented in this article are what? Flawed?

      I suspect you either do not understand genetic algorithms, or that you do not have experience in a problem domain where GA could have been successfully used.

      Just because you don't understand GA doesn't mean that GA isn't applicable to this problem. I recently had a problem to solve similar to the problem outlined in the article (optimizing results with N variables, where N > 5) and the GA solution beat the pants off the brute-force solution.

      Note that I'm actually not saying that GA is a panacea; rather, I'm saying that for some problem domains, like the one in the article, it makes a whole lot of sense.

      • Someone once said: "A genetic algorithm is the second best solution to any problem"
      • Yah, I agree.. I've been noticing many /.'ers seem to have little-to-no idea what "genetic algorithm" means, and don't understand the first thing about genetic optimization techniques. Oh well. I appreciate the article, anyway :)
      • [i]I recently had a problem to solve similar to the problem outlined in the article (optimizing results with N variables, where N > 5) and the GA solution beat the pants off the brute-force solution.[/i]

        I would indicate that Ladd's article sets up an invalid comparison between brute force and his GA search, because he seeds his search with the flag set for -O1 or -O3.

        I'm not saying he should make the GA start from scratch. I'm saying that he should permit brute force to start from -O1 or -O3 and pertu
        • It should be possible to apply the same technique to other performance-tuning things like buffer sizes, or even choosing between two different algorithms for the same task. Perhaps a program could have a single header file performance.h containing // buffer size for reading
          const size_t read_bufsiz = 1000; // amount of new memory to allocate in a chunk
          const size_t chunk_size = 5000; // threshold for switching from quicksort to // merge sort when you have this few elements //
          const unsigned int
    • Beware the kneejerk skeptic. :)
    • In Go, there is a computer player that utilizes neural networks, yet it is near the middle of the pack.

      All the top of the pack human players use neural networks. Your point?

    • by JaredOfEuropa ( 526365 ) on Tuesday November 18, 2003 @07:20PM (#7506549) Journal
      Neural networks are well-suited for tweaking the (possibly numerous) parameters of an established algorithm. It's good for finding good local maxima on n-parameter functions. Another technique for this is simulated annealing.

      It's the cleverness of the algorithm that makes a computerised Go player good. Using a simple stinky algorithm and tweaking its parameters isn't going to turn a mediocre Go player program into a great one.

      As always, you pick the solution that fits the problem.
    • A good dose of skepticism should accompany any examination of the dazzingly elegant solution.

      Have you read the article? Personally I think it's an interesting method to determine, for a specific program or algorithm, what compiler options produce the fastest results. Survival of the fittest certainly works in this case.

      Sure, he didn't provide a specific set of optimization flags that are "best", but that's because it's highly implementation-specific. To imply that it's all about "buzz-words" indicates yo
  • by Anonymous Coward on Tuesday November 18, 2003 @06:03PM (#7505894)
    That is, fiddle with settings at random until it's acceptable?
  • by Anonymous Coward on Tuesday November 18, 2003 @06:04PM (#7505899)
    It turns out that hand tuning will ALWAYS get you the best optimizations, but human nature dictates that a certain portion of optimizations will never be found. Genetic algorithms will find some of these optimizations in a reasonable time, but not all. So the bottom line is, no code can ever be 100% optimized, unless it is compiled for weeks.
    • This is an interesting sounding post, but I just can't make any of it make sense. "It turns out that hand tuning will ALWAYS get you the best optimizations, but human nature dictates that a certain portion of optimizations will never be found. Genetic algorithms will find some of these optimizations in a reasonable time, but not all. So the bottom line is, no code can ever be 100% optimized, unless it is compiled for weeks."
      If genetic algorithms can find better optimizations than hand tuning (which has been
    • It turns out that hand tuning will ALWAYS get you the best optimizations

      Given the rest of your post, I think you meant, "that hand waving will always get you the best optimizations."
  • I met SRL when I was teenager visiting Gunnison Colorado for the Rocky Mountain SOG. That was over 15 years ago. He was well into GA's then. Good to see he's still at it!
  • by millette ( 56354 ) <robin@@@millette...info> on Tuesday November 18, 2003 @06:05PM (#7505915) Homepage Journal
    This is a very neat application of genetic algoriths! Haven't seen something so interesting since this Tron cycle game [brandeis.edu].
  • Skepticism (Score:5, Interesting)

    by wdavies ( 163941 ) on Tuesday November 18, 2003 @06:07PM (#7505936) Homepage
    It actually seems like a good idea.Some problems I can imagine are optimization flags may have non speed related side-effects?

    However, this seems like a great candidate for GA's - a fitness function (ie speed of execution), and a nice binary vector of atributes (flags).

    Interesting that the gains aren't that great it would seem to sugges that the flags don't do much :) or maybe the base-line is already close to optimal, and so it isnt a hard problem?

    But Kudos nevertheless. No expert in the GA field, so its possible someones tackled this before, but if they haven't, extra Kudos.

    Winton

  • by kisrael ( 134664 ) * on Tuesday November 18, 2003 @06:08PM (#7505949) Homepage
    An important thing to note is that this isn't coming up with new optimizations, but rather finding the optimal mix of pre-existing gcc optimizations.

    It's an important distinction. When you're trying to do breed some really new system using Genetic Algorithms, it's possible to get some very fragile results, i.e. systems that do the job incredibly efficiently, but only under certain conditions...

    For instance, I remember reading some folks were trying to "breed" a circuit using Genetic Algorithm like techniques that created a "pulser" using as few gates as possible. They got something that used like one tenth the number of logic gates of the previous low record...but didn't work when they moved the setup away from their desktop PCs. Turned out what they "discovered" was a sensitive RF receiver that picked up (I think) something like the normal 120V cycling, and utilized. Similarly, a lot of the bred ciruits are extremely sensitive to temperature and other conditions.

    So breeding for efficiency can be a bit dangerous, you get things that rely on odd side-effects and environmental states. Though I think the method the article describes isn't much more dangerous than normal optimization selection, though admittedly more likely to stumble over some unusual flag combination that might break under certain circumstances.

    In short, *if* GA is ever going to let software development become more like gardening than engineering, (something I've herd as a goal) we're going to have to find ways to apply very general evolutionary pressures, complex test environments.
    • When you're trying to do breed some really new system using Genetic Algorithms, it's possible to get some very fragile results, i.e. systems that do the job incredibly efficiently, but only under certain conditions...

      This is not a problem of the Genetic Algorithm in general, this is the problem of the fitness function: The GA always tries to produce something wich gets the best out of the fitness function (e.g, a pulser with as few gates as possible). If the fitness function also uses the stability of the r

      • Unfortunately, its very difficult to automate the fitness function to include "stability" in GA circuit design. Even a binary "does it still work when I move it five feet from this CRT?" is time consuming, especially when you're planning on running large populations or generations.

        The fact that these GAs are exploiting electric interference and other phonomena that aren't easily applied to design by humans to achieve new and lower cost designs isn't unexpected. But I don't envy the people whose job is to p
    • The "folks" you refer to is Professor Adrian Thompson of the University of Sussex. A paper describing his interesting experiment can be found here [nec.com]. It was actually a flawed FPGA chip he was programming.

      Another example of this tendency of Genetic Algorithms to make use of helpful "flaws" in their environments can be found in the works of Karl Sims [genarts.com]. A round-off error in his physics model resulted in some weird locomotion by a branch of virtual creatures.

      You will find details of both examples in this entry [hjalli.com]

    • It justs goes to show that the scientific method is still very important even with genetic engineering methods. One cannot rely on experiments happening in a single environment, rather they must run the GA to breed circuits in several labs around the world and diversify internal environments. Possibly, a local GA could try to breed optimal circuits and then a global GA would take the results of all the labs' local GAs and re-GA it to balance the system.
  • ... :P (Score:2, Funny)

    by rylin ( 688457 )
    Sadly, these genetic algorithms only work for blue-eyed developers
  • The concept is good in that it can take a predefined set of available optimizations and "figure out" (via some AI magic) optimal settings. This is good.

    But, a more interesting problem might be more of a meta-compiler problem. Sure, we can optimize a set of options that we humans have determined/guessed are most influential in the building of efficient code. But what about running an analysis on the options themselves?

    If we want to get real optimization, why not apply some data-mining theory to the
  • ... now it's not that the compiler is too slow on the computer, but the super-optimiser that runs the compiler 2000 times on the code to get the speed as fast as possible, that runs too slow on the computer...

    That's all right then :-)

    Simon
  • by MrDelSarto ( 95771 ) on Tuesday November 18, 2003 @06:18PM (#7506051) Homepage
    Todd Proebsting has a "law" like Moore's law ... from his webpage
    Moore's Law" asserts that advances in hardware double computing power every 18 months. I (immodestly) assert "Proebsting's Law", which says that advances in compiler optimizations double computing power every 18 years.

    There is more information on his webpage [microsoft.com]. It's not strictly true -- compilers have moving targets (different architectures and hardware optimisations over time, greater complexity in languages, etc etc), and for important things like scientific applications compilers can really optimise code; but in general R&D towards compiler development seems to be sorely lacking compared to other areas.

    I work on IA64, which is fundamentally designed to take advantage of smarter compilers. While there are some interesting projects (ORC [sourceforge.net] for example) nothing is really close to ready to take over gcc/intel compiler. We really want something that compiles once, can profile code execution and then recompile to optimise based on the execution profile; something along the lines of what this article talks about but built right into the compiler.
    • We really want something that compiles once, can profile code execution and then recompile to optimise based on the execution profile; something along the lines of what this article talks about but built right into the compiler.

      Sun's compiler had this for years. From their man-page:

      -xprofile=p Collects data for a profile or use a profile to optimize.

      p must be collect[:name], use[:name], or tcov. This option causes execution frequency data to be collected and saved during execution, then

    • gcc has something similar with -fprofile-arcs and so on, that can be used to supply branch prediction hints etc. after the profile run.
  • Why use a GA? (Score:4, Interesting)

    by eclarkso ( 179502 ) on Tuesday November 18, 2003 @06:20PM (#7506067)
    In general, the literature suggests that GAs tend to be a kind of 'jack of all trades, master of none' type approach. See, e.g., Russell & Norvig's AI: A Modern Approach textbook.

    As such, is there any justification for using a GA to do this rather than, say, simulated annealing? I'd would be interested to see a comparison of the two approaches. Especially in light of papers like this one [helsinki.fi].

    • The first rule of evolutionary techniques is: if there is a better-known, more specific optimization technique for the problem at hand, use it instead. In this case, there wasn't one.

      Simulated annealing has a lot in common with GAs, but they tend to not do so well on multidimensional discontinuous fitness landscapes.

      I'm sure annealing could eventually be used to come up with a solution; I'm just amazed whenever somebody does something cool with Tool X, and all of slashdot immediately jumps all over i

  • by Animats ( 122034 ) on Tuesday November 18, 2003 @06:22PM (#7506080) Homepage
    You could probably get equally good results with plain hill-climbing. Turn on all the optimizations. Then turn them off one at a time and see if things get better. Repeat until no improvement is observed.

    Any time you consider using a broad-front search algorithm like a GA, a neural net, or simulated annealing, try plain-hill climbing first. If you try any broad-front search, compare it with plain-hill climbing. Only if the space being searched is dominated by local maxima (but not too many of them) do the broad-front algorithms help. And they're always slower.

    If this guy had demonstrated that a GA did better than a plain hill-climber, he'd have an interesting result. But he hasn't demonstrated that.

    • by SmackCrackandPot ( 641205 ) on Tuesday November 18, 2003 @06:49PM (#7506295)
      You could probably get equally good results with plain hill-climbing. Turn on all the optimizations. Then turn them off one at a time and see if things get better. Repeat until no improvement is observed.

      It is known that some optimisations can disable others. For example, "merge frequently called functions with parent" will penalise "store frequently used variables in registers" or "give priority to inner-most loop variables in registers". You could certainly build on the research in this article. What are the best types of optimisations for different types of algorithm (B-tree traversal, matrix-vector mathematics, text-editing). Does compiling different modules of an application provide improved performance over compiling all modules with the same set of flags?
      • Does compiling different modules of an application provide improved performance over compiling all modules with the same set of flags?

        I would fear enabling different flags for different modules, except perhaps flags that are explicitly guaranteed to be only local in scope. Even then, I'm not so sure, as I've run into optimization-dependent bugs in programs that are a real PITA.
    • You could probably get equally good results with plain hill-climbing.

      Hill climbing algorithms are more susceptible to getting trapped in a local minima than genetic algorithms are. GAs tend to avoid this because of the cross-over breeding that occurs.

    • You could probably get equally good results with plain hill-climbing.

      This problem is complex enough (the combinations you need for good optimization of your code are fragile when other chunks of options are taken out) for the hillclimber to get easily stuck in a local minimum. It's probably a good idea to use something different than the local search, and maybe add some annealing, tabus, or populations, and I think the choice of a GA is very suitable for the problem.

      If this guy had demonstrated that a
  • by Kourino ( 206616 )
    I have a bone to pick with the summary. -O3 probably won't help your performance in the first place, and will likely degrade it. -O2 or -Os, with maybe a few -mxxxx or -fxxxx flags, would be better. Of course, optimizing your code by making it not do stupid things is the best. (Though compiling with -O2 is pretty much standard and a good thing.)
    • -O3 probably won't help your performance in the first place, and will likely degrade it.

      Yes, you're right: -O3 is simply -O2 with -finline-functions and -frename-registers, neither of which are likely to make much difference and -finline-functions could actually slow things down.

      There's a fantastic freshmeat editorial from a year or two ago on all this here [freshmeat.net]
  • Shortcomings (Score:5, Insightful)

    by hibiki_r ( 649814 ) on Tuesday November 18, 2003 @06:31PM (#7506155)

    Having worked on applying GAs to multi-objective optimization, I don't belive that this technique can be used effectively to optimize most programs.

    The main issue is to compare the individuals generated by the genetic algorithm. To do so, we need to both compile the program under the specified settings, and then to be able to benchmark its performance. In my current job, a full build of our main product takes over 12 hours on an 8 CPU machine. Using a pretty conservative estimate, 100 generations with 100 individuals each, we'd be talking about more than a year of CPU time for a single run of the algorithm!

    Even if we could ignore the amount of time required for compilation, we still have a second, more important flaw: Most programs out there are not really that easy to benchmark automatically. Database applications might need to go back to a known DB state to be able to run the benchmark faily. Also, server apps need to have the exact same load for every test if we want to be able to compare two executables fairly. This problem is increased when many compiler options will just create a 1% performance improvement or so. A poorly run system could lead the comparison function to just pick the wrong executable if the two executables didn't run in the exact same conditions.

    I see how using GE for this task has a high coolness factor, and how it might even be usable for applications that are by nature easier to benchmark, but don't expect this technique to be applicable to enterprise-sized server applications, or even most GUI based apps any time soon.

    • by pmz ( 462998 )
      12 hours on an 8 CPU

      100 million LOC, really crappy makefiles, or eight 60MHz CPUs?
    • Re:Shortcomings (Score:3, Insightful)

      by psavo ( 162634 )

      a full build of our main product takes over 12 hours on an 8 CPU machine

      Err.. ever heard of rule that 5% of code does 99% percent of work?

      If your application does 'diverse' kind of things (you mention DB's), then most probably when optimizing one module, you're hurting others. The obvious solution is to optimize module by module. Starting from the most abysmally slow, your build environment.

    • Re:Shortcomings (Score:2, Interesting)

      by Anonymous Coward

      In my current job, a full build of our main product takes over 12 hours on an 8 CPU machine. Using a pretty conservative estimate, 100 generations with 100 individuals each, we'd be talking about more than a year of CPU time for a single run of the algorithm!

      True, but you can still get some improvement with a procedure like this:

      1. Profile the code
      2. Identify the bottleneck (probably less than 1% of your source, based on what you say)
      3. Apply the genetic algorithm to that 1%, and find the best optimizati
    • Not the point (Score:3, Informative)

      by devphil ( 51341 )

      Having worked on applying GAs to multi-objective optimization, I don't belive that this technique can be used effectively to optimize most programs.

      Good, because that's not the original intent of the project. When he first contacted the GCC maintainers about the project which would later be called Acovea, the idea was to see which underlying -f/-m options shod be turned on by the -O? options by default. More and more projects using 3.2 and 3.3 were using "-Osomething -fno-something -msomething" so he

  • I've always found the largest problems with optimizers to be that they sometimes generate incorrect code. He doesn't seem to test for this. Of course, maybe the benchmarks are smaller/less kludgy than the n x 10,000 line codes I've tried to optimize with flags.
  • To me, the most interesting facts about the various optimization options are which ones introduce bugs into my code. Sometimes -O2 introduces code-generation bugs that -O3 doesn't. Sometimes -O3 yields warnings that lead to fixing bugs somebody put in the source code, but which don't show up in testing.

    Of course, buggy code is more likely actually to fail under aggressive optimization. I've certainly had to maintain lots of buggy code, although of course I never write any of it myself. (Did you know t

    • by RML ( 135014 )
      Of course, buggy code is more likely actually to fail under aggressive optimization. I've certainly had to maintain lots of buggy code, although of course I never write any of it myself. (Did you know that

      union { char c[8]; double d; } u;
      u.d = 1.0;
      char c = u.c[1];

      yields undefined results? The compiler is allowed (and under -O3, encouraged) to generate code that will erase your disk and impregnate your sister.)


      Sorry, you're wrong, for two reasons:
      - A char* can be used to point to anything.
      • RML claims, "Sorry, you're wrong, for two reasons:
        • A char* can be used to point to anything.
        • You can write to one member of a union and read from another."

        Yes, a char* can point to anything. However, actually reading from uninitialized storage has undefined behavior. Writing a different member of a union doesn't count as initializing. Unions really cannot safely be used for type punning.

        Casting a pointer, as you say:

        double d = 1.0; int i = *(int *)&d;

        would be the correct way achieve that end

  • Interestingly, I read a story in Debian Weekly News [1] where someone noticed that sometimes compiling a C++ program with -O3 is worse than with -O2.

    The moral of the story? Just because you're running Gentoo and you've compiled everything yourself doesn't mean you have the fastest possible system. You actually have to know what you're doing, too. :o)

    [1] http://www.debian.org/News/weekly/2003/44/
  • Fix the Compilers (Score:3, Insightful)

    by Gothmolly ( 148874 ) on Tuesday November 18, 2003 @06:43PM (#7506236)
    The compiler ought to optimize this stuff anyway! How about "set suck=(yes|no)" and "prefer=(speed|size)" as options. Let the compiler and its designers come up with what is required to meet (or approach) a programmer's wishes.
  • by BanjoBob ( 686644 ) on Tuesday November 18, 2003 @06:45PM (#7506251) Homepage Journal
    The Green Hills C compiler had all kinds of optimization techniques but they also had a way to measure the assembler output. One method would eliminate all symbols and make huge code that ran inline. The code was fast but used up a lot of disk space. They had many optimization options and dozens of flags that could be set so that the compiler would actually give you what you wanted. Analyzing the assembly code was a great way to learn the ins and outs of their compiler.

    • I loved that compiler! The 68030/68882 compiler did some incredible optimizations!

      We looked at some of the code once, and it stored some (moderately complex) result in a register, and we couldn't figure out why, until about four pages of ASM later, it re-used it!

      We were very impressed....
  • Beyond -O3 we have -OS and -O2. Both generates better codes on workstations. And -OS might be the best on Celerons and VIA C3 CPUs.

    How will the synthetic algorithm determine if the app is for a server or a workstation ? And how to handle dual purpose machines ?
    • Considering that (for gcc at least) the only major difference between -O2 and -O3 is enabling function inlining and loop unrolling, that makes sense. -Os helps because small code is more likely to fit in cache. Similarly, -O2 outperformed -O3 because it doesn't inline functions and therefore increase code size.

      Optimization used to be all about straight-line execution and loop unrolling. These days, branch prediction pretty much obsoletes both. It's far more important to keep code size down and maintain ca

  • by rhysweatherley ( 193588 ) on Tuesday November 18, 2003 @07:05PM (#7506441)
    Never forget the Golden Rule of Optimization: a quicksort compiled with no optimization will invariably beat the pants off a bubblesort compiled with the best optimizer in the universe.

    Optimizers change the constants in the algorithm running time. They cannot change the algorithm's order.

    With the exception of high-performance computing, no one needs a super-optimizing compiler any more. Instead, they need to (re-)learn basic algorithm and data structure theory.

    • And your point? (Score:4, Insightful)

      by devphil ( 51341 ) on Tuesday November 18, 2003 @08:54PM (#7507270) Homepage


      While I don't disagree with your observations, I'm confused as to what they have to do with Ladd's project and the results shown.

      Okay, find, I've written my decent quicksort. I know my basic algorithms and data structures. Now I would very much like to squeeze every last drop out of them. Here's a project that will tell me a good combination of options to use as a starting point. Don't tell me that I should just ignore all that and go re-re-re-learn basic algorithms. GCC is used by more than students.

      More than that, it will tell the GCC maintainers which options should be on by default which currently might not be, and vice versa.

    • Perhaps you should re-learn basic algorithm and data structure theory, because bubblesort can beat the pants off quicksort when the data set is small enough. Anyway, if you're going to implement quicksort, don't you want a 20-30% faster quicksort?
  • Other ideas (Score:4, Interesting)

    by adjuster ( 61096 ) on Tuesday November 18, 2003 @07:16PM (#7506519) Homepage Journal

    This gets me thinking about Nat Friedman's [nat.org] GNU-rope (Grope [nat.org]) project. I heard him talk [lwn.net] at ALS back in '98 and then the project seemed to completely disappear. Searching on "gnu.rope" [google.com] leads to a few mailing list postings asking "where'd it go", but no good information about the project.

    The basic idea was to reorder the functions in an executable so that locality of reference was maximized and cache hits were increased. The result is less paging and better performance and memory usage.

    The really interesting bit is that the optimization is based on the usage of the program being optimized-- that is to say that my Grope-optimized version of Mozilla might be different than yours based on my differences in usage (i.e. perhaps I browse with images turned off, etc).

    The tie-in to the article here is that Nat's system, Grope, used simulated annealing to traverse the n-dimensional space of potential function arrangements and profiled the memory paging of the application as a fitness function of the new arrangement. It's not a GA-- but it's functionally similar.

    So-- anybody know what happened to Grope? I'd imagine that a community would spring up around it fairly quickly given the relatively high number of "performance zealots" who are busy "tricking out" their Linux boxes by compiling everything from scratch (think Gentoo) optimized for their procesor. Now they can add the "spolier" or "racing stripe" of having exectuables specifically re-ordered for their personal usage patterns! *smirk*

    • Re:Other ideas (Score:4, Informative)

      by bloosqr ( 33593 ) on Tuesday November 18, 2003 @07:45PM (#7506734) Homepage
      I think this is actually what most two pass compilers will do. The intel compilers [intel.com] have this option for instance. Basically you compile w/ profiling on and on the second pass it using the runtime profiling data to recompile source to a new object code. (btw the intel compilers are "free" for opensource/nonprofit use). W/ regards to using GA/SA as part of the optimization it would be (to be honest) a bit weird to have a nondeterministic compiler since in different runs you may end up w/ different optimizations.

      -bloo

  • by bloosqr ( 33593 ) on Tuesday November 18, 2003 @07:38PM (#7506686) Homepage
    There is a variation of this that is very clever that optimizes BLAS [netlib.org] and LAPACK [netlib.org]routines "empirically" called ATLAS [sourceforge.net] that has been around a while, originally (and perhaps still) a research project by err Clint Whaley from tennessee (BLAS/LAPACK are numerical routines that do a slew of thing people generally find useful linear algebra and vectors). These routines will often time sit "under" many mathematical packages like matlab/octave/maple/mathematica/scilab as well
    as make up the core of much custom scientific computing packages (or even libraries like the "Gnu Scientific Library")


    Basically the jist is atlas "empirically" (read: use an optimizer for instance like GA, though empirical may actually mean brute force in this case) to optimize various parameters that will affect things like optimize the routine for the cache size of the processor etc. The cool thing about this, is they can get w/in 10% of hand machine coded BLAS/LAPACK libraries w/out the pain!

  • As we all know, if your code is too slow then your compiler isn't smart enough.
  • by exp(pi*sqrt(163)) ( 613870 ) on Tuesday November 18, 2003 @09:37PM (#7507513) Journal
    Years ago I wrote some code that could tell if a pair of instructions 'commuted' in the sense that they could be reordered without affecting the functionality. Eg. "mov eax,1" and "mov ebx,2" are likely to be swappable but "mov eax,2" and "add ebx,eax" almost certainly can't be. I then used simulated annealing to walk through the space of code segments equivalent under commutation in an attempt to find the reordering optimal for the CPU pipeline.

    In other words I randomly swapped pairs of instructions that could be swapped, if it saw an improvement it kept it and if it saw a degradation it rejected it with a probability depending on how bad it made things.

    After hours of running it would sometimes trim a clock cycle or two per loop off a short assembler routine.

  • There are domains where squeezing every last cycle out of the code matters. Many of us no longer work in those domains. Most of the code I've written in the last 4 years was developed and executed in an environment with essentially unlimited cycles, memory, and disk. The only exception was one project that got database bound, and compiler optimization won't help that.

    For programmers who are no longer resource limited, using a language that optimizes for the human rather than the computer is a good idea.
    • "Otherwise, optimize for the human. We're more important than the computer these days"

      Actually we're not. Programmers are ten a penny , but a large computer system costs a fortune. If a company can get as much performance out of a
      machine as it can in order to hold off on an upgrade they will. YOU may not work in a enviroment where performance matters
      but I can assue you that a whole lot of us still do.

Say "twenty-three-skiddoo" to logout.

Working...