Become a fan of Slashdot on Facebook

Inside the Intel Compiler 52

Posted by michael on Thursday February 13, 2003 @12:14PM from the better-but-more-expensive dept.

deadfx writes "Linux Journal is carrying an article authored by members of the Intel Compiler Lab examining specific optimizations employed by the compiler that allowed it to beat the gcc compiler on several benchmarks."

This discussion has been archived. No new comments can be posted.

Inside the Intel Compiler

Load All Comments

Search 52 Comments Log In/Create an Account

Comments Filter:

- Re:Patented? (Score:1)
  
  by LogicFlow ( 643300 ) writes:
  
  I wouldn't be surprised. If they are allowed into GCC, it'll be a day to rejoyce though.
  Crossing my fingers and 'rtfa'ing now.
  - Re:Patented? (Score:1)
    
    by LogicFlow ( 643300 ) writes:
    
    Wow. I doubt the techniques themselfs are patented (again wouldn't be surprised) but the artical seems mostly about what it does, not how. Even if it's useless to GCC, I learned a couple new things. Good artical.
    - Re:Patented? (Score:1)
      
      by larry bagina ( 561269 ) writes:
      
      Even if it's useless to GCC, I learned a couple new things. Good artical.
      You should skip class less often!
- Re:Patented? (Score:5, Insightful)
  
  by 4of12 ( 97621 ) writes: on Thursday February 13, 2003 @12:46PM (#5295051) Homepage Journal
  
  From what very little I understand, gcc has years of infrastructure focused on multiple machine instruction sets.
  
  I'm pleasantly surprised gcc can do as well as it does considering that it can be built and run on some very unusual and dated pieces of hardware (although, from the recent release notes, it looks like some of the most obscure ones are slipping into oblivion.)
  
  Parent Share
  twitter facebook
  - Re:Patented? (Score:1)
    
    by brianjcain ( 622084 ) writes:
    
    From what very little I understand, gcc has years of infrastructure focused on multiple machine instruction sets.
    Absolutely. This is one of gcc's most valuable points, no doubt. Perhaps optimizations like these can be added into a -O flag, or something, though.
    - Re:Patented? (Score:1)
      
      by mattrix2k ( 632351 ) writes:
      
      It already does. :)
  - Re:Patented? (Score:5, Interesting)
    
    by devphil ( 51341 ) writes: on Thursday February 13, 2003 @02:07PM (#5295802) Homepage
    
    Many of the optimizations discussed in the article are being implemented in GCC. Actually, some (e.g., SSA) have been in a parallel development branch for quite a while now, where they can make major funky changes and not have to play by the "no breakage" rules that govern the release branches.
    
    Just last week there was a discussion about OpenMP and how it might be implemented in GCC.
    
    A huge optimization that the article mentioned in the intro (but didn't go into in depth, I think) is intraprocedural optimization. That can be hard in languages that don't compile the entire program at once; you really need help from the linker.
    
    it can be built and run on some very unusual and dated pieces of hardware (although, from the recent release notes, it looks like some of the most obscure ones are slipping into oblivion.)
    
    Absolutely. If nobody's using those platforms, and we want to make a change that might effect them in a negative way -- but we can't tell, because we don't have that machine available to test on -- we have two options: make the change and hope it doesn't break, or deliberately mark the platform as "no one cares, so neither do we".
    
    Parent Share
    twitter facebook
- Re:Patented? (Score:2)
  
  by EvlG ( 24576 ) writes:
  
  Wonderful idea.
  
  "Can someone please steal this and put it in GCC for free???"
  
  Way 2 go.
- GCC limits optimizations due to politics? (Score:2)
  
  by AHumbleOpinion ( 546848 ) writes:
  
  Actually GCC may avoid some optimization for political reasons. I apologize for the lack of details, I only skimmed the discussion, it was a while ago, but it was on /. so it must be competely accurate. ;-)
  
  My fuzzy recollection is that there is a class of optimizations that GCC does not utilize because a required intermediary form of code could be used to circumvent the GPL. Anyone have a better recollection or actual knowledge of the issue?
  - Re:GCC limits optimizations due to politics? (Score:3, Interesting)
    
    by truth_revealed ( 593493 ) writes:
    
    It's a shame that RMS fails to see the advantage of opening up GCC's data structures. Sure there would be some cheaters who'd use it for personal gain (writing proprietary front-ends for the GCC backend) - but there would be ten times as many people creating useful free languages and other free tools with it. It would also greatly aid in the learning of GCC itself (no simple task). It's a calculated risk, but in my opinion, one with more upside than downside potential. But who are we to say what RMS should do with his work? The FSF is the copyright holder, afterall.
    - Re:GCC limits optimizations due to politics? (Score:2)
      
      by Ed Avis ( 5917 ) writes:
      
      No, I think the reason why certain optimizations are not available in gcc is due to software patents in some countries.
      - Re:GCC limits optimizations due to politics? (Score:2)
        
        by AHumbleOpinion ( 546848 ) writes:
        
        Apologies if I was not clear originally, but my recollection is that both patents and politics are responsible. That politics prevents unencumbered techniques.
So... (Score:4, Informative)

by zulux ( 112259 ) writes: on Thursday February 13, 2003 @01:14PM (#5295290) Homepage Journal

I hope the fine folks of GCC don't trade one inch of platform independance for a mile of "performance" on an Intel chip.

The best wat to get better performance out of GCC is to recopile your program - targeted to PowerPC, Alpha, Mips or hell... AMD.

Screwing with GCC to get better x86 performance is like puting a trubo on a Honda Civic - just buy the damn Corvette.

Share
twitter facebook
- - Re:So... (Score:3, Insightful)
    
    by turgid ( 580780 ) writes:
    
    ...and why don't YOU look up the meaning of proprietary in a dictionary. SPARC is an open standard. x86 is not.
    - - Re:So... (Score:2)
        
        by turgid ( 580780 ) writes:
        
        Oh yeah, try getting docs on the UltraSparc 3
        Give me a break! The stuff's in the Linux kernel source. Let me guess, you've never looked at the kernel source?
        
        Re:So... (Score:1)
        
        by larry bagina ( 561269 ) writes:
        
        Let me guess. You've never looked at the kernel source.
        
        Re:So... (Score:2)
        
        by turgid ( 580780 ) writes:
        
        Look in linux/arch/sparc64/kernel/cpu.c .That's just detection of the CPU. If you look in that directory you will see code to do all kinds of low-level stuff on ALL the UltraSPARC processors, so get back under your bridge, troll. I tried to post some of the code but hit the damned lamness filter.
        
        Re:So... (Score:2)
        
        by larry bagina ( 561269 ) writes:
        
        OpenBSD's Battle For UltraSparc III Documentation [kerneltrap.org]
        Now go wipe your face. It's covered with Scott McNealy's jism.
        
        Re:So... (Score:1, Flamebait)
        
        by turgid ( 580780 ) writes:
        
        *fingers in ears* La la la la la la la la la la... etc.
    - - Re:So... (Score:2)
        
        by Bert64 ( 520050 ) writes:
        
        Like Cypress/Ross, Fujitsu-Siemens, SUN, Texas Instruments - and these are just the ones i have sitting on my desk right now.
        Also these sparc chips are all 100% compatible with the target revision of the sparc standard, whereas the various x86-compatible chips have huge differences... take 3dnow support for instance.
- Re:So... (Score:2, Informative)
  
  by oliverthered ( 187439 ) writes:
  
  If you take a look at the change logs (for feture releases) you'll see that the optimization process is being made more granular in 3.4 (i think)
- Re:So... (Score:2)
  
  by nosferatu-man ( 13652 ) writes:
  
  Funny, then, the the single highest performing microprocessor in the world is one of those blown Civics, isn't it?
  
  'jfb
  - Re:So... (Score:3, Informative)
    
    by turgid ( 580780 ) writes:
    
    It's only high performance on 4 gigabytes of RAM or less. Then you're back into segmentation and other hackery, just like in the "good old days" of MS-DOS. You young 'uns, don't know how good you have it. In my day, your code had to fit inside 64k segements, and so did your data. No arrays bigger than 65533 bytes, and you had to do "far" jumps to code in other segments. Yes, addresses were 20 bits, but you had to make them up with two 16-bit chunks. So, you needed 32 bits of storage to address 1 megabyte of RAM. Don't get me started on TSRs...
  - Re:So... (Score:2)
    
    by Bert64 ( 520050 ) writes:
    
    Wrong, the highest performing microprocessor in the world is the Alpha EV7. Also consider the fact that it can address a full 64bits of memory, and scale almost linearly to 64 or 128 processors, 2 things that the x86 architecture cannot do.
    - Re:So... (Score:2)
      
      by oh ( 68589 ) writes:
      
      Are you sure? I thought the alpha could only use 48 bit addresses. Is this something they changes in the EV67/EV7? It's still 256TB, but it isn't quite the same.
      
      Yes I know The CPU registers are 64 bit, but the way it handled virtual memory limited it to 48 usable bits.
      - Re:So... (Score:2)
        
        by Bert64 ( 520050 ) writes:
        
        Even so, the adresses are still stored as 64bit longs, so with a processor upgrade and possibly minor tweaks to the os, the address space could be extended to the full 64bit range... I believe it was done this way for cost reasons, but i could be wrong... But saying that, there arent many uses that would require more than 256tb of ram..
    - Sorry, Alpha fans ... (Score:2)
      
      by nosferatu-man ( 13652 ) writes:
      
      Let's go to the numbers. From spec.org:
      
      P4, 3.06ghz: 1130/1103 (int/fp)
      EV7, 1.15ghz: 795/1124 (int/fp)
      
      So the P4 is *1.4 times* as fast on integer maths, and two percent slower at floating point. Ouch. Care to rethink that statement about the EV7? Sure, it's 64bit, and scalable, and a much nicer architecture, to boot. But I wasn't arguing address space size, suitability for large-scale NUMA or SMP clustering, or ease of writing compiler backends, was I?
      
      Peace,
      'jfb
      
      PS: Even more horrifying for fans of elegance in their microarchitectures:
      
      Itanic^2, 1.0ghz: 810/1431 (int/fp)
      - Re:Sorry, Alpha fans ... (Score:2)
        
        by Bert64 ( 520050 ) writes:
        
        1.4 times the integer rate, almost 3 times the clockrate, imagine the speed of a 3.06ghz EV7... which hp could easily do if they wanted, they could also improve their compiler somewhat, since it doesnt even support the ev7.. but theyre more concerned with promoting the itanic
        
        Re:Sorry, Alpha fans ... (Score:2)
        
        by nosferatu-man ( 13652 ) writes:
        
        HPaq is definitely up to their necks in Itanic (which, btw, is already faster than EV7), but even if they weren't, I find it hard to believe that they'd have the process to shrink EV7 further and clock it higher.
        
        The truly scary thing is that there's plenty of headroom in the P4 core. Intel is talking about 7ghz on their next shrink. One hopes that AMD is able to keep up, if only to keep the price of gross nasty fast microprocessors within the grasp of the common man.
        
        'jfb
- Re:So... (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  Because why should the users of the largest number of chips on Earth get better performance for the programs they write.
bad code gets good optimization.. (Score:3, Interesting)

by josepha48 ( 13953 ) writes: on Thursday February 13, 2003 @02:01PM (#5295746) Journal

To me it would seem that the partial redundancy would only work if the code was like the examples they give. If you coded such that there was no partial redundancy in the code then it probably would not be any better than gcc.
i.e.
x[i] += a[i+j*n] + b[i+j*n];
could be coded as
l=i+j*n;
x[i] += a[l] + b[l];
Both would yeild the same result, one would use more variables, but both would only do the calc once. Would this be any different?
Personally I can see not doing the calc's twice, but I'd think that a good programmer would code so he is not doing the calcs twice.

Share
twitter facebook
- Re:bad code gets good optimization.. (Score:2)
  
  by Lars T. ( 470328 ) writes:
  
  Actually this is one of the oldest tricks in the book, and I'm sure gcc does this.
  - Re:bad code gets good optimization.. (Score:1)
    
    by wscott ( 20864 ) writes:
    
    Of course and they make that pretty clear.
    But their much better memory disambiguation and
    anaylsis of the program allow them to apply this trick
    much more often than GCC can.
- Bad code arises during compilation of good code (Score:3, Insightful)
  
  by Tom7 ( 102298 ) writes:
  
  In fact,
  
  a[x] = b[x] + c[x];
  
  probably compiles into something like (in the C equivalent):
  
  offset_b = x * 4 +
  val_b = *offset_b;
  offset_c = x * 4 +
  val_c = *offset_c;
  offset_a = x * 4 +
  val_a = val_c + val_b;
  *offset_a = val_a;
  
  (Set aside the fact that C automatically scales pointers arithmetic for you. Also ignore for the moment the fact that x86 allows you to do a scalar multiply by 4 in the load instruction -- pretend we are accessing structures of some large or non-power-of-two size.)
  
  Here the computation of x * 4 is redundant, even though we never wrote x * 4 in the original program.
  
  The point is, dumb code doesn't just arise because of dumb programmers, but because of the compilation process. (Also imagine you are calling a macro that computes offsets for you, etc.) Anway, every compiler implements this level of common sub-expression elimination, even gcc, so don't worry!
Optimization and compiler bugs (Score:3, Interesting)

by dpbsmith ( 263124 ) writes: on Thursday February 13, 2003 @04:33PM (#5296869) Homepage

Something that's been bothering me ever since I was bitten by my first compiler bug several decades ago. (And couldn't get the bug acknowledged by the vendor because I couldn't get it to occur in a short code fragment).

WHY does every vendor have optimization on by default all the time? Optimization routines in compilers are the most likely places to have bugs, and they are often extraordinarily subtle bugs that are hard to reproduce. I have personally encountered bugs in which the insertion of deletion of a COMMENT affected the code, and certainly many of us have encountered bugs that could not be demonstrated in a short fragment because they only occurred when things got deeply nested and the compiler ran out of temporary registers.

Optimization also interferes with debuggers. I know that YOU are capable of doing up-front planning and writing bug-free code, but _I_ am not, and forcing me to recompile in order to use the debugger is one more hurdle in the way of getting the bugs out of the code.

Why isn't optimization off by default, and turned on only in specific modules, or certain SECTIONS of the code (with pragmas)--those specific sections that can be demonstrated to be time-critical?

Share
twitter facebook
- Re:Optimization and compiler bugs (Score:1)
  
  by jdennett ( 157516 ) writes:
  Some reasons we use optimisation by default:
  
  The code goes faster that way
  
  Flow analysis gives better warnings that way
  
  Most compilers get more testing with optimisations enabled, so while there are bugs in optimisers there are also latent bugs exposed when the optimiser is turned off
  
  With modern C++ code, turning the optimiser off completely can kill performance. At least some basic inlining is needed to make code work at tolerable speeds.
- Re:Optimization and compiler bugs (Score:1)
  
  by Arandir ( 19206 ) writes:
  
  Let's see, GCC has optimization off by default. VC++ projects by default give you two project settings, one with no optimization. If you're using something advertised and marketed as an optimizing compiler (like ICC), then it's up to you to write your makefiles appropriately.
Inside the Intel compiler (Score:2, Funny)

by truth_revealed ( 593493 ) writes:

must be dark in there
Optimized subscript and iterator checking (Score:2)

by Animats ( 122034 ) writes:
I'd like to see aggressive optimized subscript and iterator checking in gcc. We've been discussing this in the C++ standards newsgroup, and the consensus is that early detection of assertion failures is within the standard, so it's OK to do this.
Here's what I'm talking about. Consider
- vector<int> tab(10); ... int k=11; ... int total=0; for (unsigned int i=0; i<k; i++) { total += tab[i]; }
Now the usual way to check something like this would look like this:
- vector<int> tab(10); ... int k=11; ... int total=0; for (unsigned int i=0; i<k; i++) { assert(i< tab.size()); total += tab[i]; }
Note that the assert will fail on the iteration for which i=10. (The vector is from 0..9, remember.) The first ten iterations get executed, then there's an assertion failure, and if the assert wasn't there; "undefined behavior" would occur at tab[10]. This is a valid check, but it adds overhead on every iteration.
If the compiler hoists the assert out of the loop, we get
- vector<int> tab(10); ... int k=11; ... assert(k<tab.size()); int total=0; for (unsigned int i=0; i { total += tab[i]; }
and we've pulled the subscript check entirely out of the loop, eliminating almost all the cost of subscript checking. This is possible for a high percentage of subscripts in inner loops, where it matters. It's a huge win in performance, and it makes subscript checking feasible in production programs.
There was a philosophical argument that such an optimization is illegal, because it triggers an assertion failure well before the program has actually crashed. But the consensus was that if the program is inevitably headed for undefined or illegal behavior, the compiler can take any appropriate action, including detecting the problem early. So this is an OK optimization.
I'd like to see that in "gcc".

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re:Patented? (Score:1)

Re:Patented? (Score:1)

Re:Patented? (Score:1)

Re:Patented? (Score:5, Insightful)

Re:Patented? (Score:1)

Re:Patented? (Score:1)

Re:Patented? (Score:5, Interesting)

Re:Patented? (Score:2)

GCC limits optimizations due to politics? (Score:2)

Re:GCC limits optimizations due to politics? (Score:3, Interesting)

Re:GCC limits optimizations due to politics? (Score:2)

Re:GCC limits optimizations due to politics? (Score:2)

So... (Score:4, Informative)

Re:So... (Score:3, Insightful)

Re:So... (Score:2)

Re:So... (Score:1)

Re:So... (Score:2)

Re:So... (Score:2)

Re:So... (Score:1, Flamebait)

Re:So... (Score:2)

Re:So... (Score:2, Informative)

Re:So... (Score:2)

Re:So... (Score:3, Informative)

Re:So... (Score:2)

Re:So... (Score:2)

Re:So... (Score:2)

Sorry, Alpha fans ... (Score:2)

Re:Sorry, Alpha fans ... (Score:2)

Re:Sorry, Alpha fans ... (Score:2)

Re:So... (Score:2)

bad code gets good optimization.. (Score:3, Interesting)

Re:bad code gets good optimization.. (Score:2)

Re:bad code gets good optimization.. (Score:1)

Bad code arises during compilation of good code (Score:3, Insightful)

Optimization and compiler bugs (Score:3, Interesting)

Re:Optimization and compiler bugs (Score:1)

Re:Optimization and compiler bugs (Score:1)

Inside the Intel compiler (Score:2, Funny)

Optimized subscript and iterator checking (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals