Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Inside the Intel Compiler 52

deadfx writes "Linux Journal is carrying an article authored by members of the Intel Compiler Lab examining specific optimizations employed by the compiler that allowed it to beat the gcc compiler on several benchmarks."
This discussion has been archived. No new comments can be posted.

Inside the Intel Compiler

Comments Filter:
  • So... (Score:4, Informative)

    by zulux ( 112259 ) on Thursday February 13, 2003 @12:14PM (#5295290) Homepage Journal
    I hope the fine folks of GCC don't trade one inch of platform independance for a mile of "performance" on an Intel chip.

    The best wat to get better performance out of GCC is to recopile your program - targeted to PowerPC, Alpha, Mips or hell... AMD.

    Screwing with GCC to get better x86 performance is like puting a trubo on a Honda Civic - just buy the damn Corvette.

    • Re:So... (Score:2, Informative)

      If you take a look at the change logs (for feture releases) you'll see that the optimization process is being made more granular in 3.4 (i think)
    • Funny, then, the the single highest performing microprocessor in the world is one of those blown Civics, isn't it?

      'jfb
      • Re:So... (Score:3, Informative)

        by turgid ( 580780 )
        It's only high performance on 4 gigabytes of RAM or less. Then you're back into segmentation and other hackery, just like in the "good old days" of MS-DOS. You young 'uns, don't know how good you have it. In my day, your code had to fit inside 64k segements, and so did your data. No arrays bigger than 65533 bytes, and you had to do "far" jumps to code in other segments. Yes, addresses were 20 bits, but you had to make them up with two 16-bit chunks. So, you needed 32 bits of storage to address 1 megabyte of RAM. Don't get me started on TSRs...
      • Wrong, the highest performing microprocessor in the world is the Alpha EV7. Also consider the fact that it can address a full 64bits of memory, and scale almost linearly to 64 or 128 processors, 2 things that the x86 architecture cannot do.
        • by oh ( 68589 )
          Are you sure? I thought the alpha could only use 48 bit addresses. Is this something they changes in the EV67/EV7? It's still 256TB, but it isn't quite the same.

          Yes I know The CPU registers are 64 bit, but the way it handled virtual memory limited it to 48 usable bits.
          • Even so, the adresses are still stored as 64bit longs, so with a processor upgrade and possibly minor tweaks to the os, the address space could be extended to the full 64bit range... I believe it was done this way for cost reasons, but i could be wrong... But saying that, there arent many uses that would require more than 256tb of ram..
        • Let's go to the numbers. From spec.org:

          P4, 3.06ghz: 1130/1103 (int/fp)
          EV7, 1.15ghz: 795/1124 (int/fp)

          So the P4 is *1.4 times* as fast on integer maths, and two percent slower at floating point. Ouch. Care to rethink that statement about the EV7? Sure, it's 64bit, and scalable, and a much nicer architecture, to boot. But I wasn't arguing address space size, suitability for large-scale NUMA or SMP clustering, or ease of writing compiler backends, was I?

          Peace,
          'jfb

          PS: Even more horrifying for fans of elegance in their microarchitectures:

          Itanic^2, 1.0ghz: 810/1431 (int/fp)
          • 1.4 times the integer rate, almost 3 times the clockrate, imagine the speed of a 3.06ghz EV7... which hp could easily do if they wanted, they could also improve their compiler somewhat, since it doesnt even support the ev7.. but theyre more concerned with promoting the itanic
            • HPaq is definitely up to their necks in Itanic (which, btw, is already faster than EV7), but even if they weren't, I find it hard to believe that they'd have the process to shrink EV7 further and clock it higher.

              The truly scary thing is that there's plenty of headroom in the P4 core. Intel is talking about 7ghz on their next shrink. One hopes that AMD is able to keep up, if only to keep the price of gross nasty fast microprocessors within the grasp of the common man.

              'jfb
    • Because why should the users of the largest number of chips on Earth get better performance for the programs they write.
  • by josepha48 ( 13953 ) on Thursday February 13, 2003 @01:01PM (#5295746) Journal
    To me it would seem that the partial redundancy would only work if the code was like the examples they give. If you coded such that there was no partial redundancy in the code then it probably would not be any better than gcc.

    i.e.
    x[i] += a[i+j*n] + b[i+j*n];
    could be coded as
    l=i+j*n;
    x[i] += a[l] + b[l];

    Both would yeild the same result, one would use more variables, but both would only do the calc once. Would this be any different?

    Personally I can see not doing the calc's twice, but I'd think that a good programmer would code so he is not doing the calcs twice.

    • Actually this is one of the oldest tricks in the book, and I'm sure gcc does this.

    • In fact,

      a[x] = b[x] + c[x];

      probably compiles into something like (in the C equivalent):

      offset_b = x * 4 +
      val_b = *offset_b;
      offset_c = x * 4 +
      val_c = *offset_c;
      offset_a = x * 4 +
      val_a = val_c + val_b;
      *offset_a = val_a;

      (Set aside the fact that C automatically scales pointers arithmetic for you. Also ignore for the moment the fact that x86 allows you to do a scalar multiply by 4 in the load instruction -- pretend we are accessing structures of some large or non-power-of-two size.)

      Here the computation of x * 4 is redundant, even though we never wrote x * 4 in the original program.

      The point is, dumb code doesn't just arise because of dumb programmers, but because of the compilation process. (Also imagine you are calling a macro that computes offsets for you, etc.) Anway, every compiler implements this level of common sub-expression elimination, even gcc, so don't worry!
  • by dpbsmith ( 263124 ) on Thursday February 13, 2003 @03:33PM (#5296869) Homepage
    Something that's been bothering me ever since I was bitten by my first compiler bug several decades ago. (And couldn't get the bug acknowledged by the vendor because I couldn't get it to occur in a short code fragment).

    WHY does every vendor have optimization on by default all the time? Optimization routines in compilers are the most likely places to have bugs, and they are often extraordinarily subtle bugs that are hard to reproduce. I have personally encountered bugs in which the insertion of deletion of a COMMENT affected the code, and certainly many of us have encountered bugs that could not be demonstrated in a short fragment because they only occurred when things got deeply nested and the compiler ran out of temporary registers.

    Optimization also interferes with debuggers. I know that YOU are capable of doing up-front planning and writing bug-free code, but _I_ am not, and forcing me to recompile in order to use the debugger is one more hurdle in the way of getting the bugs out of the code.

    Why isn't optimization off by default, and turned on only in specific modules, or certain SECTIONS of the code (with pragmas)--those specific sections that can be demonstrated to be time-critical?
    • Some reasons we use optimisation by default:

      • The code goes faster that way
      • Flow analysis gives better warnings that way
      • Most compilers get more testing with optimisations enabled, so while there are bugs in optimisers there are also latent bugs exposed when the optimiser is turned off
      • With modern C++ code, turning the optimiser off completely can kill performance. At least some basic inlining is needed to make code work at tolerable speeds.

    • Let's see, GCC has optimization off by default. VC++ projects by default give you two project settings, one with no optimization. If you're using something advertised and marketed as an optimizing compiler (like ICC), then it's up to you to write your makefiles appropriately.
  • must be dark in there
  • I'd like to see aggressive optimized subscript and iterator checking in gcc. We've been discussing this in the C++ standards newsgroup, and the consensus is that early detection of assertion failures is within the standard, so it's OK to do this.

    Here's what I'm talking about. Consider

    • vector<int> tab(10);
      ...
      int k=11;
      ...
      int total=0;
      for (unsigned int i=0; i<k; i++)
      { total += tab[i]; }
    Now the usual way to check something like this would look like this:
    • vector<int> tab(10);
      ...
      int k=11;
      ...
      int total=0;
      for (unsigned int i=0; i<k; i++)
      { assert(i< tab.size()); total += tab[i]; }
    Note that the assert will fail on the iteration for which i=10. (The vector is from 0..9, remember.) The first ten iterations get executed, then there's an assertion failure, and if the assert wasn't there; "undefined behavior" would occur at tab[10]. This is a valid check, but it adds overhead on every iteration.

    If the compiler hoists the assert out of the loop, we get

    • vector<int> tab(10);
      ...
      int k=11;
      ...
      assert(k<tab.size());
      int total=0;
      for (unsigned int i=0; i { total += tab[i]; }
    and we've pulled the subscript check entirely out of the loop, eliminating almost all the cost of subscript checking. This is possible for a high percentage of subscripts in inner loops, where it matters. It's a huge win in performance, and it makes subscript checking feasible in production programs.

    There was a philosophical argument that such an optimization is illegal, because it triggers an assertion failure well before the program has actually crashed. But the consensus was that if the program is inevitably headed for undefined or illegal behavior, the compiler can take any appropriate action, including detecting the problem early. So this is an OK optimization.

    I'd like to see that in "gcc".

If you think nobody cares if you're alive, try missing a couple of car payments. -- Earl Wilson

Working...