Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Transmeta

Intel's New Compiler Boosts Transmeta's Crusoe 272

Bram Stolk writes: "Intel recently released its new C++ compiler for linux. I've been testing it on my TM5600 Crusoe. Ironically, it turns out that Transmeta's arch nemesis, Intel, provides the tools to really unlock Crusoe's full potential on linux." It doesn't support all of gcc's extensions, so Intel's compiler can't compile the Linux kernel yet, but choice is nice.
This discussion has been archived. No new comments can be posted.

Intel's New Compiler Boosts Transmeta's Crusoe

Comments Filter:
  • Intel's C++ compiler still compiles code to x86. This is really great, considering that the approx. 28% speedup in Crusoe is not the native Crusoe. I wonder how Crusoe will fare once there is a compiler that build straight to its native.

    For me, Crusoe + icc + GNU/linux is a winning combination.

    Well, to me, it's a hasty conclusion. P4 gains 26%, Athlon XP gains 19%, and plain Athlon gains 16%.

    • Re:Take Note that... (Score:2, Interesting)

      by smileyy ( 11535 )

      You're one of those people who just doesn't get the fact that the Crusoe gets speed gains by *not* using its native instruction set.

      • That's what I have said. My point was that the 28% gain is basically on-par with P4. Athlon gains weren't too shabby either. Meanwhile, we understand that current Crusoe performance is pretty dismal compared to P4 or Athlon. So, 2% difference on performance gain doesn't mean that Crusoe performance is now leveraged into a new level.

        If it were compiled into its native, we then can see Crusoe's raw power and compare them neck to neck. The story would have been much different.

        Note also that I am not a revisionist. I believe Slashdot community is intelligent enough figuring out what I said.

        • Crusoe does cool things because it runtime optimizes the code that it is morphing. If you were to run crusoe code natively, you'd no longer get the optimization benefits, and all you'd be left with is an even slower low-power chip.

          Theoretically, you could write a Crusoe-to-Crusoe code morphing module, but that wouldn't buy you anything more than the X86-to-Crusoe morpher.

          • Well, code morphing itself does not worth the performance. For example, let's compare Intel Celeron vs Crusoe with the same speed. I doubt that Crusoe can even beat Celeron, even with the super-optimized morphing that has run for months.

            The problem here is that no matter how good is the morphing, it is still "emulation". You can do morphing or maybe mixed with dynamic recompilation, you cannot beat the real stuff that run natively. There are lots of examples.

            The real power of Crusoe processor is that it is a VLIW processor, which can jam-pack several instructions into one. *This* is the real power. Notice that P4 adopt this solution too (3 instr to 1). Intelligent compiler can arrange the machine code so that the instruction bundles are used very efficiently. Now, let's say that Crusoe has 32-bit instruction wide and it has 128-bit. Theoretically, you can jam-pack 4 instruction at once, thus yielding 4 times the MHz rate. *This* is what I really want to see.

            About the power problem: I really pessimistic about processor power that can prolong battery life n times (with n > 2), as claimed by Transmeta. IIRC, *the* major power drain is at LCD and hard drive. If those bogger are attacked, I wouldn't have been surprised that the battery life would be prolonged. But, let's recall that before Crusoe came, P3 processor only consumes about 2W. How many portions was that to the total laptop consumption? Now, Crusoe reduce that into a mere 1W -- and *that* was claimed as *the* dramatic power saver. I smelled rat.

            I don't want to attack your "belief" as Crusoe adherents, but please understand the underlying problem before you answer.

            • Well, code morphing itself does not worth the performance. For example, let's compare Intel Celeron vs Crusoe with the same speed. I doubt that Crusoe can even beat Celeron, even with the super-optimized morphing that has run for months.

              That doesn't mean code morphing is a bad idea. It means at least one of the following is true:

              • The celeron is had better designers
              • Celeron designers had more resources to work with (time, money, people)
              • Celeron and Crusoe had different design goals
              • Code morphing is a bad idea
              • Code morphing is only a good idea when combined with something we haven't tried yet

              If you look at the history of the RISC CPUs the IBM ROMP was not very fast when it first came out. As I recall that was the first commercial RISC design. It didn't mean RISC was a bad idea (RISC in fact stomped CISCs butt soundly during the first half of the 90s, and I'm not sure the most recent x86 has beaten the cold dead Alpha's corpse...and the Power4 seems a whole lot faster...). The ROMP was slow because it wasn't the best design. I think the big problem was it's MMU. They did manage to fix it up, but I don't think it got a whole lot faster then the 68020 for a long time!

              I'm not saying code morphing is great either, just that a commercial failure is hardly a scientific experiment.

              The real power of Crusoe processor is that it is a VLIW processor, which can jam-pack several instructions into one. *This* is the real power. Notice that P4 adopt this solution too (3 instr to 1).

              And the PIII did 2 instructions, the AMD K7 did three, and RISCs have been doing 4+ for a long time. As far as VLIW goes 4-way is pretty low. The most interesting things that (seem) to be in the real Crusoe are the split load and split store (they can start a load, and later decide to complete it -- taking a fault if need be, or abort it; similar with stores, they can be queued and later canceled).

              I really pessimistic about processor power that can prolong battery life n times (with n > 2), as claimed by Transmeta. IIRC, *the* major power drain is at LCD and hard drive.

              I am too, but on my notebook (an older G3 PowerBook) the batt monitor shows about 4 hours with the LCD on high and minimal CPU use, it shows 4.5 when I put the LCD on max dim (almost unreadable in a room with 60watt bulbs, ok with no light in the room). Leaving the monitor on dim and running a tight loop the batt time drops to around three hours. Leaving it going until the fan (tempature sensitive) kicks in and the batt time drops a bit more.

              So I would say the CPU can use more power then the backlight. No, not quite, the swing from minimal CPU use to max is more taxing then the swing from minimal backlight to max. The no backlight to min may be a bigger swing then no CPU to minimal CPU.

              That's not to say the Crusoe can magically turn a one hour batt to six hours, but it might get one hour to two.

      • Re:Take Note that... (Score:2, Informative)

        by kingdon ( 220100 )
        Exactly. One benefit of x86 instructions (the only benefit? ;-)) is that they are pretty compact. And that wouldn't be such a big deal except that it means more of them fit in cache, you can fetch more instructions in one memory cycle, and that sort of thing. So using native transmeta instructions across the bus could easily slow things down (kind of a thought experiment, since as far as I know they haven't done it even for testing purposes).
  • At first in the writeup [2y.net] it looked as though you were planning on compliling an image, and I thought to my self "Holy crap, self! Can complilers these days make graphics from source code?" Then I realized that you were just compiling the program to make the image. Then I looked at the example [tu-darmstadt.de], and it looks as though you are (effectively) compiling a graphic. I'm so confused... :o)
    • He's compiling a raytracer; A 3-D rendering program designed to take instructions and create an image from them.

      The point here is that the raytracer code generated by icc is more tuned; it's optimized and can do its job faster.

      Justin
      • Yeah, I got that. Figured it out by the end of the write up. Thanks for the post though.

        I have jest never been exposed to raytracing source before. Really quite interesting, the similarities between it and computer source code. I don't know what I was expecting, but what I saw certainly wasn't it.

        Again, thanks for the kind clarification. It was much more constructive than the AC that also replied.
        • Uh, are you trolling? The raytracer is a program that runs on a computer. Thus, the raytracing source *is* computer source code. They're not similar, but the same.
          • Did any of that really sound like I'm looking for an outraged reaction from the readers? I sure hope not. It was more a statement regarding the first impressions I had with ray tracing instructions. As I stated, I'm not really sure what I was expecting, but the instructions for "drawing" that fish was not it. To be quite honest, I'd take the whole thing back if I could.
          • I think he his talking about the source the raytrace reads in to render the image...
  • Uh... (Score:1, Interesting)

    by davster ( 169308 )
    Intel's compiler can't compile the Linux kernel yet

    Last time i checked the kernel was in C not C++
    • Re:Uh... (Score:2, Informative)

      by AaronMB ( 136741 )
      Intel's compiler can compile both C and C++.
    • Re:Uh... (Score:3, Informative)

      hmmm... The reason Intel's compiler can't compile the kernel is that the kernel uses extentions that only gcc has. like __attibute__((packed)), ARGV style macros and embeded blocks:
      #include useless_macro(a, b...) ({ printf(a, ##b); })
  • by Walter Bell ( 535520 ) <wcbell AT bellandhorowitz DOT com> on Friday November 09, 2001 @08:12PM (#2546970) Homepage
    The Linux results were interesting, but rather flat: everything benefited from it on my system. However I rebooted into NetBSD and gave the compiler a shot at 'make sys' and 'make world'. Because the NetBSD kernel does not use any nonstandard gcc extensions, it compiled just fine with the new compiler. What I found was:
    • The kernel showed a marked performance benefit on the TM5600. On my TM5400 the results were not noticable.
    • Most userspace utilities appeared to be quite a bit faster on both CPUs. However, some (one notable example being /usr/local/bin/perl) were much slower with the new compiler. I verified that this was not the case on Linux, so it is unclear to me as to why this happens under NetBSD. Further investigation is needed.

    ~wally

  • Ironically, it turns out that Transmeta's arch nemesis, Intel, provides the tools to really unlock Crusoe's full potential on linux.


    Any bets on which of the next versions will spew an error about "incompatible architecture" when used on non-Intel hardware?

  • GCC extensions?? (Score:4, Insightful)

    by Reality Master 101 ( 179095 ) <`moc.liamg' `ta' `101retsaMytilaeR'> on Friday November 09, 2001 @08:14PM (#2546979) Homepage Journal

    Wait, the Kernel uses GCC extensions? I thought the Kernel was written in real C, not that bastard GCC version. I've never look at Kernel code, so I'm not sure. Is this really true?

    If it's true, I think that's a huge mistake. The Kernel should not be at the mercy of one compiler.

    • Re:GCC extensions?? (Score:5, Informative)

      by wmshub ( 25291 ) on Friday November 09, 2001 @08:28PM (#2547016) Homepage Journal
      Yes, the kernel uses enormous numbers of GCC extensions. It gets significant performance improvements this way. Perhaps you are willing to give up kernel performance for a portability, but from my experience as an instructor in a Linux device drivers class, you are in the vast minority. A kernel really needs assembly inlines (for example the sti and cli instructions are get inserted pretty frequently in critical code paths), and to do them well you have to use extensions to C.

      There are even some places where GCC extensions make the code easier to maintain. For example, the way that device driver entry points are defined is much cleaner (using the "structure member : value" structure initialization syntax) and less error prone than using standard C.

      Yes, it might have been helpful a few times to have been able to compile Linux on a non-GCC compiler, but not very often. And GCC runs almost everywhere, so limiting yourself to GCC doesn't limit the architectures you can port to. It really does seem that in this case the benefits outweigh the losses.

      • Re:GCC extensions?? (Score:3, Interesting)

        by srvivn21 ( 410280 )
        It gets significant performance improvements this way.

        But are these performance increases greater than what would be realized if the Kernel could be compiled using icc?

        This doesn't address the maintainence issue, but it's something that I am looking forward to seeing in the near future. I figure someone has the time and drive to hack the Kernel source to the point that icc will compile it. Goodness knows, I don't.
      • Perhapse, but what if gcc isn't the best compiler in the world? What if, besides caruso, Intel's compiler is actually a BETTER compiler than gcc on intel hardware? Then were stuck using gcc for compiling the kernel when something better is or might be some day available. Locking the kernel to a compiler is a BAD THING[tm].
        • Wrong... (Score:5, Informative)

          by Carnage4Life ( 106069 ) on Friday November 09, 2001 @09:46PM (#2547211) Homepage Journal
          What if, besides caruso, Intel's compiler is actually a BETTER compiler than gcc on intel hardware? Then were stuck using gcc for compiling the kernel when something better is or might be some day available. . Locking the kernel to a compiler is a BAD THING[tm].

          The Linux kernel is not only available on Intel chips. It is available on ARMs, DEC Alphas, SUN Sparcs, M68000 machines (like Atari and
          & Amiga), MIPS and PowerPC, as well as IBM mainframes.

          Which makes more sense? Targetting a cross plartform compiler like gcc [gnu.org] are targetting individiual compilers for each platform Linux runs on?
          • I believe the gist of his argument is not that the kernel should be tied to some other compiler instead of gcc, but rather to the language spec, so that any standards compliant compiler should be able to compile it.

            • That's easier said than done. As soon as you want to do something that's not supported in the language spec, you have to pick a compiler.
        • gcc may not be the best compiler in the world, but it's open source. If there is a serious deficiency, it can be fixed without relying on a certain company. We know it will always be there.

          Now, tying an Open Source project to a single proprietary compiler or tool is certainly a bad idea. I'm trusting proprietary tool makers less and less every day (based on how Borland is handling Kylix). But tying it to Open Source tools, especially popular ones, is not a problem.
        • What if, besides caruso, Intel's compiler is actually a BETTER compiler than gcc on intel hardware?

          Another one who learned the pronunciation of "Crusoe" from the Gilligan's Island theme song!

    • Re:GCC extensions?? (Score:1, Interesting)

      by Anonymous Coward
      Being at the "mercy" of GCC is not a bad thing - it is the most portable compiler on earth. What's wrong with that?
    • Wait, the Kernel uses GCC extensions? I thought the Kernel was written in real C, not that bastard GCC version. I've never look at Kernel code, so I'm not sure. Is this really true?

      Here's some kernel code [linux.no]. Now you've seen it.

      If it's true, I think that's a huge mistake. The Kernel should not be at the mercy of one compiler.

      Why not? The major goal of operating system design is to extract as much performance as possible with as little overhead as possible. Portable code by definition is rarely as efficient as code targetting a specific platform or compiler.
    • Re:GCC extensions?? (Score:3, Interesting)

      by rgmoore ( 133276 )
      The Kernel should not be at the mercy of one compiler.

      "At the mercy of" one compiler is a rather strange description, don't you think? After all, both Linux and gcc are released under the GPL, which means that anyone who wants to use Linux will already be willing to accept what many people view as the most obnoxious feature of gcc (the license). And it's not as though the gcc developers can yank the rug out from under Linux by making it proprietary, refusing to distribute old versions, etc. If anything it would be crazy to make serious modifications to Linux to take advantage of a compiler like Intel's that could be taken away at any minute.

    • It's a very pragmatic decision.

      Different compiler do produce different code and have different extensions.

      To enable compiling the kernel with different compilers, you have to programm for the different extensions and you have to test the kernel with the different compilers. This, plus the different architectures supported gives you a (n*m) variety of possibilities and the same amount of problems.

      For the very same reason, the kernel is not only for GCC it's also only for one or two different versions of GCC.
      Maybe take a look a the LKML-FAQ [tux.org], where Rafael R. Reilova gives an anwser on why not use different compilers.
  • Write their optimized code in ASM. Who needs fancy c++ compilers/et al. "But Tairan, do you know how hard it is to write perl in ASM?" you ask? My answer: use a real language :)

    Seriously, anything that is going to need the optimizations that this new compiler does, should probably be written in ASM anyway. Your 'hello world' and 'count and increment an array' programs are not going to run any faster. Don't bother.

    • That's completely false. How many programs do you have that you wish were faster? If you can notice how long it takes for a program to do something, then it's not fast enough. Wouldn't you like your Perl or PHP scripts to run faster? Should you have to rewrite them in assembly? Isn't it better to recompile the interpreter with a better compiler than write it or the scripts in assembly?
    • Real men write machine code directly, in hexes. Who needs the pinko sissy commy fag Assembly Language?
      • Only LONELY geeks program in Hex or assembly!

        Real Men code in C++, Java, Fortran, or Objective C, get the necessary job done, then go home to f*ck the prom queen!

    • Re:Real Men (Score:2, Funny)

      by MrHat ( 102062 )
      Nah. Real men can write assembly in *any* language. :)
    • Re:Real Men (Score:2, Funny)

      by Mindbridge ( 70295 )
      And Real Men rewrite the entire kernel when a new processor comes along.
  • by grover ( 5492 ) on Friday November 09, 2001 @08:19PM (#2546996)
    ...because this is the first question everyone asks as soon as they find out Intel's compiler works on Linux. ;-)

    I'm not surprised the compiler helped Crusoe. GCC is a remarkable achievement in portability, but architecture-tailored compilers (MSVC, ICC) do better both in terms of code size and speed - like 30% better. But if you're going to PAY for your compiler, it better not be beaten by a free alternative.

    I hope we see distros using icc, and I also hope it spurs further development in GCC.
    • by Anonymous Coward
      Please stop saying that MSVC generates code that runs faster than GCC generated code - it just ain't so.

      I've compared MSVC 6.0 on Win32 to Cygwin's port of gcc 2.95.3 (with the -mno-cygwin switch, so gcc generates native win32 executables).

      The gcc generated stuff consistently runs faster. Ballpark 20% or more IIRC - I don't have the figures handy. These were on real world compute intensive programs that I use at work, not artificial benchmarks. And yes, I had full-tilt optimization on both compilers.

      While I don't doubt that gcc optimization could be improved further, and should be, my biggest complaint is often that the compiler itself runs slowly (particularly for C++).
      • It does on alpha.. but I'm not sure if that version of MSVC came from digital->compaq->hp or not??

        Nonetheless... gcc doesn't far well on alpha chips.

        Pan
        • That's because there are more people working on the Intel optimizations. It's also worth noting that some people heavily involved with the development of gcc have said there are some optimization techniques that they would like to implement, but can't because they are currently patented and are waiting for the patents to run out.

          I've always heard good things about Digital's Alpha compilers. When the Alpha division of H-Paq finally takes a dirt nap, it would be nice if they could GPL the Alpha optimizations for inclusion into gcc.

  • by brer_rabbit ( 195413 ) on Friday November 09, 2001 @08:23PM (#2547005) Journal
    I wonder if Intel's compiler is binary compatible with gcc. While it's probably against the licensing to redistribute the compiler's math or C library, I wonder if you could compile the gnu math/C library with icc and produce a shared object? An optimized math or other system library would give some decent improvement in performance.
    • There shouldn't be a lot of problems for binary compatibility with C (e.g. glibc, libcurses, X libraries). (Famous last word is "should" so unless someone does some testing and reports the results, take with a grain of salt). For C++, it gets a bit murkier. The Intel page [intel.com] has a section called "Compatibility with the GNU Compilers". They refer to the C++ ABI that was developed for Itanium, which I believe is basically the same ABI as GCC 3.x (it has mangled names which start with _Z). When they say they aren't compatible with g++, I suspect they mean g++ 2.95.x and maybe even 3.0 or 3.0.1, I'm not sure that sentence applies to 3.0.2 or (certain unspecified) future releases of 3.x.
  • by Anton Anatopopov ( 529711 ) on Friday November 09, 2001 @08:28PM (#2547017)
    It seems as if Intel is playing a very clever game here. If Intel can unlock the true potential of the transmeta's code morphing architecture, then they are giving it a great deal of credibility in the industry.

    Given that Intel makes a lot of its money from selling silicon, why on earth would it develop compiler technology which legitimized the approach of one of its major competitors ?

    I can only assume that Intel has some fairly advanced code morphing technology of its own, and has been using the transmeta devices as a testbed.

    I can just see it now, a 4GHz pentium with code morphing extensions.

    I expect this one will be fought out in the patent arena. IBM and Intel are heavyweight players and I don't see either of them giving any ground willingly.

    • I doubt if this has much to do with Transmeta from Intel's standpoint. If anything, I'd suspect these optimizations are geared more towards some kind of funky x86-32 implementation under McKinley IA-64.
    • by hexix ( 9514 ) on Friday November 09, 2001 @08:53PM (#2547068) Homepage
      I think you're reading more into this. I think it's more like intel released a compiler to generate better optimized x86 code for it's processors. And since transmeta does code morphing from x86 to whatever it's instruction set is, a side effect from better optimized x86 code would be faster code morphing of that better optimized code.

      You make it sound like it only improves transmeta's chips and not others. I really doubt that's what's going on here.
      • This is the same case with Intel's Windows compiler. They have a (very expensive) program that plugs into Microsoft's Visual C++ and does the compiling for it. As you'd expece Intel chips, most espically the P4 see marked performance improvements. What supprises some people is so do AMD chips. The compiler is just an all around more efficient compiler and it works better on all x86 chips, even those not made by Intel.

        I think the most dramatic demonstration of this was a test done by Tom's Hardware [tomshardware.com] last year. He ran a test on a bunch of different processors doing MPEG-4 encoding using FlaskMPEG. The Pentium 4 performed abysmal, comming in behind a Pentium III 1ghz. Intel decided then to download the source code to FlaskMPEG and recompile it with their compiler. This moved the P4 up to the top of the heap, but also increased all the other scores. The P4 1.5 got the biggest boots, from 3.83fps to 14.03fps the PIII 1ghz also got a lesser boost from 4.39fps to 8.03fps. However the Intel compiler helped out the Athlon 1.2ghz too, boosting it from 6.43fps to 11.14fps. So it even gave their competitors' hardware a 60% speed boost.

        Intel's compiler division isn't interested in trying to screw their competitiors and make Intel's chips look the best, they are interested in producing the most optimized x86 code possible. Now of course the Intel compiler supports all the special Intel extensions (MMX, SSE, SSE2) and I don't believe it supportins things like 3dnow, but that dones't mean they are going to screw up their code on purpose to make it run poorly on other chips.

    • Actually, the P4 has a Trace Cache [arstechnica.com] which smells something like embryonic code morphing. Clearly it is not nearly as powerful, in terms of optimizations, but it seems a future Pentium (or even the Itanium) with code morphing is not out of the question.
  • archenemy (Score:2, Insightful)

    by Anonymous Coward
    how is intel the 'archenemy' of us... just because Linus works at Transmeta? What chip are you running your OS on? I bet its an Intel chip, or an intel-clone (AMD)

    /me is wintel-free, yay Mac
    • What chip are you running your OS on? I bet its an Intel chip, or an intel-clone (AMD)

      If the Athlon was an Intel clone, it wouldn't kick the P4's ass.
    • Read the slashdot post - it referred to 'Transmeta's arch nemesis, Intel' (not enemy). Nowhere did it say anything about either of them being 'our' enemy...
  • KDE performance (Score:2, Insightful)

    by joshua42 ( 103889 )
    Just a thought: Might this compiler perhaps be different in a way that improves the situation regarding the C++ library relocation issues that bothers KDE?
    • Re:KDE performance (Score:3, Interesting)

      by hexix ( 9514 )
      I was wondering the same type of thing. Is this going to be helpful to KDE in any way?

      It's my understanding that the problem is with the gnu library linker, but I don't know much about compilers. Does this intel compiler have it's own library linker or does it still use the gnu one?

      If it does use it's own can we expect to see dramatic speed increases if we were to compile KDE with this intel compiler?
      • No, it uses the standard GNU linker. It exports a standard ELF program that ld.so links at runtime. However, you couldn't really use it with KDE anyway. Intel C++ has a different C++ ABI than G++ (the C ABI is of course the same) and thus can't link with G++ compiled libraries. Lastly, KDE doesn't even compile with icc yet. Look at this thread [kde.org] on kde-devel.
        • That thread is from May. In the meanwhile, it seems that almost all the new KDE tree is compilable with the intel compiler (at least based on the cvs logs, I didn't check it myself).

          Now, for the expected performance increases. If I am correct, the intel compiler is the old KAI C++ compiler, which was highly regarded in number crunching circles as the best optimizing, more standard compliant compiler around.

          Still, the spectacular increases occur only in very specific cases which are amenable to optimization. Number crunching (big math computations) are the best example, and this applies probably to mp3 encoding, divx playback and compression, image processing and other stuff like this, too. But for your average, highly heterogenous code which goes into your typical desktop apps, the increase is significatly smaller.

          Lotzi
  • Kernel (Score:2, Redundant)

    by Old Wolf ( 56093 )
    Why don't they use ANSI C for the kernel?
    • Re:Kernel (Score:4, Informative)

      by be-fan ( 61476 ) on Friday November 09, 2001 @10:27PM (#2547301)
      Because Linux is a real project and not some theoretical programming plaything. Kernels have all sorts of weird problems to deal with (passing parameters via registers, inline ASM, structure packing, alignment, etc) that normal application code doesn't have to bother with.
  • Ummm... Price? (Score:1, Interesting)

    by Erioll ( 229536 )

    I'm suprised I haven't seen anyone else post this. Intel's compiler is EXPENSIVE! $499? I think since most programmers are not exactly rich (Gates excluded), I think most Linux people are not going to exactly embrace this new compiler.

    $500? I paid less than that for my MS compiler!

    Erioll

  • Not surprising (Score:3, Interesting)

    by d5w ( 513456 ) on Friday November 09, 2001 @08:51PM (#2547065)
    This isn't a particularly startling result. Many of the things an x86 compiler has to optimize for these days are similar across all processors: e.g., regular branch patterns are faster than unpredictable ones; you have very few visible registers; it's helpful to have closely associated data in the same cache lines; you're usually better with the RISCy subsets of the ISA; etc. Intel would have had to go well out of its way to optimize for their own chips and pessimize for others, and I can't see Intel bothering.
  • by kijiki ( 16916 ) on Friday November 09, 2001 @08:58PM (#2547089) Homepage
    Intel's compiler boosts AMD Athlons too.

    AMD uses (or at least, used to use, I haven't checked lately) Intel's compilers for their SPEC runs.

    Intel's compiler is the best available for CPUs that implement the x86 ISA. Transmeta implements that ISA, so why does this news surprise people?
    • Optimization for a compiler is directed at chip-generation, not necessarily ISA's. Remember that FU latencies are the big problem in the pipeline, and since Transmeta is a RISC, this is kinda neat! :-)
  • by Anonymous Coward
    You're benchmarking an intel compiler which will generate optimized intel code, but telling gcc to use "-m386" ?

    You have an 80386 machine here secretly? Why not use the optimized flags like "-mcpu=i686 -march=i686" and give a fair comparison?

    Am I the only one to see this? C'mon people, wake up, read the manual.
  • No kernel, so what? (Score:4, Interesting)

    by labradore ( 26729 ) on Friday November 09, 2001 @10:07PM (#2547260)
    It seems that nearly everyone has missed the point of this article. POVRay is a program that makes very heavy use of the FPU. ICC speeds up POVRay's performance by 16% to 28% in the x86 architecture compared to GCC. In other words x86 FPU's are faster executing code from programs built with ICC. The Linux kernel (and almost any kernel for that matter) has very little floating point code. Therefore one cannot assume that ICC would improve kernel performance, even if it could compile it.

    The real story here is that the maintainers of GCC aught to look carefully at their optimization code for x86 FPUs.

    I'm betting that Intel developers have done their best to make use of the P4 cache. Since Transmeta CPUs do work recompiling programs on the fly they have larger caches (128KB L1 + 512KB L2) than the Athlon (128KB L1 + 256KB L2) and the Pentium 4 (20? KB L1 and 256KB L2). ICC is probably also highly agressive in implimenting SSE and SSE2 instructions. Transmetal CPUs also use VLIW instructions in core wich are by their nature highly parallel (compared to native x86). Even if the Transmeta chips can't use SSE and SSE2 they may benefit from the parallel-oriented optimizations that ICC probably makes.

    On a different note: in a program like POVRay that executes basically the same tight loop of instructions mega-gazillions of times during a scene the Transmeta chip's software can have the opportunity to highly optimize the program. I would like to see the stats on the second and third runs of that rendering to see how much the Transmeta "code morphing" improved the performance. It would be very interesting if the GCC and ICC built POVRays perfomed at almost the same speed after a few runs. It would obviously be a great proof of the value of Transmeta's design. I for one have always wondered what the code morphing stuff would be able to do if it were able to interface with the operating system and recompile and save the entire system back to the hard disk as it goes through the optimiztion processes. (I suppose that errors could be highly disasterous.)

    That's just my $0.02 and I'm no expert so I could definately be wrong.

    This is not a signature.

    • ICC doesn't even attempt SSE optimizations at the optimization level tested (-xMi; that's PPRO and MMX instructions; you need to -xMiKW to get SSE and SSE2 as well). The big wins that gcc could get would come from rewriting the scheduler and register allocator. The difference for gcc probably comes from extra loads and stores, and possibly more code in loop bodies. Function inlining may also play a part, as gcc doesn't do that very well.

      You may also be right that gcc doesn't play with the x87 stack very well, but that is likely a minor difference in comparison.
  • As I type this, I'm downloading Intel's Linux Fortran compiler. While this is slightly off-topic, it will be interesting to see if this free (non-supported version) will compile some code I have that previously relied on Compaq/Digital Fortran's fort26.dll on the Win32 platform (not my code, honest :).

    If I can get it to compile on Linux, then I can do a whole host of things my employer previously thought impossible. :)
  • What about gcc 3.0 ? (Score:4, Interesting)

    by Stormie ( 708 ) on Friday November 09, 2001 @10:42PM (#2547333) Homepage

    Interesting benchmark of Intel's compiler vs. gcc 2.95.4, but what about gcc 3.0? I'd love to see how that compared, given that I've heard such mixed opinions about whether it's optimisation tends to be better, worse, or the same as the 2.95 series..

    • by the Atomic Rabbit ( 200041 ) on Saturday November 10, 2001 @02:03AM (#2547726)
      From what I gather reading the mailing lists, GCC 3.0 was a features release, and 3.0.x were bugfix releases. There is generally very little performance benefit over 2.9.x (and the occasional performance regression.)

      GCC 3.1 will focus on optimization, building on the new infrastructure implemented with 3.0. If you're brave enough, you can pull from CVS and try it out for yourself.
  • gcc has gotten so far behind the specialized instruction set curve that you're better off writing hardware descriptions for an FPGA using iverilog than spending $500 to write useful software for a modern instruction set.
  • by jgarzik ( 11218 ) on Friday November 09, 2001 @11:41PM (#2547497) Homepage
    These are not surprising results. Even the gcc developers will admit that many general, not-architecture-specific optimizations done by commercial compilers are not performed in gcc. Most new CPUs, not just Intel CPUs, can benefit from a smarter compiler to take advantage of features like data prefetching, instruction bundling and pipelining, profile-based (feedback-based) optimization, data and control speculation, and much more.

    The gcc "open projects" [gnu.org] page gives people a good idea of what remains to be done on gcc. The minutes of the IA-64 GCC summit [linuxia64.org] are especially interesting and informative, because it gives a good idea of the current state of GCC and also what GCC needs to be a competitive compiler in the future.

    Bottom line: Do not be surprised when commercial compilers beat gcc performance. It's catching up, but it's still got a long way to go.

    GCC Home Page [gnu.org]

    • The minutes of the IA-64 GCC summit are especially interesting and informative

      Was that a subliminal message?
    • And just think, ten years ago, the first thing one would do with a new Sun was install gcc on it because it was much faster than the compiler that you had to buy from Sun. Especially, if it was a M68K based machine. I don't think gcc was that much slower than the sparc compiler. It was slower than the MIPS compiler by quite a bit though.

  • by Florian Weimer ( 88405 ) <fw@deneb.enyo.de> on Saturday November 10, 2001 @04:51AM (#2547994) Homepage
    Floating point performance doesn't tell much about integer performance and vice versa (remember the Itanium). It is well-known that GCC has got its problems with the stack-based x86 floating point unit (especially pre-3.0 versions; some people claim that 3.x is faster).

    Since the kernel doesn't use floating point instructions, it's not such a big loss that you can't compile it with icc yet. In addition, compiling the kernel (which is not written in ISO C, let alone ISO C++) might uncover a few bugs in the kernel code and the compiler, and it's not very likely that the kernel folks are able or even willing to help you if you use a strange system configuration with a proprietary compiler.
  • by ChaoticCoyote ( 195677 ) on Saturday November 10, 2001 @10:16AM (#2548383) Homepage

    ...I wrote up a short "First Look" [coyotegulch.com] regarding the "noncommercial" (i.e., no-cost) versions of Intel's C++ and Fortran 95 compilers for Linux. I look at licensing, too, and have Intel's comments [coyotegulch.com] posted as well.

    You can also look at some rudimentary benchmarks comparing gcc 3.0.1 and Intel C++ 5.0 [coyotegulch.com].


  • I tried Intel's C++ compiler on my own floating point heavy plasma simulation program. I tried some very high optimization flags, and that produced a binary which crashed.

    Using -O1 produced a binary roughly 1/2 as fast as a -O3 g++-compiled binary.

    Perhaps this compiler is a win on C code, but on C++ it sure looks like a dog to me.

Let's organize this thing and take all the fun out of it.

Working...