Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Intel Programming IT Technology

Intel Updates Compilers For Multicore CPUs 208

Threaded writes with news from Ars that Intel has announced major updates to its C++ and Fortran tools. The new compilers are Intel's first that are capable of doing thread-level optimization and auto-vectorization simultaneously in a single pass. "On the data parallelism side, the Intel C++ Compiler and Fortran Professional Editions both sport improved auto-vectorization features that can target Intel's new SSE4 extensions. For thread-level parallelism, the compilers support the use of Intel's Thread Building Blocks for automatic thread-level optimization that takes place simultaneously with auto-vectorization... Intel is encouraging the widespread use of its Intel Threading Tools as an interface to its multicore processors. As the company raises the core count with each generation of new products, it will get harder and harder for programmers to manage the complexity associated with all of that available parallelism. So the Thread Building Blocks are Intel's attempt to insert a stable layer of abstraction between the programmer and the processor so that code scales less painfully with the number of cores."
This discussion has been archived. No new comments can be posted.

Intel Updates Compilers For Multicore CPUs

Comments Filter:
  • Re:Anyone want to... (Score:5, Informative)

    by Trigun ( 685027 ) <evil@evil e m p i r e . a t h .cx> on Tuesday June 05, 2007 @03:54PM (#19402197)
    The compiler worries about the cores so you don't have to. Is that too cretin?
  • Re:Anyone want to... (Score:5, Informative)

    by BecomingLumberg ( 949374 ) on Tuesday June 05, 2007 @03:55PM (#19402221)
    >>>So the Thread Building Blocks are Intel's attempt to insert a stable layer of abstraction between the programmer and the processor so that code scales less painfully with the number of cores.

    They found a way to make the computer be able to determine how to use its many CPU cores automagically when you compile a program. It is cool, since it is really to figure out how to share a given workload 16 even ways.
  • Re:Anyone want to... (Score:2, Informative)

    by CaptainPatent ( 1087643 ) on Tuesday June 05, 2007 @03:57PM (#19402257) Journal
    essentially the compiler will automatically optimize thread splitting (time and number of splits if I'm reading this correctly) which is very handy feature as it will quickly become nearly impossible to manage future processors with 16+ cores. They do seem to hide a lot of the true features underneath market-speak though.
  • by Doctor Memory ( 6336 ) on Tuesday June 05, 2007 @04:01PM (#19402337)
    I was looking at the Thread Building Blocks paper, and it reads like it was somebody's hastily-scribbled draft:

    "The Intel Threading Tools automatically finds correctness and performance issues" (The tools finds?)
    "Along with sufficient task scheduler and generic parallel patterns" (Who has insufficient task scheduler?)
    "automatic debugger of threaded programs which detects many of thread-correctness issues such as data-races, dead-locks, threads stalls" (Sarcasm fails me...)

    And that's just in the first few paragraphs, I haven't even gotten to the real meat of the article!

    I'm used to informative, well-written and reasonably complete technical documentation from Intel — WTF is this?
  • intel's product page (Score:4, Informative)

    by non ( 130182 ) on Tuesday June 05, 2007 @04:07PM (#19402431) Homepage Journal
    the intel product has somewhat more detail. it can be found here [intel.com].
  • by Anonymous Coward on Tuesday June 05, 2007 @04:35PM (#19402793)
  • by presearch ( 214913 ) * on Tuesday June 05, 2007 @04:43PM (#19402937)
    The Intel Compiler Lab is based in two Russian cities - Moscow and Novosibirsk.
    Probably the source of the less than optimal text.

    How's the documentation on -your- compiler coming along?

  • No and yes (Score:3, Informative)

    by Sycraft-fu ( 314770 ) on Tuesday June 05, 2007 @05:25PM (#19403657)
    No, they won't add them to GCC. Intel's compiler competes with GCC and it is the best there ever was. In every test I've ever seen on Intel chips, it comes out ahead and I'm sure they've no interest in changing that. However yes, the docs are out there. Intel's processors are extremely well documented and you can get everything you need. The problem isn't that the GCC people are having to guess how the processors work, the problem is that their coders aren't as good as Intel's at optimising their compiler. This isn't helped by the fact that GCC targets many architectures where the ICC is only for one.

    However don't expect Intel to help GCC out. Their answer will just be "buy the ICC".
  • by Anonymous Coward on Tuesday June 05, 2007 @05:31PM (#19403745)
    It would probably be best for everyone if the compiler were open source, but if Intel thinks they need to sell it as a commercial product to justify it financially we still get all of the benefit on their future processor designs.

    If it were open source you could modify it to work on AMD processors. In the past, I specced out an intel workstation rather than AMD specifically because my software used the Intel Math Kernel Libraries. Granted, it was only one computer many years ago (When AMD was faster than Intel) but when you see companies building big bewoulf clusters or considering processor/math intensive apps I bet there's a few extra sales to be made there.

    And yes, the MKL gave me 60x speedup over hand-written matrix algebra. Big deal when things go from an hour to a minute.
  • Re:No and yes (Score:5, Informative)

    by smallfries ( 601545 ) on Tuesday June 05, 2007 @07:31PM (#19404957) Homepage
    Well, no actually you can't. If you've ever spent any time going through the 1000 page Intel Optimisation Guide for the x86 then you would know that they don't spell out all of the trade offs explicitly. They describe enough to point you in the right direction but they keep a lot back. Partially because the behaviour of these chips in certain usage patterns isn't even defined by the design - it's a side-effect of several other parts of the chip design interacting. So the best that you can do is suck it and see - and in general it changes not between major ISA revisions but on individual models.

    Now, if you're Intel then you have the time and the money to work out exactly how to exploit these tradeoffs to schedule threads effectively. But you don't want to give that away for free. From the (very scanty) marketing bullshit that was linked to, it would appear that they've appear an Intel-specific threading library (probably with a POSIX interface). Separate to this is a profiling tool and a multi-threaded debugger (the latter of which is non-trivial). While any debugger will let you skip across threads allowing you do it in a deterministic manner to look for race hazards is much harder.

    The analysis tools sound nice, but the bolton library is nothing special. It's purely to win a few synthetic benchmarks and gain some marketshare for ICC and therefore more "Made for Intel" applications in the market. I'm cynical about the library because what is broken about the threading model in C/C++ would take more than a library to fix. It would require redesigning the language down to the ground and choosing a different set of control constructs.

    So finally, when you claim that it's because Intel has "better" coders. You don't know what you're talking about. I know a few guys who code GCC for a living, and they are grade A coders. It is because Intel has moved the goalposts. It's not so much that GCC targets multiple architectures, it's that they are trying to stick to (relatively) standard C where-as Intel is willing to redefine where the semantic gap sits if they can squeeze out a little more performance. Their attitude is screw portable code - talking across different compiler vendors here, rather than chip vendors. If what they need to squeeze into their compiler is no longer "C" strictly speaking, then they don't care. The gcc guys do.

    Ah yes, and portable code can be a smaller window than you expect. That weighty 1000 page Intel document is sitting comfortable next to the AMD equivalent, which differs in suprising places.
  • Re:Anyone want to... (Score:5, Informative)

    by James_Intel ( 1082551 ) on Tuesday June 05, 2007 @08:08PM (#19405243)
    Automagical - we try. Vectorization, paralellization - I dare say the Intel compilers are at least as good at it as any compiler ever has been. Bold statement - yeah. I believe it is true.

    A more interesting question is "Is that good enough?" For vectorization, the answer is 'usually' - so some additional work/headaches happen when it isn't enough. For parallelization - the answer is at best 'sometimes.' So I'll get flamed two ways: (1) by people very happy with it - and say that I've understated how good it is - and it is all they need, (2) by people with programs which don't get magical auto-paralleism to solve there needs. There are more people in #2 than #1 - but this ain't a 1-size-fits-all-world. Not a bad deal if it solves you problems - otherwise - you got work to do... but that ain't the compiler's fault... parallelism requires work for most of us.

    About languages...
    Virtually every Fortran, C and C++ compiler these days support OpenMP, which is not part of the official standard - but is there to use. It is loop oriented, and is very Fortran-like and fits into C well enough... but is definitely not C++ like.

    Fortran and C/C++ don't support threading in the language, you need to write your code to be thread-safe, and you need to use a threading package like Windows threads or POSIX threads (pthreads). Boost thread offer a portable interface to hit on the key threading needs - essentially wrappers for pthreads and Windows threads, etc. - the standards are likely to add a portable interface officially in the future. One thing Java did from the start.

    Intel compilers -> Intel CPUs -> all compatible processors
    The Intel compilers and libraries aim to beat other compilers and libraries regardless of the processor it is run on. No one will get it right all the time - so this is not a dare to find single examples of little code sample to prove me wrong. But if a real program doesn't get the best results from Intel - we want to know. (yeap - I work at Intel - I post for myself)
  • by James_Intel ( 1082551 ) on Tuesday June 05, 2007 @08:14PM (#19405301)
    You're right - vectorization - by itself can't handle step 11 dependent on step 10... and assuming there isn't a magical way to rewrite the loop to remove the dependence (which it the first thing the ocmpiler will try todiscover and do for you - but usually it can't) - then you need to look at pipelining - software pipelining on a single core, or parallelism on multi-core... but you'll have to have the right interconnect processor to processor to match the work to get multiprocessor pipelining to do what you want. Software pipelining can be very effective on loops with dependencies loop to loop.
  • by James_Intel ( 1082551 ) on Tuesday June 05, 2007 @08:39PM (#19405505)
    (Yes - I work for Intel - post for myself - tell it like it is) Cute story if it was true. However - Intel compilers and libraries, are designed to use features - but we don't come out every day with an update. The new compilers support SSE 4, but Intel only. AMD support comes after the processors exist that support it. Libraries aren't quite there yet with SSE 4 (I guess we hate Intel processors too - flame us). But AMD support for SSE 3 is there - now that it is in their processors. It wasn't there when we developed version 9 of the compilers. We do test our compilers/libraries on other implementations - because believe it or not - we care if it works. It doesn't always - and we adjust the compiler/library to make it work. We had a beta a few years ago which blew up on Intel processors and worked on AMD processors (yeap - I said it right - imagine the embarrassment when a customer told us about that combination). Opps. I heard that was because we released support before we tested that it worked on that processor. So we learned not to do that too often. By the time we release product - it should work on all procesors. I would say "does" or "guaranteed to" - but the lawyers would freak - because nothing in life is guaranteed. We are clearly not trying to screw our customers though - you know... the developers who count on our software. It is annoying when people suggest that might be our goal.

    My favorite complaint: Intel checks "CPUID"
    No duh - that's where the feature information is.

    Next favorite: Intel checks for "GenuineIntel".
    Another "no duh" - RTFM from Intel or AMD - the features flags checking has to come AFTER you determine the manufacturer AND family of the processor...
    unless you don't care about running on all processors
    (spare pointing out to be that you can skip the first two checks - look at the SSE flag - and it is usually right - unless say you pick just the right older processor)
    We do the checks the way Intel and AMD manuals say we have to... if that is evil... so be it.
    We even start by testing if the CPUID instruction exists (it didn't before Pentium processors).
  • by at0mjack ( 953726 ) on Wednesday June 06, 2007 @01:39PM (#19413439)

    Checking for 'GenuineIntel' is fine, but the actual code emitted by the compiler goes straight to 'no additional capabilities' if it detects any other string. In other words, in the 32-bit compiler any non-intel chip is doomed to run the 'you are a bog-standard 386 with no MMX/SSE/SSE2 support' code path regardless of its actual capabilities. This 'feature' makes less difference in the 64-bit compiler (because the base level is a EM64T with SSE2, as opposed to a 386 with nothing for the 32-bit version), but as new instruction sets come online (SSE3, SSE4 and the like) this artificial crippling of AMD chips will start to show there as well.

    And yes, you say you 'tell it like it is', but I've disassembled the actual code and it doesn't accord with your story. See http://www.swallowtail.org/naughty-intel.html [swallowtail.org] for the gory details. The proof of the pudding is in the eating: if you patch one of our programs compiled with the Intel compiler to remove the Intel check it runs significantly slower on AMD chips (as in DOUBLE the runtime).

    There is no technical reason for these checks to be there: they are purely a competitive ploy to cripple performance on AMD chips. If Intel released their compiler for free, then I'd say so what: they're allowed to make it a marketing tool. OTOH, they release it as a commercial product and charge me money for it: doing that and then deliberately crippling its performance is IMHO not acceptable.

Happiness is twin floppies.

Working...