Forgot your password?
Intel Open Source Programming Software

Comparing G++ and Intel Compilers and Vectorized Code 225

Posted by timothy
from the different-lenses dept.
Nerval's Lobster writes "A compiler can take your C++ loops and create vectorized assembly code for you. It's obviously important that you RTFM and fully understand compiler options (especially since the defaults may not be what you want or think you're getting), but even then, do you trust that the compiler is generating the best code for you? Developer and editor Jeff Cogswell compares the g++ and Intel compilers when it comes to generating vectorized code, building off a previous test that examined the g++ compiler's vectorization abilities, and comes to some definite conclusions. 'The g++ compiler did well up against the Intel compiler,' he wrote. 'I was troubled by how different the generated assembly code was between the 4.7 and 4.8.1 compilers—not just with the vectorization but throughout the code.' Do you agree?"
This discussion has been archived. No new comments can be posted.

Comparing G++ and Intel Compilers and Vectorized Code

Comments Filter:
  • by jpschaaf (313847) on Thursday December 19, 2013 @11:51AM (#45736695)

    For better or worse, I've always given the intel compiler the benefit of the doubt. They have access to documents that the GCC folks don't.

    • by Kookus (653170)

      The place I work has lots of documents generated about decisions made, why those decisions were made, etc...
      They are really helpful documents that save a bunch of time... if only people would read them 6 months later when they should.

      Nah, people seem to ask for documentation third. Google first, co-workers second. Only until they run into a co-worker that says RTFM do they get to the third option of reading it :)

    • by Curupira (1899458) on Thursday December 19, 2013 @12:27PM (#45737089)
      Yeah, on Intel processors. What about AMD and other x86 processors? Don't ever forget that ICC was once caught red-handed disabling important features [] when the CPUID did not return GenuineIntel...
      • Re: (Score:2, Flamebait)

        by CajunArson (465943)

        I'm sorry, you lost the right to put on the whole "Oh poor little AMD is being abused by the big bad monopolist!" the day that AMD came out with Mantle and started leveraging it's 100% monopoly in the console market in a much much worse way than Intel ever did with its 70 - 80% "monopoly" in the desktop market.

        • by Immerman (2627577)

          Who is talking about AMD? We're talking about Intel's surreptitiously anti-competitive behavior as it applies to trusting the efficacy of their compiler - the target of that behavior is irrelevant to the conversation.

        • Mantle? (Score:2, Insightful)

          by Anonymous Coward

          Mantle is a good idea insofar as it should kick Microsoft and/or NVIDIA up the behind. We desperately need someone to cure us of the pain that is OpenGL and the lack of cross platform compatibility that is Direct 3D.

          Obviously NVIDIA won't play ball with Mantle but I've got a feeling they might have to eventually given that some AAA games developers are going code a path for it. When it starts showing up how piss-poor our current high level layers are compared to what the metal can do, they'll have no choi

        • the day that AMD came out with Mantle and started leveraging it's 100% monopoly in the console market

          Among consoles that aren't discontinued or battery-powered, I count Xbox 360, PlayStation 3, Wii U, Xbox One, PlayStation 4, and OUYA. Of these, two have NVIDIA graphics: PlayStation 3 has RSX, and OUYA has the same Tegra 3 that's in the first-generation Nexus 7 tablet. The forthcoming iBuyPower Steam Machine also has NVIDIA graphics.

    • by Runaway1956 (1322357) on Thursday December 19, 2013 @01:45PM (#45737941) Homepage Journal

      Yep, they have access to some cool documents. It took a lot of work to document the fact that the intel compiler was actually crippling code if it was run on AMD processors. I mean, some suspicious, somewhat paranoid people suspected that intel was crippling code on AMD processors, but it took a good deal of work to actually demonstrate it.

      That is just one of the many reasons I don't use Intel.

    • by Darinbob (1142669) on Thursday December 19, 2013 @03:05PM (#45738735)

      GCC also works with many CPUs that Intel compiler does not. That includes x86 compatible chips from other vendors, as well as the advanced features in Intel chips that were originally introduced by competiting clones. So maybe Intel is nice, but that's irrelevant if you don't even use Intel hardware in your products.

      If Intel really is basing their compiler off of secret architecture documents, then people should be able to deduce what's going on from looking at the generated assembler. Ie, find some goofy generated code that does not seem to make sense given public documents, get a benchmark to compare it, figure out there's a hidden feature, and then make use of it.

      • by Arker (91948)

        "If Intel really is basing their compiler off of secret architecture documents, then people should be able to deduce what's going on from looking at the generated assembler. Ie, find some goofy generated code that does not seem to make sense given public documents, get a benchmark to compare it, figure out there's a hidden feature, and then make use of it."

        In theory, given enough highly skilled eyes checking these things often enough, that is exactly what would happen.

        In reality, who is going to go to that

  • by serviscope_minor (664417) on Thursday December 19, 2013 @11:55AM (#45736743) Journal

    I don't think it's troubling.

    Firstly they beat on the optimizer a *lot* between major versions.

    Secondly, the compiler does a lot of micro optimizations (e.g. the peephole optimizer) to choose between essentially equivalent snippets. If they change the information about the scheduling and other resources you'd expect that to change a lot.

    Plus I think that quite a few intresting problems such as block ordering are NP-hard. If they change the parameters of their heuristic NP-hard solver, that will give very different outputs too.

    So no, not that bothered, myself.

    • by david.emery (127135) on Thursday December 19, 2013 @12:10PM (#45736909)

      Mod parent up +1 insightful.

      Unless you suspect and are trying to debug a code generator error (one of the least pleasant/most difficult debugging experiences I've had), the base assertion that you should understand your compiler's code generation is at best unrealistic, and probably just dumb. Code generation is extremely complex, requiring deep knowledge of both this specific compiler's design and this specific computer's instruction set architecture, how the caches work, pre-fetching approaches, timing dependencies in instruction pipelines, etc, etc. If you do suspect a code generator error, you're best off hiring a compiler expert at least as a consultant, and be prepared for a long hard slog.

      Maybe 30 years ago, for a PDP-8, you could assert that the C code you wrote had some semblance to the generated machine code. That hasn't been true for a very long time, and C++ is most definitely not C in this regard.

    • by zubab13 (3445193) on Thursday December 19, 2013 @12:17PM (#45736985)
      Just use something like libsimdpp[1] and you are sure that your code stays vectorized between compiler versions. As a bonus, this and similar wrapper libraries give you an option to produce assembly for multiple instruction sets (say SSE2, AVX and NEON) from the same code. [1]: []
  • Very different code (Score:4, Interesting)

    by Anonymous Coward on Thursday December 19, 2013 @11:58AM (#45736779)

    I have worked on a couple of projects that compiled and ran perfectly with GCC 4.6 and 4.7. They no longer run when compiled with the latest versions of GCC. No warnings, no errors during compilation, they simply crash when run. It's the same source code, so something has changed. The same code, when compiled with multiple versions of Clang, runs perfectly. The GCC developers are doing something different and it is causing problems. Now it may be that a very well hidden bug is lurking in the code and the latest GCC is exposing that in some way, but this code worked perfectly for years under older versions of the compiler so it's been a nasty surprise.

    • by david.emery (127135) on Thursday December 19, 2013 @12:14PM (#45736953)

      Unfortunately, that's not unique to GCC. I've seen this happen with several different compliers for different programming languages over the years. Worse, I've seen it with the same compiler, but different Optimizer settings.

      In one case, our system didn't work (segfaulted) with the optimizer engaged, and didn't meet timing requirements without the optimizer. And the problem wasn't in our code, it was in a commercial product we bought. The compiler vendor, the commercial product vendor (and the developer of that product, not the same company as we bought it from) and our own people spent a year pointing fingers at each other. No one wanted to (a) release source code and then (b) spend the time stepping through things at the instruction level to figure out what was going on.

      And the lesson I learned from this: Any commercial product for which you don't have access to source code is an integration and performance risk.

      • by Gothmolly (148874)

        And trying to do it all yourself is a risk of never getting to market.

        • by david.emery (127135) on Thursday December 19, 2013 @12:33PM (#45737151)

          Well, in part that depends on your market. Most of my work has been in military systems or air traffic systems, where the cost of failure >> lost opportunity cost. That's a point a lot of people forget; not all markets (and therefore the risk calculations for bugs, etc) are created equal.

        • > And trying to do it all yourself is a risk of never getting to market.

          You don't have to maintain the compiler yourself. You just need to have source code to it, and a compiler that compiles it, for the life of your project. That way, if a newer version of the compiler breaks your project, as the original poster complained of, you always have a working compiler for the life of your project. Your compiler may not get any additional improvements. But having it work vs not work is much more importan
      • Any commercial product for which you don't have access to source code is an integration and performance risk.

        So true, I've run into the same problem. It doesn't mean you need to only use GPL, but you should try to get the source code when you sign the contract to use the product (you're probably paying enough, anyway).

      • by Darinbob (1142669)

        Yup, and the commercial product with paid support does not actually get you a fix; in fact they may often tell you that you can upgrade to a later version if you purchase it.

      • by loufoque (1400831)

        That's why you don't buy software without support.
        With support, they'd be contractually obliged to debug it.

        • We bought support. But to get support from organization X, that organization has to first admit it is -their problem-. Please re-read my post to see that no organization wanted to take ownership of the problem.

          • by loufoque (1400831)

            Depending on the support contract you negotiated with them, they shouldn't need to admit it.
            I recommend you tell management to improve their legal department.

    • it may be that a very well hidden bug is lurking in the code and the latest GCC is exposing that in some way

      I have run into this situation. The code actually depended upon a bug in the older gcc versions. When that bug was fixed, the code stopped working. In some cases, the compile failed, in others, it crashed at runtime.

      Specifically, this was around gcc version 2.7, and the bug was this: for (int i=0; i < SIZE; i++) { ... } for (i=0; .... The variable "i" should be out of scope for the 2nd loop and cause an error during compilation, but gcc didn't catch it. gcc version 2.95 caught it. I forget if tha

      • by Mr Z (6791) on Thursday December 19, 2013 @12:31PM (#45737129) Homepage Journal
        Actually, the scope of int i changed in C++. Previously, the scope would extend beyond the for. If you enable warnings, G++ will tell you all about it.
        • by drawfour (791912) on Thursday December 19, 2013 @12:54PM (#45737389)
          This is why all code should be compiled with highest warning level enabled, and all warnings should be treated as errors. The compiler can have a very hard time guessing at what you meant, so it's best to be as explicit as you can. If, for some reason, you're positive the code needs to be a certain way that is, and it is correct, you can always use a "pragma warning(disable)" (with appropriate push/pop semantics) to keep your code compiling clean.
          • I used to try to do that, but there are several problems. First, it's non-portable as hell. Second, have you ever tried turning on all the warning options in gcc (and some other compilers)? I'm not sure it's possible to write 10 lines of code that won't generate at least one warning.

            • by drawfour (791912) on Thursday December 19, 2013 @02:12PM (#45738229)
              There is a reason for warnings -- it's because you're doing something wrong. Unfortunately, the compiler lets you do it anyway, probably because there is a ton of legacy code that would suddenly "break" if they were errors by default. But that doesn't mean that you should stop trying to fix these issues. Many of these issues only appear to be benign until you stumble upon the exact issue the warning was trying to warn you about. Static code analysis tools are also your friend. That doesn't mean you can blindly trust them -- static analysis tools do have false warnings. But they're way better than inspecting the code yourself. You'll miss something way more times than the analysis tools will give you a false positive.
              • by 0123456 (636235) on Thursday December 19, 2013 @02:52PM (#45738609)

                There is a reason for warnings -- it's because you're doing something wrong.

                Uh, no. It's because you're doing something that may be wrong. If it was wrong, the compiler would given an error, not a warning.

                'if ( a = b )' for example. The compiler warns because you probably meant 'if ( a == b )'. But maybe you didn't.

                There's little reason to write such C code on a modern quad-core 3GHz CPU which spends 90% of its time idle and where the compiler will probably generate the same machine code anyway, but that doesn't make it wrong.

                • by drawfour (791912)
                  No. Just because something is legal according to the specification does not make it "right". The specification _should_ be restricted, but it cannot because there are a _lot_ of lines of code out there that would suddenly stop compiling. It _IS_ wrong to do 'if (a = b)' for a few reasons:

                  1. Someone else won't necessarily know that this was intended unless you have a comment telling them. That comment would take up just as much space as putting an 'a = b;' line before the 'if (a)' line.
                  2. If your comp
                • by Arker (91948)

                  "'if ( a = b )' for example. The compiler warns because you probably meant 'if ( a == b )'. But maybe you didn't."

                  If you did not, then you should have written something like a=b;if b {... instead. It may be technically legal code but it's very bad and it should be flagged and warned, you are just being cute at the expense of readability and even you probably wont remember what the heck you did there 6 months later when you look at the code again.

                • if (a && b=f(a) && c=g(b)) {
                      do stuff with a and b and c

                  If you convert that into the other format then you need to add something like six lines of code and two levels of nested if statements.

              • by Darinbob (1142669) on Thursday December 19, 2013 @04:08PM (#45739505)

                Sometimes warnings are false positives as well. Especially when turning warning levels up high they will warn about things that may be indicators of a bug or typo but which actually aren't problems, or in some cases are even intentional. Such as unused variables or parameters; is that a bug or a stylistic choice to not litter the code with extra #ifdef? An unused parameter in general seems an odd thing to complain about, usually the parameter list is fixed in an API or design document whether or not the actual implementation needs all the parameters.

            • by Darinbob (1142669)

              It's not that hard really, if you start from scratch this way. Where it falls down is trying to do this with a large existing product. Also C is trickier here than C++ because C++ has more rigorous rules to begin with.

              For me the hard part with turning all warnings on is not the code developed locally for the project, but the third party libraries that come with source code. Those libraries probably generate 95% of all warnings and red flags in static analysis.

            • by Imagix (695350)
              You are just not trying hard enough. I write various commercial products which compile with _no_ warnings. And yes, this is for performance-critical services, and a few hundred thousand lines of code.
      • Stop. You are making me feel old, I remember writing code like that. (Which compiled)

        • The samples of assembler made me feel old! I'm familiar with x86 assembler, and the variations on MOV that go back to the 8086 (things like MOVSB, MOVSW), but this VMOVUPD was totally new to me. But then I have never looked at SSE, or even MMX.

    • It can as well be that you have a subtile memory bug which never really triggered when compile with the old compiler.

      E.g. the latest bug like this, which I encountered.
      something like:
      char[10] data;
      read a file with 10 bites into "data". Assume it is a 0 terminated string.
      Surprisingly that always worked as malloc() handed out chunks devidable by 4, zero initialized. So behind the 10 bytes where 2 more zero bytes.
      Switching to another compiler (more correctly clibrary) made that code crash, but it worked for ye

      • by Darinbob (1142669)

        Alignment tends to have problems here too. Small changes in compiled code will place data differently and as soon as it's unaligned the bugs pop up (this really freaks out some programmers who previously only used Wintel and never heard of alignment before). Similarly, buffer overflows as you describe may be perfectly fine until a change in the compiler occurs; it's particularly nasty to track down if it clobbers something on the stack so that the crash doesn't occur until the calling function returns.

    • by PhrostyMcByte (589271) <> on Thursday December 19, 2013 @01:03PM (#45737481) Homepage
      Your projects were likely doing something which resulted in undefined behavior. It's been extremely rare to have GCC break working standards-compliant code.
      • by loufoque (1400831)

        I wouldn't say extremely rare. It depends entirely of what you are doing.
        Some parts of the compiler are more stable than others.

        Advanced C++ and gcc-specific extensions are two things that can break from time to time. Combine the two together, and running into bugs isn't so rare.

    • by fatphil (181876)
      Probably your code contains undefinted behaviour. The optimiser has detected a situation which it is completely sure must be true (otherwise there would be undefined behaviour), and optimised something away. When the must-be-true assumption turns out to be false in reality, the code was under no obligation to work, so who cares what happens. Personally I think it's bad QOI to remove significant bits of code as an optimisation based on assumptions about the lack of undefined behaviour without some kind of wa
      • by Immerman (2627577)

        Well, first off that's clearly bad code as you point out in the comments - it will result in undefined behavior if baz in null, a situation that's clearly being expected. Granted, it's hard to imagine a scenario outside an extremely naive compiler where that particular undefined behavior is anything other than harmless - well other than this one, where it causes undefined compile-time behavior. I would agree that a warning in such a situation would be nice, in a "we all make stupid mistakes sometimes" kind

    • by loufoque (1400831)

      Most likely, you were just invoking undefined behaviour.
      GCC 4.8 has new optimizations tied to signed integer overflow, for example. a+b is the same in hardware regardless of whether the inputs are signed or not (assuming two's complement hardware), but to the compiler, that's not the case.

  • And got completely different results!
  • by symbolset (646467) * on Thursday December 19, 2013 @12:05PM (#45736847) Homepage Journal
    Asking any audience larger than about 20 to compare the qualitative differences of object code vectorization is statistically problematic as the survey group is larger than the qualified population.
    • by Mr Z (6791)

      Also notably absent were any performance benchmarks. Two pieces of code might look very different but perform identically, while two others that look very similar could have very different performance. In any case, you should be able to work back to an achieved FLOPS number, for example, to understand quantitatively what the compiler achieved. You might have the most vectorific code in existence, but if it's a cache pig, it'll perform like a Ferrari stuck in mud.

      • by loufoque (1400831)

        Why are you counting in FLOPS in the first place? Use a real unit.
        Cache is independent from vectorization. While both affect the performance of the code, when evaluating the performance of vectorization on its own only how many cycles the computation would take if all data were in L1 cache is considered.

      • by Guspaz (556486)

        Considering that Cogswell's previous works include a bunch of completely useless compiler benchmarks that tell you how fast the *compiler* produced the code, and now how fast the resulting code was... I don't think we should be surprised that he's produced another useless article.

        When I use a compiler, I don't really care what the assembly it produces look like, I care about how it performs.

    • sudo mod parent up

    • by jbcksfrt (1167293)
      Boo this man!!! :-) I was happy to see something that only nerds would care about on Slashdot, rather than politics and BS about Bitcoin.
  • by Mr Z (6791) on Thursday December 19, 2013 @12:23PM (#45737049) Homepage Journal

    One amusing thing I discovered is that GCC 4.8.0 will actually unroll and vectorize this simple factorial function: [] Just look at that output! []

    • by DickBreath (207180) on Thursday December 19, 2013 @03:19PM (#45738909) Homepage

      Here is how I do a factorial function. No recursion, no loops, no vectorization needed. It's in Java. Converting this basic idea to C is left as an exercise for advanced readers.

              static public long factorial( int n ) {
                      switch( n ) {
                              case 0:
                              case 1: return 1L;
                              case 2: return 2L;
                              case 3: return 6L;
      . . . cases 4 to 18 omitted to bypass slashdot filters . . .
                              case 19: return 121645100408832000L;
                              case 20: return 2432902008176640000L;
                      return 0L;
      • Re: (Score:3, Informative)

        by Mr Z (6791)
        Why isn't that just a lookup table? My point in mentioning factorial is that there's no point in vectorizing that thing. Even a simple loop would be small compared to the cost of a single L2 cache miss.
        • It basically *is* a lookup table that covers all possible values, but I wrote it a long time ago. (Hence why it returns 0L for non-sane inputs instead of throwing an exception.) A switch() is a lookup table. I could have used a public final static array and done my own bounds checking.
    • by loufoque (1400831)

      Is that some sort of joke? Surely you can tell this is not the optimal assembly code at all.

  • by excelsior_gr (969383) on Thursday December 19, 2013 @12:36PM (#45737177)

    This is 2013 (almost 2014!) why are we talking about vectorization? Why don't people write code in vector notation in the first place anyway? If Matlab and Fortran could implement this 25 years ago, I am sure we are ready to move on now...

    • by mjwalshe (1680392)
      Its not C++ and OO and for some reason C++ devs seem not to want to learn fortran for technical programming which is still fortrans main use
    • by loufoque (1400831)

      Because the slowest part of the computer is memory, and vector notation leads to more cache misses.

  • by QuietLagoon (813062) on Thursday December 19, 2013 @01:15PM (#45737591) you trust that the compiler is generating the best code for you?,,,

    Trust, but verify.

    I come from the days when it was the programmer, not the compiler, that optimized the code. So nowadays, I let the compiler do its thing, but I do a lot of double-checking of the generated code.

Lend money to a bad debtor and he will hate you.