Forgot your password?
typodupeerror
Programming

Auto-threading Compiler Could Restore Moore's Law Gains 404

Posted by Unknown Lamer
from the sufficiently-smart-compiler dept.
New submitter Nemo the Magnificent writes "Develop in the Cloud has news about what might be a breakthrough out of Microsoft Research. A team there wrote a paper (PDF), now accepted for publication at OOPSLA, that describes how to teach a compiler to auto-thread a program that was written single-threaded in a conventional language like C#. This is the holy grail to take advantage of multiple cores — to get Moore's Law improvements back on track, after they essentially ran aground in the last decade. (Functional programming, the other great hope, just isn't happening.) About 2004 was when Intel et al. ran into a wall and started packing multiple cores into chips instead of cranking the clock speed. The Microsoft team modified a C# compiler to use the new technique, and claim a 'large project at Microsoft' have written 'several million lines of code' testing out the resulting 'safe parallelism.'" The paper is a good read if you're into compilers and functional programming. The key to operation is adding permissions to reference types allowing you to declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects. With that information, the compiler is able to prove that chunks of code can safely be run in parallel. Unlike many other approaches, it doesn't require that your program be purely functional either.
This discussion has been archived. No new comments can be posted.

Auto-threading Compiler Could Restore Moore's Law Gains

Comments Filter:
  • by oodaloop (1229816) on Monday December 03, 2012 @08:18PM (#42174611)
    It turns out that Moore was observing only a small portion of a greater trend. That is, many (or most) things involving technology have a short doubling period, usually between 1 and 2 years. Total information created, processing power per dollar, resolution of brain scanning technology, et al all follow a similar doubling period to what Moore observed with transistors on integrated circuits. It's not that everyone else is misusing Moore's Law, so much as Moore didn't make make the law broad enough.
  • Re:Great potential (Score:4, Interesting)

    by MightyMartian (840721) on Monday December 03, 2012 @08:22PM (#42174643) Journal

    I'm wondering how this works. Does it scan for loops to remake them as event driven processes? Does it splice off multiple function calls and then throw the results into some sort of stack for retrieval? It's a cool idea, but man, for any kind of non-trivial program it must be some monumental piece of work.

  • by xetovss (17621) on Monday December 03, 2012 @08:31PM (#42174707) Journal

    Exactly, Moore's Law isn't a law, it is a marketing plan. I don't see why so many people get so serious about it. A real law (of science) would be something like the law of gravity where it has a near universal application, whereas Moore's Law is a "law" that describes Intel's marketing plan.

  • Re:Great potential (Score:2, Interesting)

    by Anonymous Coward on Monday December 03, 2012 @08:46PM (#42174809)

    Any code that doesn't access the same resources can be run simultaneously. Alternately, if no thread is writing to a resource, it can be shared without locking.

    Those two observations allow you to determine when code can be run together in different threads. If a method only uses 'final' or 'const' things, then you can run it. If your method only uses one isolated cluster of objects, then it can run simultaneously with another method that only uses a different isolated cluster of objects.

    Honestly, any programmer worth $120k could make something more efficient just using threads, if they thought about it. This will get you 20% of the efficiency gains without thinking of it.

  • by loneDreamer (1502073) on Monday December 03, 2012 @09:17PM (#42175065)
    Magnetic Hard drives, for instance is an example of the same curve with no transistors.
  • Re:mutable state (Score:4, Interesting)

    by betterunixthanunix (980855) on Monday December 03, 2012 @09:22PM (#42175113)
    In functional algorithms and data structures, everything is immutable (in theory) -- rather than thinking in terms of "final," think in terms of Java's String class. If you want to change one character in a String instance, you must create a new String instance. For a less trivial example, consider how a functional algorithm that removes an element from a list would work (written in Java-like syntax):

    List remove_if(List lst, Predicate pred)
    {
    if(lst == null)
    {
    return null
    }
    else if(pred.test(lst.first()))
    {
    return remove_if(lst.rest());
    }
    else
    {
    return new List(lst.first(), remove_if(lst.rest()));
    }
    }

    Notice that this constructs an entirely new list, even if none of the elements in the list pass the test. This may seem like a terrible idea, but let's put it this way: if you have 10 threads that share the list, and one of them wants to remove some nodes, you would have had to have copied the list anyway; the functional approach to remove_if is exactly what you want. Now, consider this function, which only removes the first node to match:

    List remove_first(List lst, Predicate pred)
    {
    if(lst == null)
    {
    return null;
    }
    else if(pred.test(lst.first))
    {
    return lst.rest();
    }
    else
    {
    return new List(lst.first(), remove_first(lst.rest()));
    }
    }

    Now you have a situation where lists share nodes -- and again, imagine a situation where 10 threads share the list, and one wants to perform this operation. This is one of the reasons that functional programming is so promising for parallel algorithms: you have fewer situations where explicit mutexes are needed, because you are usually copying things that you intend to modify (or more precisely, you never really modify anything).

    Of course, things are more complicated in the real world. Purely functional approaches would obviously be pretty inefficient in a lot of cases, since things would be needlessly copied. Lisp, as an example, has both a non-destructive append as well as a destructive nconc, the latter being intended for use in situations where the original lists will not be used again (and can therefore be modified).

  • Re:Great potential (Score:3, Interesting)

    by Culture20 (968837) on Monday December 03, 2012 @09:43PM (#42175237)

    Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:

    for every bunny in list {
    bunny.hop()
    }

    And if the original intent is to make the bunnies hop in sequence like performing "the wave", making a new thread for each hop might produce a visual effect of them all hopping at once if the sleep occurred in the hop() function instead of the for loop calling the hops.

  • Re:Great potential (Score:5, Interesting)

    by Jeremi (14640) on Monday December 03, 2012 @09:55PM (#42175313) Homepage

    Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it.

    I suspect one reason they are so bad at it is they have to be very conservative in how they optimize, due to the relaxed nature of the C language. For example, if the C optimizer cannot prove beyond a shadow of a doubt that a particular memory location isn't being aliased, it can't make any assumptions about the location's value not changing at any step of the process. In practice, that means no optimizations for you.

    Given that, it would seem that the Microsoft approach (using not only the higher-level language C#, but a specially customized version of C#) gives their optimizer much greater latitude. Because the language forces the programmer to annotate his objects with readable/writable/immutable tags (think C's "const" tag, but with teeth), and because the language (presumably) doesn't allow the programmer to do sneaky low-level tricks like casting away const or aliasing pointers or pointer math, the optimizer can safely make assumptions about the code that a C or C++ optimizer could never get away with. That may allow it to be more effective than you might anticipate (or maybe not, we'll see).

  • Re:Fast First Post (Score:3, Interesting)

    by Anonymous Coward on Monday December 03, 2012 @11:03PM (#42175711)

    /casts detect troll 10 post radius. Multiprocessor technology was already established when Bill Gates was still dumpster diving for code samples, and a lot of the work that gave us modern high performance multiprocessing was done by companies like IBM, Intel (and Intel shootoffs like Sequent) back when Windows was just the latest new DOS application. At the extreme end of the spectrum, this is why all of the top 10 supercomputers run on the UNIX-alike known as Linux, and none run on a Microsoft based OS.

    I'm afraid regardless of what Microsoft would now like to present as a PR image, they've consistently been embarrassingly late to the party. Let's not belittle their contribution though; they certainly throw more money at problems than any other tech company.

  • Re:Fast First Post (Score:5, Interesting)

    by aNonnyMouseCowered (2693969) on Monday December 03, 2012 @11:13PM (#42175765)

    "MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size."

    Citation need. Not disputing the first part. Just the second part about the relative size of Miscorsoft Research.

  • by ceoyoyo (59147) on Monday December 03, 2012 @11:58PM (#42175979)

    Moore's law was coined by an engineer to describe a series of observations. That is, it's a mathematical function that seems to fit some data, without any explanatory power. Just like various other "laws" such as the laws of thermodynamics, and, your favourite, Newton's laws, including his law of universal gravitation.

    Moore's law states that the number of components on an integrated circuit doubles approximately every two years.

  • by TheLink (130905) on Tuesday December 04, 2012 @02:25AM (#42176601) Journal

    The problem is that we are nearing the peak of what is possible with current technology in a single core

    But aren't there still plenty of things that the hardware can do to make the software stuff easier?

    Intel has actually started adding some stuff to help: http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions [wikipedia.org]

    So maybe Intel, AMD should interact more with the OS and compiler bunch and figure out how to use those billions of transistors in more useful ways.

    There are things you can do in software to address the c10k problem (and similar problems): http://www.kegel.com/c10k.html [kegel.com]
    But I'm sure there are things you can do in hardware to make stuff even easier. It's not like the stuff that's being done is the best way of doing things. Furthermore the OS and apps people might be writing things in certain ways because certain things aren't done by the hardware or done fast.

    I know that at least on the x86 platform checking the current time and also getting monotonic time is not as simple AND efficient as it could be. It was even worse before HPET, and now even HPET isn't that great. Monotonic system time (ticks) could be different from current human time, many programs need one or both.

    Same goes for scheduling stuff, serializing stuff and getting stuff to trigger on arbitrary things all with minimal overhead. I suspect many things would be easier if you can create a way of having cluster wide consistent monotonic high-res time, and also fast low latency clusterwide locking.

    On a related note many programming languages seem to like to push parameters onto a stack. That's fine but in many implementations they seem to push the data onto the code/call stack (which holds return addresses). This mixing of data and code stuff is unsafe. They should have separate data and code stacks. That way even if a hacker messes with the data, it's harder to execute arbitrary code- you'd just execute the same code but with mangled data, which is more likely to produce errors instead of arbitrary code execution.

    If the reason for doing such insecure stuff is performance or convenience, then perhaps Intel etc can use those transistors to make it faster and easier to do things safer.

  • by chrysrobyn (106763) on Tuesday December 04, 2012 @10:20AM (#42178647)

    Except... the number of transistors in a CPU is irrelevant!

    No, it's very relevant.

    A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

    How much wiring happens on doped silicon? None. The vast majority of the chip is covered in transistors, with 6-10 levels of wires on top of them. There are some designs where the I/O count demands so many pins that's what dictates the size of the chip -- so cache is filled in underneath. Heck, if your power budget allows it, you're already blowing the silicon area anyway, might as well increase your cache size! Consider your recent Core derived designs. Take away half the cache. Do you think the die area would go down? Not hardly.

    Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

    You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.

    The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

    I realize it's vogue for people to revisit Moore's Law and rewrite it every few years, but he was not speaking specifically about memory arrays. In fact, the chips Moore had access to at the time had very little memory on them.

    Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

    Wiring never forces silicon area to be transistor-free, unless you're thinking of 1980 era chips. Not even late '80s had wiring on doped silicon. Certainly the kinds of chips Moore was talking about has had no significant wiring on doped silicon in 20 years, the exceptions being only when layout designers are getting lazy. I've done layout design, I've done circuit design, I've audited dozens of chip layouts and seen several technology manuals dating back to the 90s.

    That random logic, by the way, is the subject of the most innovation in the field of chip layout and arguably in all of chip design. When your chip's entire goal is to funnel data through different units and do different things to it, you're dominated by buses. Automated tools often do split these buses up, but different algorithms can pull them together and make them more efficient. Caches are the smallest because they can be small. There's an entire periphery to them, including senseamps devoted to reading the baby FETs that can't make full rail to rail swings on the bitlines.

    May I guess you're a student? Perhaps one who is learning from a professor who hasn't been in the industry since about 1985?

I am the wandering glitch -- catch me if you can.

Working...