Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming

Auto-threading Compiler Could Restore Moore's Law Gains 404

New submitter Nemo the Magnificent writes "Develop in the Cloud has news about what might be a breakthrough out of Microsoft Research. A team there wrote a paper (PDF), now accepted for publication at OOPSLA, that describes how to teach a compiler to auto-thread a program that was written single-threaded in a conventional language like C#. This is the holy grail to take advantage of multiple cores — to get Moore's Law improvements back on track, after they essentially ran aground in the last decade. (Functional programming, the other great hope, just isn't happening.) About 2004 was when Intel et al. ran into a wall and started packing multiple cores into chips instead of cranking the clock speed. The Microsoft team modified a C# compiler to use the new technique, and claim a 'large project at Microsoft' have written 'several million lines of code' testing out the resulting 'safe parallelism.'" The paper is a good read if you're into compilers and functional programming. The key to operation is adding permissions to reference types allowing you to declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects. With that information, the compiler is able to prove that chunks of code can safely be run in parallel. Unlike many other approaches, it doesn't require that your program be purely functional either.
This discussion has been archived. No new comments can be posted.

Auto-threading Compiler Could Restore Moore's Law Gains

Comments Filter:
  • Not this shit again (Score:4, Informative)

    by Anonymous Coward on Monday December 03, 2012 @07:11PM (#42174551)

    Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

    • by xetovss ( 17621 ) on Monday December 03, 2012 @07:31PM (#42174707) Journal

      Exactly, Moore's Law isn't a law, it is a marketing plan. I don't see why so many people get so serious about it. A real law (of science) would be something like the law of gravity where it has a near universal application, whereas Moore's Law is a "law" that describes Intel's marketing plan.

      • by ceoyoyo ( 59147 ) on Monday December 03, 2012 @10:58PM (#42175979)

        Moore's law was coined by an engineer to describe a series of observations. That is, it's a mathematical function that seems to fit some data, without any explanatory power. Just like various other "laws" such as the laws of thermodynamics, and, your favourite, Newton's laws, including his law of universal gravitation.

        Moore's law states that the number of components on an integrated circuit doubles approximately every two years.

      • by shmageggy ( 2682297 ) on Monday December 03, 2012 @11:18PM (#42176091)
        Yeah but "Moore's Heuristic" just aint as snappy.
    • by Baloroth ( 2370816 ) on Monday December 03, 2012 @07:54PM (#42174871)

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

      So, if you can scale up the ability to use processor cores, then Moore's law can continue to hold for CPU's as we increase processor cores. If you can't, then it won't. It'll have to stop sometime, of course, just not necessarily today.

      • by tlhIngan ( 30335 ) <slashdot@worf.ERDOSnet minus math_god> on Tuesday December 04, 2012 @01:13AM (#42176549)

        Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).

        Except... the number of transistors in a CPU is irrelevant!

        A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

        Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

        The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

        Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

        • by chrysrobyn ( 106763 ) on Tuesday December 04, 2012 @09:20AM (#42178647)

          Except... the number of transistors in a CPU is irrelevant!

          No, it's very relevant.

          A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).

          How much wiring happens on doped silicon? None. The vast majority of the chip is covered in transistors, with 6-10 levels of wires on top of them. There are some designs where the I/O count demands so many pins that's what dictates the size of the chip -- so cache is filled in underneath. Heck, if your power budget allows it, you're already blowing the silicon area anyway, might as well increase your cache size! Consider your recent Core derived designs. Take away half the cache. Do you think the die area would go down? Not hardly.

          Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).

          You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.

          The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.

          I realize it's vogue for people to revisit Moore's Law and rewrite it every few years, but he was not speaking specifically about memory arrays. In fact, the chips Moore had access to at the time had very little memory on them.

          Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.

          Wiring never forces silicon area to be transistor-free, unless you're thinking of 1980 era chips. Not even late '80s had wiring on doped silicon. Certainly the kinds of chips Moore was talking about has had no significant wiring on doped silicon in 20 years, the exceptions being only when layout designers are getting lazy. I've done layout design, I've done circuit design, I've audited dozens of chip layouts and seen several technology manuals dating back to the 90s.

          That random logic, by the way, is the subject of the most innovation in the field of chip layout and arguably in all of chip design. When your chip's entire goal is to funnel data through different units and do different things to it, you're dominated by buses. Automated tools often do split these buses up, but different algorithms can pull them together and make them more efficient. Caches are the smallest because they can be small. There's an entire periphery to them, including senseamps devoted to reading the baby FETs that can't make full rail to rail swings on the bitlines.

          May I guess you're a student? Perhaps one who is learning from a professor who hasn't been in the industry since about 1985?

      • by TheLink ( 130905 ) on Tuesday December 04, 2012 @01:25AM (#42176601) Journal

        The problem is that we are nearing the peak of what is possible with current technology in a single core

        But aren't there still plenty of things that the hardware can do to make the software stuff easier?

        Intel has actually started adding some stuff to help: http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions [wikipedia.org]

        So maybe Intel, AMD should interact more with the OS and compiler bunch and figure out how to use those billions of transistors in more useful ways.

        There are things you can do in software to address the c10k problem (and similar problems): http://www.kegel.com/c10k.html [kegel.com]
        But I'm sure there are things you can do in hardware to make stuff even easier. It's not like the stuff that's being done is the best way of doing things. Furthermore the OS and apps people might be writing things in certain ways because certain things aren't done by the hardware or done fast.

        I know that at least on the x86 platform checking the current time and also getting monotonic time is not as simple AND efficient as it could be. It was even worse before HPET, and now even HPET isn't that great. Monotonic system time (ticks) could be different from current human time, many programs need one or both.

        Same goes for scheduling stuff, serializing stuff and getting stuff to trigger on arbitrary things all with minimal overhead. I suspect many things would be easier if you can create a way of having cluster wide consistent monotonic high-res time, and also fast low latency clusterwide locking.

        On a related note many programming languages seem to like to push parameters onto a stack. That's fine but in many implementations they seem to push the data onto the code/call stack (which holds return addresses). This mixing of data and code stuff is unsafe. They should have separate data and code stacks. That way even if a hacker messes with the data, it's harder to execute arbitrary code- you'd just execute the same code but with mangled data, which is more likely to produce errors instead of arbitrary code execution.

        If the reason for doing such insecure stuff is performance or convenience, then perhaps Intel etc can use those transistors to make it faster and easier to do things safer.

    • by Jonner ( 189691 ) on Monday December 03, 2012 @07:56PM (#42174893)

      Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.

      Try reading the headline again. Usually, clueless posters have to be reminded to read TFA, but this is ridiculous. As Moore's "law" continues to be mostly true, the added transistors are being used for extra more cores rather than to make one core faster. Most of the cores sit idle most of the time because few programs can use more than one of them at once.

    • by c0lo ( 1497653 ) on Monday December 03, 2012 @08:22PM (#42175117)

      Apparently the real geeks left Slashdot ages ago.

      Casted to void?

  • Moore's law is only about the number of transistors on integrated circuits.
  • by databeast ( 19718 ) on Monday December 03, 2012 @07:22PM (#42174639) Homepage

    Or, gawd forbid.. we could teach programmers how to use threading? I am a casual developer, with no formal training beyond writing practical code and reading as much as I can from the keyboards of real developers. I've run into my fair share of "real", "professional" developers who've been tasked to work on my code and thrown their hands up in defeat - not, as I feared, because of the awfulness of my coding style, but "This uses threading, I don't know this!".. Of course, looking at their resumes, a long list of single threaded webapps where the actual threading is handled for them by the webserver itself.. I give up. Even some basic native GUI client development should teach developers simple threading and asynchronous callbacks? alas, yet another talent abandoned in the age of the webapp. Is it any wonder the security issues (my actual realm of supposed 'expertise') in their code often make the architectural issues look trivial in comparison?

    An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

    • by lennier ( 44736 ) on Monday December 03, 2012 @07:42PM (#42174781) Homepage

      Or, gawd forbid.. we could teach programmers how to use threading?

      That's easy: "Don't."

      From everything I've read about threading, there's no general way a hand-coded multithreaded program can ever be proven to be correct. Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't. And most programmers are not the smartest programmers in the world (they only think they are).

      The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.

      • There are plenty of threading frameworks in most languages where you can just define threadable operations. Your job is simply to ensure the task is correct, use the right framework and trust it to the correct degree. As with many things, someone only needs to do it right once.

      • It is possible to prove a multi-threaded program correct.

        First you might start at the memory model, and all the guarantees that one can make based on that (and, by association, all the guarantees in terms of the memory model that locking / atomic calls make), then one can move on to he library structures and routines (objects and methods to put it in another paradigm).

        Once you have hard guarantees using primitives and low level constructs it might be easy to construct a state-based proof. One example is Cli

    • Goto is swell also. Just be sure not to make any mistakes!
    • I wrote up a very long comment that ranted and raved and didn't come close to this comment right here. Some threading problems are hard, really hard. They aren't that common.

      Man up and learn. I like it.

      • Some threading problems are hard, really hard. They aren't that common.

        Which is why compilers should be handling as many threading problems as they can: so that rather than spending time and mental effort dealing with problems that are not so hard, we can spend time and mental effort on problems that compilers cannot solve for us. There is a reason that we use higher level languages, and that reason is not "laziness."

    • Or, gawd forbid.. we could teach programmers how to use threading?

      No, we want our compilers to do these things for us, because things that compilers can do they usually do better than humans can. Once your programs become large enough and complex enough, compilers outperform humans every time -- it's not about laziness, it is about the limits of human programming ability.

      Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.

      • That is because of knowledge of processor details, memory details, comple operations, and, well, they [compilers] are better than us. Except that the ability to optimise depends on pure logic, of some form. As the state gets large optimization gets more complex.

        Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code. Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect. In

        • by betterunixthanunix ( 980855 ) on Monday December 03, 2012 @09:19PM (#42175437)

          Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code

          First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort). That aside, it is worth noting (and I am somewhat bothered by this when it comes to TFA) that we still do not know if it is even possible to optimize any program by parallelizing it; see the NC-vs-P question:

          https://en.wikipedia.org/wiki/P-complete [wikipedia.org]

          Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.

          Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect

          Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions, and programming in a declarative language requires an entirely different mode of thinking (it is closer to asking a question that giving instructions). It is also not strictly necessary for the computer to do exactly what a program expresses; there has been some work on compiler optimizations that have a (tunable) chance of not maintaining soundness.

          In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)

          If you are convinced of this, try implementing something more complex than the algorithms you see in Knuth's books; say, this:

          http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf [cr.yp.to]

          Then ask yourself this: could the constant factors in your implementation be better? At the end of the day, big constant factors will hurt your performance so badly that you might as well have used an asymptotically worse algorithm; indeed, consider fast integer multiplication:

          https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm [wikipedia.org]

          10000 digits are needed before that algorithm actually outperforms the asymptotically worse Toom-Cook family of algorithms. Here is an even more extreme example:

          https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm [wikipedia.org]

          Sure, that's a better matrix multiplication algorithm in the asymptotic sense...but only for matrices that are so large that you cannot even store them on today's computers.

          So really, while you are right that asymptotic improvements will always beat constant factor improvements (which is what compilers are mostly going to do for you), you are wrong to ignore the importance of constant factor improvements. In the real world, constant factors matter. In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors. In the real world, large integer multiplication is done using Karatsuba or Toom-Cook methods, not FFT methods, because of constant factors. In the real world, if you are not using a compiler to optimize your code, your code is going to be slower than it needs to be, even if you spent hours hand-tuning it, unless you are only dealing with toy problems.

    • by Hentes ( 2461350 )

      Most parallel problems tend to fall into a small set of categories, and I see no problem with abstracting them away. There are many libraries already that handle the threading for you.

    • by Jonner ( 189691 ) on Monday December 03, 2012 @08:08PM (#42175001)

      An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)

      Developing software is all about managing complexity. Abstractions are the primary tool used to do so. They are neither inherently good or bad. A large part of writing good software is finding the appropriate abstractions and eliminating inappropriate ones. If an abstraction allows a program to be written more easily with an acceptible level of performance and correctness, it is an appropriate abstraction.

      To know what code is "actually doing" is relative. No programmer knows what his code is doing at the level of gates on the chip. It's rarely necessary or even helpful to know what the code is doing at the level of CPU instructions or microinstructions. This is especially true if the code runs on multiple CPUs.

    • I've spent a lot of time in the earlier part of my career writing multithreaded applications (mostly client/server stuff) and I got very good at it. That said, it's a complete waste of my time as a developer. Developers should focus on domain problems, not how to tweak the microprocessor. In my experience, most developers don't seem to ever fully get how to write multithread apps and threading problems are a royal pain in the ass to debug. I can't see a better candidate for automation than taking this out o

  • by HuguesT ( 84078 ) on Monday December 03, 2012 @07:24PM (#42174651)

    OK, so now we shall have another way to produce parallel programs.

    Running things safely in parallel is a very good thing, but it does not by itself guarantee significant improvements in speed. The hard bit is actually to optimize the parallel code [futurechips.org].

    • by godrik ( 1287354 )

      Exactly, I am doing parallel programming and HPC for a leaving. I do not believe in compiler automagic^Wautomatic parallelisation. Typing is only a small parts of the problem. Most algorithms need to be written VERY differently to be able to exploit the parallelism within. Some times, you actually need to have variable concurrency and race condition within your algorithm but you have ways to fix problems whenever they appear.

      In brief, I do not think that compilers will be able to do anything more than quite

  • mutable state (Score:5, Informative)

    by jbolden ( 176878 ) on Monday December 03, 2012 @07:31PM (#42174703) Homepage

    Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state. The Microsoft approach means that state needs to be tracked carefully so that it could be isolated by the compiler even if it isn't isolated by the code. Which is likely just as much work as isolating state. And the nice thing about isolating state is once you do it you can make use of all sorts of incredibly powerful functional paradigms like like first class functions (closures, partial execution...) and lazyness (infinite data structures, no need to figure out proper order of evaluation..)

    The solution to parallelism is functional programming. And no it is not too hard. Excel is a functional programming language that lots of people know that does a great job isolating state.

    • by readin ( 838620 )
      I wish I knew more about functional programming than I do. In reading the article I found the concept of a language the limits mutable variables interesting. In using Java I find that making most variables "final" is helpful to me for a number of reasons. I can easily find where the variable got its current value. If I write a line of code that changes the wrong variable it is immediately obvious. It keeps me from re-using variables that shouldn't be re-used (generally a variable should have one meanin
      • Re:mutable state (Score:4, Interesting)

        by betterunixthanunix ( 980855 ) on Monday December 03, 2012 @08:22PM (#42175113)
        In functional algorithms and data structures, everything is immutable (in theory) -- rather than thinking in terms of "final," think in terms of Java's String class. If you want to change one character in a String instance, you must create a new String instance. For a less trivial example, consider how a functional algorithm that removes an element from a list would work (written in Java-like syntax):

        List remove_if(List lst, Predicate pred)
        {
        if(lst == null)
        {
        return null
        }
        else if(pred.test(lst.first()))
        {
        return remove_if(lst.rest());
        }
        else
        {
        return new List(lst.first(), remove_if(lst.rest()));
        }
        }

        Notice that this constructs an entirely new list, even if none of the elements in the list pass the test. This may seem like a terrible idea, but let's put it this way: if you have 10 threads that share the list, and one of them wants to remove some nodes, you would have had to have copied the list anyway; the functional approach to remove_if is exactly what you want. Now, consider this function, which only removes the first node to match:

        List remove_first(List lst, Predicate pred)
        {
        if(lst == null)
        {
        return null;
        }
        else if(pred.test(lst.first))
        {
        return lst.rest();
        }
        else
        {
        return new List(lst.first(), remove_first(lst.rest()));
        }
        }

        Now you have a situation where lists share nodes -- and again, imagine a situation where 10 threads share the list, and one wants to perform this operation. This is one of the reasons that functional programming is so promising for parallel algorithms: you have fewer situations where explicit mutexes are needed, because you are usually copying things that you intend to modify (or more precisely, you never really modify anything).

        Of course, things are more complicated in the real world. Purely functional approaches would obviously be pretty inefficient in a lot of cases, since things would be needlessly copied. Lisp, as an example, has both a non-destructive append as well as a destructive nconc, the latter being intended for use in situations where the original lists will not be used again (and can therefore be modified).

    • The benefits of using functional languages is realized in terms of program safety and a lack of defects -- not necessarily performance. I think we are all aware of the fact that there are plenty of programmers out there who care very little about introducing a few bugs for the sake of speed and looking clever.

      But even if you just use immutable data and message passing as IPC under a procedural language you are headed in the right direction. It's really a mindset and even the procedural programming folks don

    • All mainstream languages are slowly gravitating towards FP-style - C# is already halfway there, Java is on its way to join it, C++ STL is functional in spirit if not in implementation, and of course we have new languages like Scala that are explicitly designed to marry the old and the new.

      Problem is, for transparent parallelism, you need to go 100% functional / immutable, and that's just not happening. What does happen is isolating distinct units in an OOP program that are then implemented functional-style

      • by chthon ( 580889 )

        Problem is, for transparent parallelism, you need to go 100% functional / immutable

        Which comes down to re-educating programmers to learn to think about their algorithms.

        One book which does this nicely, without going to deep into theoretical computer science is How to Design Programs [htdp.org]

  • by DaneM ( 810927 ) on Monday December 03, 2012 @07:34PM (#42174733)

    About time.

  • by 140Mandak262Jamuna ( 970587 ) on Monday December 03, 2012 @08:30PM (#42175151) Journal
    All these days of careful coding diligently avoiding static_casts and const_casts ... All those recompilations triggered because I had to touch one header file used by everyone and his brother to satisfy my strict insistence of const correctness... I paid and paid and paid all these days. I avoided mutable data members because Soustroup pontificated in some vague posting in comp.lang.c++ (OMG I even forgot the usenet group name!) that "a well constructed object oriented code should not require mutables".

    This is my pay day baby! My code is going to run super fast in multicore machines without any (more) extra work from me! Wooo Hooo!

    Please take pity on my and let me enjoy my day dream for a few hours, without rudely inserting reality into my reverie! Thanks slashdotters, I know I can count on you!

  • by Culture20 ( 968837 ) on Monday December 03, 2012 @08:35PM (#42175187)
    The holy grail of parallel processing is teaching programmers how to handle parallel processing (and what domains can benefit and where).
  • by segfault_0 ( 181690 ) on Monday December 03, 2012 @08:35PM (#42175193)

    A compiler may be able to thread your program, but it will be a long time before it understands your intent well enough to do it well. Also I can think of many situations under which such functionality may not even be a good idea at all. I would assume such a system would use a pool of threads to avoid constant thread construction overhead and if you get a few IO-busy actions threaded out in an ignorant fashion I think you will find your program blocking a lot more, and producing a lot less throughput than it should.

    Also, the OP blithely stated that functional programming isn't happening -- yet features of the functional paradigm, like anonymous functions, have made their way into nearly every language of consequence and many new languages proudly tout functional programming features (see f#, scala, rust, clojure, and go). Perhaps pure functional programming is moving pretty slow, but it's features are not.

  • by Animats ( 122034 ) on Monday December 03, 2012 @08:49PM (#42175273) Homepage

    Automatically paralleling compilers aren't new. SGI had one for C ten years ago. Intel has it for C++ and Fortran now. [intel.com] It's been done for Matlab. There's plenty of theory [wikipedia.org].

    Outside of the supercomputing community, it's never gone mainstream. Microsoft is at least trying to do it for C#, which is a reasonable language. An interesting point is that this Microsoft approach exploits immutable objects. Immutable objects can be freely shared without locking, so wide use of immutable objects makes automatic extraction of parallelism easier.

    I'd looked at doing something similar for Python, to get rid of the Global Interpreter Lock, Python's boat-anchor. Python already makes wide use of immutable objects, but doesn't gain any performance from them. If everything is thread-local, immutable, or synchronized in the Java sense, you don't need global locking. But the politics of the Python community do not favor performance. (Potentially, Python could be up to at least the LISP level of performance, within a factor of 2 of C, if the language was restricted in certain ways.)

    Another part of the problem is the pernicious heritage of POSIX/Linux locking primitives. Many programmers think that locking is an library or OS-level problem, not a language level problem. Locking via library primitives means language doesn't know what data is locked by which lock. This makes optimization and race checking very tough.

    The political and social problems here are tougher than the technical ones. So the open source community can't solve them. It takes someone with a big hammer, like Microsoft, to do it.

    "Who tells whom what to do?" - V. Lenin

    • Re: (Score:3, Informative)

      by whistl ( 234824 )

      SGI's automatic parallelizing software came from Kuck and Associates, Inc (kai.com). I worked there for 8-1/2 years, and one disappointing fact we learned was that the only people who really cared enough about parallelizing their software to analyze their code and modify the source to make it faster were either research scientists (of which there were relatively few) who mostly wanted quicker and cheaper results (because renting time on supercomputers costs $$) or marketing departments of computer hardware

  • by citizenr ( 871508 ) on Monday December 03, 2012 @10:19PM (#42175791) Homepage

    Sounds like user provides list of Structural/Control/Data Hazards. Compiler Pipelines code into blocks that can be ran in parallel.
    Sounds familiar :)

  • by 14erCleaner ( 745600 ) <FourteenerCleaner@yahoo.com> on Monday December 03, 2012 @10:20PM (#42175795) Homepage Journal
    From skimming their paper, it doesn't appear that they get any real speedup from their parallelism. This is apparent because they state, in the part about the millions of lines of code written in their language, that

    These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks

    Translation: we didn't speed them up any, or at least not by enough that we care to share any number.

    Amdahl's Law [wikipedia.org] is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.

All science is either physics or stamp collecting. -- Ernest Rutherford

Working...