Auto-threading Compiler Could Restore Moore's Law Gains 404
New submitter Nemo the Magnificent writes "Develop in the Cloud has news about what might be a breakthrough out of Microsoft Research. A team there wrote a paper (PDF), now accepted for publication at OOPSLA, that describes how to teach a compiler to auto-thread a program that was written single-threaded in a conventional language like C#. This is the holy grail to take advantage of multiple cores — to get Moore's Law improvements back on track, after they essentially ran aground in the last decade. (Functional programming, the other great hope, just isn't happening.) About 2004 was when Intel et al. ran into a wall and started packing multiple cores into chips instead of cranking the clock speed. The Microsoft team modified a C# compiler to use the new technique, and claim a 'large project at Microsoft' have written 'several million lines of code' testing out the resulting 'safe parallelism.'"
The paper is a good read if you're into compilers and functional programming. The key to operation is adding permissions to reference types allowing you to declare normal references, read-only references to mutable objects, references to globally immutable objects, and references to isolated clusters of objects. With that information, the compiler is able to prove that chunks of code can safely be run in parallel. Unlike many other approaches, it doesn't require that your program be purely functional either.
Not this shit again (Score:4, Informative)
Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
Re:Not this shit again (Score:4, Interesting)
Exactly, Moore's Law isn't a law, it is a marketing plan. I don't see why so many people get so serious about it. A real law (of science) would be something like the law of gravity where it has a near universal application, whereas Moore's Law is a "law" that describes Intel's marketing plan.
Re:Not this shit again (Score:5, Interesting)
Moore's law was coined by an engineer to describe a series of observations. That is, it's a mathematical function that seems to fit some data, without any explanatory power. Just like various other "laws" such as the laws of thermodynamics, and, your favourite, Newton's laws, including his law of universal gravitation.
Moore's law states that the number of components on an integrated circuit doubles approximately every two years.
Re: (Score:3)
We also have the law of conservation of mass. Oh yeah, we had a nice demonstration of breaking that one in 1945. Then there's Kepler's laws of planetary motion: the first and second are pretty close, but not quite true (even if you're not looking at Mercury) and the third was only a statement of proportionality until it was refined later.
Notice that all the physics "laws" originate before the 20th century. They were exactly as I described: statements, preferably mathematical, that fit the data of the tim
Re:Not this shit again (Score:4, Insightful)
Re:Not this shit again (Score:5, Informative)
Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
Yes and no. Moore's law states that the number of transistors will double every 2 years. The problem is that we are nearing the peak of what is possible with current technology in a single core, hence all the focus on 4,6,8, and even 16 core systems for consumers (always been popular in supercomputers and the like). That means doubling transistor count every 2 years can be done through increasing cores... but there is no point in doing that if programs can only use a few of them (very few consumers now need 4 cores, much less 8 or 16).
So, if you can scale up the ability to use processor cores, then Moore's law can continue to hold for CPU's as we increase processor cores. If you can't, then it won't. It'll have to stop sometime, of course, just not necessarily today.
Re:Not this shit again (Score:5, Informative)
Except... the number of transistors in a CPU is irrelevant!
A CPU doesn't have the transistor density that really benefits much from Moore's Law - because the vast majority of the space on a chip is not taken up by transistors, but by wiring. In fact, the wiring density is what's limiting transistor density (a good thing - larger transistors can give you better performance because they can drive the longer wires quicker).
Most of the transistors used in a CPU actually goes towards the cache - when you're talking about 16+ MB of pure L1/L2/L3 cache, implemented as 6T SRAM cells, that's 100M transistors right there (and that doesn't include the cache line tag logic and CAM).
The thing with the highest transistor density (and thus the most benefit of Moore's Law) is actually memory structures - caches, DRAM, SRAM, flash memory, etc. This is where each transistor is vital to memory storage and packing them in close means more storage is available, in which case Moore's law states that RAM etc. will double in capacity or halve in cost every 18 months or so.
Smaller transistors do help CPUs consume a little less power, but double the number of transistors doesn't do a whole lot because there's a lot of empty space that the wiring forces to be transistor-free. (Non-memory parts of the CPU are effectively "random logic" where there's no rhyme or reason to the wiring). It's why the caches have the most transistors yet take the smallest areas.
Re:Not this shit again (Score:5, Interesting)
No, it's very relevant.
How much wiring happens on doped silicon? None. The vast majority of the chip is covered in transistors, with 6-10 levels of wires on top of them. There are some designs where the I/O count demands so many pins that's what dictates the size of the chip -- so cache is filled in underneath. Heck, if your power budget allows it, you're already blowing the silicon area anyway, might as well increase your cache size! Consider your recent Core derived designs. Take away half the cache. Do you think the die area would go down? Not hardly.
You did the math right, but the cache line tag logic and coupled CAM are negligible. Sure, they may add a few million or so, but not anywhere near 5% of 100M.
I realize it's vogue for people to revisit Moore's Law and rewrite it every few years, but he was not speaking specifically about memory arrays. In fact, the chips Moore had access to at the time had very little memory on them.
Wiring never forces silicon area to be transistor-free, unless you're thinking of 1980 era chips. Not even late '80s had wiring on doped silicon. Certainly the kinds of chips Moore was talking about has had no significant wiring on doped silicon in 20 years, the exceptions being only when layout designers are getting lazy. I've done layout design, I've done circuit design, I've audited dozens of chip layouts and seen several technology manuals dating back to the 90s.
That random logic, by the way, is the subject of the most innovation in the field of chip layout and arguably in all of chip design. When your chip's entire goal is to funnel data through different units and do different things to it, you're dominated by buses. Automated tools often do split these buses up, but different algorithms can pull them together and make them more efficient. Caches are the smallest because they can be small. There's an entire periphery to them, including senseamps devoted to reading the baby FETs that can't make full rail to rail swings on the bitlines.
May I guess you're a student? Perhaps one who is learning from a professor who hasn't been in the industry since about 1985?
Re:Not this shit again (Score:4, Funny)
In fact, the chips Moore had access to at the time had very little memory on them
Well of course! That was a lot of 18 monthses ago!
Re:Not this shit again (Score:5, Interesting)
The problem is that we are nearing the peak of what is possible with current technology in a single core
But aren't there still plenty of things that the hardware can do to make the software stuff easier?
Intel has actually started adding some stuff to help: http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions [wikipedia.org]
So maybe Intel, AMD should interact more with the OS and compiler bunch and figure out how to use those billions of transistors in more useful ways.
There are things you can do in software to address the c10k problem (and similar problems): http://www.kegel.com/c10k.html [kegel.com]
But I'm sure there are things you can do in hardware to make stuff even easier. It's not like the stuff that's being done is the best way of doing things. Furthermore the OS and apps people might be writing things in certain ways because certain things aren't done by the hardware or done fast.
I know that at least on the x86 platform checking the current time and also getting monotonic time is not as simple AND efficient as it could be. It was even worse before HPET, and now even HPET isn't that great. Monotonic system time (ticks) could be different from current human time, many programs need one or both.
Same goes for scheduling stuff, serializing stuff and getting stuff to trigger on arbitrary things all with minimal overhead. I suspect many things would be easier if you can create a way of having cluster wide consistent monotonic high-res time, and also fast low latency clusterwide locking.
On a related note many programming languages seem to like to push parameters onto a stack. That's fine but in many implementations they seem to push the data onto the code/call stack (which holds return addresses). This mixing of data and code stuff is unsafe. They should have separate data and code stacks. That way even if a hacker messes with the data, it's harder to execute arbitrary code- you'd just execute the same code but with mangled data, which is more likely to produce errors instead of arbitrary code execution.
If the reason for doing such insecure stuff is performance or convenience, then perhaps Intel etc can use those transistors to make it faster and easier to do things safer.
Re:Not this shit again (Score:5, Insightful)
Moore's law has nothing to do with performance, numbnuts. Apparently the real geeks left Slashdot ages ago.
Try reading the headline again. Usually, clueless posters have to be reminded to read TFA, but this is ridiculous. As Moore's "law" continues to be mostly true, the added transistors are being used for extra more cores rather than to make one core faster. Most of the cores sit idle most of the time because few programs can use more than one of them at once.
Re:Not this shit again (Score:5, Insightful)
Maybe you need to read?
Here's the headline (emphasis mine): Auto-threading Compiler Could Restore Moore's Law Gains
In the past, adding more transistors to a core meant the core worked faster (simplification), so Moore's law indirectly lead to better performance. Now a core is close to as fast/efficient as its going to get, so we throw the extra transistors at new cores (still Moore's law). The problem is, there's only so many single-threaded processes a person can run before there will literally be one core per process. In order to see benefits from the extra transistors again, "gains" in the summary's terminology, then we need a way to take advantage of it (this new compiler technique, functional programming, programmers who can actually grok threads). In the end, if we're not seeing some kind of gain from Moore's law, then the chip manufacturers will stop adding new transistors because no one will pay money for a chip that's just as good as their current chip, and Moore's law will fail.
Re:Not this shit again (Score:5, Funny)
Apparently the real geeks left Slashdot ages ago.
Casted to void?
I hate when people misuse Moore's law (Score:2, Informative)
Re:I hate when people misuse Moore's law (Score:5, Interesting)
Re:I hate when people misuse Moore's law (Score:5, Funny)
Re: (Score:3, Informative)
You do realize that all the things you listed are a side effect of the doubling of transistor density - and not some separate independent phenomenon?
Re: (Score:3)
Re: (Score:3)
Re:I hate when people misuse Moore's law (Score:5, Interesting)
Re: (Score:3)
"resolution of brain scanning technology"
Oh, that would be fantastic. Unfortunately it's just not true. Physics intervenes. Well, that and not making patients smoke (because they're on fire, not because they crave nicotine).
Re:I hate when people misuse Moore's law (Score:5, Funny)
No no, you misunderstand. They're teaching the compiler to be so efficient that it actually starts creating new transistors to make itself smarter.
Make no mistake, this is how the rise of the machines will start.
Re:I hate when people misuse Moore's law (Score:5, Insightful)
Or.. teach devs to use threading as appropriate? (Score:5, Informative)
Or, gawd forbid.. we could teach programmers how to use threading? I am a casual developer, with no formal training beyond writing practical code and reading as much as I can from the keyboards of real developers. I've run into my fair share of "real", "professional" developers who've been tasked to work on my code and thrown their hands up in defeat - not, as I feared, because of the awfulness of my coding style, but "This uses threading, I don't know this!".. Of course, looking at their resumes, a long list of single threaded webapps where the actual threading is handled for them by the webserver itself.. I give up. Even some basic native GUI client development should teach developers simple threading and asynchronous callbacks? alas, yet another talent abandoned in the age of the webapp. Is it any wonder the security issues (my actual realm of supposed 'expertise') in their code often make the architectural issues look trivial in comparison?
An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)
Re:Or.. teach devs to use threading as appropriate (Score:5, Insightful)
Or, gawd forbid.. we could teach programmers how to use threading?
That's easy: "Don't."
From everything I've read about threading, there's no general way a hand-coded multithreaded program can ever be proven to be correct. Threads introduce all sorts of extremely subtle timing-based logic and security bugs which even the smartest programmers in the world think they can handle but in practice don't. And most programmers are not the smartest programmers in the world (they only think they are).
The correct solution is to switch from C-based imperative languages to pure-functional implicitly parallelised languages, but that's not likely to happen before the heat death of the universe.
Re: (Score:2)
There are plenty of threading frameworks in most languages where you can just define threadable operations. Your job is simply to ensure the task is correct, use the right framework and trust it to the correct degree. As with many things, someone only needs to do it right once.
Re: (Score:2)
It is possible to prove a multi-threaded program correct.
First you might start at the memory model, and all the guarantees that one can make based on that (and, by association, all the guarantees in terms of the memory model that locking / atomic calls make), then one can move on to he library structures and routines (objects and methods to put it in another paradigm).
Once you have hard guarantees using primitives and low level constructs it might be easy to construct a state-based proof. One example is Cli
Re: (Score:3)
This is totally true.
So long as you define trivial programs to be a subset of programs that can be proven correct.
Re: (Score:3, Insightful)
L4 isn't exactly trivial, but it's proof exceeds the length of it's code by whole factors.
Re: (Score:3)
Re:Or.. teach devs to use threading as appropriate (Score:4, Insightful)
Actually you don't need functional. Java works fine with multi-threading (well, it does for my daily use). In fact, that is the real strength of the garbage collection model of Java, it can track and manage efficient release of resources in a multi-threaded system. Also Java has direct language support for synchronization mechanisms and extensive library support for multi-thread friendly data structures (Doug Leah is a legend!). Lots and lots of multi-threaded programs get written in Java, without needing functional languages, it's just that we we do write them they do work (after we test for various possible thread-related problems), work well, and we don't have to bitch about needing to change paradigms to get things going (eg. needing to change to functional languages when they are not always the best fit for many classes of problem).
Of course you'll hear those still stuck on legacy languages designed without multi-threading in mind whinging. That doesn't mean there isn't a straightforward, easy to learn language and development platform out there that isn't strong on multi-threading. There is - and it is cross-platform and general purpose too.
Re: (Score:3)
someObj.someMethod(if(x == 2) { 5; } else { 10; });
Aarrrrghhh!!!!!111 Surely you would never *teach* students such monstrosities. The idea is to keep it simple, one statement or side effect per line of code. This helps you when you wave your debugger over the code, since it will automagically highlight the result of the statement. This also helps the people reading your program (and source code is meant for people firstly). You are trying to write Java in non-Java different style, no wonder that code you gave is so hideous. Of course, the Java compiler is
Re: (Score:3)
The idea is to keep it simple, one statement or side effect per line of code
Except that you are then going to either (a) duplicate code or (b) have to introduce and keep track of more variables. In what way is that simple?
You are trying to write Java in non-Java different style
Perhaps I was not clear: that was a mistake that one of my students made. They did not know Java, they were still trying to learn it, and the point is that the Java style of programming is very unnatural. "Call f with 5 if x is 2 or else 10" is how the student was thinking about the program; they were forced to think about it instead as "if x is 2 call f w
Re: (Score:2)
Re: (Score:3)
Bingo.. and as many other folks have pointed out, 90% of of threading is purely to decrease latency and bypass blocking operations - very few applications out there today are heavily dependent on concurrency to achieve any kind of raw horsepower... This does just seem like, if it were to become mainstream, would just become a crutch for developers to ignore threading and blocking I/O issues entirely, because "The compiler will sort that out".. In theory, this isn't necessarily a bad thing. In theory, all th
Re:Or.. teach devs to use threading as appropriate (Score:5, Informative)
What about memory alignment and packing? Calling conventions? Near pointers and far pointers (OK maybe that one is a bit dated)? Most programmers do not think about these problems, because most programmers are trying to solve bigger problems and need to spend time thinking about the bigger picture. Getting mired down in low level details makes programmers less productive -- that is why we use programming languages.
So if we can have compilers that can automatically utilize multithreading, even if it is in a somewhat limited context, we should be happy. If you are happy to not be spending time thinking about padding data or allocating registers, you should be happy about automatic multithreading.
Re: (Score:2)
I wrote up a very long comment that ranted and raved and didn't come close to this comment right here. Some threading problems are hard, really hard. They aren't that common.
Man up and learn. I like it.
Re: (Score:3)
Some threading problems are hard, really hard. They aren't that common.
Which is why compilers should be handling as many threading problems as they can: so that rather than spending time and mental effort dealing with problems that are not so hard, we can spend time and mental effort on problems that compilers cannot solve for us. There is a reason that we use higher level languages, and that reason is not "laziness."
Re: (Score:3)
Or, gawd forbid.. we could teach programmers how to use threading?
No, we want our compilers to do these things for us, because things that compilers can do they usually do better than humans can. Once your programs become large enough and complex enough, compilers outperform humans every time -- it's not about laziness, it is about the limits of human programming ability.
Compilers surpassed humans when it comes to optimization a very long time ago, except for very small programs or very short inner loop bodies.
Re: (Score:2)
That is because of knowledge of processor details, memory details, comple operations, and, well, they [compilers] are better than us. Except that the ability to optimise depends on pure logic, of some form. As the state gets large optimization gets more complex.
Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code. Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect. In
Re:Or.. teach devs to use threading as appropriate (Score:5, Insightful)
Just like quicksort(3) is far faster than bubblesort so too is a highly threadable code faster than non-threadble code
First, just to be pedantic, I'll point out that quicksort is as bad as bubblesort in the worst case, to a constant factor (you should have picked heapsort or mergesort). That aside, it is worth noting (and I am somewhat bothered by this when it comes to TFA) that we still do not know if it is even possible to optimize any program by parallelizing it; see the NC-vs-P question:
https://en.wikipedia.org/wiki/P-complete [wikipedia.org]
Multithreading is not a magic bullet, and in all likelihood it is not generally applicable.
Languages do not, contrary to belief, express intent, the provide a strict set of instructions that the computer MUST respect
Wrong on all counts. Imperative languages are a way to convey instructions to a computer; declarative languages do not convey instructions, and programming in a declarative language requires an entirely different mode of thinking (it is closer to asking a question that giving instructions). It is also not strictly necessary for the computer to do exactly what a program expresses; there has been some work on compiler optimizations that have a (tunable) chance of not maintaining soundness.
In the end a good algorithm with no compiler help will beat optimized "dumb" code in all cases larger than "toy" (say, a few dozen "n" in Big-O notation)
If you are convinced of this, try implementing something more complex than the algorithms you see in Knuth's books; say, this:
http://eurocrypt2010rump.cr.yp.to/9854ad3cab48983f7c2c5a2258e27717.pdf [cr.yp.to]
Then ask yourself this: could the constant factors in your implementation be better? At the end of the day, big constant factors will hurt your performance so badly that you might as well have used an asymptotically worse algorithm; indeed, consider fast integer multiplication:
https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm [wikipedia.org]
10000 digits are needed before that algorithm actually outperforms the asymptotically worse Toom-Cook family of algorithms. Here is an even more extreme example:
https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm [wikipedia.org]
Sure, that's a better matrix multiplication algorithm in the asymptotic sense...but only for matrices that are so large that you cannot even store them on today's computers.
So really, while you are right that asymptotic improvements will always beat constant factor improvements (which is what compilers are mostly going to do for you), you are wrong to ignore the importance of constant factor improvements. In the real world, constant factors matter. In the real world, quicksort and mergesort will use asymptotically worse algorithms below a certain problem size because of constant factors. In the real world, large integer multiplication is done using Karatsuba or Toom-Cook methods, not FFT methods, because of constant factors. In the real world, if you are not using a compiler to optimize your code, your code is going to be slower than it needs to be, even if you spent hours hand-tuning it, unless you are only dealing with toy problems.
Re:Or.. teach devs to use threading as appropriate (Score:5, Insightful)
Even the best of C or C++ compilers are terrible at vectorization of code.
Yeah, and the best humans are terrible at allocating registers -- so bad, in fact, that the best C compilers ignore the register keyword. What do you think is more relevant to the general case: vectorizing multimedia operations, or allocating registers? Compilers are also better than humans at:
To put it another way, look at the Orbitz story, which is over a decade old now:
http://www.paulgraham.com/carl.html [paulgraham.com]
On the one hand, you have hand-tuned assembly language. On the other, you have a program written in Lisp (a high level, compiled language) with some C++ mixed in (for managing memory). Orbitz was able to compete on speed, but more importantly, it was returning better results. It's not that the people who wrote that mainframe assembly language were idiots -- they were taking advantage of all sorts of special hardware features, they knew how to hack their machines better than anyone else -- it is just that the Orbitz algorithm was far too complex for efficient hand-rolled assembly code, at which point compilers are really the only choice. The mainframe guys were busy thinking about how to make use of special machine features in assembly language; the ITA team was busy solving the higher-level problem, and relying on their compiler to generate good assembly language code. This is a particularly telling line:
We disassemble most every Lisp function looking for inefficiencies and have had both CMUCL and Franz enhanced to compile our code better.
[emphasis mine]. They disassembled their code...and then improved their compiler when they saw problems. They did not hand-roll the code, they made the compiler do a better job of generating code. These are not lazy programmers, nor are they programmers who do not know how to use assembly language; they are programmers who understand that they have a tool that is far better at generating assembly language than they are, and that they have more important things to do with their time.
I deal with quite a bit of crypto code in my work. I have seen lots of hand-tuned assembly language, I dealt with code that took advantage of the AESNI instructions to perform very fast encryption. I am well aware that in small, highly specialized functions (like AES), humans are better able to utilize special instructions to improve performance. Those are niche cases, and the techniques used in those cases have very limited applicability (even SSE is fairly limited in its applicability, by comparison with the sort of code programmers write and maintain every day), and the techniques scale very poorly.
Re: (Score:2)
Most parallel problems tend to fall into a small set of categories, and I see no problem with abstracting them away. There are many libraries already that handle the threading for you.
Re:Or.. teach devs to use threading as appropriate (Score:5, Insightful)
An interesting development, and much needed I fear, but yet another layer of abstraction to allow lazy developers to not have to really bother about knowing what their code is actually doing (that's for the poor SoB who has to maintain it is for...)
Developing software is all about managing complexity. Abstractions are the primary tool used to do so. They are neither inherently good or bad. A large part of writing good software is finding the appropriate abstractions and eliminating inappropriate ones. If an abstraction allows a program to be written more easily with an acceptible level of performance and correctness, it is an appropriate abstraction.
To know what code is "actually doing" is relative. No programmer knows what his code is doing at the level of gates on the chip. It's rarely necessary or even helpful to know what the code is doing at the level of CPU instructions or microinstructions. This is especially true if the code runs on multiple CPUs.
Re: (Score:3)
I've spent a lot of time in the earlier part of my career writing multithreaded applications (mostly client/server stuff) and I got very good at it. That said, it's a complete waste of my time as a developer. Developers should focus on domain problems, not how to tweak the microprocessor. In my experience, most developers don't seem to ever fully get how to write multithread apps and threading problems are a royal pain in the ass to debug. I can't see a better candidate for automation than taking this out o
Parallelizing is easy, performance is hard (Score:5, Insightful)
OK, so now we shall have another way to produce parallel programs.
Running things safely in parallel is a very good thing, but it does not by itself guarantee significant improvements in speed. The hard bit is actually to optimize the parallel code [futurechips.org].
Re: (Score:2)
Exactly, I am doing parallel programming and HPC for a leaving. I do not believe in compiler automagic^Wautomatic parallelisation. Typing is only a small parts of the problem. Most algorithms need to be written VERY differently to be able to exploit the parallelism within. Some times, you actually need to have variable concurrency and race condition within your algorithm but you have ways to fix problems whenever they appear.
In brief, I do not think that compilers will be able to do anything more than quite
mutable state (Score:5, Informative)
Mainstream language have mutable state all over the code. Functional programming's big change on state issues is to careful isolate state. The Microsoft approach means that state needs to be tracked carefully so that it could be isolated by the compiler even if it isn't isolated by the code. Which is likely just as much work as isolating state. And the nice thing about isolating state is once you do it you can make use of all sorts of incredibly powerful functional paradigms like like first class functions (closures, partial execution...) and lazyness (infinite data structures, no need to figure out proper order of evaluation..)
The solution to parallelism is functional programming. And no it is not too hard. Excel is a functional programming language that lots of people know that does a great job isolating state.
Re: (Score:3)
Re:mutable state (Score:4, Interesting)
Notice that this constructs an entirely new list, even if none of the elements in the list pass the test. This may seem like a terrible idea, but let's put it this way: if you have 10 threads that share the list, and one of them wants to remove some nodes, you would have had to have copied the list anyway; the functional approach to remove_if is exactly what you want. Now, consider this function, which only removes the first node to match:
Now you have a situation where lists share nodes -- and again, imagine a situation where 10 threads share the list, and one wants to perform this operation. This is one of the reasons that functional programming is so promising for parallel algorithms: you have fewer situations where explicit mutexes are needed, because you are usually copying things that you intend to modify (or more precisely, you never really modify anything).
Of course, things are more complicated in the real world. Purely functional approaches would obviously be pretty inefficient in a lot of cases, since things would be needlessly copied. Lisp, as an example, has both a non-destructive append as well as a destructive nconc, the latter being intended for use in situations where the original lists will not be used again (and can therefore be modified).
Re: (Score:2)
The benefits of using functional languages is realized in terms of program safety and a lack of defects -- not necessarily performance. I think we are all aware of the fact that there are plenty of programmers out there who care very little about introducing a few bugs for the sake of speed and looking clever.
But even if you just use immutable data and message passing as IPC under a procedural language you are headed in the right direction. It's really a mindset and even the procedural programming folks don
Re: (Score:3)
All mainstream languages are slowly gravitating towards FP-style - C# is already halfway there, Java is on its way to join it, C++ STL is functional in spirit if not in implementation, and of course we have new languages like Scala that are explicitly designed to marry the old and the new.
Problem is, for transparent parallelism, you need to go 100% functional / immutable, and that's just not happening. What does happen is isolating distinct units in an OOP program that are then implemented functional-style
Re: (Score:3)
Problem is, for transparent parallelism, you need to go 100% functional / immutable
Which comes down to re-educating programmers to learn to think about their algorithms.
One book which does this nicely, without going to deep into theoretical computer science is How to Design Programs [htdp.org]
Auto-Threading (Score:3)
About time.
Oh! My Sweet Day. (Score:3)
This is my pay day baby! My code is going to run super fast in multicore machines without any (more) extra work from me! Wooo Hooo!
Please take pity on my and let me enjoy my day dream for a few hours, without rudely inserting reality into my reverie! Thanks slashdotters, I know I can count on you!
Re: (Score:3)
Teach programmers (Score:3)
functional programming (Score:3)
A compiler may be able to thread your program, but it will be a long time before it understands your intent well enough to do it well. Also I can think of many situations under which such functionality may not even be a good idea at all. I would assume such a system would use a pool of threads to avoid constant thread construction overhead and if you get a few IO-busy actions threaded out in an ignorant fashion I think you will find your program blocking a lot more, and producing a lot less throughput than it should.
Also, the OP blithely stated that functional programming isn't happening -- yet features of the functional paradigm, like anonymous functions, have made their way into nearly every language of consequence and many new languages proudly tout functional programming features (see f#, scala, rust, clojure, and go). Perhaps pure functional programming is moving pretty slow, but it's features are not.
Automatically paralleling compilers (Score:3)
Automatically paralleling compilers aren't new. SGI had one for C ten years ago. Intel has it for C++ and Fortran now. [intel.com] It's been done for Matlab. There's plenty of theory [wikipedia.org].
Outside of the supercomputing community, it's never gone mainstream. Microsoft is at least trying to do it for C#, which is a reasonable language. An interesting point is that this Microsoft approach exploits immutable objects. Immutable objects can be freely shared without locking, so wide use of immutable objects makes automatic extraction of parallelism easier.
I'd looked at doing something similar for Python, to get rid of the Global Interpreter Lock, Python's boat-anchor. Python already makes wide use of immutable objects, but doesn't gain any performance from them. If everything is thread-local, immutable, or synchronized in the Java sense, you don't need global locking. But the politics of the Python community do not favor performance. (Potentially, Python could be up to at least the LISP level of performance, within a factor of 2 of C, if the language was restricted in certain ways.)
Another part of the problem is the pernicious heritage of POSIX/Linux locking primitives. Many programmers think that locking is an library or OS-level problem, not a language level problem. Locking via library primitives means language doesn't know what data is locked by which lock. This makes optimization and race checking very tough.
The political and social problems here are tougher than the technical ones. So the open source community can't solve them. It takes someone with a big hammer, like Microsoft, to do it.
"Who tells whom what to do?" - V. Lenin
Re: (Score:3, Informative)
SGI's automatic parallelizing software came from Kuck and Associates, Inc (kai.com). I worked there for 8-1/2 years, and one disappointing fact we learned was that the only people who really cared enough about parallelizing their software to analyze their code and modify the source to make it faster were either research scientists (of which there were relatively few) who mostly wanted quicker and cheaper results (because renting time on supercomputers costs $$) or marketing departments of computer hardware
Superscalar, old is new again. (Score:3)
Sounds like user provides list of Structural/Control/Data Hazards. Compiler Pipelines code into blocks that can be ran in parallel. :)
Sounds familiar
The real law in play is Amdahl's (Score:5, Informative)
These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks
Translation: we didn't speed them up any, or at least not by enough that we care to share any number.
Amdahl's Law [wikipedia.org] is difficult to overcome in auto-parallelising systems that rely on anything other than loop optimizations. Basically, in straight-line code, if you make 50% of the code run on multiple cores, you're only going to get at most a 2x improvement in speed. In practice, you won't even get that much (except in loops) due to added overhead. Bottom line: you can write papers about your wonderful auto-parallelizing technique, but when the rubber hits the road this is unlikely to lead to much performance improvement.
Re:The real law in play is Amdahl's (Score:4, Insightful)
There is no metaphysical part of the program that is not parallellizable
require 'digest/sha2'
string = 'Parallelize this!'
123456789.times { string = Digest::SHA2.hexdigest(string) }
puts string
Any suggestions?
Most programs have parts that can be run in parallel, but Amdahl's point is that as you increase the number of processors you'll find that the parts that are inherently serial - even if they're quite small - will increasingly become the bottleneck.
Re:Great potential (Score:4, Interesting)
I'm wondering how this works. Does it scan for loops to remake them as event driven processes? Does it splice off multiple function calls and then throw the results into some sort of stack for retrieval? It's a cool idea, but man, for any kind of non-trivial program it must be some monumental piece of work.
Re: (Score:2)
And it must be hell trying to debug without digging deeply into the generated code.
Re: (Score:2, Interesting)
Any code that doesn't access the same resources can be run simultaneously. Alternately, if no thread is writing to a resource, it can be shared without locking.
Those two observations allow you to determine when code can be run together in different threads. If a method only uses 'final' or 'const' things, then you can run it. If your method only uses one isolated cluster of objects, then it can run simultaneously with another method that only uses a different isolated cluster of objects.
Honestly, any progr
Re:Great potential (Score:4, Insightful)
Re: (Score:3)
as can any code that does access the same memory location as long as that access is all read-only
Not entirely true. If reads on that block of memory are atomic it will work every time, otherwise you might get some strange results as it may be possible for values to change while reading part of it. This mostly happens with more complex data structures but it can happen with primitive ones depending on their size.
Re:Great potential (Score:5, Insightful)
Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:
for every bunny in list {
bunny.hop()
}
This is a simple, sequential process - bunny.hop() is called in order, for every bunny in the list, one after the other.
Now suppose you have defined the data in your program in such a way that the compiler knows that method 'bunny.hop()' only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else. The compiler now knows that the order of execution of the bunny hops doesn't really matter, as every call of bunny.hop() is independent from anything else. This frees the compiler to spawn threads or processes to call as many bunny.hop()'s as he likes at the same time, thereby processing through the list alot faster.
Another method, bunny.eat() actually performs write access on a shared object 'carrot' when called. If two bunnies eat the same carrot, the compiler can not perform automatic parallelization, as running two bunny.eat() methods could lead to invalid state (only one piece of carrot remaining, two bunnies eating 1 piece of carrot at the same time, results in -1 pieces of carrot). In this case, the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot, these are again independent from each other and can again be paralleled.
The requirement to make this possible is to provide the compiler with information on what kind of data something is - is it an isolated hopping? Or a shared carrot? or a global Bunnygod that affects all bunnies?
Re:Great potential (Score:5, Funny)
If two bunnies eat the same carrot...the compiler will take care to run two bunnies eating away at the same carrot sequentially. However if there are 2 bunnies eating the green carrot and another 2 bunnies eating the yellow carrot
Am I the only one who is totally horny after reading that?
Re: (Score:3, Interesting)
Imagine you have a for-loop that calls the method 'hop' on every object 'bunny' in the list:
for every bunny in list {
bunny.hop()
}
And if the original intent is to make the bunnies hop in sequence like performing "the wave", making a new thread for each hop might produce a visual effect of them all hopping at once if the sleep occurred in the hop() function instead of the for loop calling the hops.
Re:Great potential (Score:5, Insightful)
If it has a visual effect, then it's not read-only.
Re:Great potential (Score:5, Insightful)
One would assume that "world modification" (monad; in this case, visual I/O) does not fit the restriction "only ever accesses read-only data, or modifies local data that is unaccessible from anywhere else" and as soon as you are showing something to the user, that blocks the auto-threading.
Then again, if you are drawing map tiles, the order you draw them in doesn't matter - so this goes to show that in some situations, there's no way around the programmer actually having to stop, think and declare whether the program's contract allows it to do some user-visible operation in parallel.
Re: (Score:3)
I know nobody knows how to write low level code anymore but displaying something involves writing to memory.
Re: (Score:3)
Have you ever considered writing a text book on programming?
Re: (Score:3)
Now, it is not guaranteed to be pushed out to memory immediately. If you're doing some like for(i = 0;
Re:Great potential (Score:5, Informative)
Having the compiler identify things which could run in parallel and thread work to run as A B&C D would be a huge step forward as long as it doesn't introduce bugs.
Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it. Even Intel's compiler which is probably the best at it usually misses out on most of the obvious places that should be vectorized. This is why pretty much all code dealing with multimedia content (audio/video/image codecs, games, etc.) still rely on tons of had written SIMD to be optimized to their fullest.
Meh. Not that big a problem. (Score:5, Insightful)
Honestly, you know what this says? This just says some programmers still need to go practice in the threading space.
Threading is not that hard at all for a VERY large class of problems. There are issues -- like, the memory bandwidth will eventually choke the ability of processors to get at nearby memory regions -- and you need some kind of sane plan to deal with cacheing in one core as compared to the next core when you share flags or data coming in during execution from elsewhere -- and you need to figure out when thread overhead starts to chew into your efficiencies -- but really, it's entirely doable, and isn't that difficult a problem space as compared to the actual problems *serious* programmers have to solve. It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.
Then again, some problem spaces lend themselves very well to multiple core approaches -- I just finished a library that slices up images and then applies various separable algorithms to the slices in a variable number of threads (the user of the library can specify.) I wrote it in pure C, used posix threads, no sweat at all, in my 8-core machine, you get just about exactly the gains you'd think (x the number of cores) when the process is CPU cycle intensive, less as the ops are simpler and memory I/O rates rise.
Seriously. If threading seems that hard, you need to go do some studying. It's not the threading. It's you. You really should get on this bus.
Re: (Score:2)
You seem to have replied to the wrong person as I'm not seeing what your post has to do with mine.
Re:Meh. Not that big a problem. (Score:5, Funny)
Maybe it's subtle sarcasm regarding learning threaded programming? People can't identify the right thread in a conversation, but should be expected to write threaded code? :-)
Re:Meh. Not that big a problem. (Score:4, Insightful)
It may just be that your "thing" isn't really all that threadable. Some things aren't. But it should also be very obvious when they aren't.
Should it? Do you really want to bet that you understand a modern CPU, the related hardware architecture, and a production quality compiler for that platform, all well enough to predict reliably when it's worth running a few instructions in parallel and when it isn't, taking into account the inevitable overheads of fork/join operations vs. the potential performance gains of concurrency? Can you really analyse the global structure of a 1,000,000+ line project and reliably identify algorithms that are expressed sequentially but in fact could be transformed losslessly into parallelised equivalents?
Implementing efficient algorithms for embarrassingly parallel computations is (relatively) easy. Implementing efficient algorithms for basically independent parallel operations, such as a web server, is (relatively) easy. Implementing efficient algorithms for problems that might admit some asymmetric parallelism as long as all interactions are coordinated safely and where it might be worth implicit generation of some parallel code paths as long as the benefits outweigh the overheads... Well, if that doesn't seem hard to you, I look forward to reading your papers, because you just outsmarted a lot of researchers who have been investigating this problem for years.
Re: (Score:3)
I think the problem is the original article was about an approach to add another route to parallelise an existing bit of arbitrary code.
Fyngyr jumps in and says something along the lines of 'well if you just wrote with parallelism in mind in the first place when architecting the system you wouldn't have this problem.' A viewpoint I agree with and in fact I would extend to say that from my viewpoint most systems are easier to design to be parallel in the first place than to force single threaded onto them on
Re: (Score:3)
The problem with generalised parallel programming is that the possible interactions between all threads are on the order of O(n!). That is the big problem. That is also why coordinating parts of the program around synchronising objects makes the program easier to understand, because in the same way that O(n!) grows very fast, it also shrinks very fast if n can be reduced.
Another problem I see is that the current core capacity is too large for running small problems.
Take a complex mathematical expression,
Re: (Score:3)
Only when the threads share data; and that's what mutexes, etc., are for. If you share data without mutexes, it had better be 100% invariant (table of constants, etc.) Or actually not shared: different regions of an image where the process at hand doesn't have inter-region dependencies. So really, it's not O(n!) in any practical sense.
Re: (Score:3)
Fine. But that doesn't make it so.
Perhaps you and I simply have different appreciations for the word "complicated." The only "support" I require for parallel programming are conc
Nothing new here (Score:5, Informative)
This is nothing new. As in decades-old. Back when I was at DEC in the 1980s we had auto-threading compilers that worked very well on standard application code. Today, Intel has been doing auto-threading in its C++ and Fortran compilers for more than a decade and it is very effective for a large class of applications. Intel's compiler also has features that help you tweak the code for better auto-parallelism, notably the Guided Auto Parallel feature introduced in 2010. Other compiler vendors also have auto-parallelizing compilers.
I've been in the industry for decades, and every couple of years someone comes out with yet another programming language or technique that is supposed to revolutionize application parallelization. This is just another one, and many years behind the rest of the industry from what I can tell.
Re:Great potential (Score:5, Interesting)
Compilers already try to do this for example with auto-vectorization. The problem is that they are usually quite terrible at it.
I suspect one reason they are so bad at it is they have to be very conservative in how they optimize, due to the relaxed nature of the C language. For example, if the C optimizer cannot prove beyond a shadow of a doubt that a particular memory location isn't being aliased, it can't make any assumptions about the location's value not changing at any step of the process. In practice, that means no optimizations for you.
Given that, it would seem that the Microsoft approach (using not only the higher-level language C#, but a specially customized version of C#) gives their optimizer much greater latitude. Because the language forces the programmer to annotate his objects with readable/writable/immutable tags (think C's "const" tag, but with teeth), and because the language (presumably) doesn't allow the programmer to do sneaky low-level tricks like casting away const or aliasing pointers or pointer math, the optimizer can safely make assumptions about the code that a C or C++ optimizer could never get away with. That may allow it to be more effective than you might anticipate (or maybe not, we'll see).
Re: (Score:2)
Innovation doesn't mean "doing something no one has ever thought of before" it means "doing something no one has done before". If Microsoft's technique works where Carnegie-Mellon's didn't then it's innovative, if it doesn't, it probably isn't. There could still be something innovative in the way they tried which could help someone else later, but mostly what we'd be looking for from an innovation point of view is "actually does what it says in the tin".
Re: (Score:3)
There is great potential here if this sort of thing can be made to work cleanly and safely. I suspect a lot of programmers (myself included) think better in a single-threaded manner. People are mostly used to dealing with A B C D and, as such, I think we usually write code to execute steps the same way.
If you've ever done a dish with more than one casserole or making a side dish while the main dish is cooking you've done parallelization in real life, the problem is that expressing it in a computer language is hard. Functional programming works okay if you're doing lots of similar things at the same time, but not if you're doing lots of different - but parallelizable - tasks. I want to call oven.roast( meat ), stove.boil( potatoes ) and workbench.slice( salad ) but I don't particularly care about the order
Re: (Score:3)
What you describe sounds a lot like "make".
$ cat Makefile
dinner: roast_meat bake lasagna boil_potatoes chop_salad
turn_off stove oven
wipe table
clean_table:
wipe table
prepare_meat: clean_table
cut meat
salt meat
set_meat_temp:
# make it fast
chtemp 800
roast_meat: prepare_meat set_temp_meat
Re: (Score:3)
Indeed. silicon is sort of a shiny black color!
OP probably once read or heard the phrase,
misunderstood the context and thought it
would embiggen the post's language.
Re:How about (Score:5, Funny)
all the magic compiler is going to do is increase the speed of the System Idle Thread.
Have you seen how much processing power that thing consumes? Usually 90% and up. On all cores. It really needs some serious optimization.
Re: (Score:3, Interesting)
/casts detect troll 10 post radius. Multiprocessor technology was already established when Bill Gates was still dumpster diving for code samples, and a lot of the work that gave us modern high performance multiprocessing was done by companies like IBM, Intel (and Intel shootoffs like Sequent) back when Windows was just the latest new DOS application. At the extreme end of the spectrum, this is why all of the top 10 supercomputers run on the UNIX-alike known as Linux, and none run on a Microsoft based OS.
I'm
Re:Fast First Post (Score:5, Interesting)
"MS R&D is the largest computer tech R&D in the world. Combine IBM, Intel, and AMD, and you get an idea of their size."
Citation need. Not disputing the first part. Just the second part about the relative size of Miscorsoft Research.
Re:Fast First Post (Score:4, Informative)
https://en.wikipedia.org/wiki/Microsoft_Research [wikipedia.org]
http://researcher.ibm.com/researcher/search.php?sn=1 [ibm.com]
Infinite Monkeys (Score:3)
I'm sure that they have produced all the worlds great literature the last 30 years as well. Just their size doesn't make them the source of everything good.
I haven't seen any GPU or CPU design used in modern computers come from MicroSoft. Sure, they have worked with it to implement it into their own software. They probably have been able to express their preferences to hardware vendors, in order to make them support the hardware and make it perform good in their software. High Performance multi-core kernel
Re:Fast First Post (Score:5, Funny)
Of those, IBM have around 439,999 project managers.
Re: (Score:3, Funny)
And the last one cleans the office and makes coffee.
Re: (Score:3)
Re:Fast First Post (Score:5, Informative)