Faster Chips Are Leaving Programmers in Their Dust 573
mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."
OS/2? (Score:5, Interesting)
Does anybody remember DeScribe?
Re:2005 Called (Score:4, Interesting)
Not without an infinite number of processors you don't.
Personal computing? (Score:5, Interesting)
Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video. The video encoding part is already covered - that scales nicely to multiple threads, and even free encoders will use the extra cores to their full potential. That leaves gaming, which is basically proprietary. The game engine must be designed so that AI, physics, and other CPU-bound algorithms can be executed in parallel. This has already been addressed.
So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?
Dan East
Re:The basic problem (Score:2, Interesting)
So it is you (as the programmer) that determines if your program just sits and waits for blocked IO to complete. Or you could spawn a thread for blocked IO calls so your main program thread continues executing (if it is viable to your situation).
With more processors, your program and its blocked IO calls will be checked more frequently. So even blocked IO calls will see a performance increase.
Re:How many languages have multithread support? (Score:3, Interesting)
There is a great talk by Bjarne Stroustrup (http://csclub.uwaterloo.ca/media/C++0x%20-%20An%20Overview.html [uwaterloo.ca]) about the new version of C++ coming out and some of the difficulties getting things added. Essentially, if a new feature will only help 100,000 developers, it isn't important enough to be implemented. With such a huge developer community all the "little" things get left for non-standard API implementations, only big, almost everyone will find useful features get added. That is probably why this version or the next of C++ probably will get a standard tread library, because almost everyone has access to a multicore system. Oh yeah, also, and it sucks, anyone with a few thousand dollars to waste can get added to the committee, but most people don't care enough to go get their feature implemented for that much money (you also have the travel/time off to attend the meetings) except big business, so guess who runs the show (I don't expect anyone to be suprised).
HPC (Score:2, Interesting)
It's like of like when people see recursive functions for the first time. If they don't understand the base condition and inductive step, then they can easily fall into infinite loops or write bugs. Parallel code is the same way... just a bit more tricky.
Re:OS/2? (Score:2, Interesting)
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn (Score:3, Interesting)
Evolution that halted at 4 ghz.... (Score:3, Interesting)
Threads considered harmful (Score:5, Interesting)
Re:Threads Are Not the Answer (Score:5, Interesting)
There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. It's not at all uncommon for overall performance to be improved by doing something *unrelated* at the same time, requiring less synchronization overhead.
Are threads perfect for everything? No. But calling them the second worse thing to happen to computing is, as best, disingenuous.
This has been coming for a while. (Score:4, Interesting)
There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to. There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between. Old school coders had a process in mind to fit a project as a whole and could see the end running program. Many times now, you are to code an algorithm without any regard or concept as to how it might be used. A lot of strange stuff going on out there in the business world with this!
If there is a fundamental change in the base for C++, et al., this is going to possibly have a detrimental effect on the employment market as there will be many who cannot conceptualize multi-threading methodologies much less modeling some existing processing in this paradigm; and leave the markets.
I left the programming markets because of the clash of bean counters vs quality, and maybe this will have a telling change in that curve. I always did enjoy some coding over the years and maybe this would make an interesting re-introduction. I have personally not coded in a multi-threading project but have the concepts down. Might be fun!
Re:2005 Called (Score:1, Interesting)
It's not a semantic difference. Threads are basically just lightweight processes...so each thread of a program execution can be thought of as a different process. OTOH, in parallel programming, a thread/task is broken down into pieces and brought back together when the pieces are done. Think SETI@Home, but on a much smaller scale.
This probably isn't all that useful for writing something like a web server, where it makes some sense to thread off each connection, but for writing a scientific computing application like a simulation or a climate model where you have number crunching that can be done on different subsets of data, you might want to break down those calculations so they occur on different processors. This requires some degree of sophistication as you usually have one part of the calculation that depends on another part and you have to send data back and forth between parts. This involves more than multithreading, but true parallel processing.
The rise of Erlang and Haskell? (Score:3, Interesting)
Will the new world of concurrency cause a shift in language popularity? Or will traditional languages remain more popular, perhaps with some enhancements? C++ is gaining concurrency enhancements; C++, Python, and many other languages work well with map/reduce systems like Google MapReduce; and even with no enhancements to the language, you can decompose larger systems into multiple threads or multiple processes to better harness concurrency.
If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? People grow very attached to their familiar languages and tools; to displace the entrenched languages, alternative languages need to not just be better, they need to be a lot better.
steveha
Chip makers want developers to pay for lunch (Score:3, Interesting)
Without the work of developers, multi-core chips will be like the extra transistors in transistor radios in the 1960s: good for marketing but functionally useless.
LabVIEW [& other graphical environments] (Score:3, Interesting)
my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?)
LabVIEW, by its very nature [which is graphical - based on "G" - the "Graphical" programming language] is kinda/sorta topologically self-threading: If a piece of LabVIEW code sits off in its own connected component, then [more or less] it gets its own thread.
Of course, all your ".h" & ".c" [or ".cc"] files [& their innards] might very well break down into little distinct connected components which are ripe for running their own threads, it's just that you can't - unless you're some sort of a super genius - you can't readily visualize all those connected components as they exist in your code.
Now you and your colleagues could try to anticipate the connected components a priori, during the "planning" phase: You could draw huge pictures on the dry-erase board, and everyone could yell and scream at each other about the topological structure which the code should ultimately embody, and then everyone would have to promise - Scout's Honor! - that they would stick to the blueprint [which they might very well resent as having been shoved down their throats by some pointed-headed suit who didn't have any clue what he was talking about] - but the beauty of LabVIEW is that THE CODE IS THE BLUEPRINT [which I think is a point that Jack Reeves used to make [c2.com]].
There's actually a Slashdotter, MOBE2001 [slashdot.org], who maintains a blog called Rebel Science News [blogspot.com], who's got some pretty interesting ideas here - he seems to be leaning towards a graphical approach to this [rebelscience.org] [realizing that the fundamental nature of the problem tends to be topological, rather than anything which we (YET!) would recognize as semantic], but his program is very, very ambitious [if I had a couple of spare lifetimes, I must just throw one in that general direction].
Another line of thought which everyone should keep an eye on is the discipline of Petri nets [wikipedia.org] - it's kinduva big graphical/topological approach to state machines, which [if someone were to put the necessary elbow grease into it] might prove to be very useful in squeezing the most bang for the buck out of these massively-multicore CPU's.
Sutter's article is awesome (Score:3, Interesting)
When I first started writing object-oriented code, I was somewhat dismayed to find that OO was an extension to the same ol' linear programming. It seemed to me that objects should be able to exist as if alive and react freely, but really, they were just a fancy interface to the linear runtime. Color me disapointed yet again.
It's an important paradigm shift [chrisblanc.org] to recognize parallel computing. Maybe when the world realizes the importance of parallel computing, and parallel thinking, we'll have that singularity that some writers talk about. People will no longer think in such basic terms and be so ignorant of context and timing. That in itself must be nice.
Sutter's article hits home with all of this. His conclusion is that efficient programming, and elegant programming that takes advantage of, not conforms to, the parallel model is the future. Judging by the chips I see on the market today, he was right, 2.5 years ago. He will continue to be right. The question is whether programmers step up to this challenge, and see it as being as fun as I think it will be.
Re:2005 Called (Score:2, Interesting)
Regarding the other "he said n log n int O(n log n)" comment...well, that's already been answered (and with considerably more tact than I would have used).
Re:2005 Called (Score:4, Interesting)
Eventually, good parallel algorithm libraries will pop up. That will help some subset of problems. I'd expect frameworks to pop up as well, helping another. But in many cases it just comes down to changing how we write programs.
And you're right, this isn't really a desktop issue- its mainly a server one. Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it. That doesn't make it any less important of a problem to solve. Although I expect in the end people will still end up disappointed- parallelization is not magic pixie dust, you can only get so much of a speedup. I wouldn't be surprised if those 8 corse only give a 2x speedup over a single core on many apps.
Re:The rise of Erlang and Haskell? (Score:2, Interesting)
Re:Thank god (Score:3, Interesting)
(I know - I had a discussion with a chap about C# thread-safe singleton initialisation. A simple app to test performance on my little laptop had a static initialised singleton taking 1.5 seconds, lock-based initialisation in 6 seconds. No big deal, we expect that, but then I ran the same tests on a dual-CPU server and both apps took 30 seconds - the framework decided it knew best).
The cure is the Actor programming model (Score:3, Interesting)
If an object wants a result from another object, then it obtains a future value that represents the result of the computation when it will be ready. When the caller wants the actual value, it blocks until the result is available.
Of course, blocking on a result would cause a deadlock in recursive algorithms...therefore, objects don't wait for a result, they simply enter a new message loop at the position they wait for a result. When the result is ready, the callee wakes up the caller by putting a 'terminate current loop' message in the caller's message loop after the result is computed.
The Actor model, implemented as described above, not only solves the problems of classical parallel programming (deadlocks, priority inversion, etc), but it also exposes whatever parallelism is there in a program.
Synchronization is performed only in two places:
1) when inserting/removing elements in an object's queue.
2) when adding the current thread into the waiting list of a future value.
Both synchronizations are implemented via spinlocks. In the case of the queue, there is no need to synchronize on all the queue, just on the edges.
I have made a demo in C++, using Boehm's garbage collector (it is a quite complex system, it needs gc), and it works beautifully. With this model, there is no need to use mutexes, semaphores, wait conditions, or any other synchronization primitive.
I chose C++ because:
1) operator overloading allows future values to be treated naturally like non-future values.
2) when waiting for a result, the waiting thread puts itself in the waiting list of the future. The nodes of the list are allocated on the stack; only c/c++ can do this, and it is crucial, because it minimizes allocation.
Another advantage of this system is that tail recursion comes for free: when you call a method which you don't want the result of, the local stack is not exhausted, because there is no call, only a message placed in a queue.
Patterns like the producer/consumer pattern come for free: one object simply invokes the other.
Data parallelism comes for free: invoking a computation on an array of objects will execute the computations in parallel, on each element of the array. For example, increasing the elements of an array can take O(N) with one CPU and O(1) with N cpus.
Of course, it is much slower on two or even four cores than the same sequential code. But given 10 or more cores, programs start to exhibit linear increase in performance, depending on algorithm of course.
The system is much like the nervous system of an animal: signals are transmitted slowly from one nerve to another, but processing is parallel, so the organism can do many things at the same time.
Another similarity between this system and the nervous system of an animal is that when a nerve wants to transmit an electrical signal to another nerve, the nerves must synchronize, much like there should be synchronization when an object puts a message in the object of another thread.
Re:2005 Called (Score:2, Interesting)
Editing video is also a niche use case. Not doing it myself, I won't comment on how much resources it really requires. But 1 in 20 or 50 people do this regularly, if that.
Touching up photos takes almost no resources on a modern machine. People were doing this efficiently on 1GHZ and lower processors.
Browsing, even with multiple windows and tabs and AJAX is very very low resource usage. My EEEPC handles it easily, despite being a 900 MHZ celeron UNDERCLOCKED to a 600 MHZ speed.
Where the hell did I mention moving anything off the desktop? I think web based apps suck for the most part. My point was that most computers are at 1% load or less most of the time, and sit at 30% load or less while in active use, except for those machines being used for video games. You seem to have some hugely inflated idea of what type of resources things actually use, the common email, web surfing, office use case can easily be handled by a 300MHz pentium 2, if not less.
Re:OS/2? (Score:4, Interesting)
The big problem is not the operating system designers, it's the CPU designers. They integrated two orthogonal concepts, protection and translation, into the same mechanism (page tables, segment tables, etc). The operating system wants to do translation so it can implement virtual memory. The userspace program wants to do protection so it can use parallel contexts efficiently. Mondrian memory protection would fix this, but no one has implemented it in a commercial microprocessor (to my knowledge).
Developers need to get out for lunch (Score:3, Interesting)
Re:2005 Called (Score:3, Interesting)
Re:Erlang (Score:1, Interesting)
I'm not trying to be mean, but talking with him he really doesn't understand it and I think has never programmed anything using multi-threads. At first I was really scared - finding out that Erlang enthusiasts just couldn't grasp the basics, until I read Armstrong's book and realized that they didn't need to. Joe gets it and the language hides it so well from you that you just don't need to understand it whatsoever. Just like Java developers don't need to understand pointers and OO programmers can (often) avoid recursion, Erlang coders just don't need to deal with it. I'm still a bit put off, and I wouldn't let the author near any infrustructural code in a threaded language, but Erlang really has become the VB of concurrency. Quite a feat.
High level language (Score:3, Interesting)
Why a Java virtual machine can't take the burden of the multi-core adaptation?
They have promised "write once run anywhere"!
Lazy coder
Re:Thank god (Score:3, Interesting)
for (int i = 0; i < 100; i++) {
a[i] = a[i]*a[i];
}
into
Parallel.For(0, 100, delegate(int i) {
a[i] = a[i]*a[i];
});
and the hint tells the
So let me get this straight: the runtime is going to
Sounds like a losing proposition to me. I don't think this is the kind of parallelism that is going to bring noticeable gains.