Forgot your password?
Programming IT Technology

Faster Chips Are Leaving Programmers in Their Dust 573

Posted by CmdrTaco
from the or-maybe-they've-already-wrapped-around-to-zero dept.
mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."
This discussion has been archived. No new comments can be posted.

Faster Chips Are Leaving Programmers in Their Dust

Comments Filter:
  • Re:C++? (Score:1, Informative)

    by Anonymous Coward on Monday December 17, 2007 @02:08PM (#21727514)
    Almost all the desktop software that matters are written in C++ so obviously the clever minds are not where you think they are. Almost anything that has to do with graphics like the software from Adobe is written in C++, almost anything that has to do with video is written in C++, almost anything that has to do with audio and music is written in C++... the creatives mind are using C++ while the folks banging their heads at yet another web 2.0 or financial app are using Java, Lisp or scripting languages.
  • Re:Thank god (Score:5, Informative)

    by zifn4b (1040588) on Monday December 17, 2007 @02:11PM (#21727570)

    The only significant thing that managed languages make easier with regard to multithreading other than a more intuitive API is garbage collection so that you don't have to worry about using reference counting when passing pointers between multiple threads.

    All of the same challenges that exist in C/C++ such as deadly embrace and dining philosophers still exist in managed languages and require the developer to be trained in multi-threaded programming.

    Some things can be more difficult to implement like semaphores. You also have to be careful about what asynchronous methods and events you invoke because those get queued up on the thread pool and it has a max count.

    I would say managed languages are "easier" to use but to be used effectively you still have to understand the fundamental concepts of multithreaded programming and what's going on underneath the hood of your runtime environment.

  • by $RANDOMLUSER (804576) on Monday December 17, 2007 @02:12PM (#21727576)
    Well then you're not remembering very well. There was some crazy statistic floating around that a Prescott at ~25Ghz would put out as much heat per cm^2 as the surface of the sun.
  • Re:Clue by 4? (Score:1, Informative)

    by Anonymous Coward on Monday December 17, 2007 @02:19PM (#21727692)
    The problem is, most of those programs running "under the hood" on your laptop are likely small enough that they would not cause moderate strain on even a single CPU.

    The user wants the application that he is currently running to perform as well as possible. If that application is single-threaded, it may not be able to perform as well as it needs to appear fast and responsive, and the net result will be that the user perceives the system as slow.

    There's only so much an OS can do to "hide" the need for concurrent design and programming.
  • by scorp1us (235526) on Monday December 17, 2007 @02:19PM (#21727696) Journal
    Full disclosure: I am a Qt Developer (user) I do not work for TrollTech

    The new Qt4.4 (due 1Q2008) has QtConcurrent [], a set of classes that make multi-core processing trivial.

    From the docs:

    The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.

    QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:

            * QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
            * QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
            * QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
            * QtConcurrent::filter() removes all items from a container based on the result of a filter function.
            * QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
            * QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
            * QtConcurrent::run() runs a function in another thread.
            * QFuture represents the result of an asynchronous computation.
            * QFutureIterator allows iterating through results available via QFuture.
            * QFutureWatcher allows monitoring a QFuture using signals-and-slots.
            * QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
            * QRunnable is an abstract class representing a runnable object.
            * QThreadPool manages a pool of threads that run QRunnable objects.

    This makes multi-core programming almost a no-brainer.
  • Re:2005 Called (Score:5, Informative)

    by caerwyn (38056) on Monday December 17, 2007 @02:24PM (#21727762)
    As you know, multiple threads in a program do not actually execute concurrently - processing is still serial, it's just so fast that threads can appear to execute simultaneously - and it's not just about queuing execution either.

    That holds only for multithreaded programming on a single core. As soon as there are multiple cores available, processing does, in fact, happen simultaneously.
  • Erlang (Score:5, Informative)

    by Niten (201835) on Monday December 17, 2007 @02:24PM (#21727778)

    Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang []'s model for concurrency. The slides are available here: []

    The presentation itself (OGG Theora video available here []) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."

    The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?

    The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.

    This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.

    For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie [].

  • by Animats (122034) on Monday December 17, 2007 @02:36PM (#21727932) Homepage

    I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model.

    Six years, and you haven't discovered all the machines built to try that? This was a hot idea in the 1980s. Hypercubes, connection machines, and even perfect shuffle machines work something like that. There's a long history of multidimensional interconnect schemes. Some of them even work.

  • Re:2005 Called (Score:4, Informative)

    by chaboud (231590) on Monday December 17, 2007 @02:37PM (#21727936) Homepage Journal
    Well, 2005 called...

    it wants its reply back.

    The parent is exactly how I would have replied a couple of years ago. I was doing lots of threading work, and I found it easy to the point of being frustrated with other programmers who weren't thinking about threading all of the time.

    I was wrong in two ways:

    1. It's not that easy to do threading in the most efficient way possible. There's almost always room for improvement in real-world software.

    2. There are plenty of programmers who don't write thread-safe/parallel code well (or at all) that are still quite useful in a product development context. Some haven't bothered to learn and some just don't have the head for it. Both types are still useful for getting your work finished, and, if you're responsible for the architecture, you need to think about presenting threading to them in a way that makes it obvious while protecting the ability to reach in and mess with the internals.

    The first point is probably the most important. There are several things that programmers will go through on their way to being decent at parallelization. This is in no strict order and this is definitely not a complete list:

    - OpenMP: "Okay, I've put a loop in OpenMP, and it's faster. I'm using multiple processors!!! Oh.. wait, there's more?"
    Now, to be fair, OpenMP is enough to catch the low-hanging fruit in a lot of software. It's also really easy to try out on your code (and can be controlled at run-time).

    - OpenMP 2: "Wait... why isn't it any faster? Wait.. is it slower?"
    Are you locking on some object? Did you kill an in-loop stateful optimization to break out into multiple threads? Are you memory bound? Blowing cache? It's time to crack out VTune/CodeAnalyst.

    - Traditional threading constructs (mutices, semaphores): "Hey, sweet. I just lock around this important data and we're threadsafe."
    This is also often enough in current software. A critical section (or mutex) protecting some critical data solves the crashing problem, but it injects the lock-contention problem. It can also add the cost of round-tripping to the kernel, thus making some code slower.

    - Transactional data structures: "Awesome. I've cracked the concurrency problem completely."
    Transactional mechanisms are great, and they solve the larger data problem with the skill and cleanliness of an interlocked pointer exchange. Still, there are some issues. Does the naive approach cleanly handle overlapping threads stomping on each-others' write-changes? If so, does it do it without making life hell for the code changing the data? Does the copy/allocation/write strategy save you enough time through parallelism to make back its overhead?

    Should you just go back to a critical section for this code? Should you just go back to OpenMP? Should you just go back to single-threading for this section of code? (not a joke)

    Perhaps as processors get faster by core-scaling instead of clock-scaling this will become less of a dilemma, but to say that "[to do multi-threaded programming effectively] is not that difficult" is akin to writing your first ray-tracer and saying that 3D is "not that difficult." Somtimes it is. At least at this point there are places where threading effectively is a delicate dance that not every developer need think about for a team to produce solid multi-threaded software.

    That doesn't mean that I object to threading being a more tightly-integrated part of the language, of course.
  • by Anonymous Coward on Monday December 17, 2007 @02:40PM (#21727994)

    So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?

    Unfortunately it doesn't beg anything. Begging the question [] is something completely different.
  • Re:Thank god (Score:3, Informative)

    by Fizzl (209397) <[fizzl] [at] []> on Monday December 17, 2007 @02:53PM (#21728180) Homepage Journal
    Ugly, clumsy and complicated compared to Java's way.

    I know how to do threading in C++ on every platform I have used for development. It's just that the modern languages have elegant system with forethought given to threading while desinging the platform/language. Why would anyone new want to learn how to do clumsy non-standard threading in C++?
    I think the options are to adapt or continue riding the dinosaur untill they die out and be left behind. Sorry that I am sending mixed singnals. I have always worked with C++ and only used these newfangled language when forced to. I feel that I have done something stupid by being too closed minded.
  • by mariuszbi (1113049) on Monday December 17, 2007 @03:10PM (#21728414)
    Wait a second! Have you ever coded in C++ ? Even if threads are not in the standard library, you have boost, you have Intel's TBB(threading building blocks), besides the native threading library. Do you trust you library in Java? What if the VM screws everything up. As for the compiler "optimizing" everything there is a little keyword : volatile that just tells the compiler not to optimize memory access for that varible. A think the real problem is working in a new programming paradigm : have a problem with sharing variables : code everything using pure functions.
  • Re:2005 Called (Score:3, Informative)

    by egomaniac (105476) on Monday December 17, 2007 @03:31PM (#21728862) Homepage
    The difference between parallel programming and multithreaded programming is this ... with a parallel algorithm, different parts of one task/thread are done on separate CPUs, whereas with multithreaded programming each one thread/task is done entirely on one processor.

    Wait... what? "Different parts of one thread are done on separate CPUs"?

    In what (real world, non-research) system is a single thread run on multiple processors at the same time? And why are you claiming that running each thread on a single processor, as is done by all major OSes, not parallel programming?

    It's not a semantic difference. Threads are basically just lightweight each thread of a program execution can be thought of as a different process.

    I've re-read that about five times, and I still don't have a clue what point you're trying to make here. From an algorithmic standpoint, all that matters is "these instructions are run in sequence, and these two sets of sequential instructions can run in parallel". The terminology that generally describes the concept of a sequential set of instructions is "thread". Sure, on a given operating system you might use a lightweight process or even a full-blown process to implement each 'thread', but that's an implementation detail and has nothing to do with the algorithm. What are you trying to say?

    OTOH, in parallel programming, a thread/task is broken down into pieces and brought back together when the pieces are done. Think SETI@Home, but on a much smaller scale.

    You're referring to "data parallelism" versus "task parallelism". Breaking a single computation's data set up into parallelizable chunks a la SETI@Home is "data parallelism", whereas running two relatively unrelated tasks in parallel is "task parallelism". They are both forms of parallel programming and your assertion that only data parallelism 'counts' is simply false.
  • by ppanon (16583) on Monday December 17, 2007 @04:14PM (#21729912) Homepage Journal
    Well, you could parallelize recalculation of large spreadsheets. Create dependency trees for cells and split the branch recalculations among different threads. Some accountants and executives with large "what-if?"-type spreadsheets could find that quite useful.

    Browsers could have separate threads for data transfer and rendering. If the web site is using
    tags and CSS, you could split the rendering work for each div to a separate thread. More rapid and frequent partial screen updating can provide today's generation of MTV-style re-orientation addicted workers the perception of faster performance.

    Parallelize WISYWYG document preparation with a backend using TeX text-layout algorithms.

    But probably the biggest advantage would be obtained from more parallelism (both coarse and fine-grained) in GUI operations. That probably requires a re-architecting of display and GUI subsystems. But that's a bit of a chicken-and-egg problem because, to do that properly, you also need GPUs to become multi-core to remove the GPU as a single-thread bottleneck. GPUs are going to hit the same wall general purpose CPUs are hitting now, with a few years' delay. There's hope that today's Crossfire/SLI approaches could provide a hardware base to find an evolutionary path for that.

    I figure it will take at least another 5 years or more for a graphics subsystem redesign, and my guess is that it will happen on Linux first. I don't see Microsoft being first in re-architecting the Windows display subsystem to do it. Certainly not for the next Windows version in 2010(?), and thus they probably won't implement it until 2014 at the earliest. I think it's more likely to happen with somebody replacing large parts of the server as a PhD thesis.

    But, yeah, fundamentally the biggest bottleneck with personal computer systems is the bandwidth between the user and the computer and there's no way to parallelize the user.
  • Re:C++? (Score:5, Informative)

    by Yetihehe (971185) on Monday December 17, 2007 @04:26PM (#21730154)

    ...while all the clever folks have already started writing their scalable applications in something reasonable, like Erlang?
    From erlang site:

    1.4. What sort of problems is Erlang not particularly suitable for?

    People use Erlang for all sorts of surprising things, for instance to communicate with X11 at the protocol level, but, there are some common situations where Erlang is not likely to be the language of choice.

    The most common class of 'less suitable' problems is characterised by performance being a prime requirement and constant-factors having a large effect on performance. Typical examples are image processing, signal processing, sorting large volumes of data and low-level protocol termination.
    That's why most applications are still in c/c++
  • Re:2005 Called (Score:2, Informative)

    by workdeville (1166127) on Monday December 17, 2007 @04:40PM (#21730384)
    It's not particularly unlikely that an algorithm runs in exactly n log n operations (for a sane choice of units). That's kind of the point of Big O notation. Dropping scalar factors into your units. (Though I will note that I prefer using notation like O(n^2 + n) over just O(n^2) since it is more informative, and admit that it complicates my "summary"). The point is: the GGP said n log n, not O(n log n). n/2 log n is faster than n log n, even if they're both of the same algorithmic order.
  • Re:2005 Called (Score:1, Informative)

    by Anonymous Coward on Monday December 17, 2007 @05:29PM (#21731192)
    >> In the end, you end up with something that sorts faster than n log (n).

    > Not without an infinite number of processors you don't.

    If you're sorting n items, and you have n processors, then the Odd-Even Transposition Sort [] has a worst-case time of O(n).
  • Re:2005 Called (Score:3, Informative)

    by pthisis (27352) on Monday December 17, 2007 @05:36PM (#21731290) Homepage Journal

    On modern systems, threads are themselves first-class constructs

    Not in, say, Linux or Plan 9. Context of execution are first-class constructs, and both threads and processes are special cases of COEs.

    A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).

    The latter sentence here is nonsensical on many modern systems.

    The core distinction which applies to most common modern systems (Windows, OS X, Linux, modern Unices, etc) is that:

    In a multithreaded program, the threads all share memory (aside from the stack and possibly thread-local storage). This can be alternately phrased as threads lack memory protection from each other. Processes do not share memory except what is specifically allocated as shared memory (through CreateMemoryMapping, mmap, shm_get, or whatever)

    When you are making the choice of whether to use threads or processes, your fundamental question should be "Do I want to implicitly share all memory?", or "Do I want to throw out memory protection?". Sometimes the answer is yes, but more often it's no (in which case you probably want to go with multiple COW processes, which the Unix/Mac crowd is familiar with through fork() but the equivalent NTCreateProcess with a NULL SectionHandle is much underpublicized on Windows).

    Additionally, some operating systems support fibers.

    Fibers are pretty tangential to the conversation and can also be implemented in user space. They're not really threads (or processes) at all, they're just coroutines. Java's "green threads" are one common example.
  • by Anonymous Coward on Monday December 17, 2007 @06:37PM (#21732164)
    Yes, you can just take a standard sort algorithm and run it multithreaded. Have you heard of the Quicksort and Mergesort algorithms? Did you learn linear algebra in University?

    Both Quicksort and Mergesort are divide-and-conquer algorithms. They are recursively defined to split the data-set and sort each part independently. In fact, most any recursively defined algorithm will run multithreaded.

    Linear algebra lends itself to multithreading. A matrix product is several independent dot products, which each can run on separate cores.

    Modern Fortran compilers already support parallel matrix operations, without any code change. There is no reason why another languages' libraries couldn't also have hidden parallelism.
  • Re:Me too (Score:3, Informative)

    by cecil_turtle (820519) on Tuesday December 18, 2007 @12:59AM (#21734956)

    I'd love to say,"Core 1, you will convert DVDs(or mp3s, or some other processor-intensive task). Core 2, run everything else."
    You can, with processor affinity []. Unless you're saying your processor isn't dual-core. Even still you can just set your process priority / nice level (whatever OS you run) so that it's a lower priority so your other programs run OK.

    I don't know, I haven't owned a computer since 2003 where the processor was really a bottleneck anyway. Unless you're doing something specific like converting media files or running a distributed application (seti, folding, etc.) then normally the bottleneck is disk access. Even on servers it's not much of an issue for me, it's pretty easy to throw more CPU horsepower at a machine nowadays, but again disk performance is killer expensive.
  • by yermoungder (602567) * on Tuesday December 18, 2007 @05:11AM (#21736562)
    You could try Ada.

    Ada is a multi-paradigm language (i.e. procedural or OO) that has threads ("tasks") built it. The experiences of Ada83 tasking wasn't brilliant - the OS/hardward available at the time just weren't up to the job and hopelessly expensive. This left a nasty taste for some which in turn led to FUD about the language as a whole - you wouldn't believe the rubbish I've heard over the years about what Ada is or is supposed to do!

    Ada95 (and in particular the $0, Open Source GNAT [] compiler) changed that, making an affordable-for-the-masses,, fast Ada environment available on GNU/Linux and Windows platforms. It now comes with an Eclipse plug-in too.

    Now, Ada2005 has arrived which even extends OO into the domain of active objects (i.e. extensible, polymorphic tasks).

nohup rm -fr /&