Forgot your password?
typodupeerror
Programming IT Technology Hardware

An Overview of Parallelism 197

Posted by kdawson
from the cores-galore dept.
Mortimer.CA writes with a recently released report from Berkeley entitled "The Landscape of Parallel Computing Research: A View from Berkeley: "Generally they conclude that the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.' This assumes things stay 'evolutionary' and that programming stays more or less how it has done in previous years (though languages like Erlang can probably help to change this)." Read on for Mortimer.CA's summary from the paper of some "conventional wisdoms" and their replacements.

Old and new conventional wisdoms:
  • Old CW: Power is free, but transistors are expensive.
  • New CW is the "Power wall": Power is expensive, but transistors are "free." That is, we can put more transistors on a chip than we have the power to turn on.

  • Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
  • New CW: As chips drop below 65-nm feature sizes, they will have high soft and hard error rates.

  • Old CW: Multiply is slow, but load and store is fast.
  • New CW is the "Memory wall" [Wulf and McKee 1995]: Load and store is slow, but multiply is fast.

  • Old CW: Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
  • New CW: It will be a very long wait for a faster sequential computer (see above).
This discussion has been archived. No new comments can be posted.

An Overview of Parallelism

Comments Filter:
  • by macadamia_harold (947445) on Saturday February 10, 2007 @07:57PM (#17967086) Homepage
    Mortimer.CA writes with a recently released report from Berkley entitled "The Landscape of Parallel Computing Research: A View from Berkeley

    Would that be a Parallelograph?
  • nothing new here... (Score:3, Informative)

    by Anonymous Coward on Saturday February 10, 2007 @08:08PM (#17967166)
    pretty much the same thing Dave Patterson's been saying for a while now...in fact, the CW sounded so familiar, I went back to double check his lecture slides from more than a year ago:

    http://vlsi.cs.berkeley.edu/cs252-s06/images/1/1b/ Cs252s06-lec01-intro.pdf [berkeley.edu]

    and it's pretty much identical (check out slide 3 on the first page of the pdf)
  • Erlang (Score:5, Insightful)

    by Anonymous Coward on Saturday February 10, 2007 @08:09PM (#17967176)
    Erlang only provides a way of proving parallel correctness, a la CSP. This means avoiding deadlocks and such. The primary difficulty of crafting algorithms to run efficently over multiple CPUs still remains. Erlang does not do any automatic parallelization, and expects the programmer to write the code with multiple CPUs in mind.


    I'm wating for a language which would parallelize stuff for you. This is most likely to be a functinal language, or an extension to an existing functional language. Maybe even Erlang.

    • by LLuthor (909583)
      Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

      This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

      This is why C/C++ will dominate scalable applications for a at least few years more.
      • Re: (Score:3, Informative)

        Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

        This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

        Haskell comes pretty close, and it's designed from the beginning to be pure.

        In fact, it may be these immutable data structures that make the pure functional languages able to perform well

        • In Java, you can have threads communicate by passing objects that you treat as immutable -- a thread creates an object, passes it as an argument to invokeLater(), where it gets synchronized with the event queue, acted upon, and then garbage collected. That the thread that called invokeLater() on an object no longer retains any references to that object is a matter of programming discipline, but that style of multi-threaded programming is available.

          The question or concern I have is that if one programs th

          • by Pinky (738)
            The short answer is the modern java garbage collector is tuned to deal with lots of small, short lived objects so odds are you're all right. An interesting thing to think about is when you do invokeLater(), a common style is to create a new Runnable().. I don't know if you're doing this but if you are then you're creating a new Runnable object everytime anyway.. Also know that there is some overhead to switching threads.

            If you want to now where you're being in-efficient, you've got to profile, baby... Don't
        • by zsau (266209)
          Surely functional programming languages are where the idea of syntax design as UI problem came from! Lisp on the one hand has a very simple yet incredibly powerful syntax; whereas Haskell has an incredibly beautiful syntax with more sugar than you can dream of. Neither have the same kind of hard-and-fast distinction betwee built-ins and user-defined structures of others either; I can just as easily define an infix function or a macro that doesn't expand its third argument in Haskell or Lisp (respectively) a
          • Surely functional programming languages are where the idea of syntax design as UI problem came from! Lisp on the one hand has a very simple yet incredibly powerful syntax; whereas Haskell has an incredibly beautiful syntax with more sugar than you can dream of.

            Okay, I'll be more specific: they need to address syntax as a human user interface design problem.

            Lisp. UI. Heh. Lisp syntax is a good user interface if sendmail.cf is. Its near lack of consistent visual cues disqualifies it completely.

            Haskell is a go

            • by zsau (266209)
              I don't understand. What's wrong with Haskell's syntax from a Human Interface perspective? I really can't think how it's substantially different from Python, once you account for differences of language philosophy. Do you just want brackets around parameters to a function call? (Or do you refer to the different styles Haskell allows, like the so-called point-free style for function definitions that make it completely unobvious how many arguments are being used?)

              (I realise about Lisp; I was just showing that
        • by killjoe (766577)
          This is probably an ignorant question but I will ask it anyway.

          My understanding is that in a functional language io is a bitch. In order to overcome that haskell has invented impure things called monads which are notoriously hard to understand and code.

          Given that virtually every computing task involved io why would you design a language that is not optimized specifically for io? Whether it's file access, network access or database access virtually every program ever written uses some io right?
          • Mercury is trying to make pure functional IO less anoying by developing syntatic sugar. That shows that functional IO can be easy, with the exception of you being oblied to declare that your functions do IO (what makes debug by printing quite hard).

            Now, about the GP question of paralel compilers, I've never seen one, and don't know how hard it would be to build it. The very hard problem of breaking the program on pararelizable chuncks you get for free, but you still need to decide how many threads you'll u

        • languages in which the syntax design is regarded as a user-interface problem
          Lisp then is the ultimate user-interface problem :P
    • Re: (Score:3, Insightful)

      by CastrTroy (595695)
      I wrote some parellel code using MPI [lam-mpi.org] in university. It takes a lot of work to get the hang of at first, and many people who I know that were good at programming had lots of trouble in this course, because programming for parallelism is very different than programming for a single processor. On the other hand, you can get much better performance from parallel algorithms. However, I think that we could do just as well sticking with the regular algorithms, and having a lot of threads each running on a diffe
    • Re:Erlang (Score:5, Informative)

      by cpm80 (899906) on Sunday February 11, 2007 @12:19AM (#17968946) Homepage
      I think using a *specific* language for automatic parallelization is the wrong way. Some GNU folks are working on language independent autoparallelization for GCC 4.3. Their implementation is an extension to the OpenMP implementation in GCC. Read OpenMP and automatic parallelization in GCC, D. Novillo, GCC Developers' Summit, Ottawa, Canada, June 2006 http://people.redhat.com/dnovillo/Papers/ [redhat.com] for details.
    • by joss (1346)
      > I'm wating for a language which would parallelize stuff for you. T

      You may as well wait for a language which does everything for you.
      For many problems, coming up with an algorithm for efficiently solving the
      problem on multiple cores is harder than the algorithm on a single core
      [eg, try parallelising a large matrix inversion etc etc].
      I wouldnt expect languages to come up with algorithms as fundamental
      as QuickSort any time soon [when they do, they'll be smarter than we
      are, so we wont be programming at all
    • It's a myth that functional programs are more parallelizable than imperative ones.

      First of all, most functional programs do heavy use of monads (especially the IO monad) and hence they need synchronization primitives around those monads just like imperative programs.

      Secondly, those parts that do not use monads still need to be executed in certain order, so there needs to be a scheduler which synchronizes the multiple cores (waiting for results and dispatching computations), which again means using semaphore
      • by Pinky (738)
        > It's a myth that functional programs are more parallelizable than imperative ones.

        Never heard that one.

        I think what you're getting at is:
        It's easier to write correct, parallelized code in a functional language.

        It's easier to write correct, parallelized code in functional languages because many threading errors come from race conditions when changing the state of some object of datastructure. Functional programming languages avoid having mutable states or datastructure. Therefore witting a race conditio
    • by Pinky (738)
      If it's not functional it will have very good support for many of aspects of a functional language. One of the first things one realizes after dealing with concurrency for a while is immutable objects are a good thing. Mutable states are bad..

      I can't see automatic parallelization happening for a long time. Too hard.. What I expect to see is a trend toward functional style programming.. Also maybe things like composable memory transactions.. Maybe thread based synchronization (which is a term I've been using
  • It's not hard (Score:5, Insightful)

    by PhrostyMcByte (589271) <phrosty@gmail.com> on Saturday February 10, 2007 @08:12PM (#17967204) Homepage

    I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking. This is probably good advice to a noob programmer but I otherwise can't stand people who are of the "absolutely, never, ever, use threads" mindset.

    Some applications have no need to be multithreaded, but when they do it is a lot easier than people make it out to be. Taking advantage of lock-free algorithms and NUMA for maximum scalability *can* be hard, but the people who need these will have the proper experience to tackle it.

    Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine.

    • Re:It's not hard (Score:4, Interesting)

      by ardor (673957) on Saturday February 10, 2007 @08:23PM (#17967302)
      Well the indeterministic nature of multithreading is still a problem. With one thread, debugging is simple: the only thread present will be stopped. But with multiple threads, how is the debugger supposed to handle the problem? Stop all threads? Only the current one? Etc. This is relevant when debugging race conditions.

      Also, the second great problem is that thread problems are hard to find. When I write a class hierarchy, an OOP language can help me with seeing design errors (for example, unnecessary multiple inheritance), or misses in const-correctness. Threading, however, is only present as mutexes, conditions etc.

      One other issue with threads is that they effectively modify the execution sequence. Traditional single-threaded programs have a sequence that looks like a long line. Threading introduces branches and joins, turning the simple line into a net. Obviously, this complicates things. Petri nets can be useful in modeling this.
      • by cluckshot (658931)

        Its just opinion but I think the problem with parallel programming is a substantial lack of processors. If for example a die was made with 10 million processors on it (very simple processors) with modest queue memory, there are applications in optical data processing that would become extremely efficient and much more natural than any solutions as of this time. Otherwise we just spend our time loading the queue of one or 4 processors millions of times to do exactly the same process on the data from each

        • Not all algorithms can be parallelized that easily. Imagine e.g. a parser: You cannot parse text by having a million processors looking at one character each.
          • Re:It's not hard (Score:4, Insightful)

            by mikael (484) on Saturday February 10, 2007 @10:07PM (#17967998)
            Not all algorithms can be parallelized that easily. Imagine e.g. a parser: You cannot parse text by having a million processors looking at one character each.

            You could have the first thread processor split the text by white space. Then each block of characters is assigned to any number of processors to find the matching token. I've seen some parsers where the entire document was
            read in, converted into an array of tokens before returning back to the calling routine.
            • by vadim_t (324782)
              But in that case, your speed is limited by the speed of the splitting by whitespace. And if you've got lots of CPUs, chances are they're quite slow on their own.

              Then there's that you can hardly split by whitespace with a complex syntax. You certainly won't be able to parse C like that.
            • by drerwk (695572)
              You could have the first thread processor split the text by white space. Then each block of characters is assigned to any number of processors to find the matching token. I've seen some parsers where the entire document was read in, converted into an array of tokens before returning back to the calling routine.

              I'm in the midst of a project that is trying to use multiple threads, and this example is not far off from the thinking that is going on; by some of the engineer who have not done MT before. Assum
          • I think this is a great example of where serial mind sets inhibit us. Parsing in parallel makes a whole lot of sense once you're ready for it. Granted you need parallel data storage to make good on parallel parsing, but it's feasible if you are ready to restructure your whole mind set.

            There is a very interesting theory provided by Trefethen and Bau in Numerical Linear Algebra that the existence of exact algorithms for most of the complicated interesting matrix operations may have slowed our finding a faster

          • by cluckshot (658931)

            Obviously some operations do best as serial operations. Any process whereby we do a series of operations to one unit of data that is not done to all of the rest of the data in exactly the same way has obvious advantages being done in series. It is by definition a series of events.

            To clarify, using a serial process is sort of like building a stack. You cannot add an item to a stack ahead of the item that is to precede it. If you do you break the intended order. Parallel is sort of like what could ha

            • We have developed processors that are really good. Their clock rates are beyond imagination fast. Yet competing against a processor which is massively parallel and with a clock rate less than 30 cycles per second, (Your brain) modern computers are at best running poor second rate.

              That entirely depends on the type of problem you throw at them. After all, if computers couldn't beat our brains in certain areas, there would be no point in using them.
    • by tepples (727027)

      I think the main reason people say "don't use threads" is because while single threaded apps are easy to debug, multi-threaded ones will crash and burn at seemingly random places if the programmer didn't plan ahead and use proper locking.

      Do the C standard and the C++ standard go into detail as to the semantics of this locking? No, they don't even specify what a thread is. This makes it less likely that students learn proper locking in college.

      Language extensions for threading would be great, and I'm sure somebody is working on it. But until that magical threading language (maybe c++1x) comes along the current ones work just fine.

      Which "current ones"? Do POSIX threads work on Microsoft Windows?

      • Do the C standard and the C++ standard go into detail as to the semantics of this locking?

        AFAIK C++0x (i.e. the next version of the C++ standard) will.
      • by jlarocco (851450)

        Do the C standard and the C++ standard go into detail as to the semantics of this locking? No, they don't even specify what a thread is. This makes it less likely that students learn proper locking in college.

        I don't think that's relevant. The ISO standard for Ada completely specifies the semantics of threads and thread locking, but that doesn't necessarily make the problem any easier. Of course Ada usually isn't taught in schools, but (unfortunately) C and C++ aren't usually taught in schools either,

    • by MBCook (132727)

      Maybe we should teach the basics. I don't remember my CS program (this was '01-'05 or so) teaching anything about working with threads. Everything I know about threads and threaded programming has been picked up by reading books on threading, experimenting with threading, and what I've learned on my job from others who know threading. I'm not going to claim I'm that competent through.

      What do I know about NUMA and other complex issues of thread handling and modern computers? What I know in that category cam

    • by jd (1658) <imipak&yahoo,com> on Saturday February 10, 2007 @09:54PM (#17967912) Homepage Journal
      This problem was "solved" (on paper) in the mid 1970s. Instead of writing a highly complex parallel program that you can't readily debug, you write a program that the computer can generate the parallel code for. Provided the compiler is correct, the sequential source and the parallel binary will be functionally the same, even though (at the instruction level) they might actually be quite different. What's more, if you compile the sequential source into a sequential binary, the sequential binary will behave exactly the same as the parallel version (only much slower).

      Any reproducable bug in the parallel binary will be reproducable given the same set of inputs on the sequential binary, which you can then debug as you have the corresponding sequential source code.

      So why isn't this done? Automagically parallelizing compilers (as opposed to compilers that merely parallelize what you tell them to parallelize) are extremely hard to write. Until the advent of Beowulf clusters, low-cost SMP and low-cost multi-core CPUs, there simply haven't been enough machines out there capable of sufficiently complex parallelism to make it worth the cost. Simply make a complex-enough inter-process communication system, with a million ways to signal and a billion types of events. Any programmer who complains they can't use that mess can then be burned at the stake for their obvious lack of appreciation for all these fine tools.

      Have you ever run GCC with maximum profiling over a program, tested the program, then re-run GCC using the profiling output as input to the optimizer? It's painful. Now, to parallelize, the compiler must automatically not just do one trivial run but get as much coverage as possible, and then not just tweak some optimizer flags but run some fairly hefty herustics to guess what a parallel form might look like. And it will need to do this not just the once, but many times over to find a form that is faster than the sequential version and does not result in any timing bugs that can be picked up by automatic tools.

      The idea of spending a small fortune on building a compiler that can actually do all that reliably, effectively, portably and quickly, when the total number of purchasers will be in the double or treble digits at most - say what you like about the blatant stupidity rife in commercial software, but they know a bad bet when they see one. You will never see something with that degree of intelligence come out of PCG or Green Hills - if they didn't go bankrupt making it, they'd go bankrupt from the unsold stock, and they know it.

      What about a free/open source version? GCC already has some of the key ingredients needed, after all. Aside from the fact that the GCC developers are not known for their speed or responsiveness - particularly to arcane problems - it would take many days to compile even SuperTuxKart and probably months when it came to X11, glibc or even the Linux kernel. This is far longer than the lifetime of most of the source packages - they've usually been patched on that sort of timeframe at least once. The resulting binaries might even be truly perfectly parallel, but they'd still be obsolete. You'd have to do some very heavy research into compiler theory to get GCC fast enough and powerful enough to tackle such problems within the lifetime of the product being compiled. Hey, I'm not saying GCC is bad - as a sequential, single-pass compiler, it's pretty damn good. At the Supercomputer shows, GCC is used as the benchmark to beat, in terms of code produced. The people at such shows aren't easily impressed and would not take boasts of producing binaries a few percent faster than GCC unless that meant a hell of a lot. But I'm not convinced it'll be the launchpad for a new generation of automatic parallelizing compilers. I think that's going to require someone writing such a compiler from scratch.

      Automatic parallelization is unlikely to happen in my lifetime, even though the early research was taking place at about the time I first started primary school. It's a hard problem that isn't being made easier by having been largely avoided.

    • Re: (Score:3, Interesting)

      by owlstead (636356)
      For a new C++ with multithreading language extensions (for manually coding the multithreading), good API and IDE/tool support, look no further. It's called Java and it has been around for ages. You really, really, really don't want multithreading in a non-"managed" language. You don't want to debug an application where *every* part of your application can be messed up by any thread. The advantage of Java is that the build in security meassurements.

      Things you need to have support for in a language/environmen
    • until that magical threading language (maybe c++1x) comes along

      Studied Erlang much?

    • Maybe we'll finally see functional languages come to the forefront. They make it a whole lot easier for the compiler to extract parallelism automatically. For example,see Parallel Haskell [microsoft.com]. I think this is the closest I've seen to a "magical threading language".
  • Hmmm... (Score:5, Interesting)

    by ardor (673957) on Saturday February 10, 2007 @08:13PM (#17967216)
    "but is likely to face diminishing returns as 16 and 32 processor systems are realized"

    Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?
    • Granularity.
      Neurons only work on miniscule problems. Pretty much like a single gate.
      Our programs are not yet capable of dividing problems into such small pieces.

      Plus, so far most of our parallelization efforts are focused on optimizing software. Brains are dedicated wetware machines.
      • by ardor (673957)
        In this case, the current parallelization efforts miss the point. Is there actual research into CPUs consisting of billion miniscule neuron-like units? Something like a neural net hardware? Maybe these would fare better than pure software ones...
        • I haven't heard of any hardware such as that, but I'd guess it'd be possible to do with FPGAs.
          Surely someone here could enlighten us.
          • by Simon80 (874052)
            One of my university profs does neural network research with FPGAs, perhaps that qualifies.
      • by TheLink (130905)
        Nope. Not really like a single gate at all.

        You have stuff like a single neuron that fires just for a specific abstract concept. Go search for "Halle Berry" neuron.

        So it could be more like billions of neurons watching "Sensory Channel + Stream of Consciousness", going mostly blah and then suddenly you get one little neuron yelling "It's Halle Berry!!!".

        Maybe followed by another one yelling "It's on a TV!"

        And then some neuron yelling "It's neurons screaming about Halle Berry on TV!".

        Then followed by maybe "Ch
    • Re:Hmmm... (Score:5, Funny)

      by CosmeticLobotamy (155360) on Saturday February 10, 2007 @08:32PM (#17967362)
      Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

      Brain scalability is just not that great. Trust me, putting more than four brains in one head is just asking for locking problems out the yin-yang.
    • by imsabbel (611519)
      Yeah.
      You are missing that our super-great brain isnt even able to calculate more than 1 or 2 additions per second.
      Meaning: the parallism thats at work in our brain is absolutely useless for thinks we want to do with a computer.
      • by ardor (673957)
        As mentioned before, it is all a matter of how the problem is viewed. If you can translate the addition into an abstract visual model, then the brain is very quick.
      • yet a baseball player can calculate the exact force needed to throw a baseball from center field to home plate. Here the angle and force are even overspecified and so there is a highly non-linear optimization to perform to get the minimum possible travel time to the throw. There is also an error in angle and force so a loss-function must be used to trade off accuracy for speed, in addion how important the throw is which dictates how probable an injury should be allowed. If you had a machine that let you thr
        • by timeOday (582209)

          yet a baseball player can calculate the exact force needed to throw a baseball from center field to home plate.
          No he does not. He simply learns the mapping from stimuli to actions. He's not calculating anything. That's why pitchers pitch instead of studying physics books.
          • by ardor (673957)
            Actually, he does. After several games, the brain can extrapolate pretty accurately the necessary parameters. The guy doesn't think in terms of parameters and forces, of course. But the brain adapts to the problem.
          • okay, lets say it's as simple as you say. It still requires interpolation because a center fielder never stands in exactly the same spot and yet tries to hit the catcher's mit right above and to the left of home plate, so previous responses would give a noisy surface from which extrapolation would be necessary--I would argue that these calculations are perhaps more intensive than the direct physics calculations.
    • Re: (Score:3, Interesting)

      by philipgar (595691)
      Is this really true? Of course for some tasks the massive parallelism of the human brain works great. The brain can analyze complex images extremely fast, comparing them in parallel to it's own internal database of images, using fuzzy reasoning to detect if something is familiar etc. However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve. This is due to bad programming in the brain that sucks at d
      • by ardor (673957)
        Yes, the brain works visually, and this is the problem with math. Mathematicians often say that they imagine the mathematical problems in a visual way, but so abstract that it cannot be painted on paper. So the actual problem is the translation.
        • by TheLink (130905)
          The brain doesn't necessarily work visually. Since blind people still manage somehow :).

          And there are lots of more important life affecting problems than visual calculations.

          Like whether to go for chocolate or vanilla cake, or both. Or whether this funny smelling stuff should be eaten or not. Or whether I can make it safely up the slope. Or whether I can jump, reach that branch AND it is likely to hold me (this is not just simple visual - since you need to estimate your weight, _current_ strength factoring
      • by zsau (266209)
        However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve.

        You give an x86 computer PPC binaries and ask it to run them and they'll be slow. Your complex maths problems aren't in your brain's natural format. But anyone (experienced with driving a particular car) can tell you just how hard they'll need break to stop, or how fast they can go round that corner safely. And they can do this whilst telling thei
    • It depends. Our brain's are vastly more complicated than a computer CPU, but they're not designed to do the same thing. There was an interesting explanation of this in a fiction book I read a while back that talked a little about AI.

      Basically, it said the human brain was designed for algorithmic complexity, not speed. If you take a brain, and make it ten times faster, it won't be any more intelligent. It'll just take 1/10th of the time to get the same answer as it used to, it still won't be able to solve
    • There are two things that we are missing:

      1) the brain does not share data. Data in the brain are copied around in the form of signals.

      2) the brain is not a generalized turing machine. It can only do pattern matching on images/sounds/smells/touch. Don't be fooled of our ability to do arithmetic of formulate theorems: these are the result of pattern matching as well.

      Pattern matching is highly parallelizable, of course: every piece of the pattern is fed to its own 'CPU', which calculates the degree of the matc
      • by ardor (673957)
        But since we are able to do everything a turing machine can do as well, this means the pattern matching is just another form of a turing machine, right?
        • by master_p (608214)
          The brain is not a Turing complete machine, because the brain always terminates its calculations.

          In other words, you can not do what a Turing machine can do.
          • by ardor (673957)
            But *why* does it terminate every calculation? In which situations? Is it actually proven that the brain always terminates its calculations?
            • by master_p (608214)
              Yeap.

              Have you see a human entering an infinite loop? I haven't.

              Actually the brain does not do calculations, it simply does pattern matching. Pattern matching is not a calculation, it is statistics.
    • by Pinky (738)
      >The human brain provides compelling evidence that massive parallelization works. So: what are we missing?

      That depends.. In what way is the brain massively parallel?

      Sure it uses many neurons but a processor has many transistors. Do I get to say that processors are massively parallel because every time I do an addition a large number of transistors are used at once; each one only doing a small part of the operation?

      The amount of parallelism in the brain is most often over-stated. it's impressive but not m
  • by SparhawkA (608965) on Saturday February 10, 2007 @08:16PM (#17967236)
    Take a look at LabVIEW, a compiled graphical programming language from National Instruments. It natively supports SMP / multicore / multithreading. Essentially, dissociated pieces of code you write (computations, hardware I/O, etc.) are automatically scheduled in separate threads of execution in order to maximize efficiency. It's an interesting idea: here's a technical article from their website that does a better job of describing it (some marketing included as well): http://zone.ni.com/devzone/cda/tut/p/id/4233 [ni.com]
    • by Doppler00 (534739)
      As someone who has used LabVIEW extensively I have to say it kind of sucks. They claim "self documenting" and things like that. If you've ever seen a real LabVIEW program, you'll see that it's a mess, and very difficult to understand and debug. It doesn't help that it hasn't been Object Oriented until version 8. You STILL can't spawn 0 to N threads, it's built in library support is very week (simple things like list manipulation, queues, string processing, etc. don't exist!).

      It's a niche product meant for d
  • From TFA (Score:5, Funny)

    by obender (546976) on Saturday February 10, 2007 @08:17PM (#17967240)

    The target should be 1000s of cores per chip
    640 cores should be enough for anyone.
  • In case any has missed it: http://video.google.com/videoplay?docid=-583031888 2717959520 [google.com]

    I can't wait for the sequel!
  • by RAMMS+EIN (578166) on Saturday February 10, 2007 @08:25PM (#17967312) Homepage Journal
    I think parallelism can be achieved elegantly using languages that express what is to be done, rather than how it is to be done. Functional programming is a major step in the right direction. Not only do functional programs typically more clearly express what is to be done (as opposed to which steps are to be taken to get there), they also tend to cause fewer side effects (which restrict the correct evaluation orders). In particular, not using variables avoids many of the headaches traditionally involved in multithreading.
    • Re: (Score:3, Insightful)

      by ardor (673957)
      Functional languages are no silver bullet, however. Things like I/O do not fit well in there. Yes, there are solutions for this, but they tend to be overly complicated. A hybrid functional/imperative language with safeguards for side-effects of the imperative parts seems to be the way to go.
      • by MajroMax (112652)

        A hybrid functional/imperative language with safeguards for side-effects of the imperative parts seems to be the way to go.

        You just described Common Lisp.

    • by Dorceon (928997)
      Church devised the Lambda Calculus and was largely ignored. Then Turing devised the Turing Machine. Once everyone accepted the Turing Machine, Turing proved that it was equivalent to the Lambda Calculus. In other words, functional programming wasn't a step in the right direction--procedural programming was a step in the wrong direction!
  • by Animats (122034) on Saturday February 10, 2007 @08:59PM (#17967546) Homepage

    I just heard that talk; he gave it at EE380 at Stanford a few weeks ago.

    First, this is a supercomputer guy talking. He's talking about number-crunching. His "13 dwarfs" are mostly number-crunching inner loops. Second, what he's really pushing is getting everybody in academia to do research his way - on FPGA-based rackmount emulators.

    Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer [top500.org] before you find the first real commercial customer. It's BMW, and the system is a cluster of 1024 Intel x86 1U servers, running Red Hat Linux. Nothing exotic; just a big server farm set up for computation.

    More CPUs will help in server farms, but there we're I/O bound to the outside world, not talking much to neighboring CPUs. If you have hundreds of CPUs on a chip, how do you get data in and out? But we know the answer to that - put 100Gb/s Ethernet controllers on the chip. No major software changes needed.

    This brings up one of the other major architectural truths: shared memory multiprocessors are useful, and clusters are useful. Everything in between is a huge pain. Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble. The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?

    What we do get from the latest rounds of shrinkage are better mobile devices. The big wins commercially are in phones, not desktops or laptops. Desktops have been mostly empty space inside for years now. In fact, that's true of most non-mobile consumer electronics. We're getting lower cost and smaller size, rather than more power.

    Consider cars. For the first half of the 20th century, the big thing was making engines more powerful. By the 1960s, engine power was a solved problem, (the 1967 turbine-powered Indy car finally settled that issue) and cars really haven't become significantly more powerful since then. (Brakes and suspensions, though, are far better.)

    It will be very interesting to see what happens with the Cell. That's the first non-shared memory multiprocessor to be produced in volume. If it turns out to be a dead end, like the Itanium, it may kill off interest in that sort of thing for years.

    There are some interesting potential applications for massive parallelism for vision and robotics applications. I expect to see interesting work in that direction. The more successful vision algorithms do much computation, most of which is discarded. That's a proper application for many-CPU machines, though not the Cell, unless it gets more memory per CPU. Tomorrow's robots may have a thousand CPUs. Tomorrow's laptops, probably not.

    • Re: (Score:3, Interesting)

      by deadline (14171)

      Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer before you find the first real commercial customer.

      You may want to adjust your truth as your measure of the market is wrong. The Top500 is not a marketing survey and just because you have HPC hardware does mean you run out and try an get it on the Top500. Many companies are using (HPC) parallel cluster computers, but they choose to be quiet about it for competitive reason

      • Re: (Score:3, Insightful)

        by antifoidulus (807088)
        Also keep in mind that many companies aren't interested in linpac peformance per se, at least to the extent that they will spend a lot of time and effort tweaking their computers to get really high linpac scores, which is all that is important when it comes to top500.
      • That report on supercomputing doesn't indicate growth:

        "Clusters have proven themselves as capable servers to handle a sizable portion of the HPC workload."

        "More than half of the respondents expect their budgets for all HPC tools will decline (43%) or remain the same (17%) over the next two years."

        "U.S. industrial users/buyers really want and need faster computers that fit their budgets and that don't require specialized programming skills."

        Also, that study doesn't reflect "most companies". It refl

    • by RecessionCone (1062552) on Saturday February 10, 2007 @10:05PM (#17967986)
      I don't think you were listening very carefully to the talk (or know much about Computer Architecture) if you think Dave Patterson is a supercomputer guy. Perhaps you've heard of the Hennessy & Patterson Quantitative Approach to Computer Architecture book (you know, the one used at basically every university to teach about computer architecture). Patterson has been involved in a lot of different things within computer architecture over the years, including being one of the main people behind RISC and RAID (as well as being the president of the ACM). I saw his talk when it was given at Berkeley, and you really missed the point if you thought it was about supercomputing. The talk was about the future of computing in general, which is increasingly parallel, in case you're unaware of that fact. GPUs are already at 128 cores, Network processors are up to 200 cores. Intel is going to present an 80 core x86 test chip tomorrow at ISSCC. Physics won't support faster single core processors at the rate we're accustomed to, so the whole industry is going parallel, which is a sea change in the industry. Patterson's talk is aimed at the research community, since we don't have good answers as to how these very parallel systems should be architected and programmed. FPGA emulation is a great way to play around with massive multiprocessor configurations and programming strategies, which is why Patterson is advocating it (his RAMP project has researchers from MIT, Berkeley, Stanford, Texas, Washington involved (among others)). You also need to have a little more imagination about what we could do with more computing power. Try looking at Intel's presentations on RMS http://www.intel.com/technology/itj/2005/volume09i ssue02/foreword.htm [intel.com].
    • The author of this paper thinks that all the programming headaches of supercomputers will have to be brought down to desktop level, but that's probably not going to happen. What problem would it solve?

      It would solve the problem of software actually being able to run faster on computers that will be produced in the near future. If the status quo continues, desktop software simply won't get any faster, even when computers get faster; this seems like a real waste.
    • Re: (Score:2, Informative)

      by Anonymous Coward
      I posted most of the following comment elsewhere, but this seems to be a better place for it.

      I also saw Dave Patterson give this talk at Stanford. RecessionCone is correct, and Dave did mention this at the Stanford talk: the whole world is going parallel, not just the supercomputer space, because chip makers can't figure out how to make chips faster sequentially. Chip makers are throwing a hail Mary pass (Dave's metaphor) by putting parallel chips out there and praying that someone can figure out how to exp
  • by deadline (14171) on Saturday February 10, 2007 @09:14PM (#17967630) Homepage

    Those of us that use HPC clusters (i.e. Beowulf) have been thinking about these issues as well. For those interested, I wrote a series of articles on how one might program 10,000 cores (based on my frustrations as programmer and user of parallel computers). Things will change, there is no doubt.

    The first in the series is called Cluster Programming: You Can't Always Get What You Want [clustermonkey.net] The next two are Cluster Programming: The Ignorance is Bliss Approach [clustermonkey.net], and Cluster Programming: Explicit Implications of Cluster Computing [clustermonkey.net].

    Comments welcome.

  • AKA F--, The simplest explicit programming model on the planet. Brainchild of Bob Numrich, unsung hero of Cray Research in the early 90's ( & probably much before... but that was when I was lucky enough to work with him) F-- was Numrich's second great contribution to parallel programming models... the first being the shmem model for the Cray T3D, Four assembly routines which made the raw capabilities of the T3D available to massively parallel applications when every other programming model (e.g. MPI) h
  • Actually, I've been working on a programming language/model that makes programs inherently parallel. Of course, it is quite different from anything currently in existence. Basically, it uses a queue (hence the name "Que") to store data (like the stack in FORTH), but due to the nature of the queue, programs become inherently parallel. Large programs could have hundreds of processes running at the same time, if so inclined.

    If you are interested, check out my project [sourcefourge.net] (there's not much there right now), and/or

  • For those that are interested, the Berkeley View project website is at http://view.eecs.berkeley.edu/ [berkeley.edu], which includes some video interviews with the principal professors involved in the project. There is also a blog at http://view.eecs.berkeley.edu/blog/ [berkeley.edu]
  • by zestyping (928433) on Saturday February 10, 2007 @09:59PM (#17967950) Homepage
    Reliably achieving even simple goals using concurrent threads that share state is extremely difficult. For example, try this task:

    Implement the Observer [wikipedia.org] (aka Listener) pattern (specifically the thing called "Subject" on the Wikipedia page). Your object should provide two methods, publish and subscribe. Clients can call subscribe to indicate their interest in being notified. When a client calls publish with a value, your object should pass on that value by calling the notify method on everyone who has previously subscribed for updates.

    Sounds simple, right? But wait:
    • What if one of your subscribers throws an exception? That should not prevent other subscribers from being notified.
    • What if notifying a subscriber triggers another value to be published? All the subscribers must be kept up to date on the latest published value.
    • What if notifying a subscriber triggers another subscription? Whether or not the newly added subscriber receives this in-progress notification is up to you, but it must be well defined and predictable.
    • Oh, and by the way, don't deadlock.
    Can you achieve all these things in a multithreaded programming model (e.g. Java)? Try it. Don't feel bad if you can't; it's fiendishly complicated to get right, and i doubt i could do it.

    Or, download this paper [erights.org] and start reading from section 3, "The Sequential StatusHolder."

    Once you see how hard it is to do something this simple, now think about the complexity of what people regularly try to achieve in multithreaded systems, and that pretty much explains why computer programs freeze up so often.
    • Why does the notify allow() for throwing exceptions? The only exception I can see that it throws is IllegalMonitorState, and that seems easy enough to fix. I vaguely recall something about runtime exceptions or whatnot always being throwable, but really, there's nothing the Observer can do about such things except carry on with the rest of notifications. This is bad for debugging, so one solution would be to hold the exceptions until all notifications have been done and then deal with them all at once. Simp
    • by master_p (608214) on Sunday February 11, 2007 @07:45AM (#17971256)
      There is a way to automate shared state concurrency! every object should be its own thread. Computations that refer to the same object must be executed by the object's thread.

      Here is how it works:

      A computation does not return a result, but a tuple of {key, continuation}. The key is used to locate the thread to pass the continuation to. The computation is stored in the thread's queue and the thread is woken up.

      The tuple {key, continuation} pair can be an 64-bit value (on 32-bit machines) that consists of a pointer to a memory location (the key) and a pointer to code (the continuation).

      The insertion to the thread's queue can be done using lock-free data structures.

      Threads can be user-level so there need not be a switch to kernel space.

      This design can allow for linear scaling of performance: the more cores you put in, the more performance you get (for algorithms that are not linear, that is). Linear algorithms would execute a little slower than usual, but the trade off is acceptable: for many applications that allow for parallelization due to having lots of (relatively) independent objects, the performance boost be tremendous.

      There are many domains of applications that would benefit from such an approach:

      -web servers/application servers that must serve thousands of clients simultaneously.
      -video games with thousands of objects.
      -simulations that have many independent agents that can run in parallel.
      -GUI apps that use the observer pattern and each observable has many observers than can be notified in parallel.

      Note: The above ideas are taken from libasync-mp and lock-free data structure programming.

  • As parallelism ramps up the processor cores will get simpler and we will return to the concept of RISC computing. But the programing paradigm will be virtual processors. Furthermore some processors will be better at certain operation than others.

    What will eventually turn into an AI subsystem, will postcompile and dynamically rewrite programs to take best advantage of the system. By AI I don't mean anything sentient, just something that intelligently recognizes programming structures and patterns and rest
  • by kwahoo (899178) on Sunday February 11, 2007 @12:12AM (#17968902)

    ...if there were, the langauge wars of the 80s and 90s would have produced an answer. And what new langauge caught on? Not Sisal, or C*, or Multilisp, etc. It was Java. And C, C++, and Fortran are still going strong.

    Part of the problem, as previous posts have observed, is that most people didn't have much incentive to change, since parallel systems were expensive, and bloated, ineffeicient code would inevitably get faster thanks to the rapid improvement in single-thread performance that we enjoyed until recently. So outside of HPC and cluster apps, most parallelism consisted of decoupling obviously aynchronous tasks.

    I don't think there ever will be one language to rule them all.... The right programming model is too dependent on the application, and unless you are designing a domain-specific system, you will never get people to agree. Depending on your needs, you want different language features and you make different tradeoffs on performance vs. programmability. For some applications, functional programming languages will be perfect, for others Co-Array Fortran will be, for others an OO derivative like Mentat will be, etc. And as new applications come to the fore, new languages will continue to spawn.

    I think the key is to:

    • Do your best to estimate what range of applications your chip will need to support (so you could imagine that chips for desktops/workstations might diverge from those for e-commerce datacenters--which we already see to a mild extent)
    • Deduce what range of programming language features you need to support
    • Do your best to design an architecture that is flexible enough to support that, and hopefully not close off future ideas that would make programming easier.

    If one programming model does triumph, I would predict that it will be APIs that can be used equally well from C, Fortran, Java, etc., thus allowing different application domains to use their preferred APIs. And even that model is probably not compelling enough to bring them all and in the dark bind them....

  • by soldack (48581) <soldacker@nOspaM.yahoo.com> on Sunday February 11, 2007 @01:02AM (#17969224) Homepage
    I used to work for SilverStorm (recently purchased by QLogic). They make InfiniBand switches and software for use in high performance computing and enterprise database clustering. The quality of the I/O subsystem of a cluster played a large part in determining the performance of a cluster. Latency (down the microsecond) and bandwidth (over 10 gigabits per second) both mattered.

    Also, we found that sometimes, what made a deal go through was how well your proposed system could run some prexisting software. For example, vendors would publish how well they could run a standard crash test simulation.

    Also, I would like to see more research put into making clustered operating systems like mosix good enough so that developers can stick to what they have learned on traditional SMP systems and have their code just work on large clusters. I don't think that multicore processors eliminate the need for better cluster software.
  • Use a Database (Score:2, Insightful)

    by Tablizer (95088)
    Databases already allow a kind of parellel processing. A.C.I.D.-based techniques allow multiple users (processors) to send results to the same database in order to communicate results between each user/client. Each "client" may be single threaded, but together a client/server system is essentially a multi-threaded application, all without odd code or odd programming languages.

"Success covers a multitude of blunders." -- George Bernard Shaw

Working...