Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Programming IT Technology Hardware

An Overview of Parallelism 197

Mortimer.CA writes with a recently released report from Berkeley entitled "The Landscape of Parallel Computing Research: A View from Berkeley: "Generally they conclude that the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.' This assumes things stay 'evolutionary' and that programming stays more or less how it has done in previous years (though languages like Erlang can probably help to change this)." Read on for Mortimer.CA's summary from the paper of some "conventional wisdoms" and their replacements.

Old and new conventional wisdoms:
  • Old CW: Power is free, but transistors are expensive.
  • New CW is the "Power wall": Power is expensive, but transistors are "free." That is, we can put more transistors on a chip than we have the power to turn on.

  • Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
  • New CW: As chips drop below 65-nm feature sizes, they will have high soft and hard error rates.

  • Old CW: Multiply is slow, but load and store is fast.
  • New CW is the "Memory wall" [Wulf and McKee 1995]: Load and store is slow, but multiply is fast.

  • Old CW: Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
  • New CW: It will be a very long wait for a faster sequential computer (see above).
This discussion has been archived. No new comments can be posted.

An Overview of Parallelism

Comments Filter:
  • nothing new here... (Score:3, Informative)

    by Anonymous Coward on Saturday February 10, 2007 @08:08PM (#17967166)
    pretty much the same thing Dave Patterson's been saying for a while now...in fact, the CW sounded so familiar, I went back to double check his lecture slides from more than a year ago:

    http://vlsi.cs.berkeley.edu/cs252-s06/images/1/1b/ Cs252s06-lec01-intro.pdf [berkeley.edu]

    and it's pretty much identical (check out slide 3 on the first page of the pdf)
  • by RAMMS+EIN ( 578166 ) on Saturday February 10, 2007 @08:25PM (#17967312) Homepage Journal
    I think parallelism can be achieved elegantly using languages that express what is to be done, rather than how it is to be done. Functional programming is a major step in the right direction. Not only do functional programs typically more clearly express what is to be done (as opposed to which steps are to be taken to get there), they also tend to cause fewer side effects (which restrict the correct evaluation orders). In particular, not using variables avoids many of the headaches traditionally involved in multithreading.
  • by epiphyte(3) ( 771115 ) on Saturday February 10, 2007 @09:21PM (#17967668)
    AKA F--, The simplest explicit programming model on the planet. Brainchild of Bob Numrich, unsung hero of Cray Research in the early 90's ( & probably much before... but that was when I was lucky enough to work with him) F-- was Numrich's second great contribution to parallel programming models... the first being the shmem model for the Cray T3D, Four assembly routines which made the raw capabilities of the T3D available to massively parallel applications when every other programming model (e.g. MPI) had about 50x the communication overhead. This was a big factor in Cray's takeover of the short-lived MPP market in the mid 90's. On the topic of the thread.... Explicit programming models scale to thousands of processors, implicit ones peter out at 4-8. The reason is Data Locality. Explicit models ensure that the processor is operating on data which is local and unshared. Implicit models end up fighting for operands with competing processors. This requires either heroic hardware ( e.g. 70% of the Cray C-90s processor logic was concerned with memory contention resolution) or a dramatic performance dropoff.
  • Berkeley View Links (Score:2, Informative)

    by RecessionCone ( 1062552 ) on Saturday February 10, 2007 @09:39PM (#17967778)
    For those that are interested, the Berkeley View project website is at http://view.eecs.berkeley.edu/ [berkeley.edu], which includes some video interviews with the principal professors involved in the project. There is also a blog at http://view.eecs.berkeley.edu/blog/ [berkeley.edu]
  • Re:Erlang (Score:3, Informative)

    by grammar fascist ( 239789 ) on Saturday February 10, 2007 @09:45PM (#17967834) Homepage

    Erlang also has a huge amount of overhead, and due to immutable data structures, has to spend a lot of time copying data around.

    This is the problem inherent with pure languages. Compilers/runtime systems are simply not sophisticated enough yet to reason about and perform as well as a human can with mutable data structures.

    Haskell comes pretty close, and it's designed from the beginning to be pure.

    In fact, it may be these immutable data structures that make the pure functional languages able to perform well on multiple processors where sequential languages with mutable data structures fail miserably. If you don't have to worry about function's side effects and no specific execution order is guaranteed, you can run it on any processor you want whenever you want, as long as you don't hold up things too much that depend on it.

    However, pure functional languages need to be made more palatable to your average Joe Programmer. (Or computer scientists need to get more mathematical. Or both.) We need the Python and Ruby of pure functional languages: languages in which the syntax design is regarded as a user-interface problem.
  • by RecessionCone ( 1062552 ) on Saturday February 10, 2007 @10:05PM (#17967986)
    I don't think you were listening very carefully to the talk (or know much about Computer Architecture) if you think Dave Patterson is a supercomputer guy. Perhaps you've heard of the Hennessy & Patterson Quantitative Approach to Computer Architecture book (you know, the one used at basically every university to teach about computer architecture). Patterson has been involved in a lot of different things within computer architecture over the years, including being one of the main people behind RISC and RAID (as well as being the president of the ACM). I saw his talk when it was given at Berkeley, and you really missed the point if you thought it was about supercomputing. The talk was about the future of computing in general, which is increasingly parallel, in case you're unaware of that fact. GPUs are already at 128 cores, Network processors are up to 200 cores. Intel is going to present an 80 core x86 test chip tomorrow at ISSCC. Physics won't support faster single core processors at the rate we're accustomed to, so the whole industry is going parallel, which is a sea change in the industry. Patterson's talk is aimed at the research community, since we don't have good answers as to how these very parallel systems should be architected and programmed. FPGA emulation is a great way to play around with massive multiprocessor configurations and programming strategies, which is why Patterson is advocating it (his RAMP project has researchers from MIT, Berkeley, Stanford, Texas, Washington involved (among others)). You also need to have a little more imagination about what we could do with more computing power. Try looking at Intel's presentations on RMS http://www.intel.com/technology/itj/2005/volume09i ssue02/foreword.htm [intel.com].
  • Re:Erlang (Score:5, Informative)

    by cpm80 ( 899906 ) on Sunday February 11, 2007 @12:19AM (#17968946) Homepage
    I think using a *specific* language for automatic parallelization is the wrong way. Some GNU folks are working on language independent autoparallelization for GCC 4.3. Their implementation is an extension to the OpenMP implementation in GCC. Read OpenMP and automatic parallelization in GCC, D. Novillo, GCC Developers' Summit, Ottawa, Canada, June 2006 http://people.redhat.com/dnovillo/Papers/ [redhat.com] for details.
  • by Anonymous Coward on Sunday February 11, 2007 @04:32AM (#17970422)
    I posted most of the following comment elsewhere, but this seems to be a better place for it.

    I also saw Dave Patterson give this talk at Stanford. RecessionCone is correct, and Dave did mention this at the Stanford talk: the whole world is going parallel, not just the supercomputer space, because chip makers can't figure out how to make chips faster sequentially. Chip makers are throwing a hail Mary pass (Dave's metaphor) by putting parallel chips out there and praying that someone can figure out how to exploit them effectively.

    There was an interesting exchange with an audience member who used to work on the Transputer: the audience member pointed out that many of these predictions had been made before more than a decade ago, but they never happened. Dave basically said that it was because Intel managed to squeeze more sequential performance out of their chips, and that killed parallel chip projects like Transputer. Nobody really wants parallelism. It's hard. Chip makers are going for it now simply because they have no choice; nobody has any idea whether we'll be able to develop the software techniques needed to exploit massively parallel processors effectively.

    Another audience member asked what would happen if someone figured out a way to squeeze more sequential performance out of chips. Dave smiled and said he would be very interested in talking to that person.
  • by shizzle ( 686334 ) on Monday February 12, 2007 @04:38AM (#17980330)
    Starting to feel like a curmudgeon with posts like this, but it's already been done a long time ago. Hillis and Steele presented an algorithm for parsing in parallel in logarithmic time on the Connection Machine. The article is called "Data Parallel Algorithms", Communications of the ACM, December 1986.

    Looks like if you don't have ACM digital library access and don't feel like trudging to a library you can find a copy here: http://cva.stanford.edu/classes/cs99s/papers/hilli s-steele-data-parallel-algorithms.pdf [stanford.edu]

    For a group of characters (substring) in the middle of the file, you can locally build a table that maps whatever the incoming parser state is (at the beginning of the substring) to what the corresponding outgoing state would be at then end, and then that table lets you process the whole substring in unit time. I like to think of it as the parsing equivalent of a carry-lookahead adder. Probably best to read the article if you're curious.

Remember to say hello to your bank teller.

Working...