An Overview of Parallelism 197
Mortimer.CA writes with a recently released report from Berkeley entitled "The Landscape of Parallel Computing Research: A View from Berkeley: "Generally they conclude that the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.' This assumes things stay 'evolutionary' and that programming stays more or less how it has done in previous years (though languages like Erlang can probably help to change this)." Read on for Mortimer.CA's summary from the paper of some "conventional wisdoms" and their replacements.
Old and new conventional wisdoms:
- Old CW: Power is free, but transistors are expensive.
- New CW is the "Power wall": Power is expensive, but transistors are "free." That is, we can put more transistors on a chip than we have the power to turn on.
- Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
- New CW: As chips drop below 65-nm feature sizes, they will have high soft and hard error rates.
- Old CW: Multiply is slow, but load and store is fast.
- New CW is the "Memory wall" [Wulf and McKee 1995]: Load and store is slow, but multiply is fast.
- Old CW: Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
- New CW: It will be a very long wait for a faster sequential computer (see above).
nothing new here... (Score:3, Informative)
http://vlsi.cs.berkeley.edu/cs252-s06/images/1/1b
and it's pretty much identical (check out slide 3 on the first page of the pdf)
Express What, Not How (Score:4, Informative)
Check out co-array fortran.... (Score:2, Informative)
Berkeley View Links (Score:2, Informative)
Re:Erlang (Score:3, Informative)
Haskell comes pretty close, and it's designed from the beginning to be pure.
In fact, it may be these immutable data structures that make the pure functional languages able to perform well on multiple processors where sequential languages with mutable data structures fail miserably. If you don't have to worry about function's side effects and no specific execution order is guaranteed, you can run it on any processor you want whenever you want, as long as you don't hold up things too much that depend on it.
However, pure functional languages need to be made more palatable to your average Joe Programmer. (Or computer scientists need to get more mathematical. Or both.) We need the Python and Ruby of pure functional languages: languages in which the syntax design is regarded as a user-interface problem.
Re:It's a supercomputer perspective (Score:5, Informative)
Re:Erlang (Score:5, Informative)
Re:It's a supercomputer perspective (Score:2, Informative)
I also saw Dave Patterson give this talk at Stanford. RecessionCone is correct, and Dave did mention this at the Stanford talk: the whole world is going parallel, not just the supercomputer space, because chip makers can't figure out how to make chips faster sequentially. Chip makers are throwing a hail Mary pass (Dave's metaphor) by putting parallel chips out there and praying that someone can figure out how to exploit them effectively.
There was an interesting exchange with an audience member who used to work on the Transputer: the audience member pointed out that many of these predictions had been made before more than a decade ago, but they never happened. Dave basically said that it was because Intel managed to squeeze more sequential performance out of their chips, and that killed parallel chip projects like Transputer. Nobody really wants parallelism. It's hard. Chip makers are going for it now simply because they have no choice; nobody has any idea whether we'll be able to develop the software techniques needed to exploit massively parallel processors effectively.
Another audience member asked what would happen if someone figured out a way to squeeze more sequential performance out of chips. Dave smiled and said he would be very interested in talking to that person.
already been done (20 years ago) (Score:2, Informative)
Looks like if you don't have ACM digital library access and don't feel like trudging to a library you can find a copy here: http://cva.stanford.edu/classes/cs99s/papers/hilli s-steele-data-parallel-algorithms.pdf [stanford.edu]
For a group of characters (substring) in the middle of the file, you can locally build a table that maps whatever the incoming parser state is (at the beginning of the substring) to what the corresponding outgoing state would be at then end, and then that table lets you process the whole substring in unit time. I like to think of it as the parsing equivalent of a carry-lookahead adder. Probably best to read the article if you're curious.