Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology Hardware

An Overview of Parallelism 197

Mortimer.CA writes with a recently released report from Berkeley entitled "The Landscape of Parallel Computing Research: A View from Berkeley: "Generally they conclude that the 'evolutionary approach to parallel hardware and software may work from 2- or 8-processor systems, but is likely to face diminishing returns as 16 and 32 processor systems are realized, just as returns fell with greater instruction-level parallelism.' This assumes things stay 'evolutionary' and that programming stays more or less how it has done in previous years (though languages like Erlang can probably help to change this)." Read on for Mortimer.CA's summary from the paper of some "conventional wisdoms" and their replacements.

Old and new conventional wisdoms:
  • Old CW: Power is free, but transistors are expensive.
  • New CW is the "Power wall": Power is expensive, but transistors are "free." That is, we can put more transistors on a chip than we have the power to turn on.

  • Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
  • New CW: As chips drop below 65-nm feature sizes, they will have high soft and hard error rates.

  • Old CW: Multiply is slow, but load and store is fast.
  • New CW is the "Memory wall" [Wulf and McKee 1995]: Load and store is slow, but multiply is fast.

  • Old CW: Don't bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
  • New CW: It will be a very long wait for a faster sequential computer (see above).
This discussion has been archived. No new comments can be posted.

An Overview of Parallelism

Comments Filter:
  • Hmmm... (Score:5, Interesting)

    by ardor ( 673957 ) on Saturday February 10, 2007 @08:13PM (#17967216)
    "but is likely to face diminishing returns as 16 and 32 processor systems are realized"

    Then we are doing something wrong. The human brain provides compelling evidence that massive parallelization works. So: what are we missing?
  • Re:It's not hard (Score:4, Interesting)

    by ardor ( 673957 ) on Saturday February 10, 2007 @08:23PM (#17967302)
    Well the indeterministic nature of multithreading is still a problem. With one thread, debugging is simple: the only thread present will be stopped. But with multiple threads, how is the debugger supposed to handle the problem? Stop all threads? Only the current one? Etc. This is relevant when debugging race conditions.

    Also, the second great problem is that thread problems are hard to find. When I write a class hierarchy, an OOP language can help me with seeing design errors (for example, unnecessary multiple inheritance), or misses in const-correctness. Threading, however, is only present as mutexes, conditions etc.

    One other issue with threads is that they effectively modify the execution sequence. Traditional single-threaded programs have a sequence that looks like a long line. Threading introduces branches and joins, turning the simple line into a net. Obviously, this complicates things. Petri nets can be useful in modeling this.
  • by Anonymous Coward on Saturday February 10, 2007 @08:39PM (#17967410)
    Most of the time a computer isn't churning away on a single problem that needs to be parallelized. In that respect, the solution rests more with the operating system. Something like a server could relatively easily make use of almost any number of processors; one per client maybe. The trouble comes with bus contention. It is a problem similar to designing a subnet. If you have too many computers/processors, communications grind to a halt. In other words, adding processors slows things down rather than speeding them up.

    The solution to adding more processors may be architectural. Adding more busses, ala harvard architecture for example, might be an effective approach. This is the approach used in DSPs which are a lot more powerful on a per-cycle basis than the more conventional architecture.
  • by deadline ( 14171 ) on Saturday February 10, 2007 @09:14PM (#17967630) Homepage

    Those of us that use HPC clusters (i.e. Beowulf) have been thinking about these issues as well. For those interested, I wrote a series of articles on how one might program 10,000 cores (based on my frustrations as programmer and user of parallel computers). Things will change, there is no doubt.

    The first in the series is called Cluster Programming: You Can't Always Get What You Want [clustermonkey.net] The next two are Cluster Programming: The Ignorance is Bliss Approach [clustermonkey.net], and Cluster Programming: Explicit Implications of Cluster Computing [clustermonkey.net].

    Comments welcome.

  • Re:Hmmm... (Score:3, Interesting)

    by philipgar ( 595691 ) <pcg2@leTOKYOhigh.edu minus city> on Saturday February 10, 2007 @09:14PM (#17967632) Homepage
    Is this really true? Of course for some tasks the massive parallelism of the human brain works great. The brain can analyze complex images extremely fast, comparing them in parallel to it's own internal database of images, using fuzzy reasoning to detect if something is familiar etc. However you give your brain complex math problems, and it can spend seconds, minutes, or even hours to solve it, sometimes requiring extra scratch memory to solve. This is due to bad programming in the brain that sucks at doing certain computations (except for the rare people who manage to do these problems fast).

    This is also true for computers. Big scientific problems can use up 1000's of processor's, just look at some of the problems running on supercomputers. However not all problems scale up to effectively use 1000's or processors. Many desktop applications won't be able to take advantage of the power, however many of these applications don't need to take advantage of that power.

    Basically computers will continue to evolve, with custom hardware or cores that can adapt to the problem. If there's enough demand to solve a problem, it can be done, however doing fast software designs will not be cheap, and "hardware" designs (verilog, vhdl, etc) will be even more expensive (even if they don't need to spin silicon for them). However for applications that have sufficient demand, they will be done. They're just highly time-intensive to do. Luckily there will always be outsourcing to help us out.

    Phil
  • by FMota91 ( 1050752 ) on Saturday February 10, 2007 @09:32PM (#17967734)

    Actually, I've been working on a programming language/model that makes programs inherently parallel. Of course, it is quite different from anything currently in existence. Basically, it uses a queue (hence the name "Que") to store data (like the stack in FORTH), but due to the nature of the queue, programs become inherently parallel. Large programs could have hundreds of processes running at the same time, if so inclined.

    If you are interested, check out my project [sourcefourge.net] (there's not much there right now), and/or contact me at FMota91 at GMail dot com.

  • by deadline ( 14171 ) on Saturday February 10, 2007 @09:53PM (#17967902) Homepage

    Basic truth about supercomputers - the commercial market is zilch. You have to go down to #60 on the list of the top 500 supercomputer before you find the first real commercial customer.

    You may want to adjust your truth as your measure of the market is wrong. The Top500 is not a marketing survey and just because you have HPC hardware does mean you run out and try an get it on the Top500. Many companies are using (HPC) parallel cluster computers, but they choose to be quiet about it for competitive reasons. The 2005 HPC market was well over 9 Billion and IDC predicts a 9% AGR bringing the market to over $14 Billion in 2010. Can you tell me what other markets are offering such growth rates these days?

    Supercomputer guys fuss endlessly over elaborate interconnection schemes, but none of them are worth the trouble.

    If as most companies have stated that they cannot compete without using HPC [compete.org], R&D at the high end is indeed, worthwhile. The endless fussing eventually makes it to commodity markets, open you computer case and look around.

  • Re:It's not hard (Score:3, Interesting)

    by owlstead ( 636356 ) on Saturday February 10, 2007 @11:00PM (#17968340)
    For a new C++ with multithreading language extensions (for manually coding the multithreading), good API and IDE/tool support, look no further. It's called Java and it has been around for ages. You really, really, really don't want multithreading in a non-"managed" language. You don't want to debug an application where *every* part of your application can be messed up by any thread. The advantage of Java is that the build in security meassurements.

    Things you need to have support for in a language/environment:
    - All the usual multithreading design patterns in the API (producer/consumer, stream support, stacks etc);
    - Thread safe collections;
    - A keyword and mechanism to do locking ("synchronized" in Java, "volatile" is handy for optimizations);
    - Memory protection;
    - Multi-threading supporting debuggers (at least halt all functionality)
    - Very well documented API, especially if classes are thread safe and how to use them;
    - If anyway possible, automated code checking tools for multi-threading (won't catch every multi-threading error, but are very usefull).
    Furthermore it's very important to make sure you can use the API's on different platforms if you expect to port the application in the future. Also, since threading may be expensive, a lightweight version of threads may be usefull as well.

    It's not just adding a language extension for threading. Support must go much further than that. Especially the API documentation is important in this respect. Unfortunately there are always parts of the API that are not really usefull for multithreading, such as the Date classes in Java (they suck in all other aspects as well, to be honest).

    If you are every going to try Java, make sure to not use arrays too much. You cannot prevent write access to arrays, so any thread that has a reference to an array may change it all the time. Try and use immutable classes as much as you can (BigInteger, BigDecimal and of course Strings are all immutable), and make defensive copies when needed. In C++ any thread can mess up any data element by simply casting, something that is done *way* to much in C++ anyway. You can also use C#, but a quick google search scan shows that C# multithreading is not used as much as can, and getting support might be much more difficult. At least you can make the collections thread safe.

  • by soldack ( 48581 ) <soldacker@yahoo . c om> on Sunday February 11, 2007 @01:02AM (#17969224) Homepage
    I used to work for SilverStorm (recently purchased by QLogic). They make InfiniBand switches and software for use in high performance computing and enterprise database clustering. The quality of the I/O subsystem of a cluster played a large part in determining the performance of a cluster. Latency (down the microsecond) and bandwidth (over 10 gigabits per second) both mattered.

    Also, we found that sometimes, what made a deal go through was how well your proposed system could run some prexisting software. For example, vendors would publish how well they could run a standard crash test simulation.

    Also, I would like to see more research put into making clustered operating systems like mosix good enough so that developers can stick to what they have learned on traditional SMP systems and have their code just work on large clusters. I don't think that multicore processors eliminate the need for better cluster software.
  • by master_p ( 608214 ) on Sunday February 11, 2007 @07:45AM (#17971256)
    There is a way to automate shared state concurrency! every object should be its own thread. Computations that refer to the same object must be executed by the object's thread.

    Here is how it works:

    A computation does not return a result, but a tuple of {key, continuation}. The key is used to locate the thread to pass the continuation to. The computation is stored in the thread's queue and the thread is woken up.

    The tuple {key, continuation} pair can be an 64-bit value (on 32-bit machines) that consists of a pointer to a memory location (the key) and a pointer to code (the continuation).

    The insertion to the thread's queue can be done using lock-free data structures.

    Threads can be user-level so there need not be a switch to kernel space.

    This design can allow for linear scaling of performance: the more cores you put in, the more performance you get (for algorithms that are not linear, that is). Linear algorithms would execute a little slower than usual, but the trade off is acceptable: for many applications that allow for parallelization due to having lots of (relatively) independent objects, the performance boost be tremendous.

    There are many domains of applications that would benefit from such an approach:

    -web servers/application servers that must serve thousands of clients simultaneously.
    -video games with thousands of objects.
    -simulations that have many independent agents that can run in parallel.
    -GUI apps that use the observer pattern and each observable has many observers than can be notified in parallel.

    Note: The above ideas are taken from libasync-mp and lock-free data structure programming.

  • Just as a sidenote, the BlockingQueue you link to is a Java 5 feature, and indeed Java 5 has added a LOT of API classes that are a great help when dealing with threading. Not sure if you meant to refer to 1.5, since I'm sure 1.4 has also added some utility classes, but it's been touted as one of the major features of 5 aka 1.5.
  • by Tom Womack ( 8005 ) <tom@womack.net> on Monday February 12, 2007 @06:02AM (#17980672) Homepage
    The commercial market for large-scale computers exists, it just doesn't advertise itself on the top500 list. Banks have big clusters for monte-carlo work (why do you think the most-hyped benchmark application when Intel talks about eight-core Clovertown systems is an asset-pricing model); oil companies have enormous clusters for seismic work.

    But both of those are intrinsically parallel jobs, so the clusters don't need the interconnect to run linpack, and there's no advantage to BP or Barclays in appearing on the top500 list -- they're not looking for the kind of researcher who picks his employer by acreage of computers, whilst Aldermaston and the Met Office may well be.

    One of the embarrassing truths of computation is that an awful lot of jobs are either small enough to run on a single PC or too large to run on anything; there aren't that many things (climate modelling's the obvious one) constrained by amount of computation.

New York... when civilization falls apart, remember, we were way ahead of you. - David Letterman

Working...