Scalable Nonblocking Data Structures 216
An anonymous reader writes "InfoQ has an interesting writeup of Dr. Cliff Click's work on developing highly concurrent data structures for use on the Azul hardware (which is in production with 768 cores), supporting 700+ hardware threads in Java. The basic idea is to use a new coding style that involves a large array to hold the data (allowing scalable parallel access), atomic update on those array words, and a finite-state machine built from the atomic update and logically replicated per array word. The end result is a coding style that has allowed Click to build 2.5 lock-free data structures that also scale remarkably well."
why (Score:5, Interesting)
false sharing? (Score:2, Interesting)
1. A common killer in parallelization is false sharing. That is, threads on two processors fight over a cache-line even though they are accessing independent variables. A cache-line is typically bigger than an individual variable. The approach of using adjacent elements of an array for parallelism sounds naive. One needs to pad the array.
2. Updating a shared variable, especially a non-scaler, in an inner loop is naive. One should reference local scalers in inner loops when parallelizing. Just once, should the thread update the shared variable. Don't reference non-scalers or shared variables in an inner loop. Don't lock in the inner loop, either, if you can avoid it.
If it is not an inner loop then the locking is probably not a big issue any way.
3. Java, really, Java?
Re:Java???? (Score:3, Interesting)
Umm, because Azul runs the Java in hardware. It *is* optimized by being in Java.
Re:I think you have it wrong (Score:1, Interesting)
The new Vega3 (announced last week) takes it to 54 cores per chip, and 864 cores in a system (16 x 54 = 864
Re:scalable noNBLocking data sTRructures .. :) (Score:5, Interesting)
Your turn.
No it is Lock Free (Score:2, Interesting)
Re:Java???? (Score:2, Interesting)
In any case, the author of that test failed. The test includes start-up times in the results. Java is always going to lose micro-benchmarks where start-up time is included. Why? Because the JVM load gets included in the benchmarks. As anyone who knows Java well can tell you, Java's startup time is extremely costly. Once the JVM is running, however, it produces much better performance. The end result is that the startup cost will eclipse the actual benchmark being run.
I'm willing to bet you'll have a hard time finding a modern benchmark for Java that doesn't make that (simple!) mistake.
However, I will admit that I was overextending my reach by suggesting that no one runs benchmarks anymore because of the strong performance of Java. The reality is (as one might expect) much more complex. It's not that Java performed poorly in benchmarks. Quite the contrary. By 2004, Java was cleaning up in benchmarks. The problem is that micro-benchmarking sucks. All you end up getting is the two sides fighting over who can produce the most optimized benchmark possible. Which invariably ends up defeating the purpose of benchmarking for "real-world" code.
The long and short of it is that Java was proven to be more than fast enough, with the average case easily keeping pace or beating C++ code. Thus the statement by the author of the article you linked to:
"Unfortunately, there's also a third conclusion. It seems that it's much, much easier to create a well performing program in Java. So, please consider it for a moment before you start recoding your Java program in C++ just to make it faster..."
Can also be done with a clean cache (Score:3, Interesting)
But, as you said, the lock doesn't require a bus lock. Please correct me if I'm wrong on this, it's been a few months since I've read that piece of code and I'm a little hazy on the exact details.
Re:Java???? (Score:3, Interesting)
Of course when your boss/customer requires Java, use Java
I'm mixed on Lisp- Lisp is kind of high level and kind of not... I guess I'm just too lazy - perl has CPAN = lots of wheels I don't have to reinvent - and document