Sandia's Red Storm Detailed Architecture 18
Roland Piquepaille writes "Bill Camp & Jim Tomkins, from Sandia National Laboratories, have published a 77-page document about the architecture of the Red Storm supercluster being built by Cray Inc. The new nickname for the 40 teraflops system is "Thor's Hammer." Please read the full presentation if you have the time (PDF format, 3.54 MB). This technical analysis gives you the major characteristics of the system which will be operational by August 2004. With its 108 compute cabinets and its 10,368 compute node processors (AMD Opteron running at 2.0 GHz), it is expected to reach 20 teraflops on MP-Linpack. The report also looks at scalability and reliability, which are essential for a sytem which will be expanded to 30,000 processors in the future."
in related news (Score:5, Funny)
In related news, Sandia National Laboratories has laid off all but one of its Jaffa technicians, citing diplomatic and security concerns.
Re:in related news (Score:1)
Re:in related news (Score:2)
I actually did see that episode. I forget the stunt they did to get Tealc through. That part seemed to be a cop-out.
I guess I'll have to get the DVDs someday.
Powerpoint? (Score:1)
Imagine (Score:1, Redundant)
er, wait! Nevermind!
network bandwidth (Score:4, Insightful)
but making the network linking up the thing.
# Sustained file system bandwidth of 50 GB/s for each color
# Sustained external network bandwidth of 25 GB/s for each color
wow! That's not peak, but sustained..for me that's the impressive bit.
uses "only" 2 MegaWatts for power and cooling. (Score:1)
It's pretty impressive.
2 megawatts and 3000 sq. ft. is quite good.
frightning, but given the power, quite good.
Imagine the UPS.
on the other hand... "100 hour MTBI is desirable"
Ack! It is hoped that it won't crash more than
once every three days? That is up from 40 hours
on the current one. oh. They're putting in lots of
RAS features, and they still can't target higher than that. depressing.
custom interconnect. That is the exciting part.
It looks like a lot of fun. The directions are good
and make sense.
Re:uses "only" 2 MegaWatts for power and cooling. (Score:2)
Ack! It is hoped that it won't crash more than
once every three days?
I'm sure they're talking about a single node failure, not the whole machine. Most people aren't running jobs on more than a few hundred processors, so a compute node failing will take out only one of a few dozen running jobs. And it's more like having your program crash, anyway; resubmit your job and it will simply run on a different set of nodes.
My question is: what is this operating
Re:uses "only" 2 MegaWatts for power and cooling. (Score:1)
It won't really matter until it comes time to port my code to the thing anyway. I just have a feeling sockets are going to be the deal breaker.
Re:uses "only" 2 MegaWatts for power and cooling. (Score:1)
tflops was a dsm os called osf/ad, its being replaced by suse with some minimal cluster extensions
"Thor's Hammer"? (Score:3, Insightful)
Nickname (Score:2)
The new nickname for the 40 teraflops system is "Thor's Hammer".
Ah, curious. I guess what goes around comes around. But, shouldn't it be "Thor's Hammers"? It's got 10,368 Hammers [amd.com], y'know.