Forgot your password?
typodupeerror
IBM

IBM Full-System Simulator Team Speaks Out 115

Posted by ScuttleMonkey
from the from-the-horses-mouth dept.
Shell writes "The IBM Full-System Simulator for the Cell Broadband Engine (Cell BE) processor, known inside IBM as codeword Mambo, is a key component of the newly posted offerings on alphaWorks. Meet some of the members of the team that pulled it together, and hear about the simulator in their own words."
This discussion has been archived. No new comments can be posted.

IBM Full-System Simulator Team Speaks Out

Comments Filter:
  • PS3? (Score:3, Funny)

    by raingrove (934820) on Tuesday November 29, 2005 @07:05PM (#14142486)
    Does this mean we can emulate PS3? lol
    • Re:PS3? (Score:5, Informative)

      by garrett714 (841216) on Tuesday November 29, 2005 @07:16PM (#14142573)
      Yes and no.

      While this "simulator" is basically an emulation of the Cell hardware, it won't allow people to run games at full speed. It's more of a developer tool, that allows programmers to start coding for the PS3 when they don't actually have the hardware yet. Still, it is reasonable to believe that emulation of the PS3 will be viable in the future (although not for a long time)
      • by oGMo (379)

        While this "simulator" is basically an emulation of the Cell hardware, it won't allow people to run games at full speed.

        Yeah, remember, the cell is just one component, you've got the GPU to worry about too, and make sure you can match the other system component performance (RAM bus and the like). Not impossible, but consider it took/takes a 100-200MHz intel system to emulate a 3MHz SNES. While other techniques are available (like dynamic recompilation and the like), these only go so far. If you cou

  • by Anonymous Coward
    Running Linux on one of these things is simply INSANE.

    I have been through a lot of chip transitions over the years and been impressed with the leaps each new generation has made.

    But Cell is something entirely different. It is such a HUGE leap in performance beyond x86 systems that to go back to using a x86 machine is unthinkable now for me. I almost feel drunk from the power I have at my hands...

    Read up all the Cell info you can at IBM's site and read the various patents IBM, Toshiba, and Sony have out ther
    • by smashr (307484) on Tuesday November 29, 2005 @07:29PM (#14142674)
      Running Linux on one of these things is simply INSANE.

      I have been through a lot of chip transitions over the years and been impressed with the leaps each new generation has made.

      But Cell is something entirely different. It is such a HUGE leap in performance beyond x86 systems that to go back to using a x86 machine is unthinkable now for me. I almost feel drunk from the power I have at my hands...

      Read up all the Cell info you can at IBM's site and read the various patents IBM, Toshiba, and Sony have out there. And find some way to get your hands on one of these...

      I can now see why the PS3 stuff we are seeing is so amazing...


      Sure, the cell is amazing, IF you are doing the right things. You say that you simply want to leave the old x86 architecture behind but the truth of the matter is that the two do not even begin to compare.

      It is not simply a matter of saying "OMG my cell has 8 cores at 4ghz". The main Power Processing Element is crippled at best for simple single threaded applications -- roughly equivalent to a PowerPC of the G3 era, but specifically in-order execution. The SPEs (the other 8 cores) are essentially mini vector computers. They can perform a massive amount of floating point calculations in parrallel, however they do not enjoy an inante ability to deal well with all sorts of code as a standard x86 cpu could.

      The cell designers have comptley sacrificed instruction level parrallelism in exchange for thread level parrallelism. It is certainly a valid and interesting way to achieve speed, but not for single threaded applications. -- Don't throw out your x86 just yet.
      • True, however (Score:4, Insightful)

        by geekoid (135745) <dadinportland&yahoo,com> on Tuesday November 29, 2005 @07:33PM (#14142708) Homepage Journal
        when the speed is fast enough that the single threaded applications run fast enough, even if technically crippled, will it matter?

        If cell is what what it claims to be, developers will create new applications use multi threaed applications. Compared to 15 years ago, multi-threading is a snap.
        • ...that 256KB local store for each SPU looks like a pretty severe bottleneck. You'll have to limit your execution code and data to this window, otherwise you'll take a severe penalty on fetch to main memory. The PPU isn't much to brag about in comparison to a modern G4 or G5, so your task damn well better make use of those SPUs or performance will seriously suck in comparison to a modern CPU. So, it looks to me like this thing will be amazing for lots of small, jobs like several tiny monte carlo sims each
        • What is this "fast enough" that you speak of?
        • when the speed is fast enough that the single threaded applications run fast enough, even if technically crippled, will it matter?

          If performance doesn't matter, it doesn't matter. The discussion is moot. Go and buy a cheap 386.

          If cell is what what it claims to be, developers will create new applications use multi threaed applications. Compared to 15 years ago, multi-threading is a snap.

          There seems to be a difference between that the cell claims to be, and what you perceives it to be. The SPUs of the

          • Numerical computing can deal with the 32-bit floating point issue pretty easily. Do you think no one did high precision mathematics on 16-bit CPUs? The techniques are old and well understood. Sure it costs some extra cycles, but when you have 8x4ghz going for you, you can easily afford it.

            Personally I think that anyone who does similar operations on a large set of data will LOVE the cell. If you can get a pipeline going where each SPU does one step of a larger algorithm, you can stream the data right throug
            • Yes, it'll be great for many things, but the Cell is not IEEE compliant in its 32-bit arithmetic, so algorithms that depend on denormalized numbers or infinities matching the spec will break. This actually matters for a graphics approach that would otherwise be attractive for the Cell, conformal geometric algebra, where plane primitives are infinite spheres and line primitives are infinite circles.
            • Numerical computing can deal with the 32-bit floating point issue pretty easily.

              I am no numerical analyst. But it seems to me that when you really need 8x4Ghz, you are doing some spiffy stuff. The more spiffy stuff you do, the higher precision you are going to need. While I agree that there are plenty of literature on "stable" numerical algorithms, after enough iterations, anything will become less accurate. Add to that the extra cost of developing the program for the cell (lots of workarounds for weak

          • No matter how many SPUs are in the cell processor, it won't make your builds go any faster, allow you to serve more webpages, database-clients, or whatever... For that we need general purpose CPUs.

            Actually, the new Sun T1 processors (1 floating point, 8 integer core CPUs) are what you'd want to serve more webpages for instance. Certainly not that 10GHz Intel processor coming out any day now [Tm]

            Supercomputing folks can use it, but only for 32bit operations. Depends upon the need, not solely the bits. No

        • If you have a single CPU which can only run an ADD operand, it doesn't really matter how many thousands times faster it does it than any other processor, you still won't be able to outperform those other processors.

          The "problem" is that the Cell architecture is highly specialized; it may take them much more code to do more generic stuff, enough to render it useless. Otherwise; why did they require a PowerPC core on the die as well?

          Cell is certainly interesting, and I expect a lot of the performance of it, b
      • by epine (68316) on Tuesday November 29, 2005 @10:28PM (#14143722)
        The cell designers have comptley sacrificed instruction level parallelism in exchange for thread level parrallelism. It is certainly a valid and interesting way to achieve speed, but not for single threaded applications.

        This analysis is incorrect, because it fails to recognize the fixed point. By sacrificing the out-of-order (OOO) mechanisms (which are brutal for heat production) they gained enough thermal headroom to effectively the double the clock rate. In the same thermal envelop, you either get an OOO processor running at 2GHz with three or four issues pathways (three has been the rule under x86) and a very deep pipeline, or you get a processor running at 4GHz with two issue pathways and a relatively short pipeline.

        A deep pipeline grants (partial) immunity from stalls and bubbles. A short pipeline grants (partial) immunity from branch misprediction effects. To make the deep pipelines work well, huge investments are required in the branch-prediction unit, which is also infamous for throwing off a lot of heat.

        The main Power Processing Element is crippled at best for simple single threaded applications ...

        Fortunately for Cell, this is also the wrong denominator for use in this discussion. Applications might be single threaded, but systems are hardly ever single threaded. While the SPU processors handle audio, video, encryption, block I/O and other compute/bandwidth intensive primitives that most systems engage, they also off-loading cache pollution from the main Cell processor threads, both in the data space and in the task scheduling space.

        Nothing will ever best the Pentium IV for single thread peak performance with no calorie spared. News flash: Intel has already given up on this flawed approach. The Pentium IV could easily beat the Opteron by cranking itself up to 6GHz if there was any practical way to extract 200W from a small core with no hot spots.

        OOO served its purpose in the era where cycle time was paramount and the processor to cache cycle time ratios were in closer balance. Now that heat has become the limiting factor, we'll be seeing a lot less of that from all parties.

        The reality in silicon is that we need to start rethinking those portions of the code base which only perform well under an OOO execution regime.

        This can be accomplished at so many different levels. The entire OpenSSL library can be recoded for SPU coprocessors with massive speed gains. Existing code can be recompiled with modern compilers which exploit large register sets to offset lack of hardware-level OOO. Key algorithms in system libraries can be recoded using better algorithms or memory access patterns.

        Those of you who insist on putting all your eggs into one 100W single threaded basket, it's time to step off the Moore's law express train. Hope you enjoy the milk run.


        • "The Pentium IV could easily beat the Opteron by cranking itself up to 6GHz if there was any practical way to extract 200W from a small core with no hot spots."

          Not the case. Among other things, modern code is highly dependant on memory latency. P4 as of late hasn't even been getting 60% of clock; Opteron gets nearly 95%.

          Your whole argument is why Intel developed the Itanium. The idea of producing a simpler CPU that is thermally more efficent is a novel one, but time and again we find that you can't erase th
          • >Your whole argument is why Intel developed the Itanium. The idea of producing a simpler CPU that is thermally >more efficent is a novel one, but time and again we find that you can't erase the last 15 years of CPU innovation Itanium still has instruction fusing (i.e. three instructions fused into a single instruction issue) and extensive HW branch support.
        • I do agree with your assessments of the value of non-OOO processors.

          But there's one thing OOO does that these processors will never do. That is efficiently run code that was not properly scheduled.

          Now, why would you generate code with the wrong scheduling? Well, you wouldn't do so on purpose. But in the field PCs frequently encounter it. This code is code that was scheduled for a different processor. As instruction latencies, CPU clocks and memory latencies change the optimal instruction order changes.

          So on
          • Addtionally, to mix in other arguments, I agree P IV could generate significant performance if it didn't run out of thermal headroom. You would need good caches and such but despite what the other poster says both Intel and AMD are affected similarly with memory latency and bandwidth issues. Perhaps AMD fares somewhat better. But not so much better that if the P4 were running at double its current clock rate that it wouldn't mop the floor with the AMD.

            And you could make the exact same argument about AMD m

            • Except unlike P IV, AMD's chips were designed properly.

              P IV was designed to run at 6GHz or something. And gate-delay wise, they could probably do it with minimal changes. Except then it produces too much heat due to transistor switching that it can't be cooled properly.

              AMD's chips however, were designed to run at the speeds they are running at. To make them go 4.4GHz would require redesigning them. But yes, they would also be much faster at those speeds.

              So, the argument could be made for AMD, but it's not a
        • This analysis is incorrect...

          Note that this dual issue PPE core is a 21 stage pipeline(similar to PIV Northwood), while AMD's K8 is a 12 stage integer and 17 stage FP combo. PPE is not PPC 7447A nor it's PPC 750FX.

          "you either get an OOO processor running at 2GHz with three or four issues pathways (three has been the rule under x86)"

          Not quite since a K8's macro-op instruction (fix length) is fused with two instructions (one of the instructions must be an address type instruction). K8 issues three macro-op i
        • The entire OpenSSL library can be recoded for SPU coprocessors with massive speed gains.

          A minor point, but this probably isn't a sensible thing to do. OpenSSL already supports crypto accellerators, so it would be better to write a kernel module that provided /dev/crypto using an SPU or two (or more, in very high load situations, like an eCommerce server).

        • Existing code can be recompiled with modern compilers which exploit large register sets to offset lack of hardware-level OOO.

          I saw this quote, and wondered why CPU manufacturers don't create a chip that is flexible. So instead of 8 registers, or 32, or 64, it would allow the programmer to address L1 cache as "registers" and to set aside a variable portion of L1 cache for the program's needs.

          • *cough* 8051 [wikipedia.org] *cough*
            • *cough* 6502 *cough* (the 6502's the one with page 0 and all)

              Interestingly enough, it might very well lower performance, rather than improve it, since that makes much, much more state to save during thread switches. It would also just about kill any chance of older programs (even for the same arch) running as well as new ones on each new iteration.
      • So... when can I buy a video card with one of these on it?

    • Since parent is an Anonymous Coward, it's a bit hard to believe, but if it does have one PowerPC-like thingy controlling 7 or 8 SPEs [wikipedia.org], it will be teh h0ttz0r (if used correctly - that PS3 better have good antialiasing this time; I notice the jaggies with the PS2 and I don't like 'em).

      (offtopic: To whoever has made post #14142136 (sqrt(2)*10000000-rounded-to-nearest integer): please reply, and congratulations. Hopefully you'll get the 31415927th too.)
    • Here's an article [ign.com] about Sony possibly using Linux on the PS3. The chances of this happening are good, we all remember how Sony released the Linux kit for the PS2.
    • by adisakp (705706) on Tuesday November 29, 2005 @07:37PM (#14142741) Journal
      Running Linux on one of these things is simply INSANE.

      I almost feel drunk from the power I have at my hands

      Here's some advice from someone who has access to a REAL CELL chip. I hate to disappoint you but aside from custom libraries specifically optimized for CELL, Linux ain't going to run fast on this machine. All the generic open source code targeted towards the general CPU is going to run faster on a Dual-Core Intel or Dual-Proc/Dual-Core Mac. The actual CPU's in this machine are simple pipelined (think Pentium I level of optimizations) vs current gen CPUs (P4 has out-of-order execution, speculative execution, register renaming, branch prediction, etc). While simple C code runs roughly the same speed, complicated C++ constructs are running 2-10X slower on CELL's simplified PowerPC core versus the G5's you'll find in a Mac.

      Code needs to be rewritten specifically to take advantage of the actual SPE/SPU's (Synergistic Processing Engines/Units - I prefer SPE since Sony calls their PS1/PS2 sound chip the SPE). Until those Linux libraries appear, CELL isn't going to run anything faster. Not to mention that it will have to be custom code libraries that DON'T run on the MAIN CPU since the SPE's execute different machine code.
      • Ok, you appear to be a developer who has experience with all the "major" chips. My question is this:
        For gaming, specifically games with a 3D engine, will the CELL be better than a top of the line P4 or Athlon 64? Let us assume that the entire code has been enginered for every chip. I believe the question that a lot of people have is if the XBOX chip is less powerful than the cell chip in the PS3. Again they want to know if someone wrote Wold of Warcraft or EQII for both platforms, and optimized both to
        • Every last bit of every demo for the PS3 was pre-rendered, or at least, that's the safest assumption to make. Sony did this with their last two systems, too. To be honest, I don't think we'll know how the games really look until the machine is released. I would be very surprised if it was significantly better than the other next-gen consoles. It's going to be about the same, just with more hype, as usual.
          • Considering nVIDIA's engineers have publicly stated that they hadn't finished designing (let alone debugging/testing) the silicon when the PS3 videos were made public, I'd say it's a safe bet to say there's no way in hell the graphics weren't pre-rendered. If the graphics chip wasn't designed yet, then it's not possible to have one fabricated and rendering the movies for E3.

            Nintendo has already said that while the Revolution will definitely be an improvement over the GameCube, it won't have the kind of qua
          • This is not true. Almost the exact opposite is true. Sony originally told everyone that there were no pre-rendered footage shown, but then found out that some of it was. They informed people but they couldn't comment on what was and what wasn't. But understand that Sony went out of their way to state the stuff shown was NOT pre-rendered. They did not do this with the PS2. Yes this is Sony, but it would look very bad on them to lie on this. Time will tell.

      • P4 has out-of-order execution, speculative execution, register renaming, branch prediction

        All of those features were introduced with the Pentium Pro, which was savaged at the time relative to the Pentium (which is far more like the Cell) because the pre-NT Windows codebase ran like crap in that regime (one factor was partial register stalls, but there were many issues). A decade later the compilers and general codebase has become extremely tweaked in the other direction.

        After the new code optimization fram
        • All of those features were introduced with the Pentium Pro, which was savaged at the time relative to the Pentium

          The Pentium Pro ran Windows NT much faster than an equivalent speed Pentium. A lot of the old 16-bit instructions, however, were microcoded rather than being natively executed, and took a few clocks longer. Since much legacy code at the time (games, anything with win16 roots including Window 95) made use of 16 bit instructions, they ran slower. Comparing Windows NT 4 on a 200MHz Pentium Pro

        • FWIW, the another NextGen game system :) has very similar problems with a different compiler since the simplified PPC cores are nearly identical. Complicated C++ code simply runs 2-10X slower on these simple pipelined chips. Straight "C" code runs nearly the same speed.

          To see loss in the 2-10 range suggests to me that the Cell is blocking on memory loads far more often than it should be, which could be a compiler fault.

          Here is a sequence that's hard to handle at the compiler level lacking OOO in hard
      • I prefer SPE since Sony calls their PS1/PS2 sound chip the SPE

        As far as I know, Sony call their PS1/PS2 sound chips as SPU and SPU2.
        • Yes, you're correct... that was a typo on my part. I prefer SPE for the Cell Synergistic unit so it *DOESN'T* conflict with the current SPU term we use for the PS1/PS2 sound chips. And while the PS2 official name is SPU2, nearly all developers (i.e. at PS2 Devcon) simply refer to it as SPU when discussing PS2 (that extra "2" is annoying to say a hundred times in a speech.
    • "I can now see why the PS3 stuff we are seeing is so amazing..."

      Because it's prerendered on Cell processors, naturally ;)
    • Except Cell doesn't support out of order execution making it not suitable for any normal operating system... i.e. Linux. This is the the same reason why Apple rejected the Cell.
      • It is not the only reason. If it was the only reason, there would be AMD choice at least for workstations.

        Apple made an exclusive agreement with Intel.

        Half of the stuff Steve Jobs says are lies.

        If Apple didn't become a white box Intel builder, there would be a destop variant of the Cell processor. They have chosen to bitch about PowerPC and remove legit benchmark results from their site emberassing Intel CISC stuff.

        They trusted the "cult like" zealotry behind them. They were proven right.
  • by donour (445617)
    I thought mambo was just a generic powerpc machine emulator. Not the cell...
  • It's great that we keep hearing about these things and we know they're out there with some great PS3 demos but all of this comes down to the point that I'm tired of hearing about them until I can turn on my cell workstation. The news I want is the workstation release!
  • Mambo is the name of an opensource CMS http://www.mamboserver.com./ [www.mamboserver.com] You would think these guys get out on the net and do a little research before naming a product.
    • From TFA:
      It had to be called something. Before, it was based on a previous product called SIM OS for PowerPC®, and we had to have a new name for it when we made it an IBM-only, proprietary tool. So, it was just a name that didn't have the word SIM in it, since there are so many simulators that have 'SIM' in their name. Then, for alphaWorks, we were forced to give it a more docile name. So, on alphaWorks I guess there is a reference that internally we call it Mambo, but it's called the IBM Full-System
    • by FooAtWFU (699187)
      It's called a 'codename'. The real name is apparently 'IBM Full-System Simulator for the Cell Broadband Engine processor'.

      Yes, all of IBM's products are named like that. I mean, every now and again they try to go for something neat and spiffy sounding like "WebSphere", but then they have to munge it all up with "Websphere Application Server" (WAS) and "Websphere Client Technologies Mobile Edition" (WCTME) and so on and so forth. This is normal for IBM, and this is why they really need code-names.
      A related s

      • Distinguished Engineer. IBM-speak for "This guy is so valuable that he can do anything he wants, go anywhere he wants, study anything he wants... and write his own paycheque"

        Only the very best get that designation.
    • Every word is already used as the name of a product. And the Mambo simulator was named around 2000 or 2001; when did the Mambo CMS start?
    • I know this is offtopic, but this whole "Mambo" name reminded me of a funny website my friend showed me years and years ago.

      Has anyone else been to www.zombo.com [zombo.com]? The infinite is possible at zombocom! The unattainable is unknown at zombocom! Welcome to ZOMBOCOM!!


      LOL it's the most pointless site on the web outside of a good laugh, but the funniest thing is that it's been up for years, I wonder who pays for the hosting?
  • Praise for Cell (Score:5, Informative)

    by acidblood (247709) <decio AT decpp DOT net> on Tuesday November 29, 2005 @08:34PM (#14143131) Homepage
    I've been running the simulator here, and managed to port the distributed.net client to it. The performance of current cores in the PPE is so-so (worse than the G4 in my Mac Mini), although I'm sure it would improve by proper optimization. The SPE is a completely different matter though. I wrote an RC5-72 core for it that should achieve ~190 Mkeys/s on 8 SPEs at 3.2 GHz, which is by itself almost ten times faster than the current fastest processor (G5 at 2.7 GHz, which clocks at 20 Mkeys/s, IIRC). For embarassingly parallel applications like key cracking, this thing is a dream.

    Some technical details: the SPE's instruction set could be though of as `Altivec plus'. It has most of the functionality of Altivec (so far I've only missed a byte addition instruction), but quite a few improvements, like immediate operands for many instructions, immediate loads with much better range than Altivec's splat instruction, the addition of double precision floating point operations, etc. I'm sure there are more improvements, but these are the ones I noticed from my limited experience with Altivec. Instruction scheduling for this processor is remarkably similar to that of the first Pentium: it's dual issue with static scheduling, there are some conditions on pairable instructions and their ordering to ensure dual issue, and so on. The high latencies for instructions (2 for most integer arithmetic, 4 for shifts and rotates) are problematic, but the huge register file of 128 entries is very helpful to implement techniques like software pipelining which help mask these latencies. The local store is a mixed bag -- dealing with arrays larger than the local store should be challenging, but if you don't have to worry about it, it's great to have a fixed latency of 6 cycles for loads and stores, no need to worry about cache effects and so on. Actually, the local store behaves a lot like a programmer-addressable cache, which has some benefits compared to traditional cache: specifically, less control overhead per memory cell (so more logic can be packed in the same space) and, as a consequence, the potential for higher speeds and/or smaller latencies.

    Overall, I'm very impressed with Cell, but for now I've only programmed toy examples and I'm sure to hit some limits of the architecture once I start looking at real-world code.
    • Hi.

      Could you speak more to performance issues when dealing with code/data that exceeds the 256K SPU local store? It looks to me like fetches from RAM are a real bottleneck, so if you want performance you need to keep code/data within each SPU. If you can chain a series of algorithms and move data down the chain this is a win. But if you need to manipulate a huge data block you're SOL. I can see the Cell being a huge win for say a series of Monte Carlo sims running in each SPU, but am it looks like a lose on
      • Re:Praise for Cell (Score:3, Informative)

        by acidblood (247709)

        Could you speak more to performance issues when dealing with code/data that exceeds the 256K SPU local store?

        I'll try, but take my opinion with a grain of salt as I didn't do anything beyond coding an RC5-72 core, which doesn't involve external memory accesses.

        It looks to me like fetches from RAM are a real bottleneck, so if you want performance you need to keep code/data within each SPU. If you can chain a series of algorithms and move data down the chain this is a win. But if you need to manipulate a huge

    • Well, RC5 IS pointless and emberassingly parallel...
      (I still remember when distributed.net was running RC-56 or something for 8 months on 100k machines, and some people just made some asics that had 100+ parallel key-piplines and build a a machine that could exhaust the keyspace in 3 days or so...

      So i wouldnt be too optimistic because of that little performance point....
      • Well, RC5 IS pointless

        Not as much as it seems at first glance. But that's a discussion for another day.

        (I still remember when distributed.net was running RC-56 or something for 8 months on 100k machines, and some people just made some asics that had 100+ parallel key-piplines and build a a machine that could exhaust the keyspace in 3 days or so...

        The chips were built by the EFF, and actually they cracked not RC5-56 but DES, which is also a 56-bit-key cipher but far more widespread. Also, by the time of the

        • Just for anyone wondering what the real-world point of such crypto power is outside of specialised circles:

          Using an SPE initialised with an AES decoder and encoder would mean that every single block loaded from or stored to the disk (including the swap file) could be AES encrypted with very little performance penalty. This would be a very nice feature in a laptop, since anyone who stole it would have no way of accessing the original user's files.

  • Mambo [mambo.com.au] is the name of a clothing brand as well. The company is run by Reg Mombassa [wikipedia.org] who was in an Australian band called Mental As Anything [wikipedia.org]. In retrospect it sounds apt. The whole six degrees of separation thing works! You'd have to be "mental as anything" to think that cell is going to make x86 obsolete so easily.
  • Amazing Cell Demo (Score:5, Interesting)

    by doctor_no (214917) on Tuesday November 29, 2005 @08:53PM (#14143261)
    Here is an impressive "virtual mirror" demo using the Cell processor put on by Toshiba. Basically, using a video camera, it can make a 3D model of the person in front of a the camera on the fly. Then it can manipulate the 3D model to change make-up, hair-styles, etc, basically a virtual magic mirror. Really demonstrates the truly unique features these more powerful processors will offer.

    http://techon.nikkeibp.co.jp/lsi/images/toshiba_ce ll.mpg [nikkeibp.co.jp]

    http://techon.nikkeibp.co.jp/english/NEWS_EN/20051 013/109623/ [nikkeibp.co.jp]

    • Damn!! Was that really real-time? I'm almost wanting to call it's bluff and say it was all choreographed. With the right AI program optimized for multi-threading, we could have HAL if enough CELL chips thrown at it. It may be crude, but it's worth a shot! Imagine the real-world application.

      "HAL: how much unread e-mail do I have?"

      "HAL: please set my alarm for 7:30am"

      "HAL: using google maps, please tell me how many miles and ETA it will be going from X to Y"

      and my favorite...

      "HAL: based on historical trends i
    • Re:Amazing Cell Demo (Score:1, Informative)

      by Anonymous Coward
      Apologies for A/C. This is probably a little less than a full 3D model construction. Having seen a real-time demo of a "morphable model" the almost certainly use priors on face shape.

      "First, the applications capture a user's face with a camera and detect the position of key features of the face, including the eyes, nose and mouth, using image recognition technology."

      this can be done real time quite effectively right now:

      http://citeseer.ist.psu.edu/rd/95418640%2C476373%2 C1%2C0.25%2CDownload/http%3AqSqqSqwww [psu.edu]
  • I don't own a machine that meets the simulator's minimum system requirement (namely, 2.0GHz or higher), but I'm so curious about it, that I'm willing to buy a new box just to try Mambo with CBE sim. So, what hardware platform is best for the simulator software?
    • PPC would probably be the best as it's the closest relative, however I think you are missing the point. Unless you are a developer with lots and LOTS of experience coding, this simulator would most likely be worthless to you. It's not going to show you anything meaningful about the PS3 (once again unless you are a developer) and you are most likely to become confused by even trying to run it. I'm not trying to say that I personally would have any better luck running it. It's just this really isn't a "t
      • Thanks for the suggestion, Mr. Garrett. I'll go pick up a PPC. That's also the processor used inside BlueGene, no?
        As for my plans for the Cell, I was thinking of writing either:
        • A toy hard real-time OS or
        • A toy Fortran compiler for it
        I don't care for the PS3/games, although I'll probably pick one up as well, just to get inside the Cell. There should be a stand-alone Cell workstation, really.
        (Ex-IBMer here! Good job guys!).
    • The highest-clocked K8 is probably your best bet; a 3.8GHz Pentium 4 probably wouldn't be bad either.
    • Actually the 2GHz requirement is overstated. We (ich bin ein IBMer) have run the simulator on laptops in the 1GHz range without any problems. But don't let me ruin your excuse to get a nice new computer!
    • I wonder that "Ghz" too. Is it Intel Ghz or PowerPC Ghz?

      A 1600 Mhz G5 can easily count as 2 Ghz P4 for example.

      (Don't tell Mr. Jobs about it)
  • The term "speaks out" has connotations, like revealing a dirty secret, which doesn't seem to be the case here. I think it would be prudent to choose one's headlines a little more carefully.
  • Daddy loves mambo...

    /sorry

  • it still takes 15 seconds to open Adobe Acro... Oh, nevermind. -M
  • "This story has nothing to do with mactel"

    When there is IBM and a SORT OF (read zealots) PowerPC story like this happens, you gotta concentrate too much not to think about Mactel.

    It is my personal point of view and I am kind of emberassed that whole Mac community became Intel zealots in 1 night.
  • As everyone seems to agree that running general-purpose code (e.g. Linux) on a Cell is going to be unpleasant thanks to the dumbing down of the PowerPC at the core, I was wondering what the odds are of seeing this as an add-on for doing vector-friendly operations. While I don't see people rushing out to install a Cell just for the hell of it, what are the chances that e.g. future crypto-offload accelerators or even 3D video cards might use one of these puppies?

From Sharp minds come... pointed heads. -- Bryan Sparrowhawk

Working...