Follow Slashdot blog updates by subscribing to our blog RSS feed

SW Weenies: Ready for CMT? 378

Posted by Hemos on Monday June 13, 2005 @09:20AM from the step-on-up dept.

tbray writes "The hardware guys are getting ready to toss this big hairy package over the wall: CMT (Chip Multi Threading) and TLP (Thread Level Parallelism). Think about a chip that isn't that fast but runs 32 threads in hardware. This year, more threads next year. How do you make your code run fast? Anyhow, I was just at a high-level Sun meeting about this stuff, and we don't know the answers, but I pulled together some of the questions."

This discussion has been archived. No new comments can be posted.

SW Weenies: Ready for CMT?

Load All Comments

Search 378 Comments Log In/Create an Account

Comments Filter:

Ready for CMT? Hell no! (Score:3, Funny)

by iostream_dot_h ( 824999 ) writes: on Monday June 13, 2005 @09:25AM (#12801905)

Now my hardware will force me to support CMT [cmt.com] on my computer? This is as bad as DRM.

Share
twitter facebook
- Re:Ready for CMT? Hell no! (Score:2, Funny)
  
  by ksheff ( 2406 ) writes:
  
  CMT is manufactured pop-country music at its worst. Yuck!
- Re:Ready for CMT? Hell no! (Score:4, Funny)
  
  by NoData ( 9132 ) writes: <_NoData_@y a h o o . com> on Monday June 13, 2005 @10:04AM (#12802219)
  
  Seriously! And why foist this garbage on the Star Wars (SW) weenies? Has John Williams gone country?
  
  Parent Share
  twitter facebook
  - Re:Ready for CMT? Hell no! (Score:3, Funny)
    
    by MarkGriz ( 520778 ) writes:
    
    "Has John Williams gone country?"
    
    No, that's his brother, Hank.
Schism Growing (Score:2, Insightful)

by SirCyn ( 694031 ) writes:

I see a deep schism growing in the processor industry. There are two main camps, the parallel processors, and the screemin single processors.

The parallel are used for intense processing. Research, servers, clusters, databases; anything that can be divided into many little jobs and run in parallel.

The other camp is the average user who just wants fast respons time and to play Doom 3 at 100+ fps.
- Re:Schism Growing (Score:2, Insightful)
  
  by GoatMonkey2112 ( 875417 ) writes:
  
  This will go away once there are games that take advantage of multiple processors. Eventually the game user will start to see the advantage of multiple processors. It's already starting to become clear when you look at the architectures of the next generation consoles.
  - how much for the best of both worlds? (Score:2)
    
    by nounderscores ( 246517 ) writes:
    
    If price was no object, someone could design a chip with more than two cores in it, and each core still ran as fast as any single core chip out there.
    
    Just the existance of one such device would heal the rift immediately. Everyone would say... aha! It is only a matter of time before blazing speeds and hardware threading comes to the desktop.
    - Re:how much for the best of both worlds? (Score:5, Informative)
      
      by InvalidError ( 771317 ) writes: on Monday June 13, 2005 @10:51AM (#12802633)
      
      Hardware threading has been mainstream for more than two years in the form of HyperThreading.
      
      Simultaneous Multi-Threading is a CPU's ability to concurrently execute mixed instructions from multiple threads. Intel's HT simply 2-ways SMT.
      
      Chip Multi-Threading is a CPU's ability to hold execution states for multiple threads, executing instructions from only one of them at a time unless the chip is also SMT.
      
      In Sun's case, the mid-term plan is to eventually offer 8-ways SMT with 32-ways CMT: the CPU can hold states for up to 32 threads and have in-flight instructions from as many as eight of them.
      
      Parent Share
      twitter facebook
  - Re:Schism Growing (Score:4, Interesting)
    
    by timford ( 828049 ) writes: on Monday June 13, 2005 @09:52AM (#12802138)
    
    You're right that the latest generation console CPU architectures reflect the trend of concurrent thread execution. That said, however, there seems to be a parallel trend developing that involves separating the general purpose CPU into independent single-purpose processors.
    
    The most obvious example of this is the GPU, which has been around for a long time. The latest moves toward this trend rumored to be in development are PPUs, Physics Processing Units. How long until game AI evolves enough that we have the need for AIPUs also?
    
    This approach obviously doesn't make too much sense in a general purpose computer because the space of possible applications and types of code to be run are just too large. It makes perfect sense in computers that are built especially to run games though, because we have a very good idea of the different kinds of code most games will have to run. This approach allows each type of code to be run on a processor that is most efficient at that type of code, e.g. graphics code being run on processors that provide a ton of parallel pipelines.
    
    Parent Share
    twitter facebook
- Re:Schism Growing (Score:5, Interesting)
  
  by philipgar ( 595691 ) writes: <pcg2NO@SPAMlehigh.edu> on Monday June 13, 2005 @09:42AM (#12802060) Homepage
  
  Actually from what I've heard, the entire industry is moving in this direction. The whole idea of out of order processors (OOP) has become outdated. OOP was great. Enabled massive single threaded performance, however the costs (in terms of area and heat dissipation) is enormous.
  
  I just came back from the DaMoN [cmu.edu] workshop where the keynote was delivered by one of the lead P4 developers. He explained the future of microprocessors and said that the 10-15% extra performance that OOP enables just isn't worth it. The Pentium 4 has 3 issue units, but the way things are rarely issues more than 1 instruction per cycle.
  
  We can squeeze more performance out of them, but not much. The easiest method is to go dual core. However if an application must be multithreaded to enable the best performance, what would you rather have . . . 2 highly advanced cores, or 8-10 simple cores that can issue half as many instructions per cycle as the dual core design. Than consider the fact that each core enables 4 threads to run (switch on cache miss/access). It doesn't take a rocket scientist to see that overall performance is improved with this.
  
  The other option is the hybrid core. A single really fast x86 core combined with multiple simpler x86 cores. That way single threaded apps can run fast (until they're converted) and you can get overall throughput from the system without blowing away your power budget on OOP optimizations.
  
  Granted most of this is in the future (within the next 5 years), but IBM's going that way (ala Cell), its within Intels roadmap, Sun is pushing that route etc. I assume AMD has plans to create a supercomputer on a chip . . . unless they wish to be obsoleted.
  
  Phil
  
  Parent Share
  twitter facebook
  - - Re:Schism Growing (Score:2)
      
      by jedidiah ( 1196 ) writes:
      
      Am I the only person that opens up the Windows task mangler and see hordes of proceses running?
      
      If you're not using an Atari ST, you can probably already exploit a multi-core cpu to your immediate benefit.
    - Re:Schism Growing (Score:5, Interesting)
      
      by swillden ( 191260 ) * writes: <shawn-ds@willden.org> on Monday June 13, 2005 @11:29AM (#12802949) Journal
      
      Exploring parallelism is a hard issue for many problems. For instance, most of my time I'm compiling C++ code. Usually I just need to compile one file (the one I changed and want to test), and this is not a parallel process.
      
      You'll still benefit from parallelism in two ways. First, a modern computer is rarely doing just one thing. The OS has some threads managing I/O and performing housekeeping operations, and you're probably also listening to some music, and you probably have some other apps running that occasionally need a little computation. So none of that stuff will impede your compile.
      
      Second, even a compiler can benefit from multiple threads, though current compilers don't do it. There are multiple stages in compilation, like pre-processing, lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization and code generation. The stages don't need to wait until the previous stage has completed its work on the entire file, so the stages can be parallelized to a large extent. It might even make sense to have multiple threads working on different chunks of the code for more computation-intensive stages, like optimization (which becomes even more important without out-of-order execution).
      
      It seems to me that linking could also be done in parallel with computation, to some degree. To a very large degree if you can guarantee that you don't have any symbols that override library symbols (else a use of a symbol could be linked against a library definition of that symbol before the compiler got around to noticing that you'd defined another definition).
      
      Perhaps the biggest problem with parallelizing compilation and linking to that degree will be I/O. On second thought, probably not, because modern machines have huge amounts of RAM for caching disk files.
      
      In an 8+ core machine, it may make sense to dedicate a core to memory management, also. Even with manual memory management (malloc/free), allocating and releasing memory consumes significant CPU cycles, so I could see value in offloading that to another thread. A "free" operation, from a compute thread's point of view, would be nothing more than notifying the memory manager thread that this block is now available for re-use. The memory manager thread would then take care of all of the bookkeeping needed. The manager could also arrange to have a list of blocks of commonly-needed sizes ready for instant allocation, and could even spend some CPU cycles on analyzing the allocation patterns of the compute threads to try to ensure that blocks are always available when needed. Obviously, pushing that idea further leads naturally to full-blown garbage collection, with fewer concerns about GC pauses.
      
      Although it's true that not all computations can be sped up by multi-threading, lots of them can, including lots that we're used to thinking of as inherently serial processes.
      
      Parent Share
      twitter facebook
  - - Re:Schism Growing (Score:5, Interesting)
      
      by philipgar ( 595691 ) writes: <pcg2NO@SPAMlehigh.edu> on Monday June 13, 2005 @01:09PM (#12803880) Homepage
      
      This is true. On a 500MHz machine OOP makes a huge difference. However when we move to a 4GHz machine that requires 400 cycles to access main memory, 25 cycles to access L2 cache and 4 cycles to access L1 cache, the difference between OOP and in-order starts to fall away. Even the best code on the best processors of today aren't getting a huge speedup from OOP. Also just because the processor is in order doesn't mean a memory/fp/int instruction can't all be run in parallel depending on how its designed (however they must be retired in order). The primary factor however is the memory hierarchy. If most applications are waiting on main memory or cache half of the time, even the most efficient processing can only speedup the processor by 50% (Amdahl's law). Phil
      
      Parent Share
      twitter facebook
      - Re:Schism Growing (Score:3, Informative)
        
        by CTho9305 ( 264265 ) writes:
        
        However when we move to a 4GHz machine that requires 400 cycles to access main memory, 25 cycles to access L2 cache and 4 cycles to access L1 cache, the difference between OOP and in-order starts to fall away.
        Actually, a major point of OO execution is to hide small delays - with the window sizes of modern processors, you can easily hide L1 latencies and possibly hide the latencies of L2, and only pay severely for accesses to main memory. An in-order core is the one that really loses performance as L1 and
- Don't worry (Score:2, Informative)
  
  by StupidKatz ( 467476 ) writes:
  
  You can have your parallel processors and still play DOOM III at insane fps. At worst, it will just take a bit for folks to start writing programs to take advantage of the additional processors/cores.
  
  BTW, your "average" user hasn't even played DOOM I, let alone DOOM III. Surfing the web and using e-mail doesn't usually put a lot of strain on a PC.
- Re:Schism Growing (Score:2)
  
  by selderrr ( 523988 ) writes:
  
  l33ts who want Doom3 at 100+fps can also benefit from massive paralellism : the graphics are offloaded to the GPU anyway, so what's left for the CPU is projectile & object positioning, and AI.
  
  imagine a future PC with 32656 CPUs, all running at a measly 40MHz, but each one dedicated to a single object in the game. All they have to do is calc the position of that single object. Might give some interesting results
  - Re:Schism Growing (Score:2)
    
    by rpresser ( 610529 ) writes:
    
    imagine a future PC with 32656 CPUs,
    
    What are the other 111 processors doing (32656 + 111 = 2^15-1)? Enforcing DRM?
    
    --
    Why the heck doesn't slashcode let me use <sup> and <sub>?
    - Re:Schism Growing (Score:3, Funny)
      
      by MynockGuano ( 164259 ) writes:
      
      Managing the cooling system and blue case LEDs.
  - Re:Schism Growing (Score:2)
    
    by Jeremy Erwin ( 2054 ) writes:
    
    imagine a future PC with 32656 CPUs, all running at a measly 40MHz,
    
    Ah, a PC where latency is king. Rather difficult to optimize, I should imagine.
- Re:Schism Growing (Score:2)
  
  by LWATCDR ( 28044 ) writes:
  
  "The other camp is the average user who just wants fast response time and to play Doom 3 at 100+ fps."
  I am afraid that is NOT the average user. Maybe the average high end gamer but not user.
  Parallel will be of use for the average user. Your typical PC runs about 43 processes. Yes even games will benefit once the game programmers start writing multi threaded code. For your average user I can see where you might even have a bunch of integer processors "sharing" a few blindingly fast CPUs. Sort of like a rever
- - Re:Schism Growing (Score:2)
    
    by trentblase ( 717954 ) writes:
    
    Yeah, the downside you're missing is the cost of having both.
Niagara Myths (Score:5, Insightful)

by turgid ( 580780 ) writes: on Monday June 13, 2005 @09:27AM (#12801920) Journal

I am totally not privy to clock-rate numbers, but I see that Paul Murphy is claiming over on ZDNet that it runs at 1.4GHz.
Whatever the clock rate, multiply it by eight and it's pretty obvious that this puppy is going to be able to pump through a whole lot of instructions in aggregate.
Ho hum.

On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second, provided it has 8 independent threads not blocking on I/O to execute.

It only has one floating-point execution unit attached to one of those 8 cores, so if you have a thread that needs to do some FP, it has to make its way over to that core and then has to be scheduled to be executed, and then it can only do one floating-point instruction.

Superb.

The thing is, all of the other CPU vendors with have super-scalar, out-of-order 2- and 4- core 64- bit processors running at over twice to three times the clock frequency.

You do the mathematics.

Share
twitter facebook
- Argh! (Score:3, Informative)
  
  by turgid ( 580780 ) writes:
  
  Today I have diarhea in the guts as well as the mind. I should have previewed that before I posted it.
  On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second, I meant per clock cycle, of course, not per second.
  
  The thing is, all of the other CPU vendors with have
  
  I meant "will have" not "with have".
  
  /me LARTS himself with a big stick.
- Shame (Score:4, Interesting)
  
  by gr8_phk ( 621180 ) writes: on Monday June 13, 2005 @09:53AM (#12802146)
  
  That's really a shame about the FP performance. My hobby project is ray tracing, and my code is just waiting to be run on parallel hardware. The prefered system would have multiple cores sharing cache, but seperate cache would be fine too. memory is not the bottleneck, so higher GHz and more cores/threads will be very welcome so long as they each have good performance. The code scales well with multiple CPUs as pixels can be rendered in parallel with zero effort - the code was designed for that. As it sits, I'm hoping my Shuttle (SN95G5v2) will support a AMD64x2 shortly. We're still not up for RT Quake, but interactive (read very jerky 1-2 fps) high-poly scenes are possible today.
  
  Parent Share
  twitter facebook
  - Re:Shame (Score:4, Insightful)
    
    by Knetzar ( 698216 ) writes: on Monday June 13, 2005 @11:03AM (#12802744)
    
    It sounds like you want a cell.
    
    Parent Share
    twitter facebook
    - Dude, you're gettin' a Cell! (Score:3, Funny)
      
      by spun ( 1352 ) writes:
      
      Sorry, sorry, sorry...
      I couldn't help it.
- Re:Niagara Myths (Score:4, Insightful)
  
  by Shalda ( 560388 ) writes: on Monday June 13, 2005 @09:58AM (#12802185) Homepage Journal
  
  Well, as you might expect, Sun has only a server mentality. The typical server runs few floating point instructions. In a lot of ways, Niagara would be very good at crunching through a database or serving up web pages. On the other hand, such a processor would be worthless on a desktop or a research cluster. I'd like to see actual real-world performance on these processors. I'd also like to see what Oracle charges them for a license. :)
  
  Parent Share
  twitter facebook
- Re:Niagara Myths (Score:3, Funny)
  
  by rwyoder ( 759998 ) writes:
  
  On a good day, with a following wind, Niagara might be able to do 8 integer instructions per second...
  
  Uh, I believe they said it was 1.4GHz, not 1Hz.
  - Re:Niagara Myths (Score:5, Funny)
    
    by turgid ( 580780 ) writes: on Monday June 13, 2005 @10:34AM (#12802472) Journal
    
    Uh, I believe they said it was 1.4GHz, not 1Hz.
    Yes, and I corrected myself straight away in another post. In true slashdot style, the post where I corrected myself got modded down to Offtopic.
    
    Parent Share
    twitter facebook
Steam Engine - Diesel (Score:5, Insightful)

by kpp_kpp ( 695411 ) writes: on Monday June 13, 2005 @09:27AM (#12801921)

Some people have predicted this move for quite some time. I remember hearing about it back in the late 80's early 90's and I'm sure it goes way back before then. The analogy was to Steam Engines and why they lost out over Diesels. You can only make a Steam engine so big but you cannot connect them together to get more power. With diesels you can hook many of them together for more power. Chips are finally getting to the same point -- It is more cost efficient to chain them together than to create a monsterous one. I'm surprised it has take this long to get to this point.

Share
twitter facebook
- Re:Steam Engine - Diesel (Score:5, Insightful)
  
  by turgid ( 580780 ) writes: on Monday June 13, 2005 @09:40AM (#12802040) Journal
  
  The problem has been the cost of software development. It's almost always cheaper to throw more hardware at a problem than invest in cleverer code. Highly parallel designs require very clever code. The Pentoum 4 debacle has finally shown that we're now at the stage where we're going to have to bite the bullet at develop that cleverer code. With ubiquitous high-level laguages running on virtual machines (e.g. Java) this is becoming more feasable since a lot of the gory details and dangers can be hidden from the average programmer.
  
  Parent Share
  twitter facebook
  - Re:Steam Engine - Diesel (Score:4, Informative)
    
    by arkanes ( 521690 ) writes: <arkanes&gmail,com> on Monday June 13, 2005 @10:28AM (#12802423) Homepage
    
    You cannot hide the gory details and also thread for (pure) performance, at least not to any signifigant degree, and not with our current ability to analyze programs. Some current compilers/languages can squeeze out some parallelism via analysis, but to prevent bugs they must be conservative, so you rarely get signifigant performance boosts. The key to parallelizing performance is minimizing information sharing, and thats a design/archiectural issue that can't really be addressed automatically. It's not simply a matter of higher level languages or cleverer code - the inherent complexities and dangers of multi-threaded programming are quite large, to the point where it's almost impossible to prove the correctness of any signifigantly multithreaded application while still gaining a performance boost.
    Note that I am talking about pure performance gain here, not percieved performance, such as keeping a GUI responsive during long actions - that kind of MT is generally slower than the single threaded alternative, and is fairly easy to keep correct.
    
    Gaining performance via multithreading requires you to seperate out multiple calculations, with minimal dependencies between them. The number of applications that can benefit from this is much smaller than you might think. I doubt very much that we'll see very many applications get a boost from dual/many core processers, and it's not just a matter of "re-writing legacy apps". What we will see is over all system speed increases on multi-threaded OSes.
    
    Parent Share
    twitter facebook
    - Re:Steam Engine - Diesel (Score:2)
      
      by turgid ( 580780 ) writes:
      
      You're right. I'm full of shit.
    - Re:Steam Engine - Diesel (Score:3, Insightful)
      
      by TopSpin ( 753 ) * writes:
      
      I doubt very much that we'll see very many applications get a boost from dual/many core processers, and it's not just a matter of "re-writing legacy apps".
      
      I think this is a foolish thing to doubt. As supercomputing evolved into parallelism the same thing was said; it's too hard, some things can't be done in parallel. Yet solutions have been found for most cases and there is no lack of desire for more parallel capacity today.
      
      Put enough cores in front of a twenty something Carmack wannabe and he'll figur
- Re:Steam Engine - Diesel (Score:2, Insightful)
  
  by spotvt01 ( 626574 ) writes:
  
  It's all about the scalability in processor architecture. And unfortunately, your analogy about diesel engines only goes so far. You can only chain so many pistons together before you have to worry about how effecient you can transfer the energy to the drive train. There is an upperbound of effectiveness. Concentrating on the number of pistons and ignoring each pistons' capabilites will leave you with a lot of hourse power but little torque. The same problem exists in multiple core designs, namely: only
- Re:Steam Engine - Diesel (Score:2)
  
  by flaming-opus ( 8186 ) writes:
  
  This has been going on for years. IBM gave up on bigger single CPUs about 1980, so did cray, cyber, and unisys. Everyone has been doing multiprocessors for decades now. The only new thing is that they are sticking lots of them on a single piece of silicon, instead of one per chip. (or multiple chips per cpu, as the case may be).
WTF? (Score:5, Funny)

by Timesprout ( 579035 ) writes: on Monday June 13, 2005 @09:28AM (#12801929)

and we don't know the answers, but I pulled together some of the questions."

What is this now, Questions for Nerds. Stuff we dont know?

Share
twitter facebook
well at least he seems to understand the problems (Score:5, Interesting)

by Anonymous Coward writes: on Monday June 13, 2005 @09:28AM (#12801936)

from TFA:
"Problem: Legacy Apps You'd be surprised how many cycles the world's Sun boxes spend running decades-old FORTRAN, COBOL, C, and C++ code in monster legacy apps that work just fine and aren't getting thrown away any time soon. There aren't enough people and time in the world to re-write these suckers, plus it took person-centuries in the first place to make them correct.

Obviously it's not just Sun, I bet every kind of computer you can think of carries its share of this kind of good old code. I guarantee that whoever wrote that code wasn't thinking about threads or concurrency or lock-free algorithms or any of that stuff. So if we're going to get some real CMT juice out of these things, it's going to have to be done automatically down in the infrastructure. I'd think the legacy-language compiler teams have lots of opportunities for innovation in an area where you might not have expected it."

Share
twitter facebook
- Re:well at least he seems to understand the proble (Score:2)
  
  by jstott ( 212041 ) writes:
  
  "Problem: Legacy Apps You'd be surprised how many cycles the world's Sun boxes spend running decades-old FORTRAN, COBOL, C, and C++ code in monster legacy apps that work just fine and aren't getting thrown away any time soon. There aren't enough people and time in the world to re-write these suckers, plus it took person-centuries in the first place to make them correct.
  
  Well, the Fortran programs have an easy solution---just recompile with a modern compiler designed for these CPU's. Any loop that can
  - Re:well at least he seems to understand the proble (Score:2)
    
    by daVinci1980 ( 73174 ) writes:
    
    Any loop that can be automatically unrolled can be parallelized instead.
    
    Please unroll the following loop automatically (not FORTRAN, but simple enough to translate):
    void AccumulateLoopCount(int N) { int accumulator = 0; for (int i = 1; i < N; ++i) { accumulator += i; } return accumulator; }
    
    Now make the code parallel.
    
    (I realize that this solution could actually be computed at compile-time for any known value of N, and I realize that there is a formula to compute this answer in constant
    - Re:well at least he seems to understand the proble (Score:2, Informative)
      
      by babble123 ( 863258 ) writes:
      
      Can I use OpenMP [openmp.org]? I
      
      void AccumulateLoopCount(int N) { int accumulator = 0; #pragma openmp parallel for reduction(+:accumulator) for (int i = 1; i < N; ++i) { accumulator += i; } return accumulator; }
      
      (I'm not actually an OpenMP programmer, so this syntax might be wrong...)
- Re:well at least he seems to understand the proble (Score:2)
  
  by strider44 ( 650833 ) writes:
  
  But those decade old apps can easily be done by one core in its spare time. I'm not sure why this is an issue.
  - Re:well at least he seems to understand the proble (Score:2)
    
    by Sique ( 173459 ) writes:
    
    Because sometimes the sheer amount of data those applications have to calculate has increased. Or because a calculation that once was done once a week during the weekend on several machines with separate data groups in parallel is now done as an instant report at the fingertip of a clueless manager, who just want to be the 'numbers to be up-to-date' (of course THIS calculation can be parallelized, but not in an algorithmic way, but by separating independent data).
- Re:well at least he seems to understand the proble (Score:2)
  
  by archeopterix ( 594938 ) * writes:
  
  from TFA:
  I guarantee that whoever wrote that code wasn't thinking about threads or concurrency or lock-free algorithms or any of that stuff.
  Well, perhaps it's a job for the compiler to make that code thread-aware, at least to some degree. Two consecutive function calls that you (the compiler) know to be independent? Execute them in parallel. A loop running over 10000 independent objects? Split it into k loops, 10000/k objects each.
  Of course the compiler has severe limits as to what it can really guess
How is this different from having multiple cores? (Score:2, Offtopic)

by MichaelSmith ( 789609 ) writes:

...and isn't this the challenge being addressed by DragonFly BSD? [dragonflybsd.org]

Software people use threads already, as long as the VM and OS are up to the task. I don't see why it should matter if some of the threads are implemented in hardware.
Vader vs. Brooks? (Score:3, Funny)

by hraefn ( 627340 ) writes: on Monday June 13, 2005 @09:29AM (#12801941) Homepage

I almost thought this was going to be about Star Wars nerds being forced to watch something on Country Music Television.

Share
twitter facebook
- Re:Vader vs. Brooks? (Score:2, Insightful)
  
  by Aspasia13 ( 700702 ) writes:
  
  I almost thought this was going to be about Star Wars nerds being forced to watch something on Country Music Television.
  
  Look out! It's Garth Vader!
One Weenie's Perspective (Score:5, Funny)

by Anonymous Coward writes: on Monday June 13, 2005 @09:30AM (#12801953)

Well I am a Star Wars weenie, and I am definitely NOT ready for Country Music Television.

Share
twitter facebook
Not really an issue (Score:3, Insightful)

by MemoryDragon ( 544441 ) writes: on Monday June 13, 2005 @09:30AM (#12801954)

given the fact, that I havent programmed a single threaded program in years.

Share
twitter facebook
Question: What needs multiple threads? (Score:2, Interesting)

by dostert ( 761476 ) writes:

As a scientific programmer, all I know is that this will eventually be a huge benefit to all my MPI and OpenMP codes.

I really only know the "scientific" programming languages, but most all math specific routines are already written for parallel machines. I'm a bit curious, what else really needs multiple threads? Isn't the benefit of dual-core procs the ability to not have a slow-down when you run two or three apps at a time? Don't games like DOOM III and Half-Life II depend mostly on the GPU (which I'm gu
- Re:Question: What needs multiple threads? (Score:5, Insightful)
  
  by Frit Mock ( 708952 ) writes: on Monday June 13, 2005 @09:52AM (#12802141)
  
  In games the AI of non-player-characters (-objects) can profit a lot from threading.
  
  But for common apps ... I don't expect a big gain from multiple threads. I guess typical apps like browsers, word-processor and so one have a hard time utilizing more than 3-4 threads for the most common operations a user does.
  
  Parent Share
  twitter facebook
  - Re:Question: What needs multiple threads? (Score:3, Informative)
    
    by flithm ( 756019 ) writes:
    
    Actually that's not necessarily true. It's definitely true right now though. Most developers haven't really been tought to think in terms of parallelism when designing software, but that's starting to change.
    
    It's all about the algorithms. Once multi-core chips have been mainstream for a while, all the algorithms out there will start to get converted to take advantage of parallel processing. And there are already algorithms out there that do this... this page [cmu.edu] has a small repository of parallel implement
- Re:Question: What needs multiple threads? (Score:3, Insightful)
  
  by TheKidWho ( 705796 ) writes:
  
  umm, better physics and AI for games is what I can think of off the thop of my head =)
- Re:Question: What needs multiple threads? (Score:3, Interesting)
  
  by James McP ( 3700 ) writes:
  
  The simplest example is OS runs on one, the game another. But it's really not that simple. Let's take a typical Windows box since it's the bulk of the market.
  
  Thread 1: OS kernel
  Thread 2: firewall
  Thread 3: GUI
  Thread 4: print server
  Thread 5-7: various services (update, power, etc)
  Thread 8: antivirus
  Thread 9: antivirus manager/keep-alive
  Thread 10-16: spyware (I said a typical Windows box)
  Thread 17+: applications
  
  Yeah, CMT will be handy out of box as long as the OS is aware. I expect it will be wasteful the
  - Re:Question: What needs multiple threads? (Score:2)
    
    by CrayzyJ ( 222675 ) writes:
    
    While on the surface, your idea really does not work. The cache will thrash like mad, the IO bus will be clogged, and paging (may) be a bottleneck. What if all 17 threads make a system call at the exact same time. The locking will bring the system to a screaching halt.
    
    What you propose (not a horrible idea, btw) requires much more than just some threads in the CPU.
- Can use, not needs! (Score:2, Interesting)
  
  by try_anything ( 880404 ) writes:
  
  If single-threaded performance improvements slow down, and the available computing power is spread out among multiple cores, anyone persisting in writing single-threaded code will fall behind in performance.
  
  Remember the old days when people used fancy tricks to implement naturally concurrent solutions as single-threaded programs? The future is going to be just the opposite. Any day now we'll see a rush toward langages with special support for quick, clear, safe parallelism, just like we've seen scripting
- Re:Question: What needs multiple threads? (Score:2, Informative)
  
  by timford ( 828049 ) writes:
  
  You high-and-mighty scientific code snobs looking down on us game programmers! =)
  Actually there is a whole lot to games like DoomIII and HL2 than what can be run on the GPU. First of all, a lot of the graphics-related code is never run on the GPU, it's run on the CPU (for example shadow-processing code), which then passes on the info to the GPU to do the actual rendering.
  
  Secondly multiple core GPUs doesn't make that much sense to me. The nature of graphics processing is completely SIMD (like much o
- Re:Question: What needs multiple threads? (Score:2)
  
  by bradkittenbrink ( 608877 ) writes:
  
  The argument is that GPU's are good for turning polygons into pretty pixels, but not much else. Physics and AI are nice and all but they probably only use only a couple threads each before you can't parallelize it any more. The truly scalable benefits of multi-core design will come from "procedural generation" if your game is running on a 4 core cpu, you can send say 400,000 polygons to the gpu, if your game is running on a 32 core cpu you can send 3,200,000 polys to the gpu, if your game is running on a
Big Hairy Package (Score:5, Funny)

by Tweak232 ( 880912 ) writes: on Monday June 13, 2005 @09:37AM (#12802018)

"The hardware guys are getting ready to toss this big hairy package over the wall:"

Vivid imagary...

Share
twitter facebook
Screw CMT; Time to use wasted CPU (Score:2)

by WindBourne ( 631190 ) writes:

Look, if you have 32 threads operating at 1/32 of GHz, or you have 1 thread operating at 2GHz, then it is a basic wash (not really, but close enough).

I would be far more interested in taking advantage of all the CPU cycles that run all over at Businesses. THink of how much wasted cycles there are running Screen Saver, or a Word document. By distributing the load amongst the systems, then a large number of things can be done.
- Re:Screw CMT; Time to use wasted CPU (Score:3, Interesting)
  
  by David McBride ( 183571 ) writes:
  
  I would be far more interested in taking advantage of all the CPU cycles that run all over at Businesses.
  
  Condor [wisc.edu].
Programming isn't up to it (Score:5, Interesting)

by Toby The Economist ( 811138 ) writes: on Monday June 13, 2005 @09:38AM (#12802026)

32 threads in hardware on one chip is the same as 32 slow CPUs.

Current programming languages are insufficiently descriptive to permit compilers to generate usefully multi-threaded code.

Accordingly, multi-threading is currently handled by the programmer; which by and large doesn't happen, because programmers are not used to it.

A lot of applications these days are weakly multi-threaded - Windows apps for example often have one thread for the GUI, another for their main processing work.

This is *weak* multi-threading; because the main work done occurs within a single thread. Strong multi-threading is when the main work is somehow partioned so that it is processed by several threads. This is difficult, because a lot of tasks are inherently essentially serial; stage A must complete before stage B which must complete before stage C.

The main technique I'm aware of for making good use of multi-threading support is that of worker-thread farms. A main thread receives requests for work and farms them out to worker threads. This approach is useful only for a certain subset of problem types, however, and within the processing of *each* worker thread, the work done itself remains essentially serial.

In other words, clock speeds have hit the wall, transistor counts are still rising, the only way to improve performance is to have more CPUs/threads, but programming models don't yet know how to actually *use* multiple CPU/threads.

El problemo!

--
Toby

Share
twitter facebook
- Re:Programming isn't up to it (Score:2)
  
  by Ann Elk ( 668880 ) writes:
  
  32 threads in hardware on one chip is the same as 32 slow CPUs.
  
  So, Sun managed to put an NCR Voyager on a single chip? Uhh... cool?
- Re:Programming isn't up to it (Score:5, Interesting)
  
  by flaming-opus ( 8186 ) writes: on Monday June 13, 2005 @10:17AM (#12802323)
  
  You are absolutely incorrect.
  multi-threaded programming is the predominant programming model on servers. Some tasks, such as web serving, mail serving, and to some degree data-base machines scale almost linearly with the number of processors. All of the first tier, and some of the second tier server manufacturers have been selling 32+-way SMP boxes for years. They work pretty damn well.
  
  Sun is not trying to create a chip to supplant pentiums in desktops. They are not going for the best Doom3 performance. They want to handle SQL transactions, and IMAP requests, and most likely are targetting this at JSP in a big way.
  
  As a user of a slightly aged sun SMP box, I'd rather have those many slow CPUs and the accompanying I/O capability, than a pair of cores that can spin like crazy waiting for memory.
  
  Parent Share
  twitter facebook
  - Re:Programming isn't up to it (Score:2)
    
    by Toby The Economist ( 811138 ) writes:
    
    > Some tasks, such as web serving, mail serving, and
    > to some degree data-base machines scale almost
    > linearly with the number of processors. All of the
    > first tier, and some of the second tier server
    > manufacturers have been selling 32+-way SMP boxes
    > for years. They work pretty damn well.
    
    I explicitly described this method of multi-threading in my reply.
    
    I also noted that when the work done by each thread is examined, it is performing serial tasks; e.g. it is internally single-threaded,
  - Re:Programming isn't up to it (Score:3, Informative)
    
    by johnhennessy ( 94737 ) writes:
    
    Could be wrong here, but I thought that the main reason for implementing a CMT chip with "hardware threads" was to make the context switch less painful.
    
    On single processor systems, when it wants to switch between two threads, it usually executes a context switch - it needs to dump one set of registers to memory, load the other set from memory and change the instruction pointer.
    
    That usually adds up to two seperate memory accesses to different parts of memory. What's more, is that it is not always possible
  - Re:Programming isn't up to it (Score:3, Informative)
    
    by be-fan ( 61476 ) writes:
    
    Multithreading is dominant because it's the only way to wring parallelism out of legacy languages like C. And nobody claims multithreading is easy, natural, or anything but error-prone. The future is really in languages that have formal abstractions for concurrency, so programmers can specify at a high level what tasks can be concurrent and let the compiler do the low-level locking. Basically, you want languages based on a concurrent calculus of computation (eg: Pi-calculus), instead of languages based on l
    - - Re:Programming isn't up to it (Score:3, Informative)
        
        by be-fan ( 61476 ) writes:
        
        Eh? Locks suck for debugging, nobody in their right mind likes debugging multithreaded code. If anything, this will make it easier to debug parallel code. Once you have a formal model for expressing concurrency, it makes it much easier to reason about the code and figure out where something went wrong. Further, since the compiler can understand a formal concurrency model in a way it cannot understand an ad-hoc concurrency model, the compiler can offer tools to aid in debugging concurrent applications.
        
        Now,
        
        Re:Programming isn't up to it (Score:3, Informative)
        
        by be-fan ( 61476 ) writes:
        
        The slowest possible way to find a problem is to "reason about the code"!
        
        It's the only Right Way (TM) to find a problems. Now, I don't recommend reasoning about the code to find typos, but then again, you shouldn't be making typos anyway.
        
        In my experience, problems with multi-threaded code almost never come from a lack of understanding of how to write multi-threaded code.
        
        Most people would disagree. Almost invariably, problems with multithreaded code are the fault of the programmer. Race conditions, dea
- Re:Programming isn't up to it (Score:2)
  
  by rabtech ( 223758 ) writes:
  
  It has very little to do with programmers "not being used to it".
  
  Many problems require the result of operation X to complete operation Y; in other words the algorithms are naturally serial in nature and are not easily amenable to parallelism.
  
  There are a few clever tricks but in some cases making a serial operation parallel gives vastly decreasing performance gains (i.e. two threads = 110% of one thread, four threads = 105% of two threads, etc).
- Re:Programming isn't up to it (Score:4, Interesting)
  
  by Dark Fire ( 14267 ) writes: <clasmcNO@SPAMgmail.com> on Monday June 13, 2005 @10:36AM (#12802491)
  
  "Current programming languages are insufficiently descriptive to permit compilers to generate usefully multi-threaded code."
  
  I agree.
  
  However, I believe that Functional programming languages would seem to have the best chance of successfully taking advantage of multiple threads of execution. Google has 100,000+ computers doing this now using functional programming ideas.
  
  As pointed out in other posts, not every problem will benefit from parallelism. With research and time, this might change. Many problems can be represented in both procedural constructs and recursive constructs. The procedural has been considered the most comprehendable and implementable for the past three decades. This may have to change in light of the direction the hardware technology is going.
  
  Parent Share
  twitter facebook
  - Re:Programming isn't up to it (Score:2)
    
    by sleepingsquirrel ( 587025 ) writes:
    
    CMT, meet CTM [ucl.ac.be]
  - - Re:Programming isn't up to it (Score:4, Insightful)
      
      by Dark Fire ( 14267 ) writes: <clasmcNO@SPAMgmail.com> on Monday June 13, 2005 @05:59PM (#12807009)
      
      From the parent post:
      
      "Current programming languages are insufficiently descriptive to permit compilers to generate usefully multi-threaded code."
      
      The portion of importance is:
      
      "insufficiently descriptive"
      
      In C, C++, and Java, you must program with concurrency in mind to obtain any benefit from multiple threads of execution. In a functional programming language, the restrictions placed on the behavior of functions often imply concurrency without the programmer necessarily intending that as the result. If you write a C program without concurrency in mind and want to adapt your solution later to take advantage of multiple threads, you may need to code a completely different solution and also locate a compiler that knows how to take advantage of concurrency. In a functional language, you may only need to get an updated version of your compiler/interpreter. This is why C, C++, and Java are in the "insufficiently descriptive" category and functional programming languages are not.
      
      Parent Share
      twitter facebook
OLTP systems (Score:3, Informative)

by bunyip ( 17018 ) writes: on Monday June 13, 2005 @09:43AM (#12802070)

Now of course, the room was full of Sun infrastructure weenies, so if there's something terribly obvious in records management or airline reservations or payroll processing that doesn't parallelize, we might not know about it.

Well, since I work in airline reservations systems, I'll add my $0.02 worth...

Most OLTP systems will benefit from CMT and multi-core processors. We had a test server from AMD about a month before the dual-core Opteron was announced, we did some initial testing and then put it in the production cluster and fired it up. No code changes, no recompile, no drama.

IMHO, the single-user applications, such as games and word processors, will be harder to parallelize.

Alan.

Share
twitter facebook
What a totally vague and useless post, yipee! (Score:2, Insightful)

by tomstdenis ( 446163 ) writes:

First off, performance + java != good idea. Not trying to camp fanbois here but if you really need "down to the metal" performance you're writing in C with assembler hotspots.

So the observations that there is too much locking in Java's standard api is informative but not on-topic. the fact that the standard solution is to use a completely new class [e.g. StringBuilder] is why I laughed at my college profs when they were trying to sell their Java courses by saying "and Java is well supported with over 900
- You might want to go back to school... (Score:5, Insightful)
  
  by putaro ( 235078 ) writes: on Monday June 13, 2005 @11:05AM (#12802762) Journal
  
  and take some advance architecture courses.
  
  The BEST a single core multi-thread design can hope for is the performance of a single core single thread design...
  
  I'm sorry but that turns out not to be the case.
  
  When you have a system that is running lots of different threads simultaneously the amount of time that it takes to do a context switch from one thread to another becomes an issue. In the real world, threads often do things like I/O which cause them to block or they wait on a lock. If you can do a fast context switch you get back the time that you would have wasted saving registers off to RAM and pulling back another set. Faster thread switching means that your multi-thread single core now runs its total load (all of the threads) faster than a single core single thread design. Also, things like microkernels become a lot more feasible (microkernels are notorious for being slow because context switches are slow).
  
  When you have looked beyond your desktop machine maybe you'll have earned the right to sneer at your professors. I don't think you're there yet.
  
  Parent Share
  twitter facebook
- - Re:What a totally vague and useless post, yipee! (Score:2)
    
    by tomstdenis ( 446163 ) writes:
    
    The idea is adding register sets [re: threads] somehow makes the process more efficient. My comment is that if your ALU pipeline is well stuffed another thread won't have the execution resources it needs [and likely just get in the way anyways].
    
    So if you make a shoddy ALU that stalls a lot another register set can get you better performance overall [but not for individual threads] and if you make a good ALU your extra register set in hardware buys you VERY LITTLE.
    
    A dual core cpu is something else. that'
We all are (Score:2)

by 3770 ( 560838 ) writes:

We all are.

If one of your favorite applications happen to be multithreaded then that's gravy.

But you'll benefit anyway. If you bring up your process list you'll see that you have probably at least 10 processes. These will now be able to run independently.

Also, the windows kernel itself can benefit from hardware threads.
- Re:We all are (Score:2)
  
  by tomstdenis ( 446163 ) writes:
  
  This is total f'ing hype. If you have an efficient ALU multi-threading won't help crap [in the hardware front, it does in software where you may have blocked threads, etc...].
  
  Think about it this way. You have one car that can carry you and your buddies to work at 50mph and two cars that can take you and your buddies to work at 30mph.
  
  Sure the two cars let you do independent things but when you're working on one task [getting to work] you're not ahead.
  
  In a video game context for instance, you do have mul
  - Re:We all are (Score:2)
    
    by 3770 ( 560838 ) writes:
    
    My point was, you'll benefit from multiple hardware threads (dual cores or more) even if your applications aren't multithreaded.
    
    Do you disagree with that?
  - I disagree (Score:2)
    
    by NigelJohnstone ( 242811 ) writes:
    
    "Sure the two cars let you do independent things but when you're working on one task [getting to work] you're not ahead."
    
    But you're not, you never are working on only 1 task.
    
    Look at the threads running on a PC and its hundreds, you have file cache threads, communications threads, all kinds of stuff running.
    
    A whole convoy of cars all sitting in one lane waiting for the car in front.
    You keep the speed limit the same, make the highway 8 lane and 8 times the cars can pass through.
    
    Also you would save the thr
What doesn't scale (and what does) (Score:2)

by davecb ( 6526 ) * writes:
Last year I as at a big commercial shop, looking at performance of a bunch of billing-like programs, and noticed:
- Some older C, C++ and embedded-SQL programs are written without consideration of parallelization: they're single-process single-thread.
- If the customer is large, the majority of the single-process single-thread programs have been rewritten to allow one to run multiple instances, so they can use more than one CPU.
The latter can scale on multi-processors, and mostly do. Much of our performa
Need a breakthrough in hiding concurrency (Score:3, Insightful)

by argent ( 18001 ) writes: <peter AT slashdo ... taronga DOT com> on Monday June 13, 2005 @09:53AM (#12802154) Homepage Journal

Every time someone exposes concurrency at some layer as a way of improving performance, rather than because you're implementing a process that's inherently concurrent, it's a huge clusterfuck. Doesn't matter whether it's asynchronous I/O, out-of-order execution, multithreaded code, or whatever. Even when you're dealing with a concurrent environment like a graphical user interface the most successful approaches involve breaking the problem down into chunks small enough you can ignore concurrency.

One of UNIX's most important features is the pipe-and-filter model, and one of the really great things about it is that it lets you build scripts that can automatically take advantage of coarse-grained concurrency. Even on a single-CPU system, a pipeline lets you stream computation and I/O where otherwise you'd be running in lockstep alternating I/O and code.

That's where the big breakthroughs are needed: mechanisms to let you hide concurrency in a lower layer. Pipelines are great for coarse-grained parallelism, for example, but the kind of fine grain you need for Niagara demands a better design, or the parallelism needs to be shoved down to a deeper level. Intel's IA64 is kind of a lower level approach to the same thing where the compiler and CPU are supposed to find parallelism that the programmer doesn't explicitly specify, but it suffers from the typical Intel kitchen-sink approach to instruction set design.

Share
twitter facebook
Hdw multi-thread vs multi-CPU (Score:2)

by Intron ( 870560 ) writes:

Isn't the big issue cache? On a multi-CPU system running one thread per CPU, each thread has its own cache. On HMT, the cache is shared. Threads running in different sections of code on different data will tend to reduce cache hits, offsetting the performance gain of the multiple threads. The limit on increasing the number of threads is that most of the threads will be waiting on cache misses.
The bottlenecks (Score:4, Interesting)

by davecb ( 6526 ) * writes: <davecb@spamcop.net> on Monday June 13, 2005 @09:58AM (#12802184) Homepage Journal

CMT is a good approach for dealing with the speed mismatch between CPUs and memory, our current Big Problem
I'll misquote Fred Weigel and suggest that the next problem is branching: Samba code seems to generate 5 instructions between branches, so suspending the process and running something else intil the branch target is in I-cache seems like A Good Thing (;-)).

Methinks Samba would really enjoy a CMT processor.

--dave

Share
twitter facebook
dead end (Score:3, Insightful)

by cahiha ( 873942 ) writes: on Monday June 13, 2005 @10:02AM (#12802205)

Threads are actually one of the simplest form of parallelism to deal with and we have had decades of experience with them. That's why Sun loves them: it fits in well with their big-iron philosophy and hardware and makes it easy for their customers to migrate to the next generation.

But the future of high-end computing, both in business and in science, will not look like that. Networks of cheap computing nodes scale better and more cost-effectively. Many manufacturers have already gone over to that for their high-end designs. That's where the real software challenges are, but they are being addressed.

Processors with lots of thread parallelism will probably be useful in some niche applications, but they will not become a staple of high-end computing.

Share
twitter facebook
How to make code run fast? (Score:3, Interesting)

by Apreche ( 239272 ) writes: on Monday June 13, 2005 @10:06AM (#12802234) Homepage Journal

Easy. In present days there are some assembly instructions that can be executed simultaneously. With a chip like this however, all bets would be off. Instead of just a meager few instructions that could be executed simultaneously you would be able to execute any number of instructions simultaneously.

So if you have a function that say does 10 additions and 10 moves you would first figure out if any of them needed to be done before or after each other. Then see which ones don't matter. Then write the function to do as many at once as possible.

It really doesn't matter for anyone other than the compiler writers. Those guys will write the compiler to do this kind of assembly level optimization for you. The trick is writing a high level language, or modifying an existing one, so the compiler can tell which things must be executed in order and which can be executed side by side.

Share
twitter facebook
My CPU left me, and the Flatscreen died.... (Score:2)

by the_weasel ( 323320 ) writes:

Am I the only person who was wondering why slashdot was talking about Country Music Television for a moment there?

* crickets *

Time to hand in my nerd badge I guess, and slink off into the sunset.

Seriously, though - thanks for clarifying the meaning of CMT in the blurb. A big step forward from the usual Slashdot blurb.
What is the problem here? (Score:2)

by borud ( 127730 ) writes:

Every time someone mentions systems with more processors or more cores, there is a lot of whining from people who think that making software take advantage of more processors is such a monumental task.
It isn't. And it isn't just scientific data chugging which would benefit from increased availability of actual concurrent processing in typical desktop computers; there are currently many of these PCs that already to things that can be paralellized.

For instance image processing. For many kinds of imag
Compilers and "Events Model" (Score:2)

by xtracto ( 837672 ) writes:

The main problem with paralelism for the general application is the current model. The "Event Model" that is used nowadays as the basic processing model for applications specifies that the program will stay idle until the user press a key or moves the mouse (or push buttons).

With this model it is kind of hard to use the multithreading processors. Of course after the user has triggered an action then the program could make use of the threading capability to improve its performance.

Next comes the problem of
- Re:Compilers and "Events Model" (Score:2)
  
  by putaro ( 235078 ) writes:
  
  Strangely, in the market that Sun is targeted at, the server market, applications are written to be multi-threaded and do not run off an event model because they do not have GUIs!
  
  Another way to use multithreading could be from the Operating System, so the programs [that do not require] multithreading wont have to deal with it BUT the operating system would use the multithreading capacities to schedule the processes execution... in this way we may get [AT LAST] a [REAL] multiprocess OS (and not the illusio
Sun needs more raw performance (Score:2)

by PureCreditor ( 300490 ) writes:

An UltraSparc that runs 32 threads of CMT, but combined of merely a few hundred MIPS, is worse than an IBM Power or AMD Opteron that requires software context switches, but crunches out thousands of MIPS. Sun needs a clearer server/CPU strategy than throwing a whole new paradigm on the table PER UPGRADE CYCLE.
Old news for IBM, this is just Sun catching up (Score:2, Interesting)

by The Mad Duke ( 222354 ) writes:

IBM started SHIPPING Power5 with SMT capablility August 31 of last year - IBM has SMT running on 1.9 GHz processors today. Sun is getting farther and farther behind.
New job - new tools (Score:2)

by el_womble ( 779715 ) writes:

Traditional languages that have had threads bolted on like C/C++ make threading more challenging than it needs to be. Java, as long as you understand the principles of concurrency, makes it a breeze. I would be interested to see weather a well coded JVM / JIT could outperform traditional languages on these new CPUs - especially if you could dedicate a couple of the hardware threads to JIT, and GC threads.
The author doesn't understand Java class locking (Score:3, Informative)

by putaro ( 235078 ) writes: on Monday June 13, 2005 @10:51AM (#12802624) Journal

From the article:

The standard APIs that came with the first few versions of Java were thread safe; some might say fanatically, obsessively, thread-safe. Stories abound of I/O calls that plunge down through six layers of stack, with each layer posting a mutex on the way; and venerable standbys like StringBuffer and Vector are mutexed-to-the-max. That means if your app is running on next year's hot chip with a couple of dozen threads, if you've got a routine that's doing a lot of string-appending or vector-loading, only one thread is gonna be in there at a time.

Classes such as StringBuffer and Vector are locked (synchronized) on a per-object basis. As long as you aren't trying to access the same object from different threads you won't block. And if you are trying to access the same object from different threads you will be happy that they were thread-safe!

The performance problems of having these classes being obsessive about thread safety do not result from the locking forcing singlethreadedness. The performance problem stem from the cost of locking objects.

Share
twitter facebook
I didn't see garbage collection in his list (Score:3, Interesting)

by alispguru ( 72689 ) writes: <bob...bane@@@me...com> on Monday June 13, 2005 @11:16AM (#12802849) Journal

Those of you who are up on the current state of the art here, please help me out. I was under the impression that multiple threads and automatic storage management were still not on good terms with each other, and that this was a big unsolved problem.

Share
twitter facebook
Why the future of SMT is bleak (Score:5, Informative)

by spockvariant ( 881611 ) writes: on Monday June 13, 2005 @12:54PM (#12803750)

I'm a researcher working on high performance computing and have used various configurations of Simultaneous Multithreading (aka Hyperthreading aka CMT) (Intel Xeon, IBM POWER5). The result is always the same - at the end, memory latencies and OS overheads kill most of the gains of instruction level parallelism coming from SMT. Look at it this way - the typical latencies of operations on most modern processors are of the order of 1 nanosecond, whereas DRAM latencies are of the order of 200ns. As long as you can't do anything about this latency, there's no point in cutting down on processing times. There's a very nice paper in this year's ACM SIGMETRICS that gives real experimental data to illustrate this fact - http://www.cs.princeton.edu/~yruan/XeonSMT/smt.pdf [princeton.edu] The paper shows that the speedups obtained using SMT in practice are meagre. The reason that the simulation results coming from the original UWashington research on the subject - http://www.cs.washington.edu/research/smt/ [washington.edu] - looked far better was their use of unreasonably large caches in their simulations, and that they completely ignored the OS overhead of enabling SMT - which is non-negligeable - and is a thing that has been pointed out often on the Linux Kernel mailing list as well.

Share
twitter facebook
- Re:Why the future of SMT is bleak (Score:3, Insightful)
  
  by CTho9305 ( 264265 ) writes:
  
  The reason that the simulation results coming from the original UWashington research on the subject - http://www.cs.washington.edu/research/smt/ [washington.edu] - looked far better was their use of unreasonably large caches in their simulations, and that they completely ignored the OS overhead of enabling SMT - which is non-negligeable - and is a thing that has been pointed out often on the Linux Kernel mailing list as well.
  
  I didn't read most of the princeton paper... but you're arguing that caches need to be big to get
- Re:Windows Articles, Slashdot and Pragmatism (Score:2)
  
  by gabebear ( 251933 ) writes:
  
  Wow, I've been noticing some out of place posts on Slashdot for a couple days now but this one just proves Slashdot has a serious problem.
  
  I'm sure you didn't mean to but your post ended up showing up as the first post in an Article about CMT [slashdot.org]. What's really wierd is that it showed up after a bunch of other posts...
- Re:EPIC? (Score:3, Interesting)
  
  by HidingMyName ( 669183 ) writes:
  
  That is hard to say. EPIC is a very long instruction word architecture (VLIW) which supports up to 3 concurrent non-interfering instructions which requires static (compile time) scheduling, since the instructions must be in contiguous memory. Getting efficient scheduling is hard, since the complexity is pushed back on the compiler, which may need to do some serious code reordering. Additionally, EPIC was designed to support speculative execution, which has efficiency issues if the wrong prediction is made

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ready for CMT? Hell no! (Score:3, Funny)

Re:Ready for CMT? Hell no! (Score:2, Funny)

Re:Ready for CMT? Hell no! (Score:4, Funny)

Re:Ready for CMT? Hell no! (Score:3, Funny)

Schism Growing (Score:2, Insightful)

Re:Schism Growing (Score:2, Insightful)

how much for the best of both worlds? (Score:2)

Re:how much for the best of both worlds? (Score:5, Informative)

Re:Schism Growing (Score:4, Interesting)

Re:Schism Growing (Score:5, Interesting)

Re:Schism Growing (Score:2)

Re:Schism Growing (Score:5, Interesting)

Re:Schism Growing (Score:5, Interesting)

Re:Schism Growing (Score:3, Informative)

Don't worry (Score:2, Informative)

Re:Schism Growing (Score:2)

Re:Schism Growing (Score:2)

Re:Schism Growing (Score:3, Funny)

Re:Schism Growing (Score:2)

Re:Schism Growing (Score:2)

Re:Schism Growing (Score:2)

Niagara Myths (Score:5, Insightful)

Argh! (Score:3, Informative)

Shame (Score:4, Interesting)

Re:Shame (Score:4, Insightful)

Dude, you're gettin' a Cell! (Score:3, Funny)

Re:Niagara Myths (Score:4, Insightful)

Re:Niagara Myths (Score:3, Funny)

Re:Niagara Myths (Score:5, Funny)

Steam Engine - Diesel (Score:5, Insightful)

Re:Steam Engine - Diesel (Score:5, Insightful)

Re:Steam Engine - Diesel (Score:4, Informative)

Re:Steam Engine - Diesel (Score:2)

Re:Steam Engine - Diesel (Score:3, Insightful)

Re:Steam Engine - Diesel (Score:2, Insightful)

Re:Steam Engine - Diesel (Score:2)

WTF? (Score:5, Funny)

well at least he seems to understand the problems (Score:5, Interesting)

Re:well at least he seems to understand the proble (Score:2)

Re:well at least he seems to understand the proble (Score:2)

Re:well at least he seems to understand the proble (Score:2, Informative)

Re:well at least he seems to understand the proble (Score:2)

Re:well at least he seems to understand the proble (Score:2)

Re:well at least he seems to understand the proble (Score:2)

How is this different from having multiple cores? (Score:2, Offtopic)

Vader vs. Brooks? (Score:3, Funny)

Re:Vader vs. Brooks? (Score:2, Insightful)

One Weenie's Perspective (Score:5, Funny)

Not really an issue (Score:3, Insightful)

Question: What needs multiple threads? (Score:2, Interesting)

Re:Question: What needs multiple threads? (Score:5, Insightful)

Re:Question: What needs multiple threads? (Score:3, Informative)

Re:Question: What needs multiple threads? (Score:3, Insightful)

Re:Question: What needs multiple threads? (Score:3, Interesting)

Re:Question: What needs multiple threads? (Score:2)

Can use, not needs! (Score:2, Interesting)

Re:Question: What needs multiple threads? (Score:2, Informative)

Re:Question: What needs multiple threads? (Score:2)

Big Hairy Package (Score:5, Funny)

Screw CMT; Time to use wasted CPU (Score:2)

Re:Screw CMT; Time to use wasted CPU (Score:3, Interesting)

Programming isn't up to it (Score:5, Interesting)

Re:Programming isn't up to it (Score:2)

Re:Programming isn't up to it (Score:5, Interesting)

Re:Programming isn't up to it (Score:2)

Re:Programming isn't up to it (Score:3, Informative)

Re:Programming isn't up to it (Score:3, Informative)

Re:Programming isn't up to it (Score:3, Informative)

Re:Programming isn't up to it (Score:3, Informative)

Re:Programming isn't up to it (Score:2)

Re:Programming isn't up to it (Score:4, Interesting)

Re:Programming isn't up to it (Score:2)

Re:Programming isn't up to it (Score:4, Insightful)

OLTP systems (Score:3, Informative)

What a totally vague and useless post, yipee! (Score:2, Insightful)

You might want to go back to school... (Score:5, Insightful)

Re:What a totally vague and useless post, yipee! (Score:2)

We all are (Score:2)

Re:We all are (Score:2)

Re:We all are (Score:2)