Panic in Multicore Land 367

Posted by Zonk on Tuesday March 11, 2008 @07:30AM from the multi-cores-no-waiting dept.

MOBE2001 writes "There is widespread disagreement among experts on how best to design and program multicore processors, according to the EE Times. Some, like senior AMD fellow, Chuck Moore, believe that the industry should move to a new model based on a multiplicity of cores optimized for various tasks. Others disagree on the ground that heterogeneous processors would be too hard to program. The only emerging consensus seems to be that multicore computing is facing a major crisis. In a recent EE Times article titled 'Multicore puts screws to parallel-programming models', AMD's Chuck Moore is reported to have said that 'the industry is in a little bit of a panic about how to program multicore processors, especially heterogeneous ones.'"

This discussion has been archived. No new comments can be posted.

Panic in Multicore Land

Load All Comments

Search 367 Comments Log In/Create an Account

Comments Filter:

Panic? (Score:4, Insightful)

by jaavaaguru ( 261551 ) writes: on Tuesday March 11, 2008 @07:34AM (#22713926) Homepage

I think "panic" is a bit of an over-reaction. I use a multicore CPU. I write software that runs on it. I'm not panicking.

Share
twitter facebook
- Re: (Score:2)
  
  by dnoyeb ( 547705 ) writes:
  
  Is it April 1st already?
  
  We have been writing multi-threaded software for years. There is nothing special about multicore. Its basically a cut down version of a dual-CPU box. The only people that should have any concern at all would be the scheduler writers. And even then there is no cause for "panic".
  - Re:Panic? (Score:5, Insightful)
    
    by leenks ( 906881 ) writes: on Tuesday March 11, 2008 @07:57AM (#22714100)
    
    How is an 80-core cpu a cut down version of a dual-CPU box? This is the kind of technology the authors are discussing, not your Core2 duo MacBook...
    
    Parent Share
    twitter facebook
    - No problems for servers (Score:5, Insightful)
      
      by TheLink ( 130905 ) writes: on Tuesday March 11, 2008 @09:25AM (#22714978) Journal
      
      For servers the real problem is I/O. Disks are slow, network bandwidth is limited (if you solve that then memory bandwidth is limited ;) ).
      
      For most typical workloads most servers don't have enough I/O to keep 80 cores busy.
      
      If there's enough I/O there's no problem keeping all 80 cores busy.
      
      Imagine a slashdotted webserver with a database backend. If you have enough bandwidth and disk I/O, you'll have enough concurrent connections that those 80 cores will be more than busy enough ;).
      
      If you still have spare cores and mem, you can run a few virtual machines.
      
      As for desktops - you could just use Firefox without noscript, after a few days the machine will be using all 80 CPUs and memory just to show flash ads and other junk ;).
      
      Parent Share
      twitter facebook
    - Re: (Score:3, Interesting)
      
      by carlmenezes ( 204187 ) writes:
      
      I'd like to ask a few related questions from a developer's point of view :
      
      1) Is there a programming language that tries to make programming for multiple cores easier?
      2) Is programming for parallel cores the same as parallel programming?
      3) Is anybody aware of anything in this direction on the C++ front that does not rely on OS APIs?
      - Re: (Score:3, Informative)
        
        by leenks ( 906881 ) writes:
        
        Read http://view.eecs.berkeley.edu/wiki/The_Landscape_of_Parallel_Computing_Research:_A_View_From_Berkeley [berkeley.edu] (specifically the white paper linked from it)
      - Re: (Score:3, Informative)
        
        by Mad Merlin ( 837387 ) writes:
        
        I'd like to ask a few related questions from a developer's point of view :
        
        1) Is there a programming language that tries to make programming for multiple cores easier?
        2) Is programming for parallel cores the same as parallel programming?
        3) Is anybody aware of anything in this direction on the C++ front that does not rely on OS APIs?
        1) Yes.
        2) Maybe.
        3) Yes [openmp.org].
    - - Re:Panic? (Score:5, Informative)
        
        by Penguin Follower ( 576525 ) writes: <scrose1978@noSPAM.gmail.com> on Tuesday March 11, 2008 @09:56AM (#22715382) Journal
        
        Unless you're speaking of AMD SMP systems, the Intel systems up until recently share the FSB among all the CPUs. So from the Intel side of things, SMP vs multi-core is nearly the same (save for L2 cache sharing and whatnot). The only notable exception, on the Intel side, that I have noticed is that the recent Xeon systems (within like the last two years) seem to be using two "northbridges". For example, my "quad-core" Mac Pro tower that I bought in April of 2007. It has two dual-core Xeons and the motherboard has two northbridges (though Intel doesn't refer to their chipsets that way last I checked. They like to talk about "hubs".).
        
        Parent Share
        twitter facebook
  - Re:Panic? (Score:5, Informative)
    
    by coats ( 1068 ) writes: on Tuesday March 11, 2008 @10:57AM (#22716432) Homepage
    Ditto. And the principles are pretty generic; they haven't changed since a decade before a seminar I gave six years ago at EPA's Office of Research and Development [baronams.com] .
    And frankly, it helps a lot to write code that is microprocessor-friendly to begin with:
    
    Algorithms are important; that's where the biggest wins usually are.
    
    Memory is much slower than the processors, and is organized hierarchically.
    
    ALU's are superscalar and pipelined;
    
    Current processors can have as many as 100 instructions simultaneously executing in different stages of execution, so avoid data dependencies that break the pipeline.
    
    Parallel "gotchas" are at the bottom of this list...
    
    If the node-code is bad enough, it can make any parallelism look good to the user. But writing good node-code is hard;-( As a reviewer, I have recommended rejection for a parallel-processing paper that claimed 80% parallel efficiency on 16 processors for the author's air-quality model. But I knew of a well-coded equivalent model that outperformed the paper's 16-processor model-result on a single processor -- and still got 75% efficiency on 16 processors (better than 10x the paper-author's timing).
    fwiw.
    Parent Share
    twitter facebook
- Re:Panic? (Score:4, Insightful)
  
  by shitzu ( 931108 ) writes: on Tuesday March 11, 2008 @07:42AM (#22713986)
  
  Still, the fact remains that the x86 processors (due to the OS-s that run on them, actually) have not gone much faster in the last 5-7 years. The only thing that has shown serious progress is power consumption and heat dissipation. I mean - the speed the user experiences has not improved much.
  
  Parent Share
  twitter facebook
  - Re:Panic? (Score:5, Insightful)
    
    by Saurian_Overlord ( 983144 ) writes: on Tuesday March 11, 2008 @08:51AM (#22714598) Homepage
    
    "...the speed the user experiences has not improved much [in the last 5-7 years]."
    
    This may almost be true if you stay on the cutting edge, but not even close for the average user (or the power-user on a budget, like myself). 5 years ago I was running a 1.2 GHz Duron. Today I have a 2.3 GHz Athlon 64 in my notebook (which is a little over a year old, I think), and an Athlon 64 X2 5600+ (that's a dual-core 2.8 GHz, for those who don't know) in my desktop. I'd be lying if I said I didn't notice much difference between the three.
    
    Parent Share
    twitter facebook
    - Re:Panic? (Score:4, Insightful)
      
      by TemporalBeing ( 803363 ) writes: <bm_witness@yaho[ ]om ['o.c' in gap]> on Tuesday March 11, 2008 @10:01AM (#22715438) Homepage Journal
      
      "...the speed the user experiences has not improved much [in the last 5-7 years]."
      
      This may almost be true if you stay on the cutting edge, but not even close for the average user (or the power-user on a budget, like myself). 5 years ago I was running a 1.2 GHz Duron. Today I have a 2.3 GHz Athlon 64 in my notebook (which is a little over a year old, I think), and an Athlon 64 X2 5600+ (that's a dual-core 2.8 GHz, for those who don't know) in my desktop. I'd be lying if I said I didn't notice much difference between the three.
      
      Do notice that in 5 years we have barely increased the clock frequency of the CPUs
      
      Do notice that multi-cores don't increase the overall clock frequency, just divide the work up among a set of lower clock frequency cores - yet most programs don't take advantage of that. ;-)
      
      Do notice that despite clock frequencies going from 33 mhz to 2.3 GHz, the user's perceived performance of the computer has either stayed the same (most likely) or diminished over that same time period.
      
      Do notice that programs are more bloated than ever, and programmers are lazier than ever.
      ...
      In the end the GP is right.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Interesting)
        
        by TemporalBeing ( 803363 ) writes:
        
        Clock frequency is not an indicative of CPU performance. For example, the Core 2 chips, despite generally operating at a lower frequency than the Pentium 4's outperform them significantly.
        Each core would perform nearly the same as a similarly clocked P4, of course, optimizations in the instructions have changed since then too. But they would still perform similarly. Of course, comparing a P4 to a Core2 is like comparing Apples to Oranges as there are architecture changes across the whole chip that would c
        
        Re:Panic? (Score:4, Insightful)
        
        by TemporalBeing ( 803363 ) writes: <bm_witness@yaho[ ]om ['o.c' in gap]> on Tuesday March 11, 2008 @11:57AM (#22717646) Homepage Journal
        
        What do the OS's that run on them have to do with the processors' performance? Recent processors have had significant improvements in performance in the last 5-7 years, which makes the GP incorrect.
        Perhaps you missed my statement about the user's perceived performance. It is true, I grant you, that hardware performance has gotten better. But the user's perception of that performance has not - it's gone the opposite. Some of that is because programmer's rely on a single faster core to correct for their inept programming, lack of optimization, added abstraction layers, etc. However, that is no longer how processors function - they are now two slower processors working together.
        
        And yes, the OS can, and has been able to for years since SMP first came about, spread loads across multiple processors and cores. But that cannot change how a single program functions in and of itself - it cannot make that single program work at any given moment on more than one single core if it was not designed to do so (i.e. if the program is not designed to use multiple threads or processes).
        
        All-in-all, the OP is correct.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Funny)
        
        by JoelKatz ( 46478 ) writes:
        
        Bluntly, it doesn't sound like you have any idea what you're talking about, as nothing about what you said makes any sense at all. Why not stick to talking about thinks you understand? I'll just pick one example, but there are dozens:
        
        "But you're still left with the issue that the application may not be written to be thread safe - so now, your kernel does something (even if that is thread safe!) on a different core whilst the program continues on the original core and it has an adverse affect on the applicat
        
        Re: (Score:3, Informative)
        
        by default luser ( 529332 ) writes:
        
        Clock frequency is not an indicative of CPU performance. For example, the Core 2 chips, despite generally operating at a lower frequency than the Pentium 4's outperform them significantly.
        
        But massive instruction per clock improvements do not happen very often in the x86 chip industry. In fact, I can count all the major improvements for the last 15 years on one hand:
        
        1993: Intel Pentium Pro (approximately 2 INT, 2 FP operations per clock, best case) introduces real time instruction rescheduling to the x86 wo
- Re:Panic? (Score:5, Informative)
  
  by Cutie Pi ( 588366 ) writes: on Tuesday March 11, 2008 @08:03AM (#22714146)
  
  Yeah, but if you extrapolate to where things are going, we're going to have CPUs with dozens if not hundreds of cores on them. (See Intel's 80 core technology demo as an example of where their research is going). Can you write or use general purpose software that takes advantage of that many cores? Right now I expect there is a bit of panic because it's relatively easy to build these behemoths, but not so easy to use them efficiently. Outside of some specialized disciplines like computational science and finance (that have already been taking advantage of parallel computing for years), there won't be a big demand for uber-multicore CPUs if the programming models don't drastically improve. And those innovations need to happen now to be ready in time for CPUs of 5 years from now. Since no real breakthroughs have come however, the chip companies are smart to be rethinking their strategies.
  
  Parent Share
  twitter facebook
  - Multicores, but not on a chip (Score:5, Interesting)
    
    by Kim0 ( 106623 ) writes: on Tuesday March 11, 2008 @08:24AM (#22714302)
    
    This trend with multiple cores on the CPU is only an intermediate phase,
    because it over saturates the memory bus, which is easy to remedy by
    putting the cores on the memory chips, of which there are a number
    comparable to the number of cores.
    
    In other words, the CPUs will disappear, and there will be lots of smaller
    core/memory chips, connected in a network. And they will be cheaper as well,
    because they do not need so high a yeld.
    
    Kim0
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by richlv ( 778496 ) writes:
      
      and on a larger scale there's this wicked idea about plan9.
      as for parallel processing, i don't think it is feasible to be implemented in each app separately - more likely it would be built upon some higher level api, where app would simply tell "these things can run in parallel, this one should wait for that one to finish, and this one can start as soon as that one sends a particular signal".
      it would be somewhat more work, but something like that is being already implemente with kde4 and i expect it only to
      - Re: (Score:3, Informative)
        
        by richlv ( 778496 ) writes:
        
        well, yes, i believe the thingie taking care of that is even called "threadweaver" :)
        
        yep, here it is : http://api.kde.org/4.0-api/kdelibs-apidocs/threadweaver/html/index.html [kde.org]
    - Re:Multicores, but not on a chip (Score:5, Informative)
      
      by jo42 ( 227475 ) writes: on Tuesday March 11, 2008 @09:44AM (#22715214) Homepage
      
      lots of smaller core/memory chips, connected in a network
      You mean like the Transputer [wikipedia.org] back in the '80s?
      
      Parent Share
      twitter facebook
  - Re:Panic? (Score:5, Insightful)
    
    by johannesg ( 664142 ) writes: on Tuesday March 11, 2008 @09:56AM (#22715380)
    
    Let's not be too harsh on ourselves. In most systems today, the bottleneck is the hard disk, not the CPU. No amount of threading will rescue you if your memory has been swapped out.
    
    I write large and complex engineering applications. I have a few threads around, mostly for the purpose of doing calculation and dealing with slow devices. But I'm not going to add in more threads just because there are more cores for me to use. I'll add threads when performance issues requires that I add threads, and not before.
    
    Most software today runs fine as a single thread anyway. The specialized software that requires maximum CPU performance (and is not already bottle-necked by HD or GPU access) will be harder to write, but for everything else the current model is just fine.
    
    If anything, Intel should worry about 99% of all people simply not needing 80 cores to begin with...
    
    Parent Share
    twitter facebook
    - - Re: (Score:3, Informative)
        
        by johannesg ( 664142 ) writes:
        
        Ah, sorry: I didn't mean to imply that it is unnecessary for the applications of tomorrow. Where I work we also do those massive simulations mentioned by another poster, and we welcome _any_ number of cores (one thing we simulated was the ATV, mentioned a few days ago on slashdot. The simulator runs on two machines with a total of ten cores between them, and when we started the work, we were afraid our state of the art 1GHz CPU's (single core, at that time) might not be fast enough. Hahaha, it seems so quai
  - - Re: (Score:3, Insightful)
      
      by mollymoo ( 202721 ) * writes:
      
      The 386 could run existing 16-bit code faster than the processors it replaced, so there was a market for it despite the lack of 32-bit code. This is not the same situation; an 80-core processor won't run today's code any faster than an 8-core proccessor (assuming the cores are the same). Nobody will buy an 80-core processor till there is software which would benefit from it.
      - Re:Panic? (Score:4, Insightful)
        
        by cens0r ( 655208 ) writes: on Tuesday March 11, 2008 @11:52AM (#22717548) Homepage
        
        If the 80 core processor can run 10 virtual machines as fast as one machine on the 8 core processor, I would be interested.
        
        Parent Share
        twitter facebook
      - Re:Panic? (Score:4, Funny)
        
        by Sloppy ( 14984 ) writes: on Tuesday March 11, 2008 @06:07PM (#22722664) Homepage Journal
        
        Nobody will buy an 80-core processor till there is software which would benefit from it.
        Fortunately, we already have that software. It's "make" with the "-j 80" option. Intel just needs to run a "Get Gentoo Now!" advertising campaign and their hardware marketing problem is solved.
        
        Parent Share
        twitter facebook
    - Re: (Score:3, Insightful)
      
      by DarkOx ( 621550 ) writes:
      
      Its not the same as before though. In 1986 I could get something for my money buying a 386, even if there was no new software in my plans. You got speed. Moving your DOS bases accounting package from that PC-AT at 6mhz to a 368 running at 20mhz let you do your payroll cycle faster.
      
      Assuming clock rates don't increase much; and they have not been, and instruction sets don't improve much, and the have not been; then beyond 3-4 cores I don't get any kind of improvement in the desktop world. I don't even see
      - Re: (Score:3, Funny)
        
        by bdjacobson ( 1094909 ) writes:
        
        Might take a look at Gentoo again with 80 cores. I'd be done compiling in just 2 days!
- Re:Panic? (Score:5, Insightful)
  
  by Chrisq ( 894406 ) writes: on Tuesday March 11, 2008 @08:04AM (#22714148)
  
  Yes panic is strong, but the issue is not with multi-tasking operating systems assigning processes to different processors for execution. That works very well. The problem is when you have a single CPU-intensive task, and you want to split that over multiple processors. That, in general, is a difficult problem. Various solutions, such as functional programming, threads with spawns and waits, etc. have been proposed, but none are as easy as just using a simple procedural language.
  
  Parent Share
  twitter facebook
  - Re:Panic? (Score:5, Insightful)
    
    by ObsessiveMathsFreak ( 773371 ) writes: <obsessivemathsfreak@nOSPam.eircom.net> on Tuesday March 11, 2008 @08:47AM (#22714552) Homepage Journal
    
    That works very well. The problem is when you have a single CPU-intensive task, and you want to split that over multiple processors. That, in general, is a difficult problem.
    
    It is in general, an impossible problem.
    
    Most existing code is imperative. Most programmers write in imperative programming languages. Object orientation does not change this. Imperative code is not suited for multiple CPU implementation. Stapling things together with threads and messaging does not change this.
    
    You could say that we should move to other programming "paradigms". However in my opinion, the reason we use imperative programs so such is because most of the tasks we want accomplished are inherently imperative in nature. Outside of intensive numerical work, most tasks people want done on a computer are done sequentially. The availability of multiple cores is not going to change the need for these tasks to be done in that way.
    
    However, what multiple cores might do is enable previously impractical tasks to be done on modest PCs. Things like NP problems, optimizations, simulations. Of course these things are already being done, but not on the same scale as things like, say, spreadsheets, video/sound/picture editing, gaming, blogging, etc. I'm talking about relatively ordinary people being able to do things that now require supercomputers, experimenting and creating on their own laptops. Multi core programs can be written to make this feasible.
    
    Considering I'm beginning to sound like an evangelist, I'll stop now. Safe money says PCs stay at 8 CPUs or below for the next 15 years.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by GreatBunzinni ( 642500 ) writes:
      
      However, what multiple cores might do is enable previously impractical tasks to be done on modest PCs. Things like NP problems, optimizations, simulations. Of course these things are already being done, but not on the same scale as things like, say, spreadsheets, video/sound/picture editing, gaming, blogging, etc. I'm talking about relatively ordinary people being able to do things that now require supercomputers, experimenting and creating on their own laptops. Multi core programs can be written to make t
      - Re:Panic? (Score:4, Insightful)
        
        by mollymoo ( 202721 ) * writes: on Tuesday March 11, 2008 @09:51AM (#22715302) Journal
        
        No matter how easy they make knitting I'm never going to do it, because I don't want to knit my own clothes. I just want ones which look good and work. No matter how easy you make programming most people just aren't going to do it, because they don't want to write their own programs. They just want programs that work.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by TuringTest ( 533084 ) writes:
        
        Ah, but they DO want their tedious tasks automated. If you provide users with a way to automate their tasks without them writing a whole program, just by learning what they do often [wikipedia.org], they will program the machine without knowing.
    - Re: (Score:2)
      
      by Chrisq ( 894406 ) writes:
      
      Impossible might be too strong. I don't think anyone has proved that you can't take a program written in a normal procedural language and somehow transform it to run on multiple processors. Its just that nobody has any idea of how it could be done. The fact that a skilled programmer may be able to look at a process and identify isolated components that can run in parallel means that some day a computer may be able to do the same.
    - Re:Panic? (Score:5, Insightful)
      
      by Alsee ( 515537 ) writes: on Tuesday March 11, 2008 @11:16AM (#22716806) Homepage
      
      spreadsheets, video/sound/picture editing, gaming, blogging
      
      Odd selection of examples. The processing of cells can almost trivially be allocated across 80 cores. Media work can almost trivially be split into chunks across 80 cores. Games usually relatively easy to split, either by splitiing the graphics into chunks or parallelizable physics or other parallelizable simulation aspects.
      
      Oh, and blogging.
      My optical mouse has enough processing horsepower inside for blogging.
      
      OPTICAL MOUSE CIRCUITRY:
      Has the user pressed a key?
      No.
      Has the user pressed a key?
      No.
      Has the user pressed a key?
      No.
      (repeat 1000 times)
      Has the user pressed a key?
      No.
      Has the user pressed a key?
      No.
      Has the user pressed a key?
      Yes.
      OOOO! YES!
      QUICK QUICK QUICK! HURRY HURRY HURRY! PROCESS A KEYPRESS! YIPEE!
      
      -
      
      Parent Share
      twitter facebook
- Re:Panic? (Score:5, Funny)
  
  by divisionbyzero ( 300681 ) writes: on Tuesday March 11, 2008 @08:47AM (#22714554)
  
  Developers aren't panicking. Their kernels are! Ha! Oh, that was a good one. Where's my coffee?
  
  Parent Share
  twitter facebook
Self Interest (Score:3, Informative)

by quarrel ( 194077 ) writes: on Tuesday March 11, 2008 @07:35AM (#22713936)

AMD's Chuck Moore presumably has a lot of self interest in pushing heterogeneous cores. They are combining ATI+AMD cores on a single die and selling the benefits in a range of environments including scientific computing etc.

So take it all with a grain of salt

--Q

Share
twitter facebook
- Re:Self Interest (Score:5, Informative)
  
  by The_Angry_Canadian ( 1156097 ) writes: on Tuesday March 11, 2008 @07:47AM (#22714032)
  
  The article covers many point of views. Not only the one from Chuck Moore.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Insightful)
  
  by davecb ( 6526 ) * writes:
  
  If he's saying that his multicore processors are going to be hard to program, then self-interest suggests he be very very quiet (;-))
  Seriously, though, adding what used to be a video board to the CPU doesn't change the programming model. I suspect he's more interested in debating future issues with more tightly coupled processors.
  --dave
- Re: (Score:2)
  
  by xouumalperxe ( 815707 ) writes:
  
  Sure, I'll take it with a grain of salt. But he does have Moore as a surname, and the other guy pretty much nailed it. :)
- Re: (Score:2, Informative)
  
  by Hanners1979 ( 959741 ) writes:
  
  AMD's Chuck Moore presumably has a lot of self interest in pushing heterogeneous cores. They are combining ATI+AMD cores on a single die...
  
  It's worth noting that Intel will also be going down this route in a similar timeframe, integrating an Intel graphics processor onto the CPU die.
Should Mimick The Brain (Score:5, Interesting)

by curmudgeon99 ( 1040054 ) writes: on Tuesday March 11, 2008 @07:39AM (#22713962)

Well, the most recent research into how the cortext works has some interesting leads on this. If we first assume that the human brain has a pretty interesting organization, then we should try to emulate it.

Recall that the human brain receives a series of pattern streams from each of the senses. These patterns streams are in turn processed in the most global sense--discovering outlines, for example--in the v1 area of the cortext, which receives a steady stream of patterns over time from the senses. Then, having established the broadest outlines of a pattern, the v1 cortext layer passes its assessment of what it saw the outline of to the next higher cortex layer, v2. Notice that v1 does not pass the raw pattern it receives up to v2. Rather, it passes its interpretation of that pattern to v2. Then, v2 makes a slightly more global assessment, saying that the outline it received from v1 is not only a face but a face of a man it recognizes. Then, that information is sent up to v4 and ultimate to the IT cortex layer.

The point here is important. One layer of the cortex is devoted to some range of discovery. Then, after it has assigned some rudimentary meaning to the image, it passes it up the cortex where a slightly finer assignment of meaning is applied.

The takeaway is this: each cortex does not just do more of the same thing. Instead, it does a refinement of the level below it. This type of hierarchical processing is how multicore processors should be built.

Share
twitter facebook
- Re:Should Mimick The Brain (Score:5, Funny)
  
  by El_Muerte_TDS ( 592157 ) writes: on Tuesday March 11, 2008 @08:00AM (#22714124) Homepage
  
  If we first assume that the human brain has a pretty interesting organization, then we should try to emulate it.
  
  I think it's pretty obvious there are serious design flaws in the human brain. And I'm not only talking about stability, but also reliability and accuracy.
  Just look at the world.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by doublebackslash ( 702979 ) writes:
  
  Yeah, okay, that is all well and good, but not everything can be arbitrarily broken down into parallel tasks.
  Take your example. Imagine v1 takes 2s of CPU time and cannot be split into smaller pieces to be processed. However v2 takes 25s and cannot be broken up into parallel tasks. v4 will execute slightly sooner because parts of v2 started processing slightly sooner than if there were no parallelism between v1 and v2, but the speedup is minimal since the large wait is on v2.
  
  Take that down to a smaller leve
  - Re: (Score:2)
    
    by curmudgeon99 ( 1040054 ) writes:
    
    I think you missed my point. In the brain v1 focuses on a broad task. v2 focuses on a finer task. v4 on still a finer task. The results of the work done by v1 are sent in summary form to v2. v2 also sends its summary up to v4. Likewise, the upper levels will send down their summary to the lower area to help focus. So, it is not to the point of discrete processes working on pieces of the same issue. Rather, each cortex layer focuses on a qualitatively different task.
- Re: (Score:3, Interesting)
  
  by radtea ( 464814 ) writes:
  
  Models from nature are rarely the best way to go. Heavier than air flight only got off the ground when people stopped looking to birds and bats for inspiration. Wheeled vehicles have no resemblance to horses. Interestingly, we are still trying to understand the nuanced details of the flight of birds based on the aerodynamics we learned building highly un-bird-like flying machines.
  
  So while there's nothing wrong in looking at our radically imperfect understanding of the brain, which is in no better state t
Let's see the menu (Score:4, Interesting)

by Tribbin ( 565963 ) writes: on Tuesday March 11, 2008 @07:39AM (#22713966) Homepage

Can I have... errr... Two floating point, one generic math with extra cache and two RISC's.

Share
twitter facebook
- Re:Let's see the menu (Score:5, Funny)
  
  by imikem ( 767509 ) writes: on Tuesday March 11, 2008 @08:34AM (#22714406) Homepage
  
  Would you like fries with that?
  
  Parent Share
  twitter facebook
OpenMP? (Score:2, Informative)

by derrida ( 918536 ) writes:

It is portable, scalable, standardized and supports many languages.
Languages (Score:2, Informative)

by PsiCollapse ( 809801 ) writes:

That's why it's so important that languages begin to adopt threading primitives and immutable data structures. Java does a good job. Newer languages, like Clojure are built from the ground up with concurrency in mind.
- Re:Languages (Score:5, Informative)
  
  by chudnall ( 514856 ) writes: on Tuesday March 11, 2008 @08:35AM (#22714428) Homepage Journal
  
  *cough*Erlang [wikipedia.org]*cough*
  
  I think the wailing we're about to hear is the sound of thousands of imperative-language programmers being dragged, kicking and screaming, into functional programming land. Even the functional languages not specifically designed for concurrency do it much more naturally than their imperative counterparts.
  
  Parent Share
  twitter facebook
  - Re:Languages (Score:4, Interesting)
    
    by TheRaven64 ( 641858 ) writes: on Tuesday March 11, 2008 @09:12AM (#22714824) Journal
    
    For good parallel programming you just need to enforce one constraint:
    Every object (in the general sense, not necessarily the OO sense) may be either aliased or mutable, but not both.
    Erlang does this by making sure no objects are mutable. This route favours the compiler writer (since it's easy) and not the programmer. I am a huge fan of the CSP model for large projects, but I'd rather keep something closer to the OO model in the local scope and use something like CSP in the global scope (which is exactly what I am doing with my current research).
    
    Parent Share
    twitter facebook
  - Re: (Score:3, Informative)
    
    by matsh ( 30900 ) writes:
    
    *cough*Scala [wikipedia.org] *cough*
    
    An object oriented AND functional language built on top of the JVM, with seamless interoperability with Java, immutable collections and an Actors framework pretty much the same as in Erlang. I think it is the solution to the multi-core problem.
- Re:Languages (Score:5, Informative)
  
  by Westley ( 99238 ) writes: on Tuesday March 11, 2008 @08:47AM (#22714560) Homepage
  
  Java doesn't do a good job. It does a "better than abysmal" job in that it has some idea of threading with synchronized/volatile, and it has a well-defined memory model. (That's not to say there aren't flaws, however. Allowing synchronization on any reference was a mistake, IMO.)
  
  What it *doesn't* do is make it easy to write verifiably immutable types, and code in a functional way where appropriate. As another respondent has mentioned, functional languages have great advantages when it comes to concurrency. However, I think the languages of the future will be a hybrid - making imperative-style code easy where that's appropriate, and functional-style code easy where that's appropriate.
  
  C# 3 goes some of the way towards this, but leaves something to be desired when it comes to assistance with immutability. It also doesn't help that that .NET 2.0 memory model is poorly documented (the most reliable resources are blog posts, bizarrely enough - note that the .NET 2.0 model is significantly stronger than the ECMA CLI model).
  
  APIs are important too - the ParallelExtensions framework should help .NET programmers significantly when it arrives, assuming it actually gets used. Of course, for other platforms there are other APIs - I'd expect them to keep leapfrogging each other in terms of capability.
  
  I don't think C# 3 (or even 4) is going to be the last word in bringing understandable and reliable concurrency, but I think it points to a potential way forward.
  
  The trouble is that concurrency is hard, unless you live in a completely side-effect free world. We can make it simpler to some extent by providing better primitives. We can encourage side-effect free programming in frameworks, and provide language smarts to help too. I'd be surprised if we ever manage to make it genuinely easy though.
  
  Parent Share
  twitter facebook
Comment removed (Score:5, Informative)

by account_deleted ( 4530225 ) writes: on Tuesday March 11, 2008 @07:42AM (#22713980)

Comment removed based on user account deletion

Share
twitter facebook
- Re:Not *that* Chuck Moore (Score:5, Funny)
  
  by Hal_Porter ( 817932 ) writes: on Tuesday March 11, 2008 @07:52AM (#22714064)
  
  Those +1 Informative links go to wikipedia [wikipedia.org], an online encyclopedia.
  
  Parent Share
  twitter facebook
What about... (Score:2, Informative)

by aurb ( 674003 ) writes:

...functional programming languages? Or flow programming?
Should Mimic DNA/cell process. (Score:2)

by cabazorro ( 601004 ) writes:

Sounds wasteful, I know (data replication everywhere). But there is a reason for that. The process becomes resilient to unexpected changes (corruption). The bus is the enzymes, the cpu is the cell and thread of execution is, well, the DNA. The replication and communication process is autonomous.
The future is here (Score:5, Insightful)

by downix ( 84795 ) writes: on Tuesday March 11, 2008 @07:48AM (#22714038) Homepage

What Mr Moore is saying does have a grain of truth, that generic will be beaten by specific in key functions. The Amiga proved that in 1985, being able to deliver a better graphical solution than workstations costing tens of thousands more. The key now is to figure out which specifics you can use without driving up the cost nor without compromizing the design ideal of a general purpose computer.

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by funkboy ( 71672 ) writes:
  
  The Amiga proved that in 1985, being able to deliver a better graphical solution than workstations costing tens of thousands more. The key now is to
  figure out which specifics you can use without driving up the cost nor without compromizing the design ideal of a general purpose computer.
  
  The key now is figuring out what to do with your Amiga now that no one writes applications for it anymore.
  
  I suggest NetBSD :-)
My heterogeneous experience with Cell processor (Score:5, Interesting)

by DoofusOfDeath ( 636671 ) writes: on Tuesday March 11, 2008 @08:01AM (#22714130)

I've been doing some scientific computing on the Cell lately, and heterogeneous cores don't make life very easy. At least with the Cell.

The Cell has one PowerPC core ("PPU"), which is a general purpose PowerPC processor. Nothing exotic at all about programming it. But then you have 6 (for the Playstation 3) or 8 (other computers) "SPE" cores that you can program. Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL). They also have poor speed for double-precision floats.

The SPEs are pretty fast, and they have a very fast interconnect bus, so as a programmer I'm constantly thinking about how to take better advantage of them. Perhaps this is something I'd face with any architecture, but the high potential combined with difficult constraints of SPE programming make this an especially distracting aspect of programming the Cell.

So if this is what heterogeneous-cores programming means, I'd probably prefer the homogeneous version. Even if they have a little less performance potential, it would be nice to have a 90%-shorter learning curve to target the architecture.

Share
twitter facebook
- Re:My heterogeneous experience with Cell processor (Score:5, Interesting)
  
  by nycguy ( 892403 ) writes: on Tuesday March 11, 2008 @08:16AM (#22714222)
  
  I agree. While a proper library/framework can help abstract the difficulties associated with a heterogeneous/asymetric architecture away, it's just easier to program for a homogeneous environment. This same principle applies all the way down to having general-purpose registers in a RISC chip as opposed to special-purpose registers in a CISC chip--the latter may let you do a few specialized things better, but the former is more accomodating for a wide range of tasks.
  And while the Cell architecture is a fairly stationary target because it was incorporated into a commercial gaming console, if these types of architectures were to find their way into general purpose computing, it would be a real nightmare, since every year or so a new variant of the architecture would come out that would introduce a faster interconnect here, more cache memory there, etc., so that one might have to reorganize the division of labor in one's application to take advantage (again a properly parameterized library/framework can handle this sometimes, but only post facto--after the variation in features is known, not before the new features have even been introduced).
  
  Parent Share
  twitter facebook
  - Re:My heterogeneous experience with Cell processor (Score:4, Insightful)
    
    by neomunk ( 913773 ) writes: on Tuesday March 11, 2008 @09:48AM (#22715272)
    
    Heterogeneous cores are already in almost every PC I've seen so far this millennium. Anyone with a GPU is running heterogeneous cores in their machine. How do we handle it? The first half of your second sentence; libraries and frameworks. OpenGL, DirectX and whatnot provide the frameworks we need while the various manufacturers provide the drivers to maintain compatibility with the various APIs. We'll see soon enough (as a result of the Cell) if the same thing (2 or more different libraries for the same processor; one for each of it's core-types) becomes the norm for other heterogeneous core system. I think so, but it may be overlooked by manufacturers who want to view a processor as a unit instead of a compilation of various units. They'll figure it out, these guys aren't MBAs, they're the truly educated. :-D
    
    Parent Share
    twitter facebook
- Re:My heterogeneous experience with Cell processor (Score:5, Interesting)
  
  by epine ( 68316 ) writes: on Tuesday March 11, 2008 @09:01AM (#22714712)
  
  So if this is what heterogeneous-cores programming means, I'd probably prefer the homogeneous version.
  Your points are valid as things stand, but isn't it a bit premature to make this judgment? Cell was a fairly radical design departure. If IBM continues to refine Cell, and as more experience is gained, the challenge will likely diminish.
  
  For one thing, IBM will likely add double precision floating point support. But note that SIMD in general poses problems in the traditional handling of floating point exceptions, so it still won't be quite the same as double precision on the PPU.
  
  The local-memory SPE design alleviates a lot of pressure on the memory coherence front. Enforcing coherence in silicon generates a lot of heat, and heat determines your ultimate performance envelop.
  
  For decades, programmers have been fortunate in making our own lives simpler by foisting tough problems onto the silicon. It wasn't a problem until the hardware ran into the thermal wall. No more free lunch. Someone has to pay on one side or the other. IBM recognized this new reality when they designed Cell.
  
  The reason why x86 never died the thousand deaths predicted by the RISC camp is that heat never much mattered. Not enough registers? Just add OOO. Generates a bit more heat to track all the instructions in flight, but no real loss in performance. Bizarre instruction encoding? Just add big complicated decoders and pre-decoding caches. Generates more heat, but again performance can be maintained.
  
  Probably with a software architecture combining the hairy parts of the Postgres query execution planner with the recent improvements in the FreeBSD affinity-centric ULE scheduler, you could make the nastier aspects of SPE coordination disappear. It might help if the SPUs had 512KB instead of 256KB to alleviate code pressure on data space.
  
  I think the big problem is the culture of software development. Most code functions the same way most programmers begin their careers: just dive into the code, specify requirements later. What I mean here is that programs don't typically announce the structure of the full computation ahead of time. Usually the code goes to the CPU "do this, now do that, now do this again, etc." I imagine the modern graphics pipelines spell out longer sequences of operations ahead of time, by necessity, but I've never looked into this.
  
  Database programmers wanting good performance from SQL *are* forced to spell things out more fully in advance of firing off the computation. It doesn't go nearly far enough. Instead of figuring out the best SQL statement, the programmer should send a list of *all* logically equivalent queries and just let the database execute the one it finds least troublesome. Problem: sometimes the database engine doesn't know that you have written the query to do things the hard way to avoid hitting a contentious resource that would greatly impact the performance limiting path.
  
  These are all problems in the area of making OSes and applications more introspective, so that resource scheduling can be better automated behind the scenes, by all those extra cores with nothing better to do.
  
  Instead, we make the architecture homogeneous, so that resource planning makes no real difference, and we can thereby sidestep the introspection problem altogether.
  
  I've always wondered why no-one has ever designed a file system where all the unused space is used to duplicate other disk sectors/blocks, to create the option of vastly faster seek plans. Probably because it would take a full-time SPU to constantly recompute the seek plan as old requests are completed and new requests enter the queue. Plus if two supposedly identical copies managed to diverge, it would be a nightmare to debug, because the copy you get back would non-deterministic. Hybrid MRAM/Flash/spindle storage systems could get very interesting.
  
  I guess I've been looking forward to the end of artificial scaling for a long time (clock freq. as the
  Read the rest of this comment...
  
  Parent Share
  twitter facebook
- Re:My heterogeneous experience with Cell processor (Score:4, Insightful)
  
  by TheRaven64 ( 641858 ) writes: on Tuesday March 11, 2008 @09:19AM (#22714888) Journal
  
  Well, part of your problem is that you're using a language which is a bunch of horrible syntactic sugar on top of a language designed for programming a PDP-8 on an architecture that looks nothing like a PDP-8.
  You're not the only person using heterogeneous cores, however. In fact, the Cell is a minority. Most people have a general purpose core, a parallel stream processing core that they use for graphics and an increasing number have another core for cryptographic functions. If you've ever done any programming for mobile devices, you'll know that they have been using even more heterogeneous cores for a long time because they give better power usage.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Interesting)
  
  by Karellen ( 104380 ) writes:
  
  "you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL)"
  
  OK, I have to ask - why on earth can't you use C++ exceptions on them?
  
  After all, what is an exception? It's basically syntactic sugar around setjmp()/longjmp(), but with a bit more code to make sure the stack unwinds properly and destructors are called, instead of longjmp() being a plain non-local goto.
  
  What else is there that makes C++ exceptions unimplemenatable?
- Re: (Score:3, Informative)
  
  by ufnoise ( 732845 ) writes:
  
  The Cell has one PowerPC core ("PPU"), which is a general purpose PowerPC processor. Nothing exotic at all about programming it. But then you have 6 (for the Playstation 3) or 8 (other computers) "SPE" cores that you can program. Transferring data to/from them is a pain, they have small working memories (256k each), and you can't use all C++ features on them (no C++ exceptions, thus can't use most of the STL). They also have poor speed for double-precision floats.
  
  I find the most useful parts of the STL, don
- - Re: (Score:2)
    
    by DoofusOfDeath ( 636671 ) writes:
    
    The horrors! How are the teams at Microsoft going to fit bloat in them, then!?!
    
    Actually, it's been a good exercise to have to work under those constraints. I found that a tight environment like that forced me to carefully reconsider the design of my code and my algorithm. It probably lead to an implementation that not only had fewer lines of code, but was also more readable, than the original version.
Well, I'm panicked... (Score:5, Interesting)

by argent ( 18001 ) writes: <peter@slashdot.2 ... a r o n g a.com> on Tuesday March 11, 2008 @08:02AM (#22714134) Homepage Journal

The idea of having to use Microsoft APIs to program future computers because the vendors only document how to get DirectX to work doesn't exactly thrill me. I think panic is perhaps too strong a word, but sheesh...

Share
twitter facebook
There's only three approaches (Score:2)

by gilesjuk ( 604902 ) writes:

1. Change operating systems to be able to use the all the available CPU power even when running single threaded applications.

2. Change programming languages to make multicore programming easier.

3. Both 1 and 2.

What the end user should be able to dictate however is how many cores should be in use. It's not for the programmer of the application to dictate how processing of any data should occur.
- Re: (Score:2)
  
  by slashbart ( 316113 ) writes:
  
  >> 1. Change operating systems to be able to use the all the available CPU power even when running single threaded applications.
  
  So how should the operating system be able to figure out what program flow dependencies there are in a binary? You can make an O.S that schedules your single threaded application so that it uses 100% of 1 core, but automatically multithreading a single threaded application, no way, not now, and not for the foreseeable future.
  - Re: (Score:2)
    
    by TheRaven64 ( 641858 ) writes:
    
    It's easier with heterogeneous multicore. Your single-threaded game happily makes use of two cores (the CPU and the GPU). Your single-threaded server happily makes use of two cores (your CPU and your crypto coprocessor). The functionality of the extra cores is exposed in both cases via a library (OpenGL or OpenSSL). If you want to design a fast multicore system then profile your existing workloads and see which libraries are using most of the CPU. Then add a core that implements their functionality in
  - Re: (Score:2)
    
    by MadKeithV ( 102058 ) writes:
    
    Actually things like the .NET and Java runtime, combined with JIT compiling and optimization, could do exactly that.
    It could recompile a reasonably abstract definition of a program into exactly the kind of code that your current system needs, on-demand
he is right, but it depends on the application (Score:5, Interesting)

by CBravo ( 35450 ) writes: on Tuesday March 11, 2008 @08:13AM (#22714200)

As I demonstrated in my thesis [tudelft.nl] a parallel application can be shown to have certain critical and less critical parts. An optimal processing platform matches those requirements. The remainder of the platform will remain idle and burn away power for nothing. One should wonder what is better: a 2 GHz processor or 2x 1 GHz processors. My opinion is that, if it has no impact on performance, the latter is better.

There is an advantage to a symmetrical platform: you cannot misschedule your processes. It does not matter which processor takes a certain job. On a heterogeneous system you can make serious errors: scheduling your video process on your communications processor will not be efficient. Not only is the video slow, the communications process has to wait a long time (impacting comm. performance).

Share
twitter facebook
- +1 Optimistic (Score:4, Funny)
  
  by Sapphon ( 214287 ) writes: on Tuesday March 11, 2008 @09:03AM (#22714736) Journal
  
  The height of optimism: posting proof in the form of a 70-odd page thesis on a Slashdot.
  I don't think we'll be Slashdotting your server any time soon, CBravo ;-)
  
  Parent Share
  twitter facebook
Multithreading is not easy but it's doable (Score:5, Interesting)

by pieterh ( 196118 ) writes: on Tuesday March 11, 2008 @08:20AM (#22714260) Homepage

It's been clear for many years that individual core speeds had peaked, and that the future was going to be many cores and that high-performance software would need to be multithreaded in order to take advantage of this.

When we wrote the OpenAMQ messaging software [openamq.org] in 2005-6, we used a multithreading design that lets us pump around 100,000 500-byte messages per second through a server. This was for the AMQP project [amqp.org].

Today, we're making a new design - ØMQ [zeromq.org], aka "Fastest. Messaging. Ever." - that is built from the ground up to take advantage of multiple cores. We don't need special programming languages, we use C++. The key is architecture, and especially an architecture that reduces the cost of inter-thread synchronization.

From one of the ØMQ whitepapers [zeromq.org]:
Inter-thread synchronisation is slow. If the code is local to a thread (and doesn't use slow devices like network or persistent storage), execution time of most functions is tens of nanoseconds. However, when inter-thread synchronisation - even a non-blocking synchronisation - kicks in, execution time grows by hundreds of nanoseconds, or even surpasses one microsecond. All kind of time-expensive hardware-level stuff has to be done... synchronisation of CPU caches, memory barriers etc.

The best of the breed solution would run in a single thread and omit any inter-thread synchronisation altogether. It seems simple enough to implement except that single-threaded solution wouldn't be able to use more than one CPU core, i.e. it won't scale on multicore boxes.

A good multi-core solution would be to run as many instances of ØMQ as there are cores on the host and treat them as separate network nodes in the same way as two instances running on two separate boxes would be treated and use local sockets to pass messages between the instances.

This design is basically correct, however, the sockets are not the best way to pass message within a single box. Firstly, they are slow when compared to simple inter-thread communication mechanisms and secondly, data passed via a socket to a different process has to be physically copied, rather than passed by reference.

Therefore, ØMQ allows you to create a fixed number of threads at the startup to handle the work. The "fixed" part is deliberate and integral part of the design. There are a fixed number of cores on any box and there's no point in having more threads than there are cores on the box. In fact, more threads than cores can be harmful to performance as they can introduce excessive OS context switching.

We don't get linear scaling on multiple cores, partly because the data is pumped out onto a single network interface, but we're able to saturate a 10Gb network. BTW ØMQ is GPLd so you can look at the code if you want to know how we do it.

Share
twitter facebook
- - Re: (Score:2)
    
    by pieterh ( 196118 ) writes:
    
    Good question. The answer is "no, not as far as we're aware"; the patent covers the distribution of transactions across network nodes, invisibly to applications, and is specifically aimed as implementing GUIs. From the patent, "The invention disclosed broadly relates to graphical user interfaces (GUI's) and particularly relates to the software architectures used to implement them."
    
    However, all software patents have the problem of "creep", so that if a market emerges that looks within reach of the claims,
- - Re:Multithreading is not easy but it's doable (Score:4, Interesting)
    
    by pieterh ( 196118 ) writes: on Tuesday March 11, 2008 @10:45AM (#22716206) Homepage
    
    It's unfair to compare blob messaging with a protocol that has to process XML, but let's look. I'm using http://www.ejabberd.im/benchmark [ejabberd.im] as a basis:
    
    - eJabberd latency is in the 10-50msec range. 0MQ gets latencies of around 25 microseconds.
    - eJabberd supports more than 10k users. 0MQ will support more than 10k users.
    - eJabberd scales transparently thanks to Erlang. 0MQ squeezes so much out of one box that scaling is less important.
    - eJabberd has high-availability thanks to Erlang 0MQ will have to build its own HA model (as OpenAMQ did).
    - eJabberd can process (unknown?) messages per second. 0MQ can handle 100k per second on one core.
    
    Sorry if I got some things wrong, ideally we'd run side-by-side tests to get figures that we can properly compare.
    
    Note that protocols like AMQP can be elegantly scaled at the semantic level, by building federations that route messages usefully between centers of activity. This cannot be done in the language or framework, it is dependent on the protocol semantics. This is how very large deployments of OpenAMQ work. I guess the same as SMTP networks.
    
    0MQ will, BTW, speak XMPP one day. It's more a framework for arbitrary messaging engines and clients, than a specific protocol implementation.
    
    I've seen Erlang used for AMQP as well - RabbitMQ - and by all accounts it's an impressive language for this kind of work.
    
    Parent Share
    twitter facebook
Why choose? (Score:2, Insightful)

by Evro ( 18923 ) writes:

Just build both and let the market decide.
Heterogenous is a natural thing to do (Score:4, Interesting)

by A beautiful mind ( 821714 ) writes: on Tuesday March 11, 2008 @08:21AM (#22714268)

If you have 80 or more cores, I'd rather have 20 of them support specialty functions and be able to do them very fast (it would have to be a few (1-3) orders of magnitude faster than the general counterpart) and the rest do general processing. This of course needs the support of operating systems, but that isn't very hard to get. With 80 cores caching and threading models have to be rethought, especially caching - the operating system has to be more involved in caching than it currently is, because otherwise cache coherency won't be able to be done.

This also means that programs will need to be written not just by using threads, "which makes it okay for multi-core", but with cpu cache issues and locality in mind. I think VMs like JVM, Parrot and .NET will be much more popular as it is possible for them to take care a lot of these issues, which isn't or only possible in a limited way for languages like C and friends with static source code inspection.

Share
twitter facebook
CPU != BRAIN (Score:2)

by v(*_*)vvvv ( 233078 ) writes:

There is this view held by some (of which some are posting here) that somehow CPUs are primitive brains and that improving them will eventually result in a non-primitive brain. Hello, there is nothing remotely human about what my computer has done for me lately. Computers and humans *do* very different things, and *are* very different things.

I beg that the distinction between acquiring hints from brain structure vs creating brain structure not be blurred, and that no moderator marks "brains are like this so
- Re: (Score:2)
  
  by curmudgeon99 ( 1040054 ) writes:
  
  Please do not pooh-pooh our ideas, unless YOU HAVE A BETTER ONE. Please correct me if I'm wrong but I see modern computers only coming close to simulating on the most rudimentary level the functions of the LEFT hemisphere. No one has attempted to replicate the right hemisphere's function. So, I'm waiting for your better idea...
Specialisation is inevitable (Score:3, Insightful)

by adamkennedy ( 121032 ) writes: <adamk@c[ ].org ['pan' in gap]> on Tuesday March 11, 2008 @08:26AM (#22714318) Homepage

I have a 4-core workstation and ALREADY I get crap usage rates out of it.

Flick the CPU monitor to aggregate usage rate mode, and I rarely clear 35% usage, and I've never seem it higher than about 55% (and even that for only a second or two once an hour). A normal PC, even fairly heavily loaded up with apps, just can't use the extra power.

And since cores aren't going to get much faster, there's no real chance of getting big wins there either.

Unless you have a specialized workload (heavy number crunching, kernel compilation, etc) there's going to simply be no point having more parallelism.

So as far as I can tell, for general loads it seems to be inevitable that if we want more straight line speed, we'll need to start making hardware more attuned for specific tasks.

So in my 16-core workstation of the future, if my Photoshop needs to apply some relatively intensive transform that has to be applied linearly, it can run off to the vector core, while I'm playing Supreme Commander on one generic core (the game) two GPU cores (the two screens) and three integer-heavy cores (for the 3 enemy AIs), and the generic System Reserved Core (for interrupts, and low-level IO stuff) hums away underneath with no pressure.

Hetrogeny also has economics on it's side.

There's very little point having specialized cores when you've only got two.

Once there's no longer scarcity in quantity, you can achieve higher productivity by specialization.

Really, any specialized core that you can keep the CPU usage rates running higher than the overall system usage rate, is a net win in productivity for the overall computer. And over time, anything that increases productivity wins.

Share
twitter facebook
- Re: (Score:2)
  
  by makapuf ( 412290 ) writes:
  
  There's very little point having specialized cores when you've only got two.
  Like, say, a CPU and a GPU ? I would have thought it was pretty efficient.
  
  I think it all breaks down to how much specialized-but-still-generic-being-computationnaly-intensive tasks we define and then implement in hardware.
  
  And, finally, it's the same specialized vs generic hardware wheel of reincarnation (see http://www.catb.org/~esr/jargon/html/W/wheel-of-reincarnation.html [catb.org])
- - Re: (Score:3, Interesting)
    
    by adamkennedy ( 121032 ) writes:
    
    Two is totally doable. I can fill two (or the equivalent of two) of my four cores.
    
    Trouble is, filling four cores is quite a bit more iffy.
Brain (Score:2)

by slashflood ( 697891 ) writes:

Take an advise from mother nature: as far as I know, our brain works like a heterogeneous multicore processor. We don't have multiple generic mini-brains in our head, we have one brain with highly specialized brain areas for different tasks. Seems to be the right concept for a computer processor.
- Re: (Score:2)
  
  by Anne Thwacks ( 531696 ) writes:
  
  as far as I know, our brain works like a heterogeneous multicore processor
  Then your brain needs an upgrade.
  The brain has a (virtual) single serial processor, and a great bundle of "neural networks" which are essentially procedures built from hardware. (Kind of like early mainframes had a circuit board per instruction, and then gated the results of the selected instruction onto the bus.)
  The self-modifying neural network architecture is interesting, but not to people who want to buy reliable computing eng
Occam and Beyond (Score:3, Insightful)

by BrendaEM ( 871664 ) writes: on Tuesday March 11, 2008 @08:33AM (#22714404) Homepage

Perhaps, panic is a little strong. At the same time, programing languages such as Occam, that are built from the ground up seem very provocative now. Perhaps Occam's syntax could modified to a Python-type syntax for a more popularity.

[Although, personally, I prefer Occam's syntax over that of C's.]

http://en.wikipedia.org/wiki/Occam_programming_language [wikipedia.org]

I think that a tread aware programming language would be good in our multi-core world.

Share
twitter facebook
Help me understand the distinction (Score:3, Interesting)

by Junior J. Junior III ( 192702 ) writes: on Tuesday March 11, 2008 @08:35AM (#22714418) Homepage

I'm curious how having specialized multi-core processors is different from having a single-core processor with specialized subunits. Ie, a single core x86 chip has a section of it devoted to implementing MMC, SSE, etc. Isn't having many specialized cores just a sophisticated way of re-stating that you have a really big single-core processor, in some sense?

Share
twitter facebook
- Re: (Score:2)
  
  by photon317 ( 208409 ) writes:
  
  The difference is that the subunits are instructed on what to do via a single procedural stream of instructions from the compiler's point of view. The CPU does some work to reorder and parallelize the instruction stream to try to keep all the subunits busy if it can, but it doesn't always do a great job, and the compiler also knows the rules for how a given CPU does the re-ordering/parallelization and tries to optimize the stream to better the outcome. This scheduling is taking place at a very low level w
I like this, more complexity - better jobs! (Score:2)

by slashbart ( 316113 ) writes:

The way I see it, to get max. performance out of these chips, you need a deeper understanding of them, i.e. it requireshigher skills, i.e. better quality jobs, better money, the works. Consider the fact that a lot of programmers have a really hard time dealing with concurrency at a thread level, these coming chips will only make it harder.
I don't think most concurrency problems can be automated away, it's the concepts and implementation of the concurrent algoritms that are hard, not so much the implementat
better idea (Score:3, Funny)

by timster ( 32400 ) writes: on Tuesday March 11, 2008 @08:43AM (#22714504)

See, the thing to do with all these cores is run a physics simulation. Physics can be easily distributed to multiple cores by the principle of locality. Then insert into your physics simulation a CPU -- something simple like a 68k perhaps. Once you have the CPU simulation going, adjust the laws of physics in your simulation (increase the speed of light to 100c, etc) so that you can overclock your simulated 68k to 100Ghz. Your single-threaded app will scream on that.

P.S.: I know why this is impossible, so please don't flame me.

Share
twitter facebook
How is heterogenous CPU different to separate GPU? (Score:2)

by tomalpha ( 746163 ) * writes:

Genuine question that I don't know the answer to:

How are heterogeneous CPU cores different conceptually to a modern PC system with say:

2 x General purpose cores (in the CPU)
100 x Vector cores (in the GPU)
n x Vector cores (in a physics offload PCI card)

How is moving the vector (or whatever) cores onto the CPU die different to the above setup, apart from allowing for faster interconnects?
Current state of software development (Score:5, Funny)

by Alex Belits ( 437 ) * writes: on Tuesday March 11, 2008 @08:55AM (#22714650) Homepage

Ugg is smart.
Ugg can program a CPU.
Two Uggs can program two CPUs.
Two Uggs working on the same task program two CPUs.
Uggs' program has a race condition.
Ugg1 thinks, it's Ugg2's fault.
Ugg2 thinks, it's Ugg1's fault.
Ugg1 hits Ugg2 on the head with a rock.
Ugg2 hits Ugg1 on the head with an axe.
Ugg1 is half as smart as he was before working with Ugg2.
Ugg2 is half as smart as he was before working with Ugg1.
Both Uggs now write broken code.
Uggs' program is now slow, wrong half the time, and crashes on that race condition once in a while.
Ugg does not like parallel computing.
Ugg will bang two rocks together really fast.
Ugg will reach 4GHz.
Ugg will teach everyone how to reach 4GHz.

Share
twitter facebook
Invention? (Score:2)

by SharpFang ( 651121 ) writes:

Some, like senior AMD fellow, Chuck Moore, believe that the industry should move to a new model based on a multiplicity of cores optimized for various tasks

And let's give the cores names like Paula, Agnus, Denise...
One Fast Core, Multiple Commodity ones (Score:3, Interesting)

by Brit_in_the_USA ( 936704 ) writes: on Tuesday March 11, 2008 @09:21AM (#22714922)

I have read many times that some algorithms are difficult or impossible to multi-thread. I envisage the next logical step is a two socket motherboard, where one socket could be used for a 8+ core cpu running at low clock rate (e.g. 2-3Ghz) and another socket for a single core running at the greatest frequency achievable to the manufacturing process (e.g. x2 to x4 the clock speed of the multi-core) with whatever cache size compromises are required.

This help get around yield issues of getting all cores to work at a very high frequency and the related thermal issues . This could be a boon to general purpose computer that have a mix of hard to multi-thread and easy to multi-thread programs - assuming the OS could be intelligent on which cores the tasks are scheduled on. The cores could or could not have the same instruction sets, but having the same instruction sets would be the easy first step.

Share
twitter facebook
8087 - Been There Done That (Score:3, Insightful)

by Nom du Keyboard ( 633989 ) writes: on Tuesday March 11, 2008 @11:53AM (#22717580)

Others disagree on the ground that heterogeneous processors would be too hard to program.

Been there, done that, already. The 8087 and its 80x87 follow-on co-processors were exactly that. Specialized processors for specific tasks. Guess what? We managed to use them just fine a mere 27 years ago. DSP's have come along since and been used as well. Graphic card GPU's are specialized co-processors for graphic intensive functions, and we talk to them just fine. They're already on the chipsets, and soon to be on the processor dies. I don't think this is anything new, or anything that programming can't handle.

Share
twitter facebook
Hetereogeneous is the key word! (Score:3, Interesting)

by Terje Mathisen ( 128806 ) writes: on Tuesday March 11, 2008 @12:47PM (#22718454)

It has been quite obvious to several people in the usenet news:comp.arch [comp.arch] newsgroup that the future should give us chips that contain multiple cores with different capabilites:

As long as all these cores share the same basic architecture (i.e. x86, Power, ARM), it would be possible to allow all general-purpose code to run on any core, while some tasks would be able to ask for a core with special capabilites, or the OS could simply detect (by trapping) that a given task was using a non-uniform resource like vector fp, mark it for the scheduler, and restart it on a core with the required resource.

An OS interrupt handler could run better on a short pipeline in-order core, a graphics driver could use something like Larrabee, while SPECfp (or anything else that needs maximum performance from a single thread would run best on an Out-of-Order core like the current Core 2.

The first requirement is that Intel/AMD must develop the capability to test & verify multiple different cores on the same chip, the second that Microsoft must improve their OS scheduler to the point where it actually understands NUMA principles not just for memory but also cpu cores. (I have no doubt at all that Linux and *BSD will have such a scheduler available well before the time your & I can buy a computer with such a cpu in it!)

So why do I believe that such cpus are inevitable?

Power efficiency!

A relatively simple in-order core like the one that Intel just announced as Atom delivers maybe an order of magnitude better performance/watt than a high-end Core 2 Duo. With 16 or 80 or 256 cores on a single chip, this will become really crucial.

Terje

PS As other posters have noted, keeping tomorrow's multi-core chips fed will require a lot of bandwith, this is neither free nor low-power. :-(

Share
twitter facebook
How to use so many cpu's (Score:4, Insightful)

by John Sokol ( 109591 ) writes: on Tuesday March 11, 2008 @02:03PM (#22719658) Homepage Journal

Back in 2000 I realized that 50 Million transistors of 4004 the first processor ever created, would out perform a P4 with the same transistor count done in the same fab running at the same clock rates. it would be over 10x faster I work out. But how to use such a device?
I had been working with a 100 PC cluster of P4 based systems to do H.264 HDTV compression in realtime. I spread the compression function across the cluster using each system to work on a small part of the problem and flow the data across the CPU's.

Based on this I wanted to build an array of processors on one chip, but I am not a silicon person, just software, driver and some basic electronics. So I looked at various FPGA cores, Arm, MIPS, etc. Then I went to a talk giving by Chuck Moore, author of the language FORTH. He had been building his own CPU's for many years using his own custom tools.

I worked with Chuck Moore for about a year in 2001/2002 on creating a massive multi core processor based on Chucks stack processor.

The Idea was instead of having 1,2 or 4 large processor to have 49 (7 * 7) small light but fast processors in one chip. This would be for tacking a different set of problems then your classic cpus'. It wouldn't be for running and OS or word processing, but for Multimedia, and cryptography, and other mathematic problems.

The idea was to flow data across the array of processors.
Each processor would run at 6Ghz, with 64K word of Ram each.
21 Bit wide words and bus (based off of F21 processor)
this allows for 4x 5bit instructions on a stack processor that only has 32 instructions.
Since it's a stack processor they run more efficiently. So in 16K transistors, 4000 gates,
the F21 at 500 Mhz performed about the same as a 500Mhz 486 with JPEG compress and decompress.
With the parallel core design instead of a common bus or network between the processors there would only be 4 connections into and out of each processor. These would be 4 registers that are shared with it's 4 neighboring processors that are laid out in a grid. So each chip would have a north, south, east and west register.

Data would be processed in whats called a systolic array, where each core would pick up some data, perform operations on it and pass it along to the next core.

The chips with a 7x7 grid of processors would expose the 28(4x7) bus lines off the edge processors, so that these could be tiled into a much larger grid of processors.

Each chip could perform around 117 Billion instructions per second at 1 Watt of power.

Unfortunately I was unable to raise money, partly because I couldn't' get any commitment from Chuck.

below is some links and other misc information on this project. Sorry it's not better organized.
This was my project.

---------
http://www.enumera.com/chip/ [enumera.com]
http://www.enumera.com/doc/Enumeradraft061003.htm [enumera.com]
http://www.enumera.com/doc/analysis_of_Music_Copyright.html [enumera.com]
http://www.enumera.com/doc/emtalk.ppt [enumera.com]

--------
This was Jeff foxes independent web site, he work on the F21 with Chuck.

http://www.ultratechnology.com/ml0.htm [ultratechnology.com]

http://www.ultratechnology.com/f21.html#f21 [ultratechnology.com]
http://www.ultratechnology.com/store.htm#stamp [ultratechnology.com]

http://www.ultratechnology.com/cowboys.html#cm [ultratechnology.com]

------
http://www.colorforth.com/ [colorforth.com] 25x Multicomputer Chip

Chucks site. 25x has been pulled down, but it's accessible on archive.org.
http://web.archive.org/web/*/www.colorfo [archive.org]
Read the rest of this comment...

Share
twitter facebook
A problem that isn't getting solved anytime soon (Score:3, Insightful)

by pcause ( 209643 ) writes: on Tuesday March 11, 2008 @05:12PM (#22722132)

The issue of the lack of progress in creating tools to simplify multithreaded programming has been a topic of discussion for well over a decade. Most programmers just don't make much use of multithreading. They take advantage of multithreading because their Web server and database support it and the Web server runs each request in a separate thread. Even then, some activity is complex and is usually not further parallelized. Operating systems programmers and some realtime programmers tend to be good a multithreading and parallel programming, but this is a small minority of programmers. Heck, look st Rails, one of the most popular Web frameworks - it isn't thread safe!

Look at most people's screens. Even if they have multiple programs running, they tend to have the one they are working on full screen. Studies have shown that people who multitask are less efficient than people who do one job at a time. Perhaps we are not educated to look at problems as solvable in a parallel fashion or perhaps there is some other human based problem. Maybe like many other skills, being able to think and program in a multithreaded fashion is a talent that only a small fraction of the population has.

This "panic" isn't going away and there is NO quick fix on the programming horizon. The hardware designers can stuff more cores in the box, but programmers won't keep up. what can consume the extra CPU power are things like speech recognition, hand writing and gesture recognition and rich media. Each of the can run in its 1-4 cores and help us serial humans interact with those powerful computers more easily.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Panic? (Score:4, Insightful)

Re: (Score:2)

Re:Panic? (Score:5, Insightful)

No problems for servers (Score:5, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re:Panic? (Score:5, Informative)

Re:Panic? (Score:5, Informative)

Re:Panic? (Score:4, Insightful)

Re:Panic? (Score:5, Insightful)

Re:Panic? (Score:4, Insightful)

Re: (Score:3, Interesting)

Re:Panic? (Score:4, Insightful)

Re: (Score:3, Funny)

Re: (Score:3, Informative)

Re:Panic? (Score:5, Informative)

Multicores, but not on a chip (Score:5, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

Re:Multicores, but not on a chip (Score:5, Informative)

Re:Panic? (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re:Panic? (Score:4, Insightful)

Re:Panic? (Score:4, Funny)

Re: (Score:3, Insightful)

Re: (Score:3, Funny)

Re:Panic? (Score:5, Insightful)

Re:Panic? (Score:5, Insightful)

Re: (Score:2)

Re:Panic? (Score:4, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re:Panic? (Score:5, Insightful)

Re:Panic? (Score:5, Funny)

Self Interest (Score:3, Informative)

Re:Self Interest (Score:5, Informative)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2, Informative)

Should Mimick The Brain (Score:5, Interesting)

Re:Should Mimick The Brain (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Let's see the menu (Score:4, Interesting)

Re:Let's see the menu (Score:5, Funny)

OpenMP? (Score:2, Informative)

Languages (Score:2, Informative)

Re:Languages (Score:5, Informative)

Re:Languages (Score:4, Interesting)

Re: (Score:3, Informative)

Re:Languages (Score:5, Informative)

Comment removed (Score:5, Informative)

Re:Not *that* Chuck Moore (Score:5, Funny)

What about... (Score:2, Informative)

Should Mimic DNA/cell process. (Score:2)

The future is here (Score:5, Insightful)

Re: (Score:2, Insightful)

My heterogeneous experience with Cell processor (Score:5, Interesting)

Re:My heterogeneous experience with Cell processor (Score:5, Interesting)

Re:My heterogeneous experience with Cell processor (Score:4, Insightful)

Re:My heterogeneous experience with Cell processor (Score:5, Interesting)

Re:My heterogeneous experience with Cell processor (Score:4, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Well, I'm panicked... (Score:5, Interesting)

There's only three approaches (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

he is right, but it depends on the application (Score:5, Interesting)

+1 Optimistic (Score:4, Funny)

Multithreading is not easy but it's doable (Score:5, Interesting)

Re: (Score:2)

Re:Multithreading is not easy but it's doable (Score:4, Interesting)

Why choose? (Score:2, Insightful)

Heterogenous is a natural thing to do (Score:4, Interesting)

Re:Not that Chuck Moore (Score:5, Funny)