Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Inside Intel's $20M Multicore Research Program

Posted by Zonk on Thu Apr 03, 2008 02:32 PM
from the that-is-a-lot-of-cores dept.
An anonymous reader writes "You may have heard about Intel's and Microsoft's efforts to finally get multi-core programming into gear so that there actually will be a developer who can program all those fancy new multicore processors, which may have dozens of core on one chip within a few years. TG Daily has an interesting article about the project, written by one of the researchers. It looks like there is a lot of excitement around the opportunity to create a new generation of development tools. Let's hope that we will soon see software that can exploit those 16+core babies. 'The problem of multi-core programming is staring at us right now. I am not sure what Intel's and Microsoft's expectations are, but it is quite possible that they are in fact looking at fundamental results from the academic centers to leverage their large work force to polish and realize the ideas that come forth. It calls for a much closer collaboration between the centers and the companies than it appears at first sight.'"
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • It's easy (Score:5, Funny)

    by Anonymous Coward on Thursday April 03 2008, @02:35PM (#22955272)
    ./configure --num-cores=16
  • by OrangeTide (124937) on Thursday April 03 2008, @02:37PM (#22955296) Homepage Journal
    The thing is, most PCs have plenty of computing power as a single core system. The hard sell is getting people to upgrade those machines mainly used for email and browsing and video playback. I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware. Unless the next version of Windows requires a core dedicated to the OS or something in the future.
    • by pla (258480) on Thursday April 03 2008, @02:55PM (#22955524) Journal
      Unless the next version of Windows requires a core dedicated to the OS or something in the future.

      So, uh, you haven't Vista yet, I see...
    • The software currently in use does not involve computationally complex problems, and so the computers appear to have "plenty of computational power." This is likely to be the case for a very long time, but there are useful but complex tasks computers might do. For example, a computer that might interact with its user purely by voice -- more advanced voice and language recognition systems are likely to require significantly more cores and computational power than is currently in wide use. Even more advanc
      • by peragrin (659227) on Thursday April 03 2008, @03:15PM (#22955758)
        Yes but voice processing is done best by dedicated hardware rather than generic. would a voice chip that can do that processing and only that processing be far more efficient? Call it the VPU, it can go next to the GPU, PPU. or it can be one of the 8 cores surrounding a cell processor. The trend that generic processors can do everything will end. maybe a plug and pray architecture where you can pick which cores you want installed on your system.

      • You can wonder whether it is sane to control your computer by interfaces which chew the bulk of your available computing power. But I think that when such system enter the market, they will have one or two dedicated cores for them (though, while facial expression may seem complex, language recognition interpretation really is a lot harder) and leave the rest of the cores alone.
      • Re: (Score:3, Informative)

        mpeg4 decompression is far more complex than voice recognition. The processing involved is simply not that great, even for "more advanced voice and language recognition". The difficulty lies in better algorithms to do it. Turns out dynamic voice control and interpretation is not something that can be brute forced.

        Game physics needs computational power. but I'm not considering game systems.

        Scientific and Engineering projects need computational power and benefit from cost reduction in high performance process
    • by KillerCow (213458) on Thursday April 03 2008, @03:09PM (#22955674)

      The thing is, most PCs have plenty of computing power as a single core system


      And 640k ought to be enough for anyone.

      I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware.


      My last purchase (6 to 8 months ago) was a "low-end" machine. I chose carefully to make sure that it was low-end and not bargain-basement. It has two cores. I don't think it's even possible to buy a single core machine through mainstream channels anymore. Today's low-end (multi-core) is more than adequate for most users to use over the next few (read: four) years.

      Unless the next version of Windows requires a core dedicated to the OS or something in the future.


      You do not understand how the scheduler works.
      • by OrangeTide (124937) on Thursday April 03 2008, @04:08PM (#22956600) Homepage Journal

        And 640k ought to be enough for anyone.
        funny you quoted my response to that issue immediately after: "I think as time moves on and quad core becomes the "low-end" you will see less demand for higher end hardware."

        I don't think it's even possible to buy a single core machine through mainstream channels anymore.
        Conroe-L's are still shipping. And Intel has a single core ultra low power chip on the horizon designed to compete with ARM. Your phone, pda, heart monitor, etc won't be symmetric multiprocessor any time soon.

        You do not understand how the scheduler works.
        xbox 360 already works this way. three cores. 2 for the game, 1 for the OS.

        As a professional kernel developer, I realize that locking cores into specific tasks is a lot easier than writing a general purpose scheduler that performs equivalently.
    • I wish I could count the number of times I've heard variations of this. I think the first time I heard it was when Intel released the 80387. Didn't seem to be accurate then either.

      Pretty soon social networking will include 1080p video mail or 50 megapixel photos of Jr. or there will be another DOOM II or something like that golf game that had every executive upgrading their Windows 95 'business' computers. Or perhaps the latest 4x1080p 3D media encoder will have us all wanting something faster.

      But that's
    • The thing is, most PCs have plenty of computing power as a single core system. . . .
      Rather than multi core technology resulting in elegant new software to take advantage of it, I suspect that software will get worse (think loop until done, rather than schedule an interrupt). Faster processors have not made software, better rather they have resulted in an abundance of bad software!
    • I think you're absolutely right in the short to medium term. We'll need another technology revolution in order for more than 1-2 cores to be really beneficial to the average home user. Outside the web/db server market there's not a lot of use that isn't somewhat fringe. Of course, the web/db server market is huge.
      • Re: (Score:3, Insightful)

        It's a valid point.... if the 'speed' of cars increased at the rate it did during the beginning of the century, we'd be driving 400mph cars around.
        We are certainly capable of making cars that are that fast, but they wouldn't really be any more useful or provide more utility than a slower car.
        • That's just untrue. The only issue with not having faster cars is safety. Turning a hour each way commute to an eighth of the time would definitely change the world. That would give a very large number of people an extra 1:45 hours a day. That's 35:00 daylight hours of extra free time. To me, that would be WAY more useful.
        • Re: (Score:2, Insightful)

          No production car can even approach 400mph. Not even close. You'd be doing very well to spend half a million on a "production car" that can crack 400kmph.

          That leaves out the questions of range (you'd be lucky to get three miles to the gallon), and, you know, being able to actually maneuver on the public roads at that speed.

          You're nuts.

          -Peter
  • In the new octagon (8-way processors), a battle of the ages, Crapware vs AntiCrapware

    Most of the new cores are being used to isolate crapware and anticrapware in a Battle Royal.

    And it looks like Crapware is going to win in a submission tapout at the current rate.
  • I am not sure what Intel's and Microsoft's expectations are, but it is quite possible that they are in fact looking at fundamental results from the academic centers to leverage their large work force to polish and realize the ideas that come forth.
    Maybe my brain needs a new compiler. This must be a multi-core sentence.
  • Multicore Programs (Score:4, Insightful)

    by Ironsides (739422) on Thursday April 03 2008, @02:42PM (#22955376) Homepage Journal
    Software that will exploit 16+ cores already exists. The problem is, it is not consumer (home/office) software. There does not yet exist an application that people use that really needs multiple cores. Video encoding is getting there, but most people will never use it.
      • I think by "that people use", he probably means normal applications one would use and interact with (as opposed to mesh-network distributed computing applications, which clearly fit in a very different problem domain). And in that light, the OP is, I think, absolutely correct.
      • People need to stop thinking that 'I don't have a program that uses 16 cores (16 real threads), so I don't need a 16 core system).'

        On a desktop PC, the IO system is going to be the source of contention a far more often than the processor(s). How often do most people run several CPU bound tasks simultaneously on a desktop anyway? Extremely rarely.

        Imagine splitting the CPU cycles of 1 core for all these tasks, and sharing them fairly, against splitting the cycles of 2..4..16 cores.

        If the CPUs you currently have aren't being heavily utilized, then having more of them isn't going to give you any perceptible improvements. This is really a matter of scaling horizontally as opposed to vertically, and they both suit entirely different workloads. The average workload of

  • I hope they do better at getting useful coding tools into the hands of home coders than GPU manufacturers have to utilise the parallel programmable nature of modern GPU's.
  • It should not be very hard... The algorithm begs for multi-threading — once you divide your array, you apply the same algorithm to the two parts, recursively. The parts can be sorted in parallel — this has a potential for huge performance gains implications in database servers (... ORDER BY ...), etc.

    Anyone?

  • Sun? (Score:4, Funny)

    by Anne Thwacks (531696) on Thursday April 03 2008, @02:44PM (#22955398)
    Of course some of you will know that Sun have had 8/16/32 cores for quite a while, and that Solars, *BSD, and probably even Linux support this stuff just fine.

    Its only you peasants that persist in using old-hat Wintel stuff that are so last-year. Get with it people! You too could be runningNetBSD on your toaster (it will probably out perform Windows Vista in a 4-core Pentium anyway). Hell it might even eat Nandos peri-peri Vista for breakfast!

    • Re: (Score:3, Informative)

      Of course some of you will know that Sun have had 8/16/32 cores for quite a while, and that Solars, *BSD, and probably even Linux support this stuff just fine.

      The NT kernel has supported SMP for 10 years. So what?

      It's all about the applications. Sure, there's some development tools in *nix for multicore. I doubt they are efficient and accessible though. Can y'all tell me how great GCC is with 16 cores and thread level parallelism? I'm sure some academic and or low level solutions exist everywhere. However,

  • by stratjakt (596332) on Thursday April 03 2008, @02:54PM (#22955514) Journal
    SMT processors of this type are only useful for accelerating a certain type of problem set, and useless for most general computing.

    We've had SIMD multicore PC's forever, and they're useless as desktops. I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle. It's been like that for more than a decade, it'll be like that until MIMD hits the street with a whole new paradigm of programming languages behind it - a handful of C compiler #pragma directives from intel isn't going to make this work.

    It's not simply a matter of "coders don't know how to do it." It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.

    Eg; writing a game engine with a video thread, audio thread and an input thread still leaves 13 cores idle. You really cant thread those much farther (the ridiculously parallel problem of rendering is handled by the GPU).

    Simply starting processes on different procs doesn't help all that much, since they all fight over memory and I/O time. The point of diminishing returns is reached fairly quickly.

    But hey, if all you do is run Folding@home so you can compare your e-cock with the other kids on hardextremeoverclockermegahackers.com, well I have some good news!

    As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.

    Savain says it best at rebelscience.org: "Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."
    • "Eg; writing a game engine with a video thread, audio thread and an input thread still leaves 13 cores idle. You really cant thread those much farther (the ridiculously parallel problem of rendering is handled by the GPU)."

      Woopsie. I think you presume that games don't need more processing before the GPU so much.

      What if you could thread out, and preprocess the video? We don't know, cause it's not yet practical. The tools to write that software don't exist.

      Actually, if we get enough cores as CPU, when do w
    • by everphilski (877346) on Thursday April 03 2008, @03:58PM (#22956496) Journal
      a handful of C compiler #pragma directives from intel isn't going to make this work.

      That's OpenMP, and depending on the program, it can work wonders. In an hour I parallelized 90% of a finite element CFD code with it. Yes, it sucks for fine-grained parallelization.

      Intel's product is Threaded Building Blocks, and is not built around pragmas, and is both commercial and OSS. It's pretty slick and will let you do the more fine-grained optimizations.

      It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.

      Not entirely true, it's just useful for problems that need a processor.

      I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle.

      ... obviously, you have more processor than you need. I, on the other hand, have a quad core Opteron that is currently over 350% utilization. I tank it almost 24/7.

      the ridiculously parallel problem of rendering is handled by the GPU

      Not for long. Raytracing is making a comeback.

      As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.

      If you can't even tank one core of your Xenon, it's doubtful.

      "Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."

      I'd caveat that by saying "threading arbitrary program X is a pain in the ass." There are plenty of useful programs that are easily parallelized.
    • Re: (Score:3, Interesting)

      The desktop PC should be idle most of the time. User input is really slow and in general the machine is waiting on the user, not the other way around. However, ask yourself who's time is more valuable, the machine you bought for $1,500 that lasts 3 years (at least, that's hardware update cycle around my work), or the person you pay $150,000 over a similar time frame? (give or take on location, entry-level position) Pay 10% more ($150) for the computer to save the person 0.1% ($150) of their time? That's an
  • by Anonymous Coward on Thursday April 03 2008, @02:55PM (#22955526)
    The structure of VHDL is inherently parallel as all processes (blocks of hardware) run at the same time. Only the code within the processes is evaluated sequentially (in most cases).

    Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?
    • Re: (Score:2, Interesting)

      Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?

      Excellent suggestion. This is precisely what the COSA software model is about. A pulsed neural network is my preferred metaphor for an ideal model of parallel computing. Intel and the others are on the verge of losing billions of dollars because they are already deeply committed to the hard to program multithreading model, a complete failure even after decades of resea
    • Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?

      Sure. In fact, VHDL is based (closely) on Ada, which allows pretty similar things. The relevant differences are less between the languages than how they're used. Ada that was written in the same style as most VHDL would have a high level of parallelism as well.

      That's rarely done though, because designing hardware in VHDL (or Verilog, etc.) is expensive, large

  • We all already have networks of servers all running in parallel. Multi core processing is simply squashing the network onto a little bit of silicon.
     
  • Didn't we already see this one? Intel (this time AMD also) develops radical new processor arch that will be insanely great once a quantum leap in developer tools is made to utilize it.

    Itanic crashed, burned and sank against the rocks of the compiler tech not being able to keep up. I see it happening again.

    Yes we will find ways to make a quad core system stay busy enough to sell em to corporate desktops and home users. Hell, you can assign one to the virus/crapware scanner. Waste another or two doing ev
    • Itanic crashed, burned and sank against the rocks of the compiler tech not being able to keep up

      There is a fundamental flaw in the itanic design philosophy that no compiler will ever be able to make up for. There are some optimisations that have to be done at run time. They can't be done at compile time. itanic was conceived out of 1970's supercomputer research before out-of-order, speculative execution, dynamic branch prediction RISC processors had been invented.

      There was in itanic fore-runner back in t

  • by MarkEst1973 (769601) on Thursday April 03 2008, @03:11PM (#22955702)

    Forget software not being written for multi-cores, the entire infrastructure around the computer needs to "go wide" for massive parallelism, not just the software. This includes disk, memory, front-side bus, etc./p>

    I'm doing highly concurrent projects (grid computing) for my company and we're finding that some things parallelize just fine, but others simply move the pain and bottleneck to a piece of infrastructure that hasn't quite caught up yet.

    For example, my laptop has a dual-core 2.2Ghz processor, which you'd think is great for development. It's no better than a single CPU machine because my disk IO light is on all the time. IntelliJ pounds the disk. Maven and Ant pound the disk. Outlook pounds the disk. Even surfing the web puts pages into disk cache, so browsing while building a project is slow. Until I get a SCSI drive, you're still limited on disk IO, so those extra cores don't help that much.

    All the cores are great on the server, though. I've recently completed a massive integration project where I grid-enabled my company's enterprise apps. All those cores running grid nodes is giving us very high throughput. Our next bottleneck is the database (all those extra grid nodes pounding away at another bottleneck resource...)

    Terracotta Server as a Message Bus [markturansky.com]. It's been a very interesting project.

    • Outlook I can understand. It needs to flush the emails to disk before replying back to the server.

      However, there's no reason why the web browser needs to ensure that the data hits the disk cache right away, so it should be just fine sitting in RAM until the disk frees up. Similarly, intellij, maven, and ant should be slow the first time but faster later on since they should be reading from the page cache.

      There's no reason for your disk I/O light to be on unless you don't have enough RAM or the disk algori
  • The solution is right in front of our faces. If you use virtualization then you can easily make use of a 16 core system. I can have IIS, Exchange, a Linux Apache Server, and a Terminal Server all on the same physical machine.
    • This is exactly what Intel is trying to avoid. They want people to do more computing, not consolidate their purchases. The need to sell more PC's with more CPU's. They want people to find uses for the extra computing power. In business, there are plenty of uses [ampl.com] for that power. If people's applications are using just one core at at time, the consolidation effect will occur and Intel's sales will plummet. If applications make good use of the multiple cores, the applications themselves become more usef
    • Man I hope this was a tongue-in-cheek post. Virtualization, used in this manner, is precisely equivalent to scheduling multiple processes across cores, only you also get the virtualization overhead. It's most definitely not a solution to the problem Intel is trying to solve (making it easy for developers to write individual pieces of software who's problems can be naturally broken down and distributed across multiple cores).
  • Scala is a JVM based language that has good features for working well with multiple cores (Actors, immutable collections, functional language, etc), so why not sponsor it?

    Mats
  • ...for Linux, Mac and Windows [intel.com] supporting multicore and also cluster [intel.com] architectures.
    Obviously it would be better if these worked better and were easier to use, but many people are unaware of the tools that are available right now.
  • Seriously, Folks, who can do anything for a mere $20M today, let alone change the entire programming paradigm of the last 65 years?
  • by EEPROMS (889169) on Thursday April 03 2008, @05:28PM (#22957536)
    I've got a dual core machine sitting on the desk before me and the cpu rarely goes above 20% load. The strange thing though is it is still slow when loading programs and this is due to the hardisk (SATA II) being the bottle_neck on my system. I could fix this to some degree with a RAID setup but the real question is why isnt this being looked at more closely ?
    • Do you even know WHY you dislike the x86 architecture? Seriously... so many people bitch about it, and I get the feeling that they don't understand what they aren't missing.

      Don't get me wrong... I've used Power-based machines, etc. I've programmed on them. With the way that x86 is designed, it's pretty much a RISC core with an x86 wrapper, which gives the programmers and compilers a much easier time optimizing, as well as still running fast.

      Seriously... what is this x86 hatred that's flying around so muc
      • You might as well acquiesce to the fact that it ain't gonna happen. Unless a company comes along that is influential enough to "get things done."

        There is no such company.

        Microsoft originally designed Windows NT to be portable, and ported it to most of the popular architectures, including MIPS, PowerPC and the DEC Alpha (and probably SPARC, though it never made it to market). All of those have died, because they didn't sell well enough to stay on the market. Right now, Windows is available for the

    • Re:stupid much? (Score:5, Informative)

      by Jerry Coffin (824726) on Thursday April 03 2008, @04:01PM (#22956532)

      Instead of trying to convince everyone on Earth to change all existing software, why doesn't Microsoft just make the next version of Windows have a process handler that can process single threads on multiple cores at once? Actually technically I think Intel could do that internally on their processors too sort of like RAID for cores.


      Intel's been doing that (to some degree) since the Pentium, and they increased it a lot in the Pentium Pro/Pentium II. It works reasonably well up to a point (modern chips typically execute an average of two instructions per clock cycle) but definitely has limits.

      Compilers to automatically detect when instructions can be executed in parallel have been around for years. Cray had vectorizing compilers by the late 1970's, and within rather specific limits, they worked perfectly well. Just for example, if you wrote a loop like:

      for (int i=0; i<256; i++)
      a[i] = b[i] * c[i];

      they'd break the loop down into four actual executions of a loop, each of which worked on 64 items in parallel. It had independent execution units, so at a given time it'd normally be loading one set of 64 items into one set of registers, executing multiplications on a second set of 64 items, and storing results from a third set of 64 registers.

      That has a couple of problems though. First of all, if you're not careful, it's pretty easy to create loops with (apparent) dependencies from one iteration to the next, so the compiler can't parallelize the code. Second, this works well for vector processors, but probably not nearly so well for a large number of completely independent processors (which have higher communication overhead, meaning that starting up things to happen in parallel is more expensive).

      If you're willing to provide the compiler with a little help, it can do quite a bit more, such as with MPI. The standard MPI interface is pretty low-level, but if you want to do the job in C++, Boost.MPI helps out quite a bit (cheap plug: if you want to know more, consider attending Boostcon '08 [boostcon.com]).