Inside Intel's $20M Multicore Research Program 187
An anonymous reader writes "You may have heard about Intel's and Microsoft's efforts to finally get multi-core programming into gear so that there actually will be a developer who can program all those fancy new multicore processors, which may have dozens of core on one chip within a few years. TG Daily has an interesting article about the project, written by one of the researchers. It looks like there is a lot of excitement around the opportunity to create a new generation of development tools. Let's hope that we will soon see software that can exploit those 16+core babies. 'The problem of multi-core programming is staring at us right now. I am not sure what Intel's and Microsoft's expectations are, but it is quite possible that they are in fact looking at fundamental results from the academic centers to leverage their large work force to polish and realize the ideas that come forth. It calls for a much closer collaboration between the centers and the companies than it appears at first sight.'"
It's easy (Score:5, Funny)
Most PCs are fast enough (Score:3, Insightful)
Re:Most PCs are fast enough (Score:5, Funny)
So, uh, you haven't Vista yet, I see...
Re: (Score:3, Insightful)
Re:Most PCs are fast enough (Score:4, Insightful)
Re: (Score:2)
It doesn't sound like a very good way to allow end users to make efficent use of their hardware.
If users want a cost effective piece of hardware that's good for a lot of things, I don't see how generic hardware that can be load balanced isn't a win.
Re: (Score:2)
Is AMD doing so poorly that that's their only hope?
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Informative)
Game physics needs computational power. but I'm not considering game systems.
Scientific and Engineering projects need computational power and benefit from cost reduction in high performance process
Re:Most PCs are fast enough (Score:4, Insightful)
And 640k ought to be enough for anyone.
My last purchase (6 to 8 months ago) was a "low-end" machine. I chose carefully to make sure that it was low-end and not bargain-basement. It has two cores. I don't think it's even possible to buy a single core machine through mainstream channels anymore. Today's low-end (multi-core) is more than adequate for most users to use over the next few (read: four) years.
You do not understand how the scheduler works.
Re:Most PCs are fast enough (Score:4, Insightful)
As a professional kernel developer, I realize that locking cores into specific tasks is a lot easier than writing a general purpose scheduler that performs equivalently.
Re: (Score:2)
Re: (Score:2)
>"And 640k ought to be enough for anyone."
No not really, but I think PCs will follow a progression similar to cars. When cars were first available they were a mere 10-20 horsepower. As the technology developed, engineers learned to make better cars until the 1950s when a family car might have 400-500 horsepower. In theory engineers could have continued building more-and-more powerful cars, so that we would have 4
Re: (Score:2)
Pretty soon social networking will include 1080p video mail or 50 megapixel photos of Jr. or there will be another DOOM II or something like that golf game that had every executive upgrading their Windows 95 'business' computers. Or perhaps the latest 4x1080p 3D media encoder will have us all wanting something faster.
But that's
Re:Most PCs are fast enough (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
However a modern 2007-designed CPU should be able to handle that movie just fine.
So why upgrade if the 2007-designed CPU can play 1080p movies flawlessly? What possible *real world* (not star trek) application could make someone want to get a
Re: (Score:2)
The thing is people don't really need a more powerful machine. The hardware is capable of handling the workloads you described. And when someone said that we won't need anything faster than a 387 fpu, or more than 640K, or whatever. They were right. And were never proven wrong. Just because a market is out there for flashy gadgets that don
Re: (Score:2)
Sure, anyone would. Barring some major breakthroughs in superconducting circuits, however, it's not going to happen anytime soon.... well, not unless you want to run one of those liquid-helium cooled machines....
Re: (Score:2)
Re: (Score:2)
Well...
I gotta disagree. It's a nice modern feature to be able to hear REAL music instead of sid-music (C=64) or watch REAL movies instead of 5-minute graphics demos (Commodore Amiga). There have bee
Re: (Score:2, Insightful)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Insightful)
We are certainly capable of making cars that are that fast, but they wouldn't really be any more useful or provide more utility than a slower car.
Re: (Score:2)
Re:Most PCs are fast enough (Score:4, Insightful)
Terrible, terrible idea. Definitely not thought out.
Re: (Score:3, Informative)
Re: (Score:2, Insightful)
That leaves out the questions of range (you'd be lucky to get three miles to the gallon), and, you know, being able to actually maneuver on the public roads at that speed.
You're nuts.
-Peter
Re: (Score:2)
"Car flies into air at 400mph, does a 1080, lands in school yard 2 miles away and kills 19 children."
P.S. Don't forget potheads, either. People don't drive safely at 5mpg with their cellphones, makeup, DVDs, nav systems, iPods, McDonalds...you get the idea
Re: (Score:2)
It wasn't that long ago where people were under the impression that 640k of memory was more than enough.
640K was more than enough .. for the sorts of things you could do in DOS. When it was said there were already systems that could support up to 8MB and 16MB of RAM. And university and the aerospace industry were building maxed out configurations. It's absurd that you would interpret it as meaning that for all cases, all situations and all people that 640K is enough. It's as absurd as trying to think it means the entire world could share 640K between a billion people.
640K RAM (and a harddrive) is plenty for
You missed a huge marketplace (Score:2)
We love multiple threads and in some cases multiple processes and/or multiple machines. DBMS's, transaction processing systems, web servers, middle tier data servers....
The code is already in place and the more cores you throw at us the faster we can run. It's not mathematically complex tasks but it is workhorse stuff that parallelises beautifully, though not in a particularly fine-grained way.
There's more to computing than consumer desktops and science/engineering departments
UFC's next fight (Score:2)
Most of the new cores are being used to isolate crapware and anticrapware in a Battle Royal.
And it looks like Crapware is going to win in a submission tapout at the current rate.
Will more cores help me decipher this run-on? (Score:5, Funny)
Re: (Score:2)
Take research result from universities, give them to Marketing.
Multicore Programs (Score:4, Insightful)
Re: (Score:2)
There is also a recent paper that shows how the MapReduce pattern can be easily applied to just about every machine-learning algorithms with near-linear speedup. This stuff isn't just going to be used to make the next Clippy, but for more interesting stuff like video processing, speech recognition, sensor fusion in "smart" handhelds, etc.
The applications and the need exist, but so far they have not
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
CPU is not bottleneck on desktop (Score:3, Interesting)
People need to stop thinking that 'I don't have a program that uses 16 cores (16 real threads), so I don't need a 16 core system).'
On a desktop PC, the IO system is going to be the source of contention a far more often than the processor(s). How often do most people run several CPU bound tasks simultaneously on a desktop anyway? Extremely rarely.
Imagine splitting the CPU cycles of 1 core for all these tasks, and sharing them fairly, against splitting the cycles of 2..4..16 cores.
If the CPUs you currently have aren't being heavily utilized, then having more of them isn't going to give you any perceptible improvements. This is really a matter of scaling horizontally as opposed to vertically, and they both suit entirely different workloads. The average workload of
I hope they do better then GPU manufacturers. (Score:2)
Multi-threaded qsort() anyone? (Score:2)
It should not be very hard... The algorithm begs for multi-threading — once you divide your array, you apply the same algorithm to the two parts, recursively. The parts can be sorted in parallel — this has a potential for huge performance gains implications in database servers (... ORDER BY ...), etc.
Anyone?
Re:Multi-threaded qsort() anyone? (Score:4, Informative)
One of several parallelised standard algorithms [209.85.135.104].
Re: (Score:2)
Re: (Score:2)
Even in 128Gb of memory? Even if the comparison function (qsort()'s last argument) takes a while to complete?
Re: (Score:2)
Uhm, yes, something like that... But it ought to be transparent to the caller — I just want to keep calling qsort() from my (portable) code and have it take advantage of the multiple CPUs, when available.
Sun? (Score:4, Funny)
Its only you peasants that persist in using old-hat Wintel stuff that are so last-year. Get with it people! You too could be runningNetBSD on your toaster (it will probably out perform Windows Vista in a 4-core Pentium anyway). Hell it might even eat Nandos peri-peri Vista for breakfast!
Re: (Score:3, Informative)
The NT kernel has supported SMP for 10 years. So what?
It's all about the applications. Sure, there's some development tools in *nix for multicore. I doubt they are efficient and accessible though. Can y'all tell me how great GCC is with 16 cores and thread level parallelism? I'm sure some academic and or low level solutions exist everywhere. However,
Re: (Score:2)
The NT kernel has supported SMP for 10 years. So what?
It sucked at it compared to OS/2 and probably Solaris 10+ years ago and because of how poorly it did threads, most Windows apps did what Microsoft did and pretty much stayed away from threading. And to be relevant to the current discussion, Windows threading did not cross CPU/core boundries while OS/2's threading did 10+ years ago.
So, are you saying that Windows( XP and/or Vista ) threading can cross core boundries? If so, why would Microsoft be trying to come up with a way to get developers to target mult
Re: (Score:2)
It sucked at it compared to OS/2 and probably Solaris 10+ years ago and because of how poorly it did threads, most Windows apps did what Microsoft did and pretty much stayed away from threading. And to be relevant to the current discussion, Windows threading did not cross CPU/core boundries while OS/2's threading did 10+ years ago.
What do you mean by "cross CPU/core boundaries" ? Windows NT has been able to schedule artbitrary threads onto arbitrary processors since *at least* NT 4.0 (and probably 3.1).
Re: (Score:2)
What do you mean by "cross CPU/core boundaries" ? Windows NT has been able to schedule artbitrary threads onto arbitrary processors since *at least* NT 4.0 (and probably 3.1).
you are right since I found a couple of pages which say that NT threads can cross CPU boundaries. Interesting since when I was working on OS/2 and NT apps in the mid 90's, NT performance was really bad on dual CPU systems with a heavily threaded app. It was explained to me at the time that the NT kernel didn't let a process's threads spread out across the CPUs. Whatever it was, the OS/2 port was much faster on the same dual CPU system as the NT port and that was before any 32bit data structure alignment wa
Re: (Score:2)
Secondly, GCC doesn't care about threading scalability. It's all up to you as the application architect to design a parallel system.
Academic and real-world examples are well known. Once you get the basic ideas down, the vast majority of throughput bottlenecks parallelise out ve
Re: (Score:2)
Firstly, NT supports SMP, but it doesn't scale well to utilise it. Windows Server 2008 might be tolerable, but it's not going to compete with current, let alone future, Linux, and the higher the core count, the bigger the divide gets.
Benchmarks ?
Show me the money Intel. (Score:5, Insightful)
We've had SIMD multicore PC's forever, and they're useless as desktops. I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle. It's been like that for more than a decade, it'll be like that until MIMD hits the street with a whole new paradigm of programming languages behind it - a handful of C compiler #pragma directives from intel isn't going to make this work.
It's not simply a matter of "coders don't know how to do it." It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.
Eg; writing a game engine with a video thread, audio thread and an input thread still leaves 13 cores idle. You really cant thread those much farther (the ridiculously parallel problem of rendering is handled by the GPU).
Simply starting processes on different procs doesn't help all that much, since they all fight over memory and I/O time. The point of diminishing returns is reached fairly quickly.
But hey, if all you do is run Folding@home so you can compare your e-cock with the other kids on hardextremeoverclockermegahackers.com, well I have some good news!
As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.
Savain says it best at rebelscience.org: "Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."
Re: (Score:2)
Woopsie. I think you presume that games don't need more processing before the GPU so much.
What if you could thread out, and preprocess the video? We don't know, cause it's not yet practical. The tools to write that software don't exist.
Actually, if we get enough cores as CPU, when do w
Re: (Score:2)
When CPU's get better at churning out FP math solutions. The whole purpose of the GPU is it's a massive net of FPUs. I think Cell style technology is going to be more similar to the type of chip we see in 10 years than an Intel C Core w/ 100 Pentium type cores in it. Ideally, I think you are looking at a processor 'office' for each thread - 1 supervisor core, multiple FPUs, a couple of CPU cores, perhaps 1 or 2 GPUs & a few FGP
Re:Show me the money Intel. (Score:5, Insightful)
That's OpenMP, and depending on the program, it can work wonders. In an hour I parallelized 90% of a finite element CFD code with it. Yes, it sucks for fine-grained parallelization.
Intel's product is Threaded Building Blocks, and is not built around pragmas, and is both commercial and OSS. It's pretty slick and will let you do the more fine-grained optimizations.
It's a matter of these multi-core "general purpose" CPUs are only really useful for a fairly limited set of specific problems.
Not entirely true, it's just useful for problems that need a processor.
I write this from a quad xeon machine, repurposed as my dev box, as CPU1 grinds away at about 75% all day long, the rest idle.
the ridiculously parallel problem of rendering is handled by the GPU
Not for long. Raytracing is making a comeback.
As for me, I'm seeing AMD's multiple specific purpose core approach as being more viable, as far as actually making my next desktop computer perform faster.
If you can't even tank one core of your Xenon, it's doubtful.
"Even after decades of research and hundreds of millions of dollars spent on making multithreaded programming easier, threaded applications are still a pain in the ass to write."
I'd caveat that by saying "threading arbitrary program X is a pain in the ass." There are plenty of useful programs that are easily parallelized.
Re: (Score:3, Interesting)
Re: (Score:2)
None of the people that argued could come up with a single real-world problem that couldn't be hacked into working on multi-core systems. When you say "SMT" I assume you've made a typo and mean SMP. SMP is a
Re: (Score:2)
Those acronyms, I don't think they mean what you think they mean. SIMD refers to a Single Instruction operating on Multiple Data values in parallel... think Altivec or SSE. MIMD is Multiple Instructions, Multiple Data... i.e. the multiple CPU machines you and I have been running for years.
Re: (Score:2)
Hi! My name is AI, I'll be happy to eat any number of cores you throw at me!
NOT SO FAST, AI! I'm RAY, RAY TRACING AND....
Anyways, you get the picture. 640k ram yadda yadda.
Re: (Score:2)
Yesterdays Paradigms (Score:2)
http://en.wikibooks.org/wiki/Ada_Programming/Tasking [wikibooks.org]
But you can use them - just use "gcc -x ada"
Martin
[1] You need a fully installed GNU Compiler Collection for it to work.
Hardware description to parallel programming lang? (Score:3, Interesting)
Although VHDL is a hardware description language, couldn't similar concepts be used to make a parallel centric computer programming language?
Re: (Score:2, Interesting)
Excellent suggestion. This is precisely what the COSA software model is about. A pulsed neural network is my preferred metaphor for an ideal model of parallel computing. Intel and the others are on the verge of losing billions of dollars because they are already deeply committed to the hard to program multithreading model, a complete failure even after decades of resea
Re:Hardware description to parallel programming la (Score:2)
Sure. In fact, VHDL is based (closely) on Ada, which allows pretty similar things. The relevant differences are less between the languages than how they're used. Ada that was written in the same style as most VHDL would have a high level of parallelism as well.
That's rarely done though, because designing hardware in VHDL (or Verilog, etc.) is expensive, large
Re: (Score:2)
Why not use what's already? (Score:2)
http://en.wikibooks.org/wiki/Ada_Programming/Tasking [wikibooks.org]
Martin
All ready done: Ada (Score:2)
http://en.wikibooks.org/wiki/Ada_Programming/Tasking [wikibooks.org]
Martin
Re:Hardware description to parallel programming la (Score:2)
Re:Hardware description to parallel programming la (Score:2)
We do it already (Score:2)
Re: (Score:2)
We have already seen this one, it ended badly (Score:2)
Itanic crashed, burned and sank against the rocks of the compiler tech not being able to keep up. I see it happening again.
Yes we will find ways to make a quad core system stay busy enough to sell em to corporate desktops and home users. Hell, you can assign one to the virus/crapware scanner. Waste another or two doing ev
Re: (Score:2)
Itanic crashed, burned and sank against the rocks of the compiler tech not being able to keep up
There is a fundamental flaw in the itanic design philosophy that no compiler will ever be able to make up for. There are some optimisations that have to be done at run time. They can't be done at compile time. itanic was conceived out of 1970's supercomputer research before out-of-order, speculative execution, dynamic branch prediction RISC processors had been invented.
There was in itanic fore-runner back in t
Re: (Score:2)
Even your humble word processor can be broken into hundered of threads, I would bet OpenOffice has dozens. Today, threads are considered expensive as RAM once was, but in the 16 core world they will b
Moving the bottleneck... (Score:5, Interesting)
Forget software not being written for multi-cores, the entire infrastructure around the computer needs to "go wide" for massive parallelism, not just the software. This includes disk, memory, front-side bus, etc./p>
I'm doing highly concurrent projects (grid computing) for my company and we're finding that some things parallelize just fine, but others simply move the pain and bottleneck to a piece of infrastructure that hasn't quite caught up yet.
For example, my laptop has a dual-core 2.2Ghz processor, which you'd think is great for development. It's no better than a single CPU machine because my disk IO light is on all the time. IntelliJ pounds the disk. Maven and Ant pound the disk. Outlook pounds the disk. Even surfing the web puts pages into disk cache, so browsing while building a project is slow. Until I get a SCSI drive, you're still limited on disk IO, so those extra cores don't help that much.
All the cores are great on the server, though. I've recently completed a massive integration project where I grid-enabled my company's enterprise apps. All those cores running grid nodes is giving us very high throughput. Our next bottleneck is the database (all those extra grid nodes pounding away at another bottleneck resource...)
Terracotta Server as a Message Bus [markturansky.com]. It's been a very interesting project.
Re: (Score:2)
why so much disk I/O? (Score:3, Informative)
However, there's no reason why the web browser needs to ensure that the data hits the disk cache right away, so it should be just fine sitting in RAM until the disk frees up. Similarly, intellij, maven, and ant should be slow the first time but faster later on since they should be reading from the page cache.
There's no reason for your disk I/O light to be on unless you don't have enough RAM or the disk algori
Re: (Score:2)
I also found that getting faster RAM makes more difference than a faster CPU, which suggests that too many programs have poor cache behavior.
Re: (Score:2)
I've been running this configuration myself for over two years without a hitch.
Your comment is typical Slashdot troll fearmongering.
Virtualization (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Obviously you've never used a Microsoft product before. They aren't exactly stable. The virtualization means that one thing doesn't take down another. Also certain apps don't get along. MSSQL for instance running with anything else gives you headaches. Plus I mentioned running multiple operating systems (Windows Server / Linux) on the same machine.
How to deliver ever improving performance? (Score:2, Interesting)
How does Microsoft sell you new licenses if you don't buy a new computer?
Virtualization at the OS image level only allows you to run multiple different applications. Running more applications at once isn't the primary goal of the average user. They want the application which has the focus of their attention to be slick and fast.
Multicore CPUs do not allow you to run a single application faster. Intel's PC market
Why not sponsor Scala? (Score:2)
Mats
Intel already has some good threading tools (Score:2)
Obviously it would be better if these worked better and were easier to use, but many people are unaware of the tools that are available right now.
Seriously Folks (Score:2)
Re: (Score:2)
I do, and it's not even remotely the same.
Faster CPU's are not the problem (Score:3, Insightful)
Re: (Score:2)
And people tried to make fun of Vista using free RAM for advanced HD Caching... Weird how Microsoft was on top of that, and even stranger is the Linux project to mimic the intelligent caching of Superfetch, all the while SlashDot people were making fun of it, until a few people realized how beneficial it was to overall performance.
BTW HD bottleneck technologies are being looked at more closely, as on a Vista system with I/O priority and intellig
Re: (Score:2)
Don't get me wrong... I've used Power-based machines, etc. I've programmed on them. With the way that x86 is designed, it's pretty much a RISC core with an x86 wrapper, which gives the programmers and compilers a much easier time optimizing, as well as still running fast.
Seriously... what is this x86 hatred that's flying around so muc
Re: (Score:2)
Even using
Re: (Score:2)
There is no such company.
Microsoft originally designed Windows NT to be portable, and ported it to most of the popular architectures, including MIPS, PowerPC and the DEC Alpha (and probably SPARC, though it never made it to market). All of those have died, because they didn't sell well enough to stay on the market. Right now, Windows is available for the
Re:stupid much? (Score:5, Informative)
Intel's been doing that (to some degree) since the Pentium, and they increased it a lot in the Pentium Pro/Pentium II. It works reasonably well up to a point (modern chips typically execute an average of two instructions per clock cycle) but definitely has limits.
Compilers to automatically detect when instructions can be executed in parallel have been around for years. Cray had vectorizing compilers by the late 1970's, and within rather specific limits, they worked perfectly well. Just for example, if you wrote a loop like:
they'd break the loop down into four actual executions of a loop, each of which worked on 64 items in parallel. It had independent execution units, so at a given time it'd normally be loading one set of 64 items into one set of registers, executing multiplications on a second set of 64 items, and storing results from a third set of 64 registers.
That has a couple of problems though. First of all, if you're not careful, it's pretty easy to create loops with (apparent) dependencies from one iteration to the next, so the compiler can't parallelize the code. Second, this works well for vector processors, but probably not nearly so well for a large number of completely independent processors (which have higher communication overhead, meaning that starting up things to happen in parallel is more expensive).
If you're willing to provide the compiler with a little help, it can do quite a bit more, such as with MPI. The standard MPI interface is pretty low-level, but if you want to do the job in C++, Boost.MPI helps out quite a bit (cheap plug: if you want to know more, consider attending Boostcon '08 [boostcon.com]).