Slashdot Log In
Faster Chips Are Leaving Programmers in Their Dust
Posted by
CmdrTaco
on Mon Dec 17, 2007 12:42 PM
from the or-maybe-they've-already-wrapped-around-to-zero dept.
from the or-maybe-they've-already-wrapped-around-to-zero dept.
mlimber writes "The New York Times is running a story about multicore computing and the efforts of Microsoft et al. to try to switch to the new paradigm: "The challenges [of parallel programming] have not dented the enthusiasm for the potential of the new parallel chips at Microsoft, where executives are betting that the arrival of manycore chips — processors with more than eight cores, possible as soon as 2010 — will transform the world of personal computing.... Engineers and computer scientists acknowledge that despite advances in recent decades, the computer industry is still lagging in its ability to write parallel programs." It mirrors what C++ guru and now Microsoft architect Herb Sutter has been saying in articles such as his "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software." Sutter is part of the C++ standards committee that is working hard to make multithreading standard in C++."
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
2005 Called (Score:5, Funny)
Seriously - any developer writing modern desktop or server applications that doesn't know how to do multi-threaded programming effectively deserves to be on EI anyway. It is not that difficult.
Re:2005 Called (Score:5, Insightful)
Parent
Re:2005 Called (Score:4, Interesting)
Not without an infinite number of processors you don't.
Parent
Re:2005 Called (Score:4, Insightful)
Parent
Re:2005 Called (Score:5, Funny)
Parent
Re:2005 Called (Score:5, Funny)
Parent
Re:Threads Are Not the Answer (Score:5, Interesting)
There is, and will always be, overhead associated with parallelization. It may sound great to say "oh, we can farm out parts of this data set to other cores!", but that requires a lot of start-up and tear-down synchronization. It's not at all uncommon for overall performance to be improved by doing something *unrelated* at the same time, requiring less synchronization overhead.
Are threads perfect for everything? No. But calling them the second worse thing to happen to computing is, as best, disingenuous.
Parent
Re:2005 Called (Score:4, Interesting)
Eventually, good parallel algorithm libraries will pop up. That will help some subset of problems. I'd expect frameworks to pop up as well, helping another. But in many cases it just comes down to changing how we write programs.
And you're right, this isn't really a desktop issue- its mainly a server one. Desktops really don't need all the power they have now, perhaps one percent of users outside of gamers actually use it. That doesn't make it any less important of a problem to solve. Although I expect in the end people will still end up disappointed- parallelization is not magic pixie dust, you can only get so much of a speedup. I wouldn't be surprised if those 8 corse only give a 2x speedup over a single core on many apps.
Parent
Re: (Score:3, Insightful)
Re:2005 Called (Score:4, Informative)
it wants its reply back.
The parent is exactly how I would have replied a couple of years ago. I was doing lots of threading work, and I found it easy to the point of being frustrated with other programmers who weren't thinking about threading all of the time.
I was wrong in two ways:
1. It's not that easy to do threading in the most efficient way possible. There's almost always room for improvement in real-world software.
2. There are plenty of programmers who don't write thread-safe/parallel code well (or at all) that are still quite useful in a product development context. Some haven't bothered to learn and some just don't have the head for it. Both types are still useful for getting your work finished, and, if you're responsible for the architecture, you need to think about presenting threading to them in a way that makes it obvious while protecting the ability to reach in and mess with the internals.
The first point is probably the most important. There are several things that programmers will go through on their way to being decent at parallelization. This is in no strict order and this is definitely not a complete list:
- OpenMP: "Okay, I've put a loop in OpenMP, and it's faster. I'm using multiple processors!!! Oh.. wait, there's more?"
Now, to be fair, OpenMP is enough to catch the low-hanging fruit in a lot of software. It's also really easy to try out on your code (and can be controlled at run-time).
- OpenMP 2: "Wait... why isn't it any faster? Wait.. is it slower?"
Are you locking on some object? Did you kill an in-loop stateful optimization to break out into multiple threads? Are you memory bound? Blowing cache? It's time to crack out VTune/CodeAnalyst.
- Traditional threading constructs (mutices, semaphores): "Hey, sweet. I just lock around this important data and we're threadsafe."
This is also often enough in current software. A critical section (or mutex) protecting some critical data solves the crashing problem, but it injects the lock-contention problem. It can also add the cost of round-tripping to the kernel, thus making some code slower.
- Transactional data structures: "Awesome. I've cracked the concurrency problem completely."
Transactional mechanisms are great, and they solve the larger data problem with the skill and cleanliness of an interlocked pointer exchange. Still, there are some issues. Does the naive approach cleanly handle overlapping threads stomping on each-others' write-changes? If so, does it do it without making life hell for the code changing the data? Does the copy/allocation/write strategy save you enough time through parallelism to make back its overhead?
Should you just go back to a critical section for this code? Should you just go back to OpenMP? Should you just go back to single-threading for this section of code? (not a joke)
Perhaps as processors get faster by core-scaling instead of clock-scaling this will become less of a dilemma, but to say that "[to do multi-threaded programming effectively] is not that difficult" is akin to writing your first ray-tracer and saying that 3D is "not that difficult." Somtimes it is. At least at this point there are places where threading effectively is a delicate dance that not every developer need think about for a team to produce solid multi-threaded software.
That doesn't mean that I object to threading being a more tightly-integrated part of the language, of course.
Parent
Re:2005 Called (Score:5, Informative)
That holds only for multithreaded programming on a single core. As soon as there are multiple cores available, processing does, in fact, happen simultaneously.
Parent
Re:2005 Called (Score:4, Insightful)
On modern systems, threads are themselves first-class constructs, and it runs somewhat like this:
A process has things like memory-tables for virtual memory, handles for objects, files, socket connections, etc. A process always contains at least one thread (this isn't always true while a process is being set up or torn down, but it's true when most anyone's code is running).
A thread generally has a stack (in the host-process's virtual address space, so everyone can read it), some thread-local storage to make life easier for some api's (you don't need to care about this in most cases), and lives in a process. This means that threads can use virtual addresses for memory interchangeably with other threads in the same process.
Additionally, some operating systems support fibers. A fiber is like a thread except that it has to be explicitly or cooperatively (not quite the same thing) multi-tasked. Fibers use even less memory than threads, and you really don't have to care about them.
When you're in, say, Visual Studio, there's a "threads" window for all of the threads of the process that you are debugging. You can end up stepping through code on one thread while other threads are running.
The modern hardware designs lead to interesting performance side-effects from cache location and memory location. It's not quite as hard as systems that have asymmetric access to resources (e.g. Playstation 2), but it makes for fun work.
Parent
M$ programmers should be already capable (Score:5, Funny)
hhooppee tthheeyy ffiixx tthhiiss ssoooonn (Score:5, Funny)
Re:hhooppee tthheeyy ffiixx tthhiiss ssoooonn (Score:5, Funny)
Parent
YOUR eyes?! (Score:3, Funny)
Re: (Score:3, Interesting)
OS/2? (Score:5, Interesting)
Does anybody remember DeScribe?
Re:OS/2? (Score:4, Insightful)
I'm of the opposite opinion; it's a shame that so many people equate parallel processing with threads. When there's not much shared data, using multiple processes keeps memory protection between your parallel "things", decreasing coupling, increasing isolation, and generally resulting in a more stable system (and for certain things where you can avoid some cache coherency problems, a faster system). Your example is perfect; there's really no good reason to use a thread for such lookups. Another process would do, or even better just use select() and avoid all the pain (and bugs) of a multithreaded solution.
OS developers spent a lot of engineering time implementing protected memory. Threads throw out a huge portion of that; a good programmer won't do that without very good reasons. Some tasks, where there really are tons of complicated data structures to be shared, are good candidates for threading. More commonly, though, threads are used either because the programmer doesn't know any better or because they allow you to be a slacker about defining exactly what is shared and mediating access to it. The latter is especially dangerous; defining exactly what (and how) things are shared goes most of the way toward eliminating multiprocessing bugs, and threads make it easy to slack off on that and get a "mostly working" solution that occasionally deadlocks, fails to scale, etc.
Use processes or state machines when you can, and threads when you must.
Parent
Re:OS/2? (Score:4, Interesting)
The big problem is not the operating system designers, it's the CPU designers. They integrated two orthogonal concepts, protection and translation, into the same mechanism (page tables, segment tables, etc). The operating system wants to do translation so it can implement virtual memory. The userspace program wants to do protection so it can use parallel contexts efficiently. Mondrian memory protection would fix this, but no one has implemented it in a commercial microprocessor (to my knowledge).
Parent
Re:OS/2? (Score:4, Insightful)
Parent
Thank god (Score:5, Funny)
Guess I had it coming.
Re:Thank god (Score:5, Informative)
The only significant thing that managed languages make easier with regard to multithreading other than a more intuitive API is garbage collection so that you don't have to worry about using reference counting when passing pointers between multiple threads.
All of the same challenges that exist in C/C++ such as deadly embrace and dining philosophers still exist in managed languages and require the developer to be trained in multi-threaded programming.
Some things can be more difficult to implement like semaphores. You also have to be careful about what asynchronous methods and events you invoke because those get queued up on the thread pool and it has a max count.
I would say managed languages are "easier" to use but to be used effectively you still have to understand the fundamental concepts of multithreaded programming and what's going on underneath the hood of your runtime environment.
Parent
Re: (Score:3, Informative)
I know how to do threading in C++ on every platform I have used for development. It's just that the modern languages have elegant system with forethought given to threading while desinging the platform/language. Why would anyone new want to learn how to do clumsy non-standard threading in C++?
I think the options are to adapt or continue riding the dinosaur untill they die out and be left behind. Sorry that I am sending mixed singnals. I have always worked
The basic problem (Score:5, Insightful)
So far, multiple cores have boosted performance mostly because the typical user has multiple applications running at a time. But as the number of cores increases, the beneficial effects diminish dramatically.
In addition, most applications these days are not CPU bound. Having eight cores doesn't help you much when three are waiting on socket calls, four are waiting on disk access calls and the last is waiting for the graphics card.
Re:The basic problem (Score:4, Insightful)
They diminish, but they never disappear. Even in algorithms where you completely have to wait the results of previous computation to go on, you can still get a speedup with branch prediction. In essence, while your one core is cracking the numbers, other cores do the what if work, and even if you mispredict in lots of cases, you can still get speedups with large datasets, because in some cases, when your first core comes up with a result, you will discover that the what if computation started out with a right guess.
Hey, i hear they are doing essentially the same stuff with all those newfangled multiscalar processors and branch prediction anyway.
Parent
Personal computing? (Score:5, Interesting)
Exactly what areas of "personal computing" are requiring this horsepower? The only two that come to mind are games and encoding video. The video encoding part is already covered - that scales nicely to multiple threads, and even free encoders will use the extra cores to their full potential. That leaves gaming, which is basically proprietary. The game engine must be designed so that AI, physics, and other CPU-bound algorithms can be executed in parallel. This has already been addressed.
So this begs the question, exactly how will average consumer benefit from an OS and software that can make optimum use of multiple cores, when the performance issues users complain about are not even CPU-bound in the first place?
Dan East
Re: (Score:3, Insightful)
AOL 10.0 will say "You got mail!"
Re: (Score:3, Insightful)
There are plenty of tasks that people do routinely on computers that are not "instantaneously" fast (spreadsheets, photo-editing, etc.). Furthermore there are many aspects of modern user interfaces that would be better if they were faster (generating thumbnail previews, sorting entries, rescanning music collections, searching,
Sameless Plug: Qt 4.4 (Score:5, Informative)
The new Qt4.4 (due 1Q2008) has QtConcurrent [trolltech.com], a set of classes that make multi-core processing trivial.
From the docs:
The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores. Programs written with QtConcurrent automaticallly adjust the number of threads used according to the number of processor cores available. This means that applications written today will continue to scale when deployed on multi-core systems in the future.
QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications:
* QtConcurrent::map() applies a function to every item in a container, modifying the items in-place.
* QtConcurrent::mapped() is like map(), except that it returns a new container with the modifications.
* QtConcurrent::mappedReduced() is like mapped(), except that the modified results are reduced or folded into a single result.
* QtConcurrent::filter() removes all items from a container based on the result of a filter function.
* QtConcurrent::filtered() is like filter(), except that it returns a new container with the filtered results.
* QtConcurrent::filteredReduced() is like filtered(), except that the filtered results are reduced or folded into a single result.
* QtConcurrent::run() runs a function in another thread.
* QFuture represents the result of an asynchronous computation.
* QFutureIterator allows iterating through results available via QFuture.
* QFutureWatcher allows monitoring a QFuture using signals-and-slots.
* QFutureSynchronizer is a convenience class that automatically synchronizes several QFutures.
* QRunnable is an abstract class representing a runnable object.
* QThreadPool manages a pool of threads that run QRunnable objects.
This makes multi-core programming almost a no-brainer.
Evolution that halted at 4 ghz.... (Score:3, Interesting)
Re:Evolution that halted at 4 ghz.... (Score:5, Informative)
I have for over 6 years been thinking..of a 3d-dimmension processor that cross communicates over a diagonal matrix instead of the traditional serial and parallel communication model.
Six years, and you haven't discovered all the machines built to try that? This was a hot idea in the 1980s. Hypercubes, connection machines, and even perfect shuffle machines work something like that. There's a long history of multidimensional interconnect schemes. Some of them even work.
Parent
Re:Evolution that halted at 4 ghz.... (Score:5, Funny)
Parent
And so it goes...... (Score:5, Insightful)
Translation:
Code will get even more inefficient / bloated and require faster hardware to do the same thing you are doing now. While I'm all for better / faster computer hardware, most if not all Jane and Joe Sixpack users never need Super Computer power to surf the net, read e-mail and watch videos.
Erlang (Score:5, Informative)
Oddly enough, I just watched a presentation about this very topic, with an emphasis on Erlang [erlang.org]'s model for concurrency. The slides are available here:
http://www.algorithm.com.au/downloads/talks/Concurrency-and-Erlang-LCA2007-andrep.pdf [algorithm.com.au]
The presentation itself (OGG Theora video available here [linux.org.au]) included an interesting quote from Tim Sweeney, creator of the Unreal Engine: "Shared state concurrency is hopelessly intractable."
The point expounded upon in the presentation is that when you have thousands of mutable objects, say in a video game, that are updated many times per second, and each of which touches 5-10 other objects, manual synchronization is hopelessly useless. And if Tim Sweeney thinks it's an intractable problem, what hope is there for us mere mortals?
The rest of this presentation served as an introduction to the Erlang model of concurrency, wherein lightweight threads have no shared state between them. Rather, thread communication is performed by an asynchronous, nothing-shared message passing system. Erlang was created by Ericsson and has been used to create a variety of highly scalable industrial applications, as well as more familiar programs such as the ejabberd Jabber daemon.
This type of concurrency really looks to be the way forward to efficient utilization of multi-core systems, and I encourage everyone to at least play with Erlang a little to gain some perspective on this style of programming.
For a stylish introduction to the language from our Swedish friends, be sure to check out Erlang: The Movie [google.com].
Threads considered harmful (Score:5, Interesting)
This has been coming for a while. (Score:4, Interesting)
There once was time when debugging was part of your job. Now; someone else does that and at most, the better coders do some unit testing to ensure their code snippet does what it is supposed to. There generally isn't any "standard" with regard to processes except in some houses that follow *recommended coding guidelines* but these are few and far between. Old school coders had a process in mind to fit a project as a whole and could see the end running program. Many times now, you are to code an algorithm without any regard or concept as to how it might be used. A lot of strange stuff going on out there in the business world with this!
If there is a fundamental change in the base for C++, et al., this is going to possibly have a detrimental effect on the employment market as there will be many who cannot conceptualize multi-threading methodologies much less modeling some existing processing in this paradigm; and leave the markets.
I left the programming markets because of the clash of bean counters vs quality, and maybe this will have a telling change in that curve. I always did enjoy some coding over the years and maybe this would make an interesting re-introduction. I have personally not coded in a multi-threading project but have the concepts down. Might be fun!
There's not much hope for the C++ committee (Score:4, Insightful)
I have little hope for the C++ standards committee. It's dominated by people who think really l33t templates are really cool. Everything has to be a template feature. They're fooling around with a proposal for declaring variables atomic through something like atomic<int> n; This allows really l33t programmers to write really l33t code using really l33t lockless programming. But without the proofs of correctness needed to make that actually work reliably.
It's also long been Strostrup's position that concurrency is a library problem. As long as the OS provides threads and locking, it's not a language problem. This isn't good enough.
The fundamental problem is that, as currently defined, a C++ compiler has no idea which variables are shared between threads, and which are never shared. The compiler has no notion of critical sections. Fixing this requires some fundamental changes to the language. It's known what to do; Modula, Ada, and Java all have synchronization and isolation built into the language. But there's nothing like that in C++, and the designers of C++ don't want to admit their mistakes.
It's not just a C++ problem. Python has a similar issue. Python as a language doesn't deal with concurrency adequately. The main implementation, CPython, has a "global interpreter lock" that slows the thing down to single-CPU speed.
The rise of Erlang and Haskell? (Score:3, Interesting)
Will the new world of concurrency cause a shift in language popularity? Or will traditional languages remain more popular, perhaps with some enhancements? C++ is gaining concurrency enhancements; C++, Python, and many other languages work well with map/reduce systems like Google MapReduce; and even with no enhancements to the language, you can decompose larger systems into multiple threads or multiple processes to better harness concurrency.
If you know Haskell and Erlang, please comment: do those languages bring enough power or convenience for concurrency that they will rise in popularity? People grow very attached to their familiar languages and tools; to displace the entrenched languages, alternative languages need to not just be better, they need to be a lot better.
steveha
Chip makers want developers to pay for lunch (Score:3, Interesting)
Without the work of developers, multi-core chips will be like the extra transistors in transistor radios in the 1960s: good for marketing but functionally useless.
Microsofts view on cores (Score:4, Funny)
Core one: For the OS
Core two: Anti-virus
Core three: Anti-Spyware / Windows Defender
Core four: Firewall
Core five: Windows update notifications and installations
Core six: Windows Genuine advantage checks
Core seven: Eye Candy (Vista) with XP you get a bonus CPU
Core eight: What ever the user wants to run, except when you get a virus, then
you have to share it with the SPAM bot.
Guess we will be waiting for 16 core CPU's.
Oh and don't start me on memory requirements
Re: (Score:3, Insightful)
Huhhh?
My guess is that you never wrote any code.
Linux doesn't do any more heavy lifting for you than Windows does. I doubt that OS/X does.
So what are you talking about.
An OS will never figure out what part of your program is going to need to be in which thread. A compiler MAY at som
Re: (Score:3, Informative)
Re: (Score:3, Interesting)
There is a great talk by Bjarne Stroustrup (http://csclub.uwaterloo.ca/media/C++0x%20-%20An%20Overview.htm [uwaterloo.ca]
LabVIEW [& other graphical environments] (Score:3, Interesting)
my current major language (Igor pro) will use all the cores automatically, and how many languages do multithread this way? Matlab(?), Octave(?)
LabVIEW, by its very nature [which is graphical - based on "G" - the "Graphical" programming language] is kinda/sorta topologically self-threading: If a piece of LabVIEW code sits off in its own connected component, then [more or less] it gets its own thread.
Of course, all your ".h" & ".c" [or ".cc"] files [& their innards] might very well break down int
Re:Oh, wow (Score:5, Insightful)
Actually, according to the latest Dr Dobbs, Herb is the *chair* of the ISO C++ Standards committee. (He had an article on lock hierarchies being used to avoid deadlock)
He's really going to know what he's talking about, then.
As chair of the committee, I'd say there's a pretty fair chance that he *does*.
I really love people who bash things just because Microsoft is involved. Contrary to what seems to be a popular belief here, they have some incredibly intelligent people who are very good at what they do there.
Parent
Re: (Score:3, Insightful)
Thank God for that.
I'm glad that coders today can use high-level tools and languages without having to spend half their time on performance tweaking.
Take as an example a game like Halo (or Guitar Hero, or World of Warcraft, or whatever your favorite modern game is). If the developers of these
Re:Wait for the new C++ standard before you switch (Score:3, Informative)
Re:Diaspora (Score:4, Insightful)
Making code easy to read and maintain is critical to maximizing the efficiency of the programmer. The efficiency of the code is generally a secondary issue, and is only a factor if the code in question is found to be a bottleneck.
Brian Kernighan once said,
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"
Parent
Re:C++? (Score:5, Informative)
Parent