Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Technology

Fundamentals Of Multithreading 122

Bob Moore writes "SystemLogic has got a very thorough article on multithreading. Deals with Amdalh's Law, Latencies and Bandwidth, On-Chip Multiprocessing, Course-Grained Multithreading, Fine-Grained Multithreading, Simultaneous Multithreading, and Applications Of Multithreading. This is definately a good one."
This discussion has been archived. No new comments can be posted.

Fundamentals Of Multithreading

Comments Filter:
  • by Anonymous Coward
    You might want to have a look at Sys Admin's performance tests which benchmark several operating systems on threaded, asynchronous, and processes based SMP. It is an excellent read [sysadminmag.com].
  • by Anonymous Coward
    This has probably been hashed out before, but why can't Andover cache these pages ala Google? They've obviously got the bandwidth to sustain such a spike, whereas nearly everyone they link to (besides msnbc of course 8) ) generally isn't expecting it.

    It actually seems quite self-defeating to spend the time to dig up this obscure gem, only to pummel the server so hard, that no one can see it anyway.
  • by Anonymous Coward
    -so I wonder if google's gotta get permission?
    I guess I can see where they're coming from. I'm too lazy right now to check Google's FAQ to find out, so I'm gonna just shoot the question out there and hope the answer comes to me instead.
    So what about malicious use of the SDE? -could get ugly.
  • by Anonymous Coward
    Pointers, explicit memory allocation, anything!

    Your choosen selection doesn't very fair. Try "while loop, functional decomposition..."

    Sure all programming constructs have danger to them but don't all have the same amount of danger. The term "fragile" applies well to all your examples - threads, pointers, or explicit memory allocation each not only have an immediate cost but also make the whole a bit harder to work-on once they've been put into one part of the program. Their "badness" consists of having an illusion of ease. A while-loop may also cause problems but these problems have a bit more limit quality.

    Joe Solbrig

  • Well, the article's been /.ed. Having done a moderate amount of multi-threaded programming, may main complaint with the designs I've seen is that they often equate a conceptual task with a thread. This results in apps that bloat into fifty thread when all they need is two. Indeed, lots of apps don't multiple threading at all. Threads are costly in terms of the time spent switch tasks and in terms of the mental energy required to consider the different paths of execution. Joe Solbrig
  • by Anonymous Coward on Friday June 15, 2001 @11:54AM (#148395)
    We must ban multi-threading, as it is evil, and causes our poor child processes to become corrupt. Do we really want our child processes to go around crashing all the time?


    WON'T SOMEBODY PLEASE THINK OF THE CHILD PROCESSES!!!!

  • by Anonymous Coward on Friday June 15, 2001 @09:24AM (#148396)

    I am the owner of SystemLogic.net, just to let you know, we officially got /.'ed. I'm trying to log in to the site now to take off a couple scripts in order to lighten the load but I can't get in.

    Just letting you guys know I'm working on it.

  • by lars ( 72 )
    Maybe you should read more carefully before you are so quick resort to insults next time. That doesn't make for a good impression.

    I am pretty sure I know exactly what your point was. An explicitly implemented state machine (so I said finite automaton - sue me) is indeed usually an ugly design. I make no bones about that. My point is that coroutines were invented to solve exactly this problem. You do not need the added complexity of threads. Simple coroutines also have the advantage of being easy to simulate, without the need for a true threading system. For example, using macros or function objects in C/C++. Coroutines also would normally be implemented at user-level so you also avoid polluting the kernel's scheduler with a bunch of threads that behave as one thread.

    Maybe you should learn a little more about concurrent programming instead of preaching about it here if you don't know what coroutines are.
  • by lars ( 72 )
    Unless you have multi-processor hardware, threads don't increase the raw processing power you have to work with. If you have multiprocessor hardward, you don't more threads than the number of processor to take full advantage of the CPU.

    Strictly speaking, you are right - multiple threads doesn't increase the maximum capabilities of your computer. However, I hope you realize that in practice having multiple threads is absolutely essential to get the best performance. Without multiple threads, the only way the second sentence above holds true is if you have complete control of the hardware at the lowest level - your application is the OS - or your application NEVER has any processing it could easily do while blocked waiting for I/O operations to complete, or is completely CPU intensive and never blocks in the first place. For typical applications, though, especially anything interactive, it is simply IMPOSSIBLE to achieve the same kind of throughput with a single threaded architecture as what you would get with an appropriately designed multithreaded one.

    I do agree with you that a lot of developers often use threads as if they are each going to be executed on their own CPU. Of course this is wrong. The previous poster's example of using synchronized threads to implement a finite automaton with implicit state management was particularly apalling. All that is needed for that is single threaded coroutines. Introducing the non-determinism of multiple threads is a recipe for disaster if you don't know what you're doing. One synchronization bug that never shows up on your single CPU development box is all it takes.

    Anyway, I see what you're getting at, but your comment that you never need more than single thread for a single CPU is misleading, as it isn't true in many cases. In your own example, you are using multiple threads (with good reason) on what is presumably a single CPU.

  • by lars ( 72 )
    What is your problem? Seriously, I am not sure where all of the contempt and vitriol comes from.

    First, I was not talking about overlapped I/O(otherwise known as asynchronous or non-blocking I/O in most circles). You've completely misinterpreted my original post. See below.

    Second, fibers in NT/2000 are NOT coroutines. They are similar, but they are not the same thing. You could probably turn them into true coroutines wiith a global variable and some preprocessor magic, but I haven't tried. I also know notihng about your specific scenario, I am just going by what you posted - namely, that you wanted to implement a FSM without explicitly maintaining state. That is exactly one of the problems coroutines were invented to solve. In practice you don't see them often, which is mainly because most programmers simply don't know a lot about concurrent programming. Few schools have entire courses on the subject like UW does. There are always other solutions to the problem you mentioned - e.g. using certain OO design patterns - which are often sufficient.

    Since you apparently aren't interested in having a civilized conversation in this forum, I invite you to email me. Maybe things will work better through that medium. I can even give you examples of specific products I've worked on in the industry where coroutines were used. In fact, consider that a challenge. I kind of doubt you will take me up on this, as looking at your posting history shows you to be somewhat of a troll.
  • by lars ( 72 ) on Friday June 15, 2001 @02:28PM (#148400)
    Check out Peter Buhr's excellent book (I don't think it's ever actually been published, but it's used in the undergraduate "Control Structures" course at Waterloo):

    ftp://plg.uwaterloo.ca/pub/uSystem/uC++book.ps.gz [uwaterloo.ca]

    It approaches the subject from a more theoretical rather than applied point of view, but if you understand all of the concepts in this book you will have a better working knowledge of concurent programming than 99% of the programmers in the industry!

  • And, it looks like you did a good job sorting it out.

    Great site, and a great service. Thank you!
  • This is a sweet story and a lot better than the usual rants. I agree.
  • Check http://www.googlebot.com/bot.html [googlebot.com], the link they keep leaving in my webserver logs. To make a long story short, they do support a "noarchive" option in the "robots" META tag that tells them not to cache your pages, but it's an opt-out rather than opt-in thing, that is, they cache you unless you tell them not to.

    --

  • The problem is, that advice is completely useless. There are usually NO tools that you NEED to use -- for any proposed tool, there is usually an alternate solution that involves using different tools other than that one. But you have to use SOME tools, so ordinarily, you end up using tools that you didn't strictly speaking need to use. The question you have to answer is which tools should I use, given that I must use some tools but there are no particular ones I need to use. Under these circumstances (which is about 99% of the time), you simply can't follow that advice (only use what you need), you must in fact decide to use tools you could have solved the problem without.

    --

  • by bjb ( 3050 ) on Friday June 15, 2001 @09:28AM (#148405) Homepage Journal
    I can't tell you how many people I've worked with over the years who have either no idea on what or how to write multithreaded programs or they claim that they can, but in practice they end up writing code that isn't thread safe.

    I can say that as a programmer, my value is significantly increased by being proficient in multithreaded programming (beyond Java, FWIW). If this article sparks any interest in people, do read further and practice, practice, practice! The people that I've interviewed in the past who have a strong working knowledge of multithreading get a lot of points in my book, and I'm sure other "aware" employers do the same.

    Keep in mind, however, that just knowing how to launch a thread isn't enough. If your code isn't reentrant and thread safe, launching a thread isn't worth a damn.

    Good article... now if I could only get to it ;-)

    --

  • I'm writing code mostly in QNX, and I do multi-thread, I had big problem with some parts of their TCP/IP stack not being thread-safe :-(
    But when you do good multi-threading, I can assure you performance increase.
    Now with QNX RTP using SMP I guess it'll be better, maybe not like BeOS, wait and see...
    --
  • Imagine you are writing a server, like say, Apache. So, this server has to handle, oh, about a thousand requests at the same time. Please tell me what other way to solve this than with threads.

    The obvious answer is to use another process, and to use some fancy IPC mechanism to communicate. However, having the process context takes up an enourmous amount of memory; having more threads generally only means another stack. Further, spawning threads is an order of magnitude faster than spawning processes.

    Another example is a GUI. Imagine you have an image processing application. This application has a particular filter which takes ten minutes to execute. Using an IPC mechanism, the signals or events or whatever from the windowing system will get buffered for ten minutes. People get impatient if a system doesn't respond to mouse clicks in about a third of a second. If you used threads, then the GUI code would never, ever have to wait.

    It comes down to performance. Some applications don't exist in an environment where performance is critical. A surprising number of them, however do.

    By the way, your exemple of recursion has it backwads. Using processes and IPC is like using recursion. Threading is the iterative way.

  • It's not an article on Slashdot. It's just a link to somebody else's article.
  • Some interesting papers on the design of the Cray (nee Tera) MTA (multi-threaded architecture) machine are here [tera.com]
  • by Doctor Memory ( 6336 ) on Friday June 15, 2001 @09:41AM (#148410)
    Is this, like, during midterms, when you're trying to study for all your classes at once?
  • I wouldn't call it flamebait, but rather very topical and insightful humor given the situation. I got a chuckle out of it myself.

  • Totally disagree in that multithreading often simplifies the flow of execution in software, making maintaining and debugging the code much much easier. If given the choice between a state machine type code architecture, and one that spawns a thread that performs some synchronous task in a linear fashion, I will most certainly choose the latter. At the very least with NT/2000 utilizes fibres (which are a lightweight user controlled version of threading) which largely eliminates synchronization issues (because the user is in control), but is allows for threads of execution.

    There are several messages regarding this article claiming that multithreading is wrong philisophically or architecturally and I think that is a gross simplification. There are a lot of people who use threads when they are not appropriate, but conversely there are certainly a lot of people who don't use threads when they would be appropriate.

  • The previous poster's example of using synchronized threads to implement a finite automaton with implicit state management was particularly apalling.

    Perhaps we have a bit of a terminology, oh master, however obviously you don't have the slightest clue what I was talking about. Thank you for coming out though.

    P.S. Your great example of when to use threading is called overlapped I/O: Just about every modern OS has it. Pretty piss poor example of when to use threading.

  • Ughhh..make that terminology difference. I gotta start previewing.

    In any case my point (that Mac Daddyo Waterloo man apparently dissed) was that I have seen numerous STATE MACHINE designs where people hold the state of various operations and sit in a loop checking on the various operation status flags, calling off to functions to transition states when relevant. Debugging such systems is often extremely fraught with errors and convoluted, and the state machine system is just like a cooperative Windows task scheduler: Such designs often lend themselves to multithreading (wanko I'm sure is going to pipe in "WHAT ABOUT SYNCHRONIZATION!" : No shit idiot. I think that's a given when discussing multithreading). Blegh.

  • Maybe you should learn a little more about concurrent programming instead of preaching about it here if you don't know what coroutines are.

    Wow! For someone who preaches that one should use threads for simple overlapped I/O, it's quite astounding seeing you giving lessons and actually referring to my comment as "appaling" when it is still painfully obvious that you don't have the slightest clue what I was talking about. P.S. A "coroutine", known in NT/2000 land as a fiber, is a thread that doesn't have OS scheduling, and it is grossly inappropriate for the scenario I laid out. I'm sure you know that though from your extensive analysis and clear mastery of concurrent programming (especially by your demonstrated mastery of the term "coroutine", which is a term which only has relevance for a small number of languages on a small number of platforms. Genius! Good old ivory towers always keep us laughing). I presume it is, however, what you happened to learn about in this weeks "How to program 101" so you felt the need to run to Slashdot to demonstrate your great wisdom.

    When they discuss mutexes and standard synchronization objects I presume you'll be back to extoll your wisdom wherever possible.

  • While it is extremely well written and very informative and interesting, I have a feeling that a lot of developers will read it through expecting information that it is not providing. This article is, at least from my interpretation, largely analyzing the designs of various processors/hardware platforms. It is not (again IMHO) discussing software development multithreading techniques, so if you're looking to it for information on how to pThread your application, or the pitfalls of multithreading, or whether a state machine is more efficient than synchronous threads, you won't find what you're looking for. Do read the intropage about RC5 & S@H though as that is fascinating, though it applies primarily to the constraints of the hardware system.

    Still very interesting though.

  • Based on the description of the article, I looked up some things. What can I say? Somebody modded me down, so I'm at 49, and I'm incomplete without that karma point.

    Amdahl's law [ameslab.gov]

    Amdahl's law [wlu.edu]

    On chip multiprocessing [sun.com]

    Simultaneous multithreading [washington.edu]

  • Don't bother reading that article unless you are interested in using Lyris' close-sourced SPAM-generating software. The sum total of their "benchmark" consists of SPAMming 200,000 copies of the same message to a list of email addresses. They didn't know the first thing about configuring their Unix systems (bumping up the maximum number of file descriptors--the only tuning they did--does NOT increase the maximum number of simultaneous connections, and it was clear that FreeBSD especially suffered from resource starvation as a result of misconfiguration). They didn't actually compare the various paradigms (single/multiple thread, single/multiple process, async/sync I/O). No, they just tell you which O/S, naively configured, pumps out SPAM the fastest.

    This is not the place to go to see how threaded/non-threaded solutions compare. It's a thinly-disguised commercial for a "bulk email" product.

    -Ed
  • 'Tis true, bulk emailers don't necessarily need to be used for SPAM, so perhaps someone felt that was "flamebait." Ordinarily, I'd have cut Lyris some slack, but it seemed pretty dishonest that their CEO and Technical Manager would write an article comparing the performance of their product on several OSes and claim it was a useful measure of OS performance. That's a clear conflict of interest, and pushes the benchmark fallacy one step further; benchmarks at least can claim to be somewhat application-neutral (which is why they should be taken with a large grain of salt in predicting the performance of any particular application). This "benchmark" measures just one thing, and that's the performance of a particular product on four (nearly) untuned OSes.

    That's all. The stuff on multithreading/multitasking/async was pretty much fluff, generated via undocumented microbenchmarks plus variants of their application and reported using a few vague percentages.

    -Ed
  • It's like a multi-cpu machine, except instead of only sharing the bus, and possibly the L2 cache, it shares the L1 cache, the TLB, and a few other things. Also, the second CPU is on the same fleck of silicon as the first.


  • I can't agree with this more!

    A better analogy is multithreading and recursion. Just because you CAN use recursion to solve a problem, doesn't mean you should. An iterative solution is (almost?) always faster and simpler, and often more elegant.

    Just because you can make your application multi-threaded, doesn't mean you necessarily should. Ask yourself what problem are you solving with threads that you can't do otherwise (besides quenching your thirst for pain and suffering)?

    I've talked to programmers who think that making an application multi-threaded will, in and of itself, make the application "better." When pressed for reasons they usually end up shaking their heads muttering something about multiple CPUs or modularized code.
  • Da! Yes! I agree!!

    Seriously, when I first saw the article, I thought exactly the same thing. I don't remember the last time /. has had a technical article of this level, and I like it. THIS is the reason that I started reading /. in the first place.

    Of course, since the article was posted on /., I had a hard time reading it, but if it wasn't on /. I wouldn't have know about it at all, so...

  • This article is very informative. As a developer who uses threads every day, I learnt a lot with this article. This is the kind of article that helps you create software that waste very little resources, unlike most software written today.

    Looks like the site has been slashdotted! Hopefully someone has mirrored it otherwise lots of people will not read it!!
  • First off, you posted at 12:16 PM, the article was posted at 11:57 am. That gives you a maximum of 19 minutes of reading time.

    I read quite a few of the sections that interested me. A couple of sections I had skimmed over, leaving till later reading and I found many of the points noted in the article very informative.

    Second, how do you know its slashdotted? If you read the article, then you read it already. Why are you going back?

    I printed the article as soon as I saw it, this kind of article is something I jump at. however when recommendiang a friend to read it, he told me he couldn't reach it. I tried to reach again, the site was only responding with a MySql error.

    I think you are a troll. Other opinions? Anyone?

    You are entitled to your own opinion, but from what I have seen most trolls go and post "first post", "goatse" and those kind of non-intelligent posts. I am sorry if I have appeared as such, I meant to appear to be advocating a good read of this article.

  • There are demonstrably better abstractions for almost all problems that threads can solve. Co-routines

    IMHO code that uses co-routines rather than threads for virtual concurrency, looks a lot uglier and therefore harder to read and understand. It may reduce the number of places in the code where the control can switch from one place to another, and potentially easier to debug, but if you write good code then it shouldn't matter where and when the control switches occur. Updates to co-routine based code in the future is a lot harder and can be prone to more error than update to multithreaded code.

  • I feel that being able to multithread code effectively in Java would make a programmer advanced in that topic.

    Here's the best book on the subject:

    Doug Lea, Concurrent Programming in Java. Second Edition: Design Principles and Patterns


    book home page [aw.com]

    author home page [oswego.edu] (pointer to online supplement for the book [oswego.edu])

    at Fatbrain [fatbrain.com]

    at Amazon [amazon.com]
  • It's worth noting that Sun's MAJC architecture already has an implementation that is commercially available (MAJC 5200) and employs several of the multithreading strategies outlined in the article (including CMP and CMT).

    Sun has an excellent in-depth explanation of those multithreading technologies in it's whitepaper [sun.com].
  • ...it's "beeotchî". Masculine (who'd have thought) 2nd declension.
  • Fear: When you see B8 00 4C CD 21 and know what it means

    Silly rabbit. That's a mov. Now the tricky part is if you in 16bit mode or 32bit mode. You're loading 0x21cd4c in to a register. I don't know what's so special about that number though.

    I can pretty much tell what kernel you're running by looking at the first 200 bytes or so. (They change the boot every major revision)

    Here's one for you: fc fa

  • the difference between programming and multi threaded programming is that, of all people programming only about 20% have the skill required to do so, while of all people programming multi threaded, only 0.5% have the skill required to do so.

    the rest sort of gets away with it because threading bugs are hard to find/debug. having looked at a lot of multi threaded programs in my time, i have yet to find one that i could prove to be bug free. and i find that i usually seem to understand the concept of parallel execution better than others.
  • It's not a "Patch to Photoshop" which allows multiprocessing on the Mac OS, but rather usage of the Multiprocessor Services API. Any application can spawn a thread to be scheduled preemptively among however many processors are on the system. While it's true that the main system event loop (think message pump) is restricted to the main processor, these MP threads can execute network and disk IO. It's not as nice as SMP, but it's not the hack the ill-informed author of the article seems to think it is.
  • If coarse it is ...
    ---
  • The first rule of multithreaded programming is the same as the first rule of optimization: Don't do it

    I don't know if this a troll, So I'm responding. Obviously you have never talked to anyone who has written an application designed to run on a machine with more than 1 processor. Sure you can make a bunch of simplifications if your application is single threaded. You don't have to worry about the whole concept of resource contention. On the other hand if you want your application to scale at all then you have to have some form of multithreading or multiprocessing. Running your application as multiple processes you basically have 2 design decisions, do I write my multiprocess application like it was threaded and share big chunks of memory between the processes or do I make a bunch of producers and consumers to solve the problem. In the first case you might just as well use threads because they usually will be faster and the code will be simpler. In the second case you end up generating massive amounts of IPC and complicated queuing systems with fairness algorithms to guarantee that longer jobs can be interrupted by shorter jobs and higher priority jobs finish sooner. The end result of the second choice is far more complicated and usually a LOT slower because of all of the overhead due to the IPC mechanisms. Compare that to the easy to grasp idea that each 'job' runs and locks the data it needs to access before it touches it and unlocks it when its done.

    As a beginning programmer you shouldn't try any complicated locking schemes designed to make your code scale you should just make a couple big global locks and pray you don't make any critical mistakes in the design that preclude good MP scalablilty (I call that the Linux solution, because its the way the SMP problem was solved, one big global kernel lock, later versions make it more fine grained) or ask someone more experienced to design and write the parts that require complicated locking schemes.

    Optimization also has a purpose, do you think quake would have run as well as it did on an old Pentium without a couple routines being hand optimized?

    Of course this whole thread has nothing to do with CPU threading!

  • The author of this article failed to mention the only processor in common use that is course grained. That processor would be the POWER/PPC compatible RS64 IV [ibm.com] which is used in the iSeries and pSeries(AS400 and RS6000) from IBM. There is a nice write up on its threading capabilities and the performance results here. [ibm.com]
  • Any chance of some references here for those of us who have never heard of co-routines?

    (Admittedly I have so far done no more than scan the main article but I didn't see a mention)...

    Cheers
  • One thing I've learned is if you can avoid threading then do. There are so many hidden ways you can cause race conditions, deadlocks and all sorts of other unforeseen dependancies in your code that it just isn't worth it.

    Most libraries (including the standard C lib) aren't properly reentrant, and if they are it's because they put a big mutex around functions. Remember this when you call the functions - you may block!!

    About the only good reasons I can think of for multithreading are:

    (i) You are CPU bound and need to take advantage of SMP machines.
    (ii) The OS or library your are using doesn't support async I/O so you need to block in a separate thread to handle multiple clients (cough..java..cough..).
    (iii) The OS can only select() or WaitForMultipleObjects() on so many things at once and you need another thread to get around that limitation.

    In any other case, avoid the temptation to multi-thread your code. It just isn't worth the pain of debugging those cases where one thread stomps all over the memory space of another thread, your application deadlocks in 3rd party libraries or you end up thrashing the context between 2 different thread so fast that more time is spent in the OS context switch code than in your application.

    Multithreaded code is the last choice you should make when desigining - its an optimisation and not a core design philosophy in most cases.
  • Actually, my experience is mainly coming from Windows.

    There's all sorts of hidden gotchas waiting for you with multithreaded code that you'll only find when you put it on a dual processor machine. Take for example the entire STL library will screw up in multithreaded code. std::string is reference counted, but not in a thread-safe manner so you get nasty bugs turning up about one in a million times; std::map has a great big static mutex (no, not a critical section, a mutex) around the entire class so actions on one map (especially a foreach algorithm) will block out all other maps in your process; deque has threadsafe bugs as well so you have to be very careful there.

    If you write yourself a nice thread class that is independant of your worker classes, you won't have coupling between different operations on the same thread (assuming they aren't long lived) so the tying there is virtually non-existant.

    The "thread safety" of the MS runtime library is a real performance killer. Every memory operation has a critical section around it that holds out every other thread while you are doing a new or delete.

    Basically I'm saying that contrary to popular belief there is a very good chance that a multithreaded app will be slower, less stable and much harder to maintain than a similar single threaded app.
  • The CPU has a separate set of registers, flags, segment tables (on x86) and so on for each thread that it is running simultaneously. When the instructions enter the pipelines they are tagged with bits to say which thread they are actually operating on and the execution units of the CPU use that set of registers and flags for the data required to run the instruction.

    If you think of a four-way SMT CPU as having four sets of everything then you are starting to get the idea. For example on an x86 you would have four different CS:IP pairs, each one providing instructions for a different thread. Once those instructions are loaded and decoded the instructions in the pipelines are tagged with which CS:IP pair provided them. On execution if a register is referenced, then the register is loaded from the register set that corresponds to that tag (mov EAX,0 would affect the EAX #1 if it came from CS:IP #1, affect EAX #2 if it came from CS:IP #2 etc).

    As there are implicitly no register dependancies between these instructions, any stalled instruction which is waiting for results from another execution unit does not hold up execution of instructions from other threads.

    This model of execution speeds up performance considerably without the requirement of having multiple CPU cores on the one piece of silicon as you are getting far more efficiency from the one set of execution units.

    For more information, Paul DeMone had an artice over at realworldtech a while back on the EV8 and how it worked. Take a look - it was quite interesting.
  • If you are memory bound, increasing the number of threads a CPU can run will help a little, but not significantly as sooner or later all instructions are going to be stalled waiting for memory. If you have a rich register set then you have the ability to run more instructions before you have to stall on memory loads.

    Unless you have a well designed memory interface that can support multiple outstanding transactions, and sufficient registers to allow other threads to continue without accessing memory while one is blocked on memory access you aren't going to be gaining the full performance from SMT.

    Of course, I've been know to be wrong.
  • Interesting paper. It's certainly a novel way of using SMT to combat the memory/internal clock disparity. Possibly even more effective on x86 where you really want a lot of stuff in L1 because you hit memory all the time!
  • Dinkumware, of course (well, actually the one that ships with MSVC). The painful thing is even though we have the source it is so damn hard to read that any changes we make couldn't accurately be ported through upgrades.

    We considered going to SGI, but ended up just working around the bugs when we saw the SGI implementation of std::string wasn't ref counted at all (performance loss bigtime). Hopefully some of them will be figured out by the time VC.NET ships.

    Is there any other way to program an MVS/390 if you are pathologically allergic to COBOL? ;-)
  • by throx ( 42621 ) on Friday June 15, 2001 @10:29AM (#148443) Homepage
    If there are more threads than register sets, you have to do normal context switching. I think 4 threads is about the limit at the moment.

    To my knowledge there are no CPUs available at the moment that do SMT. I'm not even sure if there are operating systems that support it (you need OS support to load the thread specific context of each CPU register set). The Alpha EV8 will probably be the first mainstream CPU to support it, though there were plenty of rumors that an upcoming revision of the P4 Xeon will support it as well.

    It should be noted that SMT does nothing for you if the CPU is tied down in memory stalls, thus the x86 architecture is probably going to gain the least from this as it is very register starved and hence dumps things to memory all the time. Running more threads just increases the required memory bandwidth and so you need a very fast memory system (which the EV8 has) to keep up with everything.
  • by throx ( 42621 ) on Friday June 15, 2001 @09:30AM (#148444) Homepage
    SMT is where you have one processor core executing several threads at the same time without having to context switch. The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes. This improves throughput as you don't have the overhead of task switching and you also have a far better chance of keeping your pipes full.

    Naturally, it requires OS support for it to work, but most CPU manufacturers are looking to go this way in the near future.
  • I just knew I'd spell something wrong in that post. :-p

    misspelled
    misspelled
    misspelled
    misspelled
    misspelled
    misspelled
    misspelled
    misspelled
    misspelled...
  • I've done Be programming, and because of the design, it really isn't hard to keep the interactions straight. Windows probably does have a problem with threads, mainly because of the C style of the API. However, in Be, multi-threading is explicitly done in window threads. Since, the window interface is contained within a set of C++ classes, all one has to remember to do is be aware of thread interactions in any code that you put into the window subclass. Also, most threads are wrapped in Loopers (essentially message loops) so it again comes down to making sure that any code within a particular object is thread-safe. BeOS does have some problems, but instability in the apps is not one of them. (Not more than in any other OS, anyway) As for the OS itself, I can't say if Be has better programmers (though I'd like to think so ;) but I think the emphasis on keeping the design extremely simple and clean has a lot to do with the stability of the threading model. BeOS is not a feature-laden OS. It has some well-chosen luxuries that really enhance the user experience, but it doesn't try to make everyone happy.
  • by be-fan ( 61476 ) on Friday June 15, 2001 @11:03AM (#148447)
    Any programming construct can be harmful. Pointers, explicit memory allocation, anything! However, if interfaces are clearly defined, and and the code is kept simple (ie. lack of feature creep, something that the "new" (GNOME, KDE, etc) UNIX guys don't seem to understand) threading is just as harmless as pointers. I'm not going to get theoretical here, but I'll give you an actual example: BeOS. Say what you will about it being dead or the company being stupid or whatever, it has a kick-ass threading implementation. The app_server regularly runs with 60+ threads and the damn thing only crashes on me when I'm playing with a kernel driver. The apps, too, are stable, even though they are forced into using multi-threading due to the GUI architecture. If you want to see why this is the case, take a look at the BeBook (the API). Every time there is a possible thread interaction, they warn you about it. Just as you have to keep memory ownership clear, you have to do the same thing for threads. Theoretical rules aside, an entire platform begs to differ with you.
  • SMT is where you have one processor core executing several threads at the same time without having to context switch. The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes. This improves throughput as you don't have the overhead of task switching and you also have a far better chance of keeping your pipes full.

    Interesting. I don't understand how the processor can switch from one thread to another without doing a context switch. Would you mind expanding on that a little bit?
  • The processor doesn't switch at all; it actually runs multiple threads at the same time (hence "simultaneous") because it has a separate set of registers for each thread.

    Thanks for this explanation and to the others who responded. I guess I did not properly understand the original explanation by throx. I take it that if there are suddenly more threads than there are separate register sets, the processor is then will be forced to do context switching. What off-the-shelf processors support SMT, do you know?
  • Its such a pity transputers didnt do better, good C compilers with CSP support would have helped... communicating serial processes are just so much more elegant than threads communicating through shared memory without any other protection against the various possible contentions than the one the programmers try (and inevitably fail) to ensure themselves :(

    Are you saying that threads should communicate via message passing? Or are you saying each thread should have its own processor and memory? Please clarify.
  • An API which only allowed message passing to me would seem sufficient for most people's uses. That doesnt mean you cant pass references, for efficiency, you'd just make it so passing it implicitly put whats referenced out of scope. Shared memory where multiple processes read the memory and one or more processes perform unsynchronized writes are the exception, why make it the common case in the language and put all the burden for safety on the developer?

    In my opinion, there should be no threads at all or rather just one thread, the OS thread. Software should be super-parallel. It should be a collections of primitive objects or cells that just sit there waiting for a signal to do something and send another signal to allert other cells that it's done. The operating system and the CPU should support this paradigm at the fundamental level. I envision that the OS would maintain two lists of cells: input cells and output cells. It would process one list while inserting cells into the other. This is kind of like the way some neural networks are implemented. Programming would then consist of dragging cells into a work space and connecting them together to form higher level objects. Just one man's opinion.
  • by Louis Savain ( 65843 ) on Friday June 15, 2001 @09:22AM (#148452) Homepage
    I can't get on the site. In the meantime, can someone tell mean what they mean by "simultaneous multithreading?" It sounds somewhat redundant.
  • by EXTomar ( 78739 ) on Friday June 15, 2001 @09:29AM (#148453)
    The server appears to be slashdotted. Time to make up conversation while it recovers. ^_^

    Reguardless of whether your system/OS has multiple processors or can handle multiple process execution, writing your code to be multithreaded might be a good design choice. If nothing else it forces and enforces abstraction and "compartementalize" of the design and code.

    The one huge draw back to writting code that is multithreaded is the syntax baggage you must carry around and use to keep the system sane. Even in thread friendly languages Java where the language semantics try to help users write clean multithreaded code its still a non-trival thing to support. Bug can be very obscure and extremely non trival to solve in multithreaded code not to mention tools can become cumbersome(which stack am I asking for the value of "counter" on?).

    Its too bad that writing threaded code is still considered to be an "advanced coding skill".
  • And how to clean up their threads... I ran into a server program (that I had to write an EAI connector for) that in one case bloated up to 2000+ threads because it just kept launching them without cleaning any up.

    It made sense to have multiple threads of execution inside it, as it had to handle many client processes simultaneously, but apparently when the thread execution stopped, they just went off into oblivion. That's something that will bring a system to its knees...
  • You're right... perhaps I should have spoken more clearly.

    I *meant* to say that the threads failed to clean *themselves* up. They had no timers or other means of detecting that they had stalled, therefore couldn't releasing any held resources, or cause themselves to exit, etc, etc.
  • But your last comment applies to anything--at least if you follow KISS-keep it simple, stupid. Don't use any tool unless you need it (whether immediately or for the future--extensibility).

    Andrew.
  • You don't clean up threads... that is an odd thing to suggest. When a thread is done what it is supposed to process it should terminate itself. It is very tricky to properly terminate one thread from another, if it is at all possible within your programming env. At the very least threads need to be aware of shutdown situations.. and then gracefully exit if that is possible.
  • The advantage of using the threads, though, is that you don't arbitrarily couple the code for all the
    thinkgs that you are select()ing or WaitingForMultipleObjects() on.

    Each thread can be a reusable class that doesn't depend on what other things you are waiting for.

    I know on Windows, with Visual C++, debugging multiple threads in the IDE is a piece of cake.
    Maybe it is harder with the command line based tools in Unix/Linux.
    Also, Microsoft provides thread safe versions of the runtime library.

    But if you are writing portable code, then you don't know if you can depend on the runtime library or not.
    You could always synchronize the calls in your application code.
    Then you are not dependent on the thread safety of the runtime library.
  • If you want to do some multithreaded programming, a nice way to do it in C++ is to use ZooLib [sourceforge.net]. Here the threads and locks are easy-to-use C++ objects.

    ZooLib supports Linux, BeOS for x86, Mac OS PowerPC and 68k, and Windows out of the box. It can be bound to other platforms in a straightforward way.

    I believe the Mozilla framework [mozilla.org] is multithreaded as well.


    Mike [goingware.com]

  • In MacOS versions after 7.5.5 but before X, the OS DOES have MP support (much improved in Mac OS 9, btw) but it's asymmetric MP rather than symmetric. Furthermore, the OS itself does not take any advantage of it, and ALL processes are spawned on CPU 0. However, applications can execute threads on either CPU, which means that applications (quicktime, photoshop, itunes, Quake 3...) can take advantage of multiple CPUs.

    This is completely different from the SMP model in Mac OS X, Linux, Solaris, Windoze NT/2000, etc.
  • Well, you just answered why it's an advanced programming topic. Figure out a way to let the languge/OS take care of the nightmare situations you encounter if you would like to make it simpler. I don't see any other way out of it except for using better trained programmers (which won't happen anytime in my lifetime).
  • The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes.

    No, the whole point is to put them through the same execution units. If you duplicated all of the execution units, there is no advantage since you have just duplicated the whole chip. The only thing you need to duplicate (or partition) for SMT are things such as register allocation tables, store forwarding buffers, TLB's, etc. which are unique to each thread, and then tag each operation as it goes through so it knows where to get its data. Almost everything in the processor is shared. It thus requires very little additional die space, but for some applications it offers substantial performance improvement.
  • If you'd bothered to read the linked article you'd have your answers...
  • When reading Slashdot, you *are* multithreading. Here's a quick cross-section of my processes:
    • Skim article to gather enough content for witty reply.
    • Ponder whether to defend Microsoft or Linux to generate karma.
    • Wonder what Jon Katz would say about this.
    • Terminate previous process.
    • Wonder who got first post.
    • Wonder if anyone in the cubicle across from me sees me posting on Slashdot.
  • Java unfortunately makes threading so easy, it's one of it's shortcommings
    (sic)
    And if it made it difficult people would complain too. The fact that you can start using threads and not worry too much about language details and concentrate on the concepts should be seen as a strength.
    I feel that being able to multithread code effectively in Java would make a programmer advanced in that topic.
    True, but then I'd just say that being able to program effectively would make an advanced programmer :-)
    ----
  • Sorry, I was being light-hearted and not trying to be arrogant, which you might be reading into my comment.
    I wasn't being specific to/about multithreading, I was (humorously hence the smiley) commenting on what makes an advanced programmer - an ability to design and write programs rather than just churn out code. i.e. be effective. Very few can be good at all aspects of programming and I certainly am not expert at threading, but I would expect to get better as I need to. In other areas/languages I can definitely call myself advanced and can get into the arguments to prove it. ;-)
    I was being general.
    ----
  • It seems to me that pretty much any time Hemos posts a main page story about science or technology (or pretty much anything) it is well worth checking out. No, I'm not Hemos :-)

  • I though the main advantage of SMT was the processor can switch and work on another thread while the first is stalled for data - and thus make use of cycles that would otherwise be wasted. In this light, wouldn't the x86 have the most to gain from SMT?

    -
  • http://cide1.dhs.org/thread.html
  • From the article (in reference to SMP OS support):

    There are exceptions to this rule, such as the Mac, where Photoshop has a patch which has multithreaded support for G4's even though no Mac OS below Mac OS X supports multiprocessing. Generally speaking, the computer needs OS support as well as application support to take advantage of multiprocessing.

    Maybe I am not reading this correctly, but is it saying that a patch to an APPLICATION gave the OS SMP support? I don't see how this could work ... SMP support needs to be from the OS. Not only would the OS need to be designed from the ground up to support multiple CPUs (ie, the scheduler, interrrupts, etc) but it would need to be thread-safe itself (proper locking, etc.).

    So, either tell me "yah, of course, the article is wrong" or can anyone explain how a patch to PhotoShop could give it SMP support on a non-SMP-aware OS?

    Thanks,

    Robert
  • The article is just wrong, that's all. Mac OS 9 has MP support, but it's weird and only a few apps (like Photoshop) use it.
  • The processor doesn't switch at all; it actually runs multiple threads at the same time (hence "simultaneous") because it has a separate set of registers for each thread.

    More info: http://www.cs.washington.edu/research/smt/
  • The first rule of multithreaded programming is the same as the first rule of optimization: Don't do it.

    Threads are popular (hype aside) because they are a simple abstraction. Simple is usually good--but it's not when it introduces as many pitfalls as threads do. If you don't believe me, I'm probably not going to convince you--but reading a balanced treatment (eg, a systems textbook, not pro-Pthreads, -Win32, -Java hype) might. Most programmers shouldn't write threaded production code, period. Almost every experienced programmer I've talked to agrees.

    There are demonstrably better abstractions for almost all problems that threads can solve. Co-routines, continuations, event models, message queues, sockets, shared memory. "Demonstrably" means they get the job done, but clearly introduce fewer possibilities for error and are easier to debug. They have higher conceptual overhead than threads, but they usually pay off. If you think you absolutely need threads for performance--prove it, with hard numbers.

    If you use threads, be sure to understand exactly why you're using them and spec your model precisely. Review threaded code and perform load tests early and often.

  • Imagine you are writing a server, like say, Apache. So, this server has to handle, oh, about a thousand requests at the same time. Please tell me what other way to solve this than with threads.

    This is a gimme! It's been shown time and again that state-based web servers blow threaded servers out of the water. Find any treatment of web server performance for a demonstration.

    Another example is a GUI. Imagine you have an image processing application. This application has a particular filter which takes ten minutes to execute.

    If your filter is long-running, there is no problem having it communicate via a pipe, with the main program using non-blocking read (so the GUI never stalls). I think that GIMP plug-ins do or can work in this way.

  • I admit I don't have a response for this (because I don't know much about Be). The only explanation that comes to mind would be that most thread interaction is done in the framework code (ie, the server model or the GUI model), and that was carefully designed and written by the best programmers. If the apps themselves have to think about threads, I find it incredibly hard to believe that they get it right, no matter how clean the design is and how clear the documentation. I distincly recall apps getting flaky when Microsoft started pushing threads. Maybe Be just has better programmers :-)

    PS. If be-fan reads this: do you happen to know a Carlin Wiegner, of the Be world some years ago?

  • by Slashdolt ( 166321 ) on Friday June 15, 2001 @09:15AM (#148479)
    Perhaps if their server was multi-threaded, I'd be able to access the page...

    :-/
  • This is a great article to use for education. It would have saved me some headache when I first tried learning about threads.
  • by CraigoFL ( 201165 ) <slashdot&kanook,net> on Friday June 15, 2001 @09:11AM (#148484)
    ...but I'm reading Slashdot right now, and I can't do two things at once.
  • Its such a pity transputers didnt do better, good C compilers with CSP support would have helped... communicating serial processes are just so much more elegant than threads communicating through shared memory without any other protection against the various possible contentions than the one the programmers try (and inevitably fail) to ensure themselves :(
  • The second would be nice, but quite impossible... even if you just give them their own memory space context switches will kill you, same problem as using a strict microkernel (in languages like Java it wouldnt be an issue with CSP instead of threads, once you start juggling pointers in C it becomes one though). Id really like to see future processors to allow much more finegrained access control, a process ID on the page identifier would be interesting... for SMT systems this is especially important, since without it they can only run threads from the same process concurrently. But thats all besides the point.

    An API which only allowed message passing to me would seem sufficient for most people's uses. That doesnt mean you cant pass references, for efficiency, you'd just make it so passing it implicitly put whats referenced out of scope. Shared memory where multiple processes read the memory and one or more processes perform unsynchronized writes are the exception, why make it the common case in the language and put all the burden for safety on the developer?
  • Indeed. What I like especially is a mixture between userland and kernel threads. The userland threads give you the advantage of the threading design model and you only use kernel level threads when absolutely necessary.

    Take a look at txObject [txobject.org], a C++ library which has both it's own threading engine (userland) and an abstraction of kernel threads. Unfortunately txObject threads currently don't work too well together with Linux's pthreads (due to some quirks in pthread) but this is being worked on.

    This is a really nice library, sort of Java-like. With abstractions of threads, sockets, locks, timers, events etc. etc.. And a complete object model. It builds on Win32 and various *nixes and is GPL'ed.
  • BUT... when properly done, you can gain a LOT of performance you could't get any other way.

    As far as I'm concerned, there are three best places to multithread. First is in user interface. I can't tell you how many times I've cursed Internet Explorer because Microsoft programmers didn't bother to launch a separate thread when IE connects to an FTP server. If it's down, the program is unresponsive until the attempted FTP control connection times out. It can't even paint itself.

    Second is when you write a server. You either multithread or multiprocess. Otherwise, when it gets busy, connections start to time out.

    Of course, those have nothing to do with performance. The performance part comes in when you have two process you need to complete, and both are independent of each other and tie up different resources. For example, if you have a big disk read or write and a bunch of calculations to do, if they're independent, thread one of them.

    Other than in those situations, I'm hard-pressed to think of a time when threading is actually worth the context switch and the synchronization headaches. Can some ultra-smart person here tell me if I've left something out?
  • by plcurechax ( 247883 ) on Friday June 15, 2001 @09:13AM (#148493) Homepage
    It's not an article on Slashdot. It's just a link to somebody else's article.

    To be pendantic there is an article on /., pointing to an article on System Logic about multithreading.

    I want to see more articles on /. about technical topics, not editorials, which I feel are often poorly constructed. Whether the /. article is a "pointer" or not is secondary to me.

  • by plcurechax ( 247883 ) on Friday June 15, 2001 @09:00AM (#148494) Homepage
    These are the sort of articles that I like seeing on /. Not the wind-up half-thought out "essays" that didn't get published elsewhere.

    more technical content, please.

  • by DivineOb ( 256115 ) on Friday June 15, 2001 @11:44AM (#148498) Homepage Journal
    About the memory stalls. What you said is actually false. I wrote a paper that is being presented in 2 weeks on way way to use SMT to attack memory stalls. You can view it here http://www-cse.ucsd.edu/users/tullsen/isca2001.pdf
  • There are demonstrably better abstractions for almost all problems that threads can solve. Co-routines, continuations, event models, message queues, sockets, shared memory. "Demonstrably" means they get the job done, but clearly introduce fewer possibilities for error and are easier to debug

    There are better models but the ones you have just listed are the ones that Comp Sci moved away from twenty odd years ago finding them complex and likely to introduce bugs.

    The principal abstraction in threads is Hoare's monitors (there is some dispute over who invented monitors, Hoare credits Dijkstra, Dijkstra Brinch-Hansen and Brinch-Hansen Hoare). A couple of years after the monitors paper Hoare developed the Communicating Sequential Processes model (CSP) which was later used as the basis for the occam programming language - which remains pretty much the only mainstream programming language with decent support for parallelism.

    Unfortunately the threads programming model does not provide decent abstractions for communicating between threads, except for shared memory, which is severely limiting in the multiprocessor context, there are very good reasons why shared memory bus SMP machines have consistenly had a maximum of 16 processors with 4 being the more common limit.

    Incidentally, the Amdahls 'law' that kicks off the piece was originaly marketting propaganda to persuade folk that faster processors (like Amdahl made) could not be replaced vector processing add on boxes. The argument is valid, but the framing of the argument is deceptive. Very few problems have the type of fine grained parallelism that can be exploited by vector boxes, however most engineering problems have parallelism at coarser granularities.

    People can get into trouble with threads, but they can get themselves into much more trouble with half baked multi-process tweaks. Five years ago the quality of threads implementations generaly was so poor that multiprocess hacks were the only way to go. Today that does not apply.

  • As an experienced programmer, I have to disagree.

    Threads have their problems (especially when mixed with C++ and cross-platform code).. BUT... when properly done, you can gain a LOT of performance you could't get any other way. (ie - no IPC overhead nessecary.)

    Ofcourse, good threading support in relatively new, and a lot of oldskool programmers won't 'get it'. (Probably the same ones that bash OO every chance they get :)

    Threading is not the golden solution to every problem. It's just another tool in the toolbox, and for some problems they are a better tool than say, multiprocesses communicating over sockets or shared memory.
  • Last week at the JavaOne conference Allen Holub of Holub Associates gave an excellent presentation of taming java threads. Among the reasons Java's threads are though is because the spec on how they are handled is not well written. The talk was for two hours (most sessions were only one), and because of other items on my agenda couldnt stick around for the second hour, but his slides are at his website www.holub.com [holub.com]. I did find his first hour very intriuging in how Java implements some concepts from multi-threading.

    Its too bad that writing threaded code is still considered to be an "advanced coding skill".

    Java unfortunately makes threading so easy, it's one of it's shortcommings. Anyone can write code to take advantage of multiple threads, but without knowing exactly what's happening under the hood, unnecesssary things happen to degrade the performance on the Java VM. I know my first few multi-threaded programs in Java didnt turn out as I had hoped - performance was significantly worse than a comparable single-threaded version.

    I feel that being able to multithread code effectively in Java would make a programmer advanced in that topic.

  • Hey guys, just to let you know, I'm the owner of SystemLogic.net. Yeah, we got /.ed, I'm trying my best to get things up and running once again, but it's hard when you can't login to FTP, web, or telnet! I'm talking to my hosting company now to see if there's anything we can do. In the meantime, if anybody has access to a good server with PHP installed I could technically get a mirror up...dave@systemlogic.net

Our OS who art in CPU, UNIX be thy name. Thy programs run, thy syscalls done, In kernel as it is in user!

Working...