Optimizing distcc 201
IceFox writes "Having fallen in love with distcc and its ability to speed up compiling (insert anyone who compiles like Gentoo users or Linux developers). I recently got the chance to dive deeper into distcc. By itself distcc will decrease your build times, but did you know that if you tweak a few things you can get a whole lot better compile times? Through a lot of trial and error, tips from others, profiling, testing and just playing around with distcc, I have put together a nice big article. It shows how developers can get a bigger bang for their buck out of their old computers and distcc with just a few changes."
strlen (Score:5, Funny)
Re:strlen (Score:2)
Wow... (Score:5, Funny)
This is so weird.
I must drink now.
"I do NOT suffer from a mental condition. I'm enjoying every second of it."
Re:Wow... (Score:2, Informative)
Website bit slow... (Score:5, Funny)
Re:Website bit slow... (Score:3, Insightful)
Re:Website bit slow... (Score:2)
Nice big article (Score:5, Funny)
/.-ed already? (Score:5, Funny)
Re:/.-ed already? (Score:3, Funny)
Re:/.-ed already? (Score:2, Informative)
Re:/.-ed already? (Score:2)
anal retentive admin (Score:3, Funny)
From the article:
I even found different colored cable for the different areas of my cube.
I wonder if he also sealed the empty packaging, waste paper, and dead hardware in neat little foil packets before disposing of them in the proper receptacle, which, of course, sits right next to the cozy for his server. ;)
Re:anal retentive admin (Score:2)
Big messes of cables and wires are a real pain in the ass.
Reliefe for the /. site (Score:2, Informative)
and how to compile kdelibs from scratch in six minutes
If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.
This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of th
Re:Reliefe for the /. site (Score:2)
I guess he don't mind a lot of noise...
Re:Reliefe for the /. site (Score:3, Informative)
Re:Reliefe for the /. site (Score:2)
Putting 12 older PC in the cubicle and have same level of noise could mean that either you put some work into making them quiet, or it's quite noisy already :D
Re:Reliefe for the /. site (Score:2)
there's quite a few of the pentium2 and pentium3 era pc's that only had one fan in the whole system(some compaqs at least).
Servers last words (Score:2, Funny)
"Dieing Ben-ja-min" - Short Circuit 2
ccache (Score:5, Interesting)
Re:ccache (Score:3, Informative)
When to use distcc and ccache (Score:3, Informative)
On the same tack, the performance of distcc will (to an extent) depend on the nature of the compilation task used in the test (I am not familiar with
Copy of my article... (Score:4, Redundant)
distcc optimizations - March 30th 2004
and how to compile kdelibs from scratch in six minutes
If you don't already know about distcc I recommend that you check it out. Distcc is a tool that sits between make and gcc sending compile jobs to other computers when free, thus distributing compiles and dramatically decreasing build times. Best of all it is very easy to set up.
This, of course, leads to the fantastic idea that anyone can create their own little cluster or farm (as it is often referred to) out of their extra old computers that they have sitting about.
Before getting started: In conjunction with distcc there is another tool called ccache, which is a caching pre-processor to C/C++ compilers, that I wont be discussing here. For all of the tests it was turned off to properly determine distcc's performance, but developers should also know about this tool and using it in conjunction for the best results and shortest compile times. There is a link to the homepage at the end of this article.
Farm Groundwork and Setup
As is the normal circle of life for computers in a corporate environment, I was recently lucky enough to go through a whole stack of computers before they were recycled. From the initial lot of forty or so computers I ended up with twelve desktop computers that ranged from 500MHz to 866MHz. The main limit for my choosing dealt with the fact that I only had room in my cube for fifteen computers. With that in mind I chose the computers with the best CPU's. Much of the ram was evened out so that almost all of the final twelve have 256MB. Fast computers with bad components had the bad parts swapped out for good components from the slower machines. Each computer was setup to boot from the CD-ROM and not output errors when booting if there wasn't a keyboard/mouse/monitor. They were also set to turn on when connected to power.
Having enough network administration experience to know better, I labeled all of the computers, the power cord and network cord that was attached to them. I even found different colored cable for the different areas of my cube. The first label specified the CPU speed and ram size so later when I was given faster computers, finding the slowest machine would be easy. The second label on each machine was the name of the machine, which was one of the many female characters from Shakespears plays. On the server side a dhcp server was set up to match each computer with their name and IP for easy diagnosis of problems down the line.
For the operating system I used distccKNOPPIX. distccKNOPPIX is a very small Linux distribution that is 40MB in size and resides on a CD. It does little more then boot, gets the machine on line and then starts off the distcc demon. Because it didn't use the hard disk at all, preparation of the computers required little more than testing to make sure that they all booted off the CD and could get an IP.
Initially, all twelve computers (plus the build master) were plugged into a hub and switch that I had borrowed from a friend. The build master is a 2.7Ghz Linux box with two network cards. The first network card pointed to the Internet and the second card pointed to the build network. This was done to reduce the network latency as much as possible by removing other network traffic. More on this later though.
A note on power and noise, the computers all have on-board components. Any unnecessary pci cards that were found in the machines were removed. Because nothing is installed on the hard disks they were set to spin down shortly after the machines are turned on. (I debated just unplugging the hard disk, but wanted to leave the option for installation open for later.) After booting up and after the first compile when gcc is read off the CD the CD-ROM also spins down. With no extra components, no spinning CD-ROM or hard disk drives the noise and heat level in my cube really didn't change any that I c
Distccd for cygwin (Score:5, Informative)
Re:Distccd for cygwin (Score:2)
Re:Distccd for cygwin (Score:2)
Re:Distccd for cygwin (Score:2)
A similar technique to the distcc + cygwin install can be used to allow a distcc host to provide a GCC version other than its system GCC version. For example, my setup:
1.7 GHz P4-M (Gentoo box, always is the controlling node)
1.1 GHz Athlon (RedHat 7.3, sys GCC is 2.96, but I have a 3.3 tree in another location that won't interfere with the 2.96 tree)
WinXP box with an Athlon XP 1?00+
The XP box has 256M RAM, the other two 512M. Works great.
Martin Pool interview (Score:5, Informative)
http://web.zdnet.com.au/builder/program/work/st
Re:Martin Pool interview - clickable link (Score:4, Informative)
Re:Martin Pool interview (Score:3, Informative)
Mirror (Score:5, Informative)
http://hackish.org/~rufus/distcc.php.html [hackish.org]
Re:Mirror (Score:2)
I've wanted to mirror files for
Re:Mirror (Score:2)
Gentoo Impact(s) (Score:2)
Re:Gentoo Impact(s) (Score:5, Informative)
http://www.gentoo.org/doc/en/distcc.xml [gentoo.org]
Re:Gentoo Impact(s) (Score:2)
behind the XCode curtain (Score:5, Insightful)
Re:behind the XCode curtain (Score:5, Informative)
Anyway, you can see distcc running when you have X code enabled for distributed builds and running.
--jim
Comment removed (Score:5, Interesting)
Mirror (Score:2, Informative)
Improving builds. (Score:2, Informative)
(2) Use --jobs=2 (or however many processors you have).
Build times will be greatly improved - and it's cross platform as well.
In my opinion - especially if you have a complicated project - distcc isn't worth it. The machine takes so long pre-processing everything (including header files) - that you loose whatever advantages you might have with offloading the actual compilation work. It's especially useless with MSVC once you start using precompiled headers.
Or... You could do it properly. (Score:5, Informative)
[1] http://gridengine.sunsource.net/
Re:Or... You could do it properly. (Score:2)
Re:Or... You could do it properly. (Score:2)
Re:Or... You could do it properly. (Score:2)
s/a doddle to/a piece of cake to/g
Re:Or... You could do it properly. (Score:2)
http://suned.sun.com/US/catalog/courses/WE-1600 - 90
Our grid gets jobs out to an execution host and started in less than a second. All of our applications are distributed out over the execution nodes; Editors, word processors, spreadsheets, The Gimp, software builds, *everything*.
In fact, the less than 1 second latency incurred submitting a grid job is easily and by far overcome by the reduction in time given by starting a proces
Why wasn't a factorial experiment used? (Score:5, Informative)
http://www.itl.nist.gov/div898/handbook/pri/sec
It appears in this case we have a variety of factors and trying to, in this case, have a response of "elapsed time" for compilation and it is a minimization problem. Instead of looking at factors individually, a factorial DOE would have allowed interactions to be analyzed and to look for a global optima rather than just optimizing individual factors and then tossing them all together, it doesn't work that way a lot/most of the time.
If the author of this article is present: Why wasn't a factorial experiment used?
Re:Why wasn't a factorial experiment used? (Score:4, Interesting)
Factorial DOE is useful if you have multiple measurable, continious or quasi continous [0] factors, and want to optimise - particualry when there is some trade off. In this case, however, most of the variables that were altered were clearly discrete (This version of make, or that version of make, for example), or it was clear that the optimum was at an extreme (More CPU speed is always good, for example).
So, the factors I can see that would be suitable to a factorial DOE is the number of machines in the farm. Except, each machine is different, so that's effectivly an n-dimensional set, with 2 options on each dimension, for n machines. If your going to do the stats, you'd want to do them properly, so no handwaving them all together there.
Plus, this is a determanistic situation. There is no real need for empirical analysis - you can do it all from first principles, which would be much more efficent, I think. And, indeed, that's what the author did - by looking at the theoretical background of it all, to use different makes and so on, to optimise.
Finally, if you think that a factorial DOE will get you a global optimum solution, then your sadly mistaken. It's a good procedure for optimising, and it can avoid some local minima - but it's not guarenteed to find a global minima. The only guarenteed method I'm aware of is a synthetic annealing - and if you've got a faster method, I, and a large number of people doing numerical caluclations, would love to hear it.
Oh, and the aim here was _not_ to find a global minima. It was to get something that was good enough. Trying for better than that is wasted effort.
[0] For example, the set of integers, from 0 to 1000 is quasi continous. It's not really continous, but it's close enough for real purposes.
Re:Why wasn't a factorial experiment used? (Score:3, Insightful)
DOE is widely implemented in especially manufacturing processes, however with just basic knowledge of DOE it is easy to see the applications to non-manufacturing processes as well. DOE is readily available in just about any statistics software worth using, R, SAS, Minitab, S-Plus, etc so even if you don't have m
Electric Cloud (Score:3, Informative)
Re:Electric Cloud (Score:2)
run _everything_ in parallel
What, even things that shouldn't be parallel? Screw that.
Damned if I'm letting an electric cloud near my machine room.
automake - unsermake (Score:2)
PHP article? (Score:4, Insightful)
Missed the best point (Score:5, Informative)
There are some problems though - which do you do first ccache or distcc (answer on my benchmarks is ccache - if it isn't in the cache send it on the network) how fast is your "build" machine - this is critical. The build machine is resonsible for preprocessing the file, checking if it is in the cache and then sending it out to be turned into an object. Especially when you interact the results of ccache (which most of your builds are just the same file over and over - very few "changed" files) and distcc - most of your time is spent in the first pass compiler.
In our environment we had boatloads of dual XEON machines around - they made wonderful build machines, and it didn't hurt that we connected them with Gig Ethernet either. Did wonders for our build times.
Over all distcc and ccache are wonderful tools that should be in every large compile environment - making compiles that used to take days take simple minutes. But you want to make sure that the dependancy between ccache and distcc work optimally in your environment.
Re:Missed the best point (Score:3, Informative)
Actually I mentioned it in the first paragraph...
Re:Missed the best point (Score:2)
Seems to me - he is ignoring the hard part of getting the best benefit out of the tool package... Kinda like talking about optimizing c
Re:Missed the best point (Score:4, Insightful)
-Benjamin Meyer
Re:Missed the best point (Score:2)
The point is that to a person unfamiliar with "compiler-intermediary" tools like distcc and ccache, the way to use them simultaneously is nonobvious.
Does the master host keep the cache, and farm out jobs on cache misses? Or does each box keep its own ccache, which is used to fulfill compilation jobs from the master? (Obviously, one of those options is drastically worse than the other)
Since you alluded to the possibility of distcc+ccache in the introduct
Perfect timing! (Score:2)
Hell yeah!
Re:Perfect timing! (Score:2)
The motivation for my work in 1991 was not much different than this, although back then my problem was building the X11 distro, and all of the imake crap that was in there. Since the paper itsel
jobs/cpu? (Score:2, Interesting)
Re:jobs/cpu? (Score:2, Insightful)
"Weak" computers are usefull ... (Score:2)
whoa yours server's been comprimised.. (Score:2)
Re:whoa yours server's been comprimised.. (Score:2)
Can distcc model be used for other apps? (Score:2)
If it's generalized, it would be cool to see it used for other CPU intensive tasks.. Video processing comes to mind. I would love to have a cluster bring down the times needed to:
- Convert MiniDV home video to MPEG2 DVD's. There are professional tools to do this.. A hobbyist tool that could do clustering would be excellent.
- Convert HDTV captures to MPEG2 for DVD archival. 1080i video processi
Re:Can distcc model be used for other apps? (Score:2, Informative)
What really happens is that you can use the so-called "masquarading" method installation, which basically means you set up symlinks called gcc, g++ and whatever to the distcc binary. Prefix your PATH with this directory and calling `gcc` will work.
In my opinion this is easier (and better) than doing `make CC=distcc gcc`
Mostly... (Score:2)
First off the generalised methods you allude to are MPI, the older PVM, and there's Mosix too.
MPI and PVM are framework libraries that allow for code to be written to take parallelism into account. They tend to be used for numerics calculations (which was thier birthplace), simply because numerics are CPU bound. There are others, that are even more numerics centric (HPF - a Fortran varient, for example), but MPI should probably be the target of choice for new code, including non-numerics base
Re:Mostly... (Score:2)
MPI is not what he wants. Both of the applications tji asked for are video recompression tasks. Those fall deep in the "trivially parralisable" category.
Just split up the input file into megabyte chunks, allow each helper computer to convert one chunk, then concatenate the results on the master. There is no need for the helper computers to communicate amoungst themselves while the calculation is going on, which is the ability MP
Recursive Make Considered Harmful (Score:5, Informative)
Unfortunately, the makefile creator most people use, automake, creates only recursive makefiles. Maybe a replacement like unsermake will get automake developers thinking about radical changes. I wouldn't mind seeing M4 go away, for one.
Re:Recursive Make Considered Harmful (Score:4, Interesting)
Seconded.
When I was at Be, Inc. (RIP), one of our engineers, motivated largely by the above-referenced article, converted our entire build environment to a non-recursive structure using gmake. The result was a large speedup, as well as more effective use of multiple processors (which BeOS utilized very well). gmake would grovel over the build tree for a minute or two, then launch build commands in very quick succession. 'Twas great.
Schwab
You're a bit outdated. (Score:2)
And there's a damn good reason for it, too, but that's neither here nor there. Anyhow, this was fixed so you can do non-recursive stuff if you want to now.
Unfortunately, the very latest automake versions are trying to be way, way too clever, thereby breaking stuff in lots of projects. Time to throw it out and use something else.
Automake is a Perl script.
Re:Recursive Make Considered Harmful Considered Du (Score:2)
That paper makes a spectacularly bad case
It makes a fine case. The worst part is that it exaggerates the value of its own minor insight. The grandiose title harkens to the famous "Goto Considered Harmful", which in its time was a more insightful position.
Nobody should be surprised that globally correc
How do you do all of this? (Score:2)
Re:How do you do all of this? (Score:2)
Scaling (Score:2)
Re:Scaling (Score:2)
Re:Scaling (Score:2)
-Benjamin Meyer
Re:Scaling (Score:2)
Re:Scaling (Score:2)
Until he gets his electric bill, that is.
Re:Scaling (Score:2)
Using Teambuilder, you don't have to muck about with heaps of settings, trying to discover which one works best, it just works. Out of the box.
distcc isn't so great (Score:3, Informative)
Re:distcc isn't so great (Score:2, Informative)
If I had more time I would trace through things and try to figure out why they failed. But I don't have that much time.
I still like the idea behind distcc and hope that someday (soon) they'll get it working correctly.
Re:distcc isn't so great (Score:3, Informative)
My friend recently had the same thing happen, and the conclusion we came to was that the compiler versions were different on the distcc servers (3.2.2) versus the client (3.2.3).. and the preprocessed code being sent off had syntax erorrs or something of the like when it was sent off (something to do with one of the new options in the latest gcc). I don't recall exactly what option it was or what package(s) were failing..
distributed codebase (Score:3, Interesting)
Plug for Xcode... (Score:3, Informative)
Question for ccache (Score:2)
But I thought that the 'make' program does exactly that: if a source code file is newer than the object file, then the source file is compiled; if not, the current object file is used.
What is exactly that ccache does that make does not ?
Re:Question for ccache (Score:2)
ccache will cache the previous compiles, and, if they haven't changed at all, use the cached results. This allows the certianty of a clean build to be gained in significantly less time. Make won't do that, because it was just cleaned.
Additionally, I belive that ccache uses a global cache. So, if, for example, you are compiling a couple of linux kernals, each patched differently, some of the compilations will be the same between both trees. ccache will recognise this, and only
If you want faster builds (Score:2)
build smaller things
the record for compiling a plan9 kernel is 15s
I built & installed the kernel and the whole distributed userland in 45 mins on a Duron 800Mhz.
... room in my cube for fifteen computers ... (Score:2)
I wonder how much noise and heat is generated by 15 PCs running in a small cubeacular office environment....
Re:I wonder... (Score:5, Insightful)
Re:I wonder... (Score:5, Insightful)
Re:Article Text (Slashdotted Server) (Score:3, Funny)
Re:Article Text (Slashdotted Server) (Score:3, Informative)
Re:Article Text (Slashdotted Server) (Score:2)
Perhaps you're attributing motivations to this behavior (making a useful post) that doesn't apply.
Re:Article Text (Slashdotted Server) (Score:2)
Read about it on slashdot, oddly enough.
Re:Article Text (Slashdotted Server) (Score:3, Funny)
You mean you haven't been promoted yet? Ha! n00b...
Re:Article Text (Slashdotted Server) (Score:2, Interesting)
He couldn't. It's simply a risk you take when posting the article. The moderation system is intended to improve things for the reader, not to judge his (undoubtedly good) intentions. You have a point though, maybe Redundant moderations shouldn't decrease karma, just like Funny doesn't increase it.
btw, posting the article as non-AC is viewed by many as karma whoring, so it's not recommended anyway.
Re:I don't have long compile times (Score:2)