Introduction to Distributed Computing 95
dosten writes "ExtremeTech has a nice intro article on distributed and grid computing." Someday someone will successfully implement something like Progeny's NOW and all of these assorted hacks at building a distributed computing system will be superseded.
We could spend millions to do this.. (Score:3, Funny)
Time warp? (Score:1, Troll)
Um, maybe it's me, but how could it be sooner than now? If these guys have a working time machine, maybe they ought to try to capitalize on that instead of writing an OS.
Re:Time warp? (Score:1)
Very funny, but I'll bite. Anything that has already happened, happened sooner than now.
Re:Time warp? (Score:1)
NOW == Network Of Workstations
Re:Imagine.... (Score:2)
Interesting distributed computing (Score:1, Troll)
Of course, computation in a vacuum (ha ha) is useless. Information on the results of the computations is carried around via cosmic rays, neutrinos and the like.
The really exciting thing is that the conclusion of the paper calls for research into the general direction that cosmic rays are flowing which may lead us right to the location of God Himself!
Re:Interesting distributed computing (Score:1)
How very sad (Score:1)
Re:Like all honest scientists... (Score:1)
Is it not far more logical to say, "We exist in this universe because it is the one that has the correct conditions for our existence."
And that bloody argument about the "vertebrate eye is too complex to have come about by evolution, therefore evolution is wrong." How do people persist in using this absurd statement? Despite the fact that there are organisms possessed of every gradiation from a simple light sensitive nerve on some worms, on up to the vertebrate eye. If you study biology you can see all the stages of the evolution of biological optics. And yet just last week I saw the "vertebrate eye" argument quoted in a newspaper as proof of intelligent-design.
Ye gods, what fools these mortals be.
Distributed... (Score:3, Interesting)
Re:Distributed... (Score:2)
Except for the ``paying people'' part, United Devices [ud.com] does just that.
The downside of distributed computing is figuring out how to split a given problem into pieces that can be processed separately. Not all problems can be split up, and for those that can be split, figuring out the best way to do so isn't always trivial.
Re:Distributed... (Score:2, Insightful)
This stuff is just overhyped by some companies which think that they can make the big buck.
The whole article at once (Score:5, Informative)
Rather than a popup ad per page.
Re:The whole article at once (Score:1)
Doug
Ya know ... (Score:2)
with a large scale distributed system, using the distributed translation project things like this may in the future look like this.
"My buddies and I are wimps so we pretend to be big shots online. So therefore we have created a small group called cs group. Online we are also seen as [tgk] to signify our uniqueness from you. We (being cs group) would like to point out the fact that we know a lot on the topic of distributed systems and would like to tell you our thoughts. We know all our posts will get 5's"
I can just see it now... (Score:1)
Nice introduction to DISTRIBUTED POPUP ADS (Score:1)
Condor wasn't mentioned (Score:1)
Re:Condor wasn't mentioned (Score:1)
The irony of these comments... (Score:1)
"That was all right at the time, because it was easy to raise money for ambitious development projects such as NOW that could take years to develop and, thus, that might not pay off for years."
and
"...Most new hires came in to work on projects that had the potential to bring in revenue sooner than NOW"
check this out (Score:4, Informative)
there is short description of all distributed computing projects plus lots of other stuff.
My university just got a grant to do grid comp. (Score:2, Interesting)
Re:My university just got a grant to do grid comp. (Score:1)
Re:My university just got a grant to do grid comp. (Score:1)
(At least the NSF and NSERC are consistent.)
Try a non-linux distributed protocol... (Score:3, Informative)
Fly in the ointment (Score:2)
Another networking subject that really interests me is wireless networking. I think that someday in the not too distant future we will see neighborhood networks forming and then a linking of various neighborhood networks to form a new kind of "internet." One that is absolutely not controlled by any group.
Re:Fly in the ointment (Score:2)
They really could accomplish nearly anything. Problem was the 'details' of everyday life were missed out on.
Re:Fly in the ointment (Score:1)
That's the most "American" thing I've heard in a very long time...
Related links (Score:1)
There's a Distributed Computing Forum over at Anandtech [anandtech.com]
The problem with distributed computing... (Score:2, Insightful)
If a company/organization has an *actual* need for processor cycles (say genome research), it's cheaper to buy 1000 boxes and admin the stuff in-house. Even when ignoring issues such as sending valuable company data to thousands of internet users, most applications that require large compuation also require large amounts of bandwidth, generally provided over a LAN.
This is why you'll never get to render a frame for Toy Story 5: Pixar will need to send you 5GB of data just to get back a 2k image.
Once you consider the costs of admining a network, writing/distributing your code, against having a tangible financial benefit from the results, few companies will have a reason to turn to outsiders for a few minutes on their machines.
Re:The problem with distributed computing... (Score:1)
While it's true that sending valuable company data across the Internet is a problem, not all problems are going to require that. Also, while not every problem lends itself to distributed computing, a program that properly implements a problem that does lend itself to distribution won't require large amounts of bandwidth.
You know, you could simply check that statement against current distributed problems. For example, neither Seti@home nor the distributed.net client have large bandwidth demands but a high computational demands.
Face it, if Pixar had to pass around that much data to render individual frames, its own network would get overflowed.
I will have you know, though, that processor cycles are far from free. Building a good supercomputer that can do the work of a distributed system is very expensive no matter what route you take. (The purchase, infrastructure, development, and administraton of even a few hundred machines is pricy. Ask Clemson's PARL.)
Re:The problem with distributed computing... (Score:2)
The companies looking to get into this are hoping to make money. I'm saying that's a bad business plan.
And yes, Pixar already passes about that much data. Large scenes/complicated renders can even go higher per-frame.
Re:The problem with distributed computing... (Score:2)
Sorry, that's a bad example. Pixar's existing compute farm doesn't need much networking.
Re:The problem with distributed computing... (Score:2)
But it sure needs confidentiality, both of the rendering code itself and the data it is working on. Otherwise we will all see random frames from every Pixar movie in advance.
Plus the rendering code is quite likely huge and has a lot of dependencies on proprietary codebases. I doubt the stuff would run well on Direct-X.
The liquid metal effect in Terminator cost a million or so to develop and sold for that the first time after which it was quickly copied so that no you can get it in a movie for a few $10K.
The idea of using the internet to do distributed computing is as old as the net itself. We were building SETI type configurations back in the mid 80s, as soon as the price performance of the workstation rendered mainframes obsolete.
Believe it, if Pixar need more compute cycles they will go to Dell and buy a room full of cheapo machines. It will cost much less to manage than scraping processing time up from arround the net.
Re:The problem with distributed computing... (Score:4, Informative)
Consider that a University of Wisconsin study showed that, on average, computers on desktops are idle at least 60% of the time. And that doesn't count the cycles burned lost between keystrokes --- I'm talking about extended periods of time. For example, almost all desktop machines are idle during nights. That's 50% already. Now add lunch time. Meetings, etc.
That's when systems like Condor [wisc.edu] come in. Researchers at Wisconsin got hundreds of years of CPU time on machines they already had without impacting others.
Coming back to your argument, the counter argument is that you may not even need to buy additional boxes --- just use the ones you already have more efficiently by utilizing distributed computing systems.
As far as "freeness" of processor cycles, let me tell you that the optimization researchers can soak up as much cpu as you can possibly throw at them. Also, if you look up Particle Physics Data Grid (PPDG) and GriPhyn, you'll find out that many distributed computing problems are I/O driven.
++Rajesh
Re:The problem with distributed computing... (Score:2)
Many of us need that 60% idle time to keep our CPU's running at a reasonable temperature. I have my CPU and case cooling under control but now I think I need to put muliple A/C zones in my house thanks to distributed.net.
Re:The problem with distributed computing... (Score:2)
However, there's half a dozen companies now that think they're going to make money off people using these programs for large projects.
The reality of the matter is, if d.net had to support itself financially, it'd get rid of it the internet users and stick to in-house boxes.
I'm not dissing distributed computing: it has its benefits. But it will probabally always be limited to research/educational projects.
My point is that if I'm a CGI guy who needs cpu cycles today, it's cheaper to buy them myself then to farm them out to a third party. So long as Moore's law holds up, this will remain true. There's a study on this I can't find right now.
Notes and comments (Score:3, Informative)
One of the great benefits of Grid computing over distributed computing is the access to resources, such as storage. This is what PPDG seeks to do, provide access to physicists, in near real time, to the results of experiments. The problem is that the experiments may be performed at CERN and the researcher may be at CalTech. While normally for a telnet or what not, this isn't a problem, it is a problem when an experiment can produce Petabytes of data. For more information on that see http://www.ppdg.org [ppdg.org]. There is another project called NEESGrid [neesgrid.org] that will provide access to earthquake simulation equipment remotely. Truly cool.
I also encourage you to check out Globus [globus.org]. Using a system like the Globus Toolkit along with MDS, I can locate a machine and execute my program on it transparently. This transparency is taken care through a network of resource managers, proxys and gatekeepers. It's pretty cool and is pretty easy to install on your favorite Linux box.
Programming Grid enabled applications is pretty easy. There are software libraries called CoG Kits [cogkits.org] that provide simple APIs for Java, Python and a few other languages. In just a few lines of code you can have a program that looks up a server to run your executable on, connects, executes and returns the data to you.
The current push right now is towards OGSA [globus.org] which is Open Grid Services Architecture. This will form the basis for Globus 3.0. OGSA will take ideas from web services, like WSDL, service advertisement, etc, and implement them to create Grid services. This will be the next thing with services easily able to advertise themselves and clients easily able to find services.
ignore the speeds and feeds (Score:3, Informative)
Render farms are embarrasingly parallel requiring no communication with neighbors while rendering a frame. They do require a large amount of data before starting on the next frame, but you can either pipeline that (which they don't do usually) or double up on the number of compute nodes (which is more common).
Suppose instead you want to solve a big mesh problem like a 3D cube with 10^10 points on a side. And its a fairly simple computation. You might need 10^5 or 10^6 nodes and the data traffic between nodes would look like a DOS attack if it took place on the internet.
And then there is the rich space of possibilities between these two extremes and the crossproduct with storage. It is a fascinating area to work in because there is much yet to learn and the possibilities for new networks and processors and storage evolves all the time. Things that were impossible to do last year are within reach this year or next year.
But.... just as 100 Volkswagon beetles may have the same horsepower as a huge earthmoving machine, the beetles cannot readily move mountains... and 100 or 1000 or 10000 PCs with a low-cost interconnect are not equal to a supercomputer or a supercluster that may support 10^6 greater communcations to computation ratio - and thus a much greater range of useful distribution algorithms.
More a more technical introduction... (Score:2)
This one covers issues such as parasite attacks, spoiler attacks, etc.
Slashdot rejected my guide when I submitted it. Whine whine gripe gripe.
TNC (Score:2)
They weren't nearly the last ones to announce that they had done such a thing. For a while, in the mid-80's, it was somewhat of an inside joke. It seemed that everyone was making their own distributed unix system using the same design.
I built one myself, and so did a fellow down the hall from me (at Project Athena at MIT). We both spent about a month of our spare time on it, and both of ours worked. One of my demos consisted of a Makefile with source scattered across as many machines as I could get accounts on. I showed that, despite the fact that the clocks on some machines were off by hours or days, my code correctly adjusted for clock skews and compiled the right things. I didn't need to modify make or the compiler, I just linked them to my libcnet.a, which replaced all the system calls with my distributed routines, and they corrected for the clock problems.
The problem isn't the difficulty in building a truly distributed system. Any competent software engineer should be able to do that. The problem is that the commercial world has no interest in selling such a thing, and the non-commercial world remains ignorant of things like this that were demoed several decades ago.
One of the true frustrations from having built such a system is having to work with things like NFS, that still can't get its clocks right (at least not without requiring super-user permissions on every subsystem). When I decided to solve this problem so that make would work, it took me a morning, and I didn't use super-user permissions anywhere.
BTW, the Newcastle system was used internally in a number of corporations. But the many attempts to make it more widespread just hit brick walls. So now we have the kludgery of HTTP and URLs rather than the simple, elegant schemes that the various distributed-system people have used.
We're all goin ta Hell... (Score:1)
What if CureTheCommonCold@Home is really help-pfizer-make-$10-a-dose-cold-medicine-that-tu