Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Introduction to Distributed Computing 95

dosten writes "ExtremeTech has a nice intro article on distributed and grid computing." Someday someone will successfully implement something like Progeny's NOW and all of these assorted hacks at building a distributed computing system will be superseded.
This discussion has been archived. No new comments can be posted.

Introduction to Distributed Computing

Comments Filter:
  • Or we could just spend 8 hours finding a buffer overflow in Brilliant's Distributed Kazaa software and do it that way.
  • Time warp? (Score:1, Troll)

    by Pedrito ( 94783 )
    Most new hires came in to work on projects that had the potential to bring in revenue sooner than NOW...

    Um, maybe it's me, but how could it be sooner than now? If these guys have a working time machine, maybe they ought to try to capitalize on that instead of writing an OS.
  • There is a paper due out in Science next week that presents a mathematical model of the universe as a distributed computation. The properties of the various masses (velocities, sizes, angular momenta, etc) can be considered to be the results of computations carried out by the interactions between them.

    Of course, computation in a vacuum (ha ha) is useless. Information on the results of the computations is carried around via cosmic rays, neutrinos and the like.

    The really exciting thing is that the conclusion of the paper calls for research into the general direction that cosmic rays are flowing which may lead us right to the location of God Himself!

    • The universe by definition is a distributed system of objects. That's why OO is such a Good Thing(tm). It takes advantage of the way we understand the universe, to let us say what we need to say more succinctly. A really good system is based on reactions. It can keep a relativly stable state, no matter what. It's buffered. As for the location of God Himself(tm), what part of 'omnipresent' didn't you understand?
  • Distributed... (Score:3, Interesting)

    by Renraku ( 518261 ) on Friday April 05, 2002 @04:47PM (#3292728) Homepage
    Distributed computing is actually a pretty simple idea to come up with, seeing as how a lot of things are 'distributed' such as manufacturing, selling products, etc. The thing that makes distributed computing attractive is the speed of data and the unused potential of your average computer. It would be nice to see a company that needed a lot of data processed, and paid people for every data pack they processed and completed. Rules would have to be set up to prevent abuse, but it would be a nice system. Everyone wins.
    • Except for the ``paying people'' part, United Devices [ud.com] does just that.

      The downside of distributed computing is figuring out how to split a given problem into pieces that can be processed separately. Not all problems can be split up, and for those that can be split, figuring out the best way to do so isn't always trivial.

    • Re:Distributed... (Score:2, Insightful)

      by Krapangor ( 533950 )
      While the principle is simple, the idea itself is massively overrated these days. It's not that distributed computing is exactly a new idea. Parallel massively machines are around for decades. And distributed computing is just using computers on a (large scale) network as a massively parallel machine. But history has already shown that many problems can't be solved by parallel computations therefore limiting the power of distributed computing. The only new benefit is that you don't need spend $$$ on cray systems etc. Just buy some processing power in a grid. However there won't be as much customers as you would expect.
      This stuff is just overhyped by some companies which think that they can make the big buck.
  • by Spackler ( 223562 ) on Friday April 05, 2002 @04:50PM (#3292741) Journal
    The whole thing [extremetech.com]

    Rather than a popup ad per page.
  • April 14, 2011: I sit down at my new copy of Windows 2010 to fill out my tax return. A dialog box pops up - "Sorry, but your computer and 36 million of its comrade workstations are busy working on Bill Gates' tax return. Try back again in a day or two..."
  • Thats what I got when i went to that site.

  • Condor [wisc.edu] is a very good Grid system that is freely available for Linux (binaries only).
  • From the site...

    "That was all right at the time, because it was easy to raise money for ambitious development projects such as NOW that could take years to develop and, thus, that might not pay off for years."

    and

    "...Most new hires came in to work on projects that had the potential to bring in revenue sooner than NOW"
  • check this out (Score:4, Informative)

    by emir ( 111909 ) on Friday April 05, 2002 @05:07PM (#3292843)
    if you are interessted in distributed computing over internet check out this url: http://www.aspenleaf.com/distributed/ [aspenleaf.com].

    there is short description of all distributed computing projects plus lots of other stuff.
  • My University just got a 395,000 dollar grant from the NSF. for more info : http://inside.binghamton.edu/March-April/4apr02/gr id.html">
  • by Frobnicator ( 565869 ) on Friday April 05, 2002 @05:12PM (#3292863) Journal
    ... like the dogma project [byu.edu] at Brigham Young University [byu.edu] is a distributed application system currently on used on a few thousand machines. It is written in pure Java, requires no persistant storage on the local machine, can be interrupted at any time, and is OS independant, to name a few things.
  • I guess I really like the idea of distributed computing. In a world where everyone works together with common goals we would be able to achieve almost anything. The flies in the ointment, however, are the few individuals who would get their rocks off by ruining it for everyone else, the same type of people who write virii.

    Another networking subject that really interests me is wireless networking. I think that someday in the not too distant future we will see neighborhood networks forming and then a linking of various neighborhood networks to form a new kind of "internet." One that is absolutely not controlled by any group.

  • There's a Distributed Computing Forum over at Anandtech [anandtech.com]

  • Is that for most intents and purposes, processor cycles are free.

    If a company/organization has an *actual* need for processor cycles (say genome research), it's cheaper to buy 1000 boxes and admin the stuff in-house. Even when ignoring issues such as sending valuable company data to thousands of internet users, most applications that require large compuation also require large amounts of bandwidth, generally provided over a LAN.

    This is why you'll never get to render a frame for Toy Story 5: Pixar will need to send you 5GB of data just to get back a 2k image.

    Once you consider the costs of admining a network, writing/distributing your code, against having a tangible financial benefit from the results, few companies will have a reason to turn to outsiders for a few minutes on their machines.
    • I take it this is why companies like IBM have their some of their research software run as a distributed program that eats up the processor cycles of all of their non-research PCs?

      While it's true that sending valuable company data across the Internet is a problem, not all problems are going to require that. Also, while not every problem lends itself to distributed computing, a program that properly implements a problem that does lend itself to distribution won't require large amounts of bandwidth.

      You know, you could simply check that statement against current distributed problems. For example, neither Seti@home nor the distributed.net client have large bandwidth demands but a high computational demands.

      Face it, if Pixar had to pass around that much data to render individual frames, its own network would get overflowed.

      I will have you know, though, that processor cycles are far from free. Building a good supercomputer that can do the work of a distributed system is very expensive no matter what route you take. (The purchase, infrastructure, development, and administraton of even a few hundred machines is pricy. Ask Clemson's PARL.)
      • Yes, but neither Seti@home nor d.net are making any money. They're largely research projects.

        The companies looking to get into this are hoping to make money. I'm saying that's a bad business plan.

        And yes, Pixar already passes about that much data. Large scenes/complicated renders can even go higher per-frame.

    • Sorry, that's a bad example. Pixar's existing compute farm doesn't need much networking.

      • Sorry, that's a bad example. Pixar's existing compute farm doesn't need much networking.

        But it sure needs confidentiality, both of the rendering code itself and the data it is working on. Otherwise we will all see random frames from every Pixar movie in advance.

        Plus the rendering code is quite likely huge and has a lot of dependencies on proprietary codebases. I doubt the stuff would run well on Direct-X.

        The liquid metal effect in Terminator cost a million or so to develop and sold for that the first time after which it was quickly copied so that no you can get it in a movie for a few $10K.

        The idea of using the internet to do distributed computing is as old as the net itself. We were building SETI type configurations back in the mid 80s, as soon as the price performance of the workstation rendered mainframes obsolete.

        Believe it, if Pixar need more compute cycles they will go to Dell and buy a room full of cheapo machines. It will cost much less to manage than scraping processing time up from arround the net.

    • by Rajesh Raman ( 115274 ) on Friday April 05, 2002 @06:22PM (#3293243)
      You're missing the point. Distributed computing is not about only running on machines that aren't yours, but also efficiently utilizing the machines that are yours (or at least have easy access to).

      Consider that a University of Wisconsin study showed that, on average, computers on desktops are idle at least 60% of the time. And that doesn't count the cycles burned lost between keystrokes --- I'm talking about extended periods of time. For example, almost all desktop machines are idle during nights. That's 50% already. Now add lunch time. Meetings, etc.

      That's when systems like Condor [wisc.edu] come in. Researchers at Wisconsin got hundreds of years of CPU time on machines they already had without impacting others.

      Coming back to your argument, the counter argument is that you may not even need to buy additional boxes --- just use the ones you already have more efficiently by utilizing distributed computing systems.

      As far as "freeness" of processor cycles, let me tell you that the optimization researchers can soak up as much cpu as you can possibly throw at them. Also, if you look up Particle Physics Data Grid (PPDG) and GriPhyn, you'll find out that many distributed computing problems are I/O driven.

      ++Rajesh
      • on average, computers on desktops are idle at least 60% of the time

        Many of us need that 60% idle time to keep our CPU's running at a reasonable temperature. I have my CPU and case cooling under control but now I think I need to put muliple A/C zones in my house thanks to distributed.net. :)
      • I've got no problem with research projects that use distributed computing. I myself run d.net and have thrown cycles to Seti@home and Genome@home. It's a great way to pick up free cpu cycles cheaply, if you've got the time.

        However, there's half a dozen companies now that think they're going to make money off people using these programs for large projects.

        The reality of the matter is, if d.net had to support itself financially, it'd get rid of it the internet users and stick to in-house boxes.

        I'm not dissing distributed computing: it has its benefits. But it will probabally always be limited to research/educational projects.

        My point is that if I'm a CGI guy who needs cpu cycles today, it's cheaper to buy them myself then to farm them out to a third party. So long as Moore's law holds up, this will remain true. There's a study on this I can't find right now.
  • Notes and comments (Score:3, Informative)

    by pridkett ( 2666 ) on Friday April 05, 2002 @06:20PM (#3293227) Homepage Journal
    First of all, be sure to check out the links at the end of the article to some of the projects that are going on right now. Some of the ones that I find more interesting are the Particle Physics Data Grid and the Access Grid (no link in article).

    One of the great benefits of Grid computing over distributed computing is the access to resources, such as storage. This is what PPDG seeks to do, provide access to physicists, in near real time, to the results of experiments. The problem is that the experiments may be performed at CERN and the researcher may be at CalTech. While normally for a telnet or what not, this isn't a problem, it is a problem when an experiment can produce Petabytes of data. For more information on that see http://www.ppdg.org [ppdg.org]. There is another project called NEESGrid [neesgrid.org] that will provide access to earthquake simulation equipment remotely. Truly cool.

    I also encourage you to check out Globus [globus.org]. Using a system like the Globus Toolkit along with MDS, I can locate a machine and execute my program on it transparently. This transparency is taken care through a network of resource managers, proxys and gatekeepers. It's pretty cool and is pretty easy to install on your favorite Linux box.

    Programming Grid enabled applications is pretty easy. There are software libraries called CoG Kits [cogkits.org] that provide simple APIs for Java, Python and a few other languages. In just a few lines of code you can have a program that looks up a server to run your executable on, connects, executes and returns the data to you.

    The current push right now is towards OGSA [globus.org] which is Open Grid Services Architecture. This will form the basis for Globus 3.0. OGSA will take ideas from web services, like WSDL, service advertisement, etc, and implement them to create Grid services. This will be the next thing with services easily able to advertise themselves and clients easily able to find services.
  • by xtp ( 248706 ) on Friday April 05, 2002 @07:12PM (#3293486)
    These projects when described in the lay press nearly always skip over any analysis of the kinds of algorithms that can work well on a distributed system. The first metric to look at is the ratio of communication to computation. That is, how many bytes of data does a compute exchange with neighbor(s) before continuing with the next step of computation.

    Render farms are embarrasingly parallel requiring no communication with neighbors while rendering a frame. They do require a large amount of data before starting on the next frame, but you can either pipeline that (which they don't do usually) or double up on the number of compute nodes (which is more common).

    Suppose instead you want to solve a big mesh problem like a 3D cube with 10^10 points on a side. And its a fairly simple computation. You might need 10^5 or 10^6 nodes and the data traffic between nodes would look like a DOS attack if it took place on the internet.

    And then there is the rich space of possibilities between these two extremes and the crossproduct with storage. It is a fascinating area to work in because there is much yet to learn and the possibilities for new networks and processors and storage evolves all the time. Things that were impossible to do last year are within reach this year or next year.

    But.... just as 100 Volkswagon beetles may have the same horsepower as a huge earthmoving machine, the beetles cannot readily move mountains... and 100 or 1000 or 10000 PCs with a low-cost interconnect are not equal to a supercomputer or a supercluster that may support 10^6 greater communcations to computation ratio - and thus a much greater range of useful distribution algorithms.
  • Have a read of my guide, it's at http://www.bacchae.co.uk/docs/dist.html [bacchae.co.uk]

    This one covers issues such as parasite attacks, spoiler attacks, etc.

    Slashdot rejected my guide when I submitted it. Whine whine gripe gripe.

  • by jc42 ( 318812 )
    This shows a profound lack of knowledge of the Computing literature. Back in 1982 (December issue IIRC), there was an article published describing The Newcastle Connection. This was a fully-distributed unix system built on exactly the same model. It was a unix system that incorporated other systems as components, treating the network as a bus. The result was a large multi-processor unix system.

    They weren't nearly the last ones to announce that they had done such a thing. For a while, in the mid-80's, it was somewhat of an inside joke. It seemed that everyone was making their own distributed unix system using the same design.

    I built one myself, and so did a fellow down the hall from me (at Project Athena at MIT). We both spent about a month of our spare time on it, and both of ours worked. One of my demos consisted of a Makefile with source scattered across as many machines as I could get accounts on. I showed that, despite the fact that the clocks on some machines were off by hours or days, my code correctly adjusted for clock skews and compiled the right things. I didn't need to modify make or the compiler, I just linked them to my libcnet.a, which replaced all the system calls with my distributed routines, and they corrected for the clock problems.

    The problem isn't the difficulty in building a truly distributed system. Any competent software engineer should be able to do that. The problem is that the commercial world has no interest in selling such a thing, and the non-commercial world remains ignorant of things like this that were demoed several decades ago.

    One of the true frustrations from having built such a system is having to work with things like NFS, that still can't get its clocks right (at least not without requiring super-user permissions on every subsystem). When I decided to solve this problem so that make would work, it took me a morning, and I didn't use super-user permissions anywhere.

    BTW, the Newcastle system was used internally in a number of corporations. But the many attempts to make it more widespread just hit brick walls. So now we have the kludgery of HTTP and URLs rather than the simple, elegant schemes that the various distributed-system people have used.

  • I don't mind giving away cycles to seti@home or d.net but I'm anticipating that something evil is on the way here.
    /me grabs tinfoil hat
    What if CureTheCommonCold@Home is really help-pfizer-make-$10-a-dose-cold-medicine-that-tur ns-out-to-be-carcinogenic@home or Help-Monsanto-make-deadly-pesticide@home. What if those have running under Kazaa this whole time?

BLISS is ignorance.

Working...