Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
The Internet

Modelling P2P Networks 73

Nathan Kennedy writes: "Mihajlo A. Jovanovic did his Master's project at the University of Cincinnati on modelling P2P networks with Gnutella as a case study. You view his project along with source code, stunningly pretty pictures, an applet and a paper on scalability."
This discussion has been archived. No new comments can be posted.

Modelling P2P Networks

Comments Filter:
  • by glh ( 14273 )
    It would be nice to see the source code for this! Anyone know how he did it? He just mentioned using Java RMI, but I'm curious how it works under the covers. He had the binaries posted but not the source. :(
  • Some lawyer is looking at this page and trying to figure out what part of the DMCA has been violated and who they should call to ask for a job taking Mihajlo to court for them.
  • Visuals (Score:2, Insightful)

    by Mattygfunk ( 517948 )
    All of those visualizations seem like every other visual representation of the internet.

    This doesn't look any more interesting just because it's Gnutella.

    • All of those visualizations seem like every other visual representation of the internet.

      Only on a fairly shallow level - internet and p2p representations both have lots of lines and squiggles.

      The important point being made here is that the internet is very much a client/server mechanism - lots and lots of clients (browsers, ftp clients, etc) connecting to fewer servers, with clusters of servers in associated areas.

      Peer to peer networks are by definition non-client/server, i.e. they are more distributed, and any clustering is worthy of further study, due to demographic correlation on a particular IP range, with a certain ISP, or due to academic or organisational density.

      Pretty pictures, all - but look below, see what they represent.


    • What is important : These networks are highly clustered, and as such the current Gnutella algorithms don't take advantage of that and do a lot of duplicate work, leading to an in-efficient use of bandwith and problems in scaling.

      This conclusion is rather interesting for someone who is into distributed processing (although I'm not sure how "d'uh!" obvious it is to most) but even more interesting is that its all based upon empirical evidence (of course using a mathematical basis).

      So don't just look at the pretty pictures!
    • nor would i consider them stunningly pretty.
      I don't know about you, but I find I just can't administer a network without some real stunningly pretty pictures of my topology.
      Or at least a thinkgeek internet poster [thinkgeek.com].
  • Mihajlo A. Jovanovic...Mihajlo A. Jovanovic..
    try saying that three times fast! on a more serious note, it looks like this may be next in line for the p2p executions going around...
  • Firewalls (Score:1, Troll)

    by Chardish ( 529780 )
    • Most people who use P2P networks have high speed connections.
    • Most people with high-speed connections have firewalls.
    • P2P is a major security risk, and the only way to get a P2P connection going is to let your firewall drop its guard for a second.
    Therefore, P2P is a major security risk. What's wrong with client-server? I think it's great.

    -Evan
  • Interesting... (Score:4, Interesting)

    by ZigMonty ( 524212 ) <slashdot&zigmonty,postinbox,com> on Thursday February 28, 2002 @09:26AM (#3083567)
    The pictures are cool but wouldn't one of the Fasttrack [fasttrack.nu] based P2P networks be a better example? I've got nothing against Gnutella but Morpheus and co seem to have scaled better. Is it because Gnutella is easier to test, its protocol more open, etc? Can anyone enlighten me as to why Gnutella would be better? Not trying to be a troll, just curious.
    • Re:Interesting... (Score:1, Insightful)

      by jsmyth ( 517568 )
      The pictures are cool but wouldn't one of the Fasttrack [fasttrack.nu] based P2P networks be a better example? ...Can anyone enlighten me as to why Gnutella would be better?

      Three possible reasons I can think of:

      • Gnutella's more widely known, and would generate more interest in the researcher's peer group. We all know academia's all about interested peer groups :-)
      • The researcher may have been more familiar with Gnutella
      • Gnutella's been around for longer, and is less centralised (if such a comparison is possible in the P2P world), has more diverse clients on different OSs, and is in continual development in so many different projects. The research was on a protocol and distributed application, and gnutella matches these in a fairly well known way.

      There may be more, but research is tough, and any shortcuts to getting data are usually welcomed with open arms...

    • One major reason would be that Gnutella is open source and Fasttrack is not.
    • Gnutella Ultrapeers (Score:4, Informative)

      by chrohrs ( 302592 ) on Thursday February 28, 2002 @10:40AM (#3083874) Homepage
      One problem I see with this study is it doesn't account for ultrapeers [limewire.com], technology that was released by LimeWire [limewire.com] in early January. Ultrapeers increase scalability by offloading most of the bandwidth burden to dynamically-elected high-speed hosts. Unlike Fasttrack, ultrapeers use an open protocol with open-source [limewire.org] implementations. I believe BearShare [bearshare.com] is also adding ultrapeer support.

      One problem with LimeWire's initial implementation is that ultrapeers didn't respond to "crawler pings" with "leaf pongs". (We've since changed that.) So as pretty as these pictures are, they're probably not accurate. I would love to see updated results that accounted for ultrapeers.

      The Gnutella network is evolving rapidly, and it would be great if academic papers considered these changes. The Gnutella Developer Forum [yahoo.com] (GDF) is the primary location for protocol development.

      Christopher Rohrs
      Sr. Software Engineer
      LimeWire

    • Re:Interesting... (Score:5, Informative)

      by jilles ( 20976 ) on Thursday February 28, 2002 @11:03AM (#3084001) Homepage
      Morpheus is apparently going to switch to gnutella in their next version of their client. Due to the unexpected exclusion of their existing client of the fasttrack network the release will probably be in a couple of days.

      The Gnutella protocol and the fasttrack protocol are actually very similar since limewire added superpeers. In a couple of days (when morpheus releases their gnutella stuff), gnutella will be put to the test. Theoretically it should scale at least as well as fasttrack.

      Gnutella has a few advantages over fasttrack:
      - Gnutella is an open, simple to implement protocol. Fasttrack is a propietary implementation of a propietary protocol. Non authorized implementations are being banned from the network.
      - there are implementations of Gnutella in various languages for various platforms. The best are free (both in terms of speech and beer).
      - shutting down one gnutella client doesn't affect other clients. So if bearshare is shut down you just switch to limewire or something else and connect to the same gnutella network.
      - unlike fasttrack, gnutella has no dependencies on a central server. It only needs the ip of one other client in the network to connect itself. Typically clients contact a webserver to get a list of such clients, however this is optional.
      - It's theoretically just as scalable and potentially even more scalable (due to future innovations in clients).
      - Most of the clients are stable and will survive more than a few searches (morpheus consistently crashes on me)

      However there are disadvantages:
      - Gnutella doesn't specify how to handle queries. Consequently some clients are better at this than others and you may get crappy search results. Limewire has metasearch abilities in the works which could potentially be just as good as morpheus (or better). However, until all (or most clients) on the network support this, this will be relatively useless.
      - The gnutella network is smaller (typically around 50000 hosts at the moment) than fasttrack at the moment.

      If you want to give gnutella a try and don't want the spyware, I recommend that you don't download the windows installer but the installer for "other" platforms instead. This will get you nice Java only version without all the crap (except for ads). Also be sure to run it using the latest jdk (1.4.0) as it is somewhat faster than previous versions.
      • Silly me, I was of course referring to the limewire client in the last paragraph of my previous post.

        IMHO still the best client but feel free to disagree.
  • The focus on P2P networks in the past has been solely on filesharing. But you can also exchange other data using a P2P network. There's an open source project with a P2P network for exchanging recommendations for web resources. Help us test the scalability of our network, just grab the tar.gz [berlios.de] and run the software!
    • P2P's other fruits may include:

      * Distributed storge like FreeNet
      * Instant Messaging
      * Anything else you can think of that may need to be decentralized.

      The advantage of a P2P network is that no single entity controls it and everybody shares the costs.

      • Re:True true... (Score:3, Interesting)

        by Salamander ( 33735 )
        Distributed storage like FreeNet

        Freenet is more of a data transmission method than a true data store. Even Ian says so, when pressed on the data-loss issue. MNet, OceanStore, Farsite or CFS would all be better examples of actual distributed storage.

  • ...US masters thesises aren't accepted as a diploma thesis in most european countries. Guess why.
  • Love the ~doesn't work with MS Internet Explorer~ part. Atleast one college kid has it right!
  • by Anonymous Coward on Thursday February 28, 2002 @09:55AM (#3083676)
    Looking at the charts it's really hard to get an accurate sense of gnutella's performance and scalability. Do the length of the lines correspond to the ping time, or physical location? What's the purpose of the circular chart? It would really be beneficial to include captions on each graph, to make it easier to understand. It would be nice to see a chart that relates the bandwidth, latency, number of connections, actual transfer rate, and ave connections all in a 3D visualization (not 3d bar chart). He could make a panorama of it. It might look like total garbage, or an organic structure.
    • He has those tables where he calculated the Clustering Coeffcient C and the average path length L... Those are your labels.

      And he directly comments on scalability and performance- he says it doesn't scale well, and does a lot of duplicate work and wastes time and bandwith when you are so highly clustered.
      • You're right it says that once I read through the html version of the paper, but that's no excuse for not labeling the charts. If the guy plans on submitting it to a peer review journal, he'd better have captions. Even though he makes those statements, it's still not totally clear how the charts relate to data/findings. It's the responsibility of the writer to make it easy to read and understand.
  • Hey, I think I saw my IP address in one of those pictures....

  • And I was always told that masters thesi (OK, I know theses is correct by thesi sounds cooler) were only ever used to balance wobbly table legs in professors offices. Live and learn.

    My thesis project [lsu.edu] involved building a visualization system for sensor fusion - how boring! Did have an applet [lsu.edu] though.

  • I clicked the link. I read 2 lines and already I found something I disagree with totally.
    Although P2P computing has existed for some time as a basis for network applications such as FTP, Telnet, instant messaging, ICQ, and Microsoft's MSN Messenger Service and NetMeeting

    Right, FTP is client - server, Telnet is client - server, instant messaging is client - server, ICQ is client - server, MSN Messenger is client - server. About the only one he got right out of his list of examples in NetMeeting.

    Now his paper doesn't mention these examples so why include them in the introduction page.

    The paper is well written and from a first skimming seems accurate and interesting - but that first paragraph on the introduction does nothing to enhance the paper.
    • If FTP is client-server, then Gnutella & Co. must be too. You select a file from a list of available files on a remote machine, and download it. And if NetMeeting is P2P, then so is telnet: you're invoking a communication channel directly to a remote machine.

      Not that there's anything wrong with client-server. Or P2P. But what do the terms really mean? Perhaps client-server is more usefully described as a system where a 3rd party (the "server") intermediates, while P2P requires only the 2 directly-involved machines. So, ICQ is client-server, to the extent that a message is sent to a 3rd party (an ICQ server), before being retrieved by the recipient. (Of course, ICQ communications can also be P2P).

      Or is P2P really a statement about a web of peers acting in concert? This becomes a bit problematical -- is DNS P2P? Is UUCP? Perhaps what distinguishes P2P is that the participants need to be the only players (i.e. not themselves acting as servers)? But now we're really splitting hairs.

      Perhaps P2P just isn't a new phenomenon at all. It seems that as soon as you try to really define it, it disappears on you!
      • I'd say it's simply that P2P is where the same nodes are both client & server. This is the case with Gnutella, Morpheus etc, but is not the case (typically) with ICQ, Messenger, et al. FTP and Telnet are interesting, as it's perfectly possible for the same machine to run both telnet and telnetd, but I would still argue that's the exception rather than the rule. The protocol was designed with a simple concept of server and client.

        As for DNS, from my hazy understanding (please correct me if I'm wrong) the client makes a request to one server, which redirects it if required to a different server. So the client lookup part of the protocol is certainly client server. The other parts (syncing changes etc) is outside of my realm of knowledge, but that could be considered p2p I guess.

        I certainly agree that p2p is nothing new, just a new phenomenon in terms of popularity.
        • OK -- let's look at the case where the same nodes are both client and server. That's exactly how DNS or UUCP "servers" work. With DNS, the "peers" are distributing a huge dynamic database of IP addresses. With UUCP, they're distributing messages. In both cases, the peers are symmetrical -- same "binaries" running on all machines (of course, the machines may be different architectures, running different versions of the software, etc., so they're not really the same binaries -- but you know what I mean...)

          But, you may argue, DNS and UUCP aren't truly P2P, because they themselves are servers. Is that the only thing which differentiates the new wave of P2P apps? The fact that they are monolithic? What if you split a Gnutella client into two parts -- a "server" which does the communication with other Gnutella nodes, and a "client" which is the GUI front-end? In that case, you'd have a system which was pretty much exactly analogous to UUCP and DNS. Especially if you could run the Gnutella client on a different machine from the Gnutella server.

          So how is Gnutella different, exactly? What if you made a combined UUCP client-and-server? You use the same program to both connect other UUCP nodes (downloading newsgroups or whatever), and to read the news. Would _that_ be a full-fledged P2P app? If so, the definition for P2P seems pretty lame.

          Is the P2P innovation merely the integration of a traditional server with a traditional client?
          • I agree that these definitions may seem arbitrary.

            DNS is a complicated example because there are _many_ things going on there. You've got the resolver code on each workstation and it's interaction with a recursive nameserver. You've got a recursive nameserver's interaction with authoritative nameservers. And you've got the interaction between master and slave nameservers (sharing zone data).

            Frankly I think that's all a red herring. I would modify my earlier statement of defining P2P as containing both client and server functions in the same binary (and yes, that is an important piece of the puzzle... if you seperate them, that means that you can run them exclusively on different machines) in the following way: that any given data could be both requested and served from the same node. That is to say that that any Gnutella node could source the same file.

            This is different from DNS master/slave in that although two nodes could play either role, the specific master/slave role is not arbitrary for a specific zone(data).

          • But, you may argue, DNS and UUCP aren't truly P2P, because they themselves are servers. Is that the only thing which differentiates the new wave of P2P apps? The fact that they are monolithic? What if you split a Gnutella client into two parts -- a "server" which does the communication with other Gnutella nodes, and a "client" which is the GUI front-end? In that case, you'd have a system which was pretty much exactly analogous to UUCP and DNS. Especially if you could run the Gnutella client on a different machine from the Gnutella server.


            There's not a lot of difference - to my mind it's to do with usage & populations. In the case of DNS, the servers are peers of each other, but there is a secondary population of "pure" clients, i.e. desktop machines which never act as servers. From their point of view, DNS is purely client server. With Gnutella however, as it stands, every client is also a server - there's only one (hetrogeneous) population. If you split the binaries as you suggest, you would create the potential for people to run either only the server or only the client, and thus, parts of the network would become client server.


            What does this mean? Really that there is very little technical difference (if any) - it's more that the "p2p" apps encourage/mandate that all users are equal (i.e. peers), pure clients or servers do not exist. Maybe we could think of it as a grey scale, with pure p2p at one end and pure client server at the other. Most protocols are somewhere along the line, very few are at either end!

            • Personally, I think the term P2P is best thought of in socio-political terms, rather than technological terms. It refers to a program (or a system of programs) designed to run on large numbers of PCs, doing things traditionally only done by servers. It empowers the Common Man -- letting groups of like-minded Common Men (ok, People) do things which traditionally required a powerful, expensive, legally-constrained central server.

              On that basis, DNS and UUCP clearly fail. So do FTP, telnet, and NetMeeting. Gnutella and FreeNet are clearly in. SETI@Home and Napster are not clearly in (they require that central server, even though the power comes from millions of the Common Man).
              • The real thing going on here I think is that these new Gnutella, Freenet are all self-discovering networks. DNS requires you to build a tree of servers that all know about each other, you can't just start a dns server without telling a root server about yourself (this has to be done by hand). Unlike other other protocols, gnutella and kin keep track of all the servers that are running.

                I guess the real problem with all these servers is that they are trying to do the impossiable, that is they want to search the entire internet
                • Two thoughts:

                  Gnutella isn't truly self-discovering. You need the address of at least one other Gnutella node to get hooked in. Perhaps there is some novelty in terms of the self-balancing mechanics, but I wouldn't be surprised to find even that isn't novel.

                  And if self-discovery is the key ingredient, then that definitely eliminates Napster and SETI@Home.
      • I'd define P2P (and I'm sure that someone will point out some authoritative formal definition somewhere) is when the same binary is used to both initiate and accept connections. FTP isn't P2P because it's got a client (ftp) and a server (ftpd). Same for DNS and telnet. NetMeeting is because it's one binary that plays both roles. I, however, think that while the paper is interesting (this is the first one of these I've bothered to browse), I am astonished at it's poor grammer. Is English the first language of any of these students?

    • it is also possible to use FTP server to server (ftp from a to b, transfer files from b to c) thanks to the overly complicated protocol and control channel. Maybe that's what he was talking about.

      i'm with you on the rest though..
    • The FTP protocol is p2p. It's the way the software's implemented that makes it client-server.
  • Some people may be right in that in his pictures aren't that pretty, or some statements are incorrect, or there may be elements we've seen before, but hey, aren't we forgetting something? His Master's Thesis got posted on Slashdot!!! I mean, how great is that. When and if I ever get to my master's, I'd sure love that kind of recognition. I think he at least gets credit for reaching such a large audience, and for covering a topic we all know and can discuss.
  • A few fundamental questions:

    My primary research interest is scalability issues in peer-to-peer computing networks. Although P2P computing has existed for some time as a basis for network applications such as FTP, Telnet, instant messaging, ICQ, and Microsoft's MSN Messenger Service and NetMeeting, recently it has managed to capture a lot of attention.

    Unless I am much mistaken (and has also been noted in previous posts) all the above mentioned are still CLIENT/SERVER models. Except maybe ICQ and MSN Messenger (and Yahoo messenger also, btw), that now auth with the server, and then communicate peer to peer.

    Indeed, the sudden emergence of new applications like SETI@Home, Groove, Napster, mobile communications, and Gnutella is threatening to replace the traditional client-server architecture of the web and bring rise to a new era in personal computing.

    I thought he just said that FTP, Telnet, etc. were peer to peer ? Also, is Seti@HOME a P2P App or a Client server model (which is a lot more likely, as there is a fixed (set of) entity(ies) that hands out the data to be computed.

    My recent work has focused on Gnutella as a model of a purely distributed computing system. Gnutella allows users to share information by directly connecting to each other forming a high-level network. High Level Network? What exactly is that supposed to be?

    One of the biggest problems in analyzing performance of distributed computing networks such as Gnutella as a function of size, is that even simple protocols result in complex network interactions.

    Huh? How so, can someone please care to explain? Agreed, there are a few more setup/terminate requests going around, but thats about it. (I've not analysed the actual packets of either gnutella like networks, nor IM networks, but I'm guessing here)

    In order to gain a deeper understanding of the nature of those interactions, an accurate model of the system is needed. A first step toward such a model is understanding the topology of the network. To discover the topology of the Gnutella network, I have developed a distributed computing system using Java RMI. This program allows instances of Gnutella's topology to be obtained in constant time, an extremely important feature when studying a highly dynamic network such as Gnutella.

    Am I missing something here? I would think that using Java RMI to distribute a code to a remote machine for execution would be required here. How can a program sitting at one place determine in constant time the instance of topology of the network? Looks like he used something similar to traceroute . In that case, however, how can the topology be retrieved in constant time?

    Just wondering .....

Don't panic.

Working...