
Modelling P2P Networks 73
Nathan Kennedy writes: "Mihajlo A. Jovanovic did his Master's project at the University of Cincinnati on modelling P2P networks with Gnutella as a case study. You view his project along with source code, stunningly pretty pictures, an applet and a paper on scalability."
Source? (Score:1)
Someone is looking at this page and... (Score:3, Funny)
Re:Someone is looking at this page and... (Score:2, Insightful)
B) That is a masters dissertation? Fuck me, I wish I could have written my dissertation as a couple of pages of HTML. Education must be a damned site easier over that side of the pond.
Re:Someone is looking at this page and... (Score:1)
You must've missed this sentence on their page:
The following are selected samples of my research results.
Re:Someone is looking at this page and... (Score:1)
They are pretty pictures though - my dissertation didn't look nearly as pretty.
Re:Someone is looking at this page and... (Score:1)
Re:Someone is looking at this page and... (Score:1)
Gotta stick up for a fellow Bearcat...
Visuals (Score:2, Insightful)
This doesn't look any more interesting just because it's Gnutella.
Re:Visuals (Score:1)
Only on a fairly shallow level - internet and p2p representations both have lots of lines and squiggles.
The important point being made here is that the internet is very much a client/server mechanism - lots and lots of clients (browsers, ftp clients, etc) connecting to fewer servers, with clusters of servers in associated areas.
Peer to peer networks are by definition non-client/server, i.e. they are more distributed, and any clustering is worthy of further study, due to demographic correlation on a particular IP range, with a certain ISP, or due to academic or organisational density.
Pretty pictures, all - but look below, see what they represent.
the visuals are NOT important! (Score:3, Interesting)
What is important : These networks are highly clustered, and as such the current Gnutella algorithms don't take advantage of that and do a lot of duplicate work, leading to an in-efficient use of bandwith and problems in scaling.
This conclusion is rather interesting for someone who is into distributed processing (although I'm not sure how "d'uh!" obvious it is to most) but even more interesting is that its all based upon empirical evidence (of course using a mathematical basis).
So don't just look at the pretty pictures!
Re:Visuals (Score:2)
I don't know about you, but I find I just can't administer a network without some real stunningly pretty pictures of my topology.
Or at least a thinkgeek internet poster [thinkgeek.com].
nothing much (Score:1)
try saying that three times fast! on a more serious note, it looks like this may be next in line for the p2p executions going around...
Firewalls (Score:1, Troll)
-Evan
Interesting... (Score:4, Interesting)
Re:Interesting... (Score:1, Insightful)
Three possible reasons I can think of:
There may be more, but research is tough, and any shortcuts to getting data are usually welcomed with open arms...
Re:Interesting... (Score:2)
Gnutella Ultrapeers (Score:4, Informative)
One problem with LimeWire's initial implementation is that ultrapeers didn't respond to "crawler pings" with "leaf pongs". (We've since changed that.) So as pretty as these pictures are, they're probably not accurate. I would love to see updated results that accounted for ultrapeers.
The Gnutella network is evolving rapidly, and it would be great if academic papers considered these changes. The Gnutella Developer Forum [yahoo.com] (GDF) is the primary location for protocol development.
Christopher Rohrs
Sr. Software Engineer
LimeWire
Re:Interesting... (Score:5, Informative)
The Gnutella protocol and the fasttrack protocol are actually very similar since limewire added superpeers. In a couple of days (when morpheus releases their gnutella stuff), gnutella will be put to the test. Theoretically it should scale at least as well as fasttrack.
Gnutella has a few advantages over fasttrack:
- Gnutella is an open, simple to implement protocol. Fasttrack is a propietary implementation of a propietary protocol. Non authorized implementations are being banned from the network.
- there are implementations of Gnutella in various languages for various platforms. The best are free (both in terms of speech and beer).
- shutting down one gnutella client doesn't affect other clients. So if bearshare is shut down you just switch to limewire or something else and connect to the same gnutella network.
- unlike fasttrack, gnutella has no dependencies on a central server. It only needs the ip of one other client in the network to connect itself. Typically clients contact a webserver to get a list of such clients, however this is optional.
- It's theoretically just as scalable and potentially even more scalable (due to future innovations in clients).
- Most of the clients are stable and will survive more than a few searches (morpheus consistently crashes on me)
However there are disadvantages:
- Gnutella doesn't specify how to handle queries. Consequently some clients are better at this than others and you may get crappy search results. Limewire has metasearch abilities in the works which could potentially be just as good as morpheus (or better). However, until all (or most clients) on the network support this, this will be relatively useless.
- The gnutella network is smaller (typically around 50000 hosts at the moment) than fasttrack at the moment.
If you want to give gnutella a try and don't want the spyware, I recommend that you don't download the windows installer but the installer for "other" platforms instead. This will get you nice Java only version without all the crap (except for ads). Also be sure to run it using the latest jdk (1.4.0) as it is somewhat faster than previous versions.
Re:Interesting... (Score:2)
IMHO still the best client but feel free to disagree.
P2P network not only for filesharing (Score:1)
True true... (Score:2)
* Distributed storge like FreeNet
* Instant Messaging
* Anything else you can think of that may need to be decentralized.
The advantage of a P2P network is that no single entity controls it and everybody shares the costs.
Re:True true... (Score:3, Interesting)
Freenet is more of a data transmission method than a true data store. Even Ian says so, when pressed on the data-loss issue. MNet, OceanStore, Farsite or CFS would all be better examples of actual distributed storage.
Re:now I know where all the funny headlines come f (Score:1)
Well... (Score:1)
no IE!! (Score:1)
Re:no IE!! (Score:1)
Re:no IE!! (Score:1)
Missing charts and captions (Score:3, Interesting)
Re:Missing charts and captions (Score:1)
And he directly comments on scalability and performance- he says it doesn't scale well, and does a lot of duplicate work and wastes time and bandwith when you are so highly clustered.
Re:Missing charts and captions (Score:1)
There I am.... (Score:1)
Hey, I think I saw my IP address in one of those pictures....
So one actually surfaced... (Score:1)
My thesis project [lsu.edu] involved building a visualization system for sensor fusion - how boring! Did have an applet [lsu.edu] though.
Sorry but I was instantly jaded. (Score:3, Interesting)
Right, FTP is client - server, Telnet is client - server, instant messaging is client - server, ICQ is client - server, MSN Messenger is client - server. About the only one he got right out of his list of examples in NetMeeting.
Now his paper doesn't mention these examples so why include them in the introduction page.
The paper is well written and from a first skimming seems accurate and interesting - but that first paragraph on the introduction does nothing to enhance the paper.
What does P2P really mean? (Score:1)
Not that there's anything wrong with client-server. Or P2P. But what do the terms really mean? Perhaps client-server is more usefully described as a system where a 3rd party (the "server") intermediates, while P2P requires only the 2 directly-involved machines. So, ICQ is client-server, to the extent that a message is sent to a 3rd party (an ICQ server), before being retrieved by the recipient. (Of course, ICQ communications can also be P2P).
Or is P2P really a statement about a web of peers acting in concert? This becomes a bit problematical -- is DNS P2P? Is UUCP? Perhaps what distinguishes P2P is that the participants need to be the only players (i.e. not themselves acting as servers)? But now we're really splitting hairs.
Perhaps P2P just isn't a new phenomenon at all. It seems that as soon as you try to really define it, it disappears on you!
Re:What does P2P really mean? (Score:2)
As for DNS, from my hazy understanding (please correct me if I'm wrong) the client makes a request to one server, which redirects it if required to a different server. So the client lookup part of the protocol is certainly client server. The other parts (syncing changes etc) is outside of my realm of knowledge, but that could be considered p2p I guess.
I certainly agree that p2p is nothing new, just a new phenomenon in terms of popularity.
Re:What does P2P really mean? (Score:1)
But, you may argue, DNS and UUCP aren't truly P2P, because they themselves are servers. Is that the only thing which differentiates the new wave of P2P apps? The fact that they are monolithic? What if you split a Gnutella client into two parts -- a "server" which does the communication with other Gnutella nodes, and a "client" which is the GUI front-end? In that case, you'd have a system which was pretty much exactly analogous to UUCP and DNS. Especially if you could run the Gnutella client on a different machine from the Gnutella server.
So how is Gnutella different, exactly? What if you made a combined UUCP client-and-server? You use the same program to both connect other UUCP nodes (downloading newsgroups or whatever), and to read the news. Would _that_ be a full-fledged P2P app? If so, the definition for P2P seems pretty lame.
Is the P2P innovation merely the integration of a traditional server with a traditional client?
Re:What does P2P really mean? (Score:1)
DNS is a complicated example because there are _many_ things going on there. You've got the resolver code on each workstation and it's interaction with a recursive nameserver. You've got a recursive nameserver's interaction with authoritative nameservers. And you've got the interaction between master and slave nameservers (sharing zone data).
Frankly I think that's all a red herring. I would modify my earlier statement of defining P2P as containing both client and server functions in the same binary (and yes, that is an important piece of the puzzle... if you seperate them, that means that you can run them exclusively on different machines) in the following way: that any given data could be both requested and served from the same node. That is to say that that any Gnutella node could source the same file.
This is different from DNS master/slave in that although two nodes could play either role, the specific master/slave role is not arbitrary for a specific zone(data).
Re:What does P2P really mean? (Score:2)
But, you may argue, DNS and UUCP aren't truly P2P, because they themselves are servers. Is that the only thing which differentiates the new wave of P2P apps? The fact that they are monolithic? What if you split a Gnutella client into two parts -- a "server" which does the communication with other Gnutella nodes, and a "client" which is the GUI front-end? In that case, you'd have a system which was pretty much exactly analogous to UUCP and DNS. Especially if you could run the Gnutella client on a different machine from the Gnutella server.
There's not a lot of difference - to my mind it's to do with usage & populations. In the case of DNS, the servers are peers of each other, but there is a secondary population of "pure" clients, i.e. desktop machines which never act as servers. From their point of view, DNS is purely client server. With Gnutella however, as it stands, every client is also a server - there's only one (hetrogeneous) population. If you split the binaries as you suggest, you would create the potential for people to run either only the server or only the client, and thus, parts of the network would become client server.
What does this mean? Really that there is very little technical difference (if any) - it's more that the "p2p" apps encourage/mandate that all users are equal (i.e. peers), pure clients or servers do not exist. Maybe we could think of it as a grey scale, with pure p2p at one end and pure client server at the other. Most protocols are somewhere along the line, very few are at either end!
Re:What does P2P really mean? (Score:1)
On that basis, DNS and UUCP clearly fail. So do FTP, telnet, and NetMeeting. Gnutella and FreeNet are clearly in. SETI@Home and Napster are not clearly in (they require that central server, even though the power comes from millions of the Common Man).
Re:What does P2P really mean? (Score:2)
I guess the real problem with all these servers is that they are trying to do the impossiable, that is they want to search the entire internet
Re:What does P2P really mean? (Score:1)
Gnutella isn't truly self-discovering. You need the address of at least one other Gnutella node to get hooked in. Perhaps there is some novelty in terms of the self-balancing mechanics, but I wouldn't be surprised to find even that isn't novel.
And if self-discovery is the key ingredient, then that definitely eliminates Napster and SETI@Home.
Re:What does P2P really mean? (Score:1)
well... (Score:1)
it is also possible to use FTP server to server (ftp from a to b, transfer files from b to c) thanks to the overly complicated protocol and control channel. Maybe that's what he was talking about.
i'm with you on the rest though..
Re:Sorry but I was instantly jaded. (Score:2)
Master's (Score:1)
One question -- HOW? (Score:1)
My primary research interest is scalability issues in peer-to-peer computing networks. Although P2P computing has existed for some time as a basis for network applications such as FTP, Telnet, instant messaging, ICQ, and Microsoft's MSN Messenger Service and NetMeeting, recently it has managed to capture a lot of attention.
Unless I am much mistaken (and has also been noted in previous posts) all the above mentioned are still CLIENT/SERVER models. Except maybe ICQ and MSN Messenger (and Yahoo messenger also, btw), that now auth with the server, and then communicate peer to peer.
Indeed, the sudden emergence of new applications like SETI@Home, Groove, Napster, mobile communications, and Gnutella is threatening to replace the traditional client-server architecture of the web and bring rise to a new era in personal computing.
I thought he just said that FTP, Telnet, etc. were peer to peer ? Also, is Seti@HOME a P2P App or a Client server model (which is a lot more likely, as there is a fixed (set of) entity(ies) that hands out the data to be computed.
My recent work has focused on Gnutella as a model of a purely distributed computing system. Gnutella allows users to share information by directly connecting to each other forming a high-level network. High Level Network? What exactly is that supposed to be?
One of the biggest problems in analyzing performance of distributed computing networks such as Gnutella as a function of size, is that even simple protocols result in complex network interactions.
Huh? How so, can someone please care to explain? Agreed, there are a few more setup/terminate requests going around, but thats about it. (I've not analysed the actual packets of either gnutella like networks, nor IM networks, but I'm guessing here)
In order to gain a deeper understanding of the nature of those interactions, an accurate model of the system is needed. A first step toward such a model is understanding the topology of the network. To discover the topology of the Gnutella network, I have developed a distributed computing system using Java RMI. This program allows instances of Gnutella's topology to be obtained in constant time, an extremely important feature when studying a highly dynamic network such as Gnutella.
Am I missing something here? I would think that using Java RMI to distribute a code to a remote machine for execution would be required here. How can a program sitting at one place determine in constant time the instance of topology of the network? Looks like he used something similar to traceroute . In that case, however, how can the topology be retrieved in constant time?
Just wondering .....