Forgot your password?
typodupeerror
The Internet Programming IT Technology

Six Degrees of Wikipedia 296

Posted by kdawson
from the finding-the-center dept.
An anonymous reader notes that someone has applied the game Six Degrees of Kevin Bacon to the articles in Wikipedia. Instead of the relation being "in the same film," he used "is linked to by." From the blog post: "We'll call the 'Kevin Bacon number' from one article to another the 'distance' between them. It's then possible to work out the 'closeness' of an article in Wikipedia as its average distance to any other article. I wanted to find the centre of Wikipedia, that is, the article that is closest to all other articles (has minimum [distance])."
This discussion has been archived. No new comments can be posted.

Six Degrees of Wikipedia

Comments Filter:
  • by Palmyst (1065142) on Tuesday May 27, 2008 @05:54PM (#23562913)
    Ignoring obvious stuff like main page, index etc.. is it not possible that there could be two articles that are not in the same transitive closure at all?
  • Where All... (Score:4, Interesting)

    by TheLazySci-FiAuthor (1089561) <thelazyscifiauthor@gmail.com> on Tuesday May 27, 2008 @05:54PM (#23562919) Homepage Journal
    It's sometimes eerie to think of an idea and then see that someone has done it over the weekend and posted it on slashdot.

    Last friday at work I was researching different chemicals on wikipedia (a favorite past time of mine) and thought it would be pretty neat if there was a way to find how related two articles were - or to have some way to query the links between two articles to find similarities.

    What I really wanted was a very simple query. My SQL is very rusty, so a plain english version might be perhaps, 'show links where link exists in article_a and article_b'

    Is there a way to execute SQL queries on wikipedia without having to actually download the entire database? I asked google, but was presented with the SQL page on wikipedia....

  • Interestingly (Score:1, Interesting)

    by Anonymous Coward on Tuesday May 27, 2008 @05:58PM (#23562963)
    Slashdot's favorite Star Wars Prequel actress Natalie Portman "... is among a very small number of professional actors with a defined ErdÅ'sâ"Bacon number. [wikipedia.org]"

    Math AND movies. Mmmm ...
  • by Intron (870560) on Tuesday May 27, 2008 @05:59PM (#23562971)
    In theory. I haven't found two articles with a separation greater than 4, tho.

    Orca
    Argentina
    Saxophone
    Oboe
    3 clicks needed
  • Link distance (Score:5, Interesting)

    by ninjapiratemonkey (968710) on Tuesday May 27, 2008 @06:01PM (#23563021)
    The distance going from Article A to Article B is not necessarily the same as from Article B to article A. For example, the Slashdot [wikipedia.org] page links to the HTTP [wikipedia.org] page, but not vice versa. It would be interesting to know if he took that into consideration when counting links, or whether he would have counted it as one in either direction.
  • by certron (57841) on Tuesday May 27, 2008 @06:04PM (#23563049)
    While the results are interesting (I won't spoil it by posting the answers, although I'm sure someone else has already cut to the chase and done it), the way they arrived at their results is more interesting. I'm sure this could be extended to some pretty maps of what links where, or deep/shallow topics in different fields. I had tried to find the number of links between Kevin Bacon and Nuclear Physics, but it didn't like my input. Instead, I discovered that it takes 3 clicks to go from Bacon to Physics, passing through Columbia University and BDSM on the way.

    Off-topic, but this is as good a place as any: There was a project hosted on some academic server a few years ago that linked song lyrics together. Clicking on the lyric 'creep' in the lyrics of the Radiohead song of the same title would bring up links to the TLC and Stone Temple Pilots songs of the same title, as well as any other song that used that word in their lyrics. Two songs that shared certain words would be linked by at most 2 clicks. I'm sure it has been buried in Google-cruft in the years since someone figured out that lyrics pages could be slurped up and turned into banner ad farms, but I had been thinking about how this could be re-implemented using a Wiki that would turn every word into a link and then link to a 'what links here' page. Does anyone know where this original project is or what happened to it? Any hints on re-implementing the behavior with a wiki?
  • "What is the use... (Score:3, Interesting)

    by jd (1658) <imipak@yaCOLAhoo.com minus caffeine> on Tuesday May 27, 2008 @06:17PM (#23563185) Homepage Journal
    ...in staying up all night arguing over whether there is or isn't a God, if the machine only gives you his bleedin' phone number in the morning!"

    You're not the only one with this problem, I fear.

  • by stedo (855834) on Tuesday May 27, 2008 @06:25PM (#23563295) Homepage
    Yes, there are. Read the rest of TFA for exactly how this is handled, but the gist is: closeness of an article = [total length of all shortest paths from this article]/[number of articles reachable from here]. There are a couple of disjoint sets, but they don't actually affect the results much as they're all tiny (disambig pages, etc)
  • Well, that depends. (Score:3, Interesting)

    by jd (1658) <imipak@yaCOLAhoo.com minus caffeine> on Tuesday May 27, 2008 @06:32PM (#23563375) Homepage Journal
    The six degrees of seperation is an easily-misunderstood concept, making it important that what it is people are looking for is also what people think they are looking for.

    The next thing to consider is that Wikipedia is produced by self-selecting contributors who are (necessariy) selective as to what facts (and what references) are to be used, making this a definitely non-random sample using incomplete data out of a population that may have unexpected biases.

    What matters, then, is that even under heavily sub-optimal conditions, we are getting the same results as we'd expect from near-perfect data. What also matters is that the incompleteness of the data is not significantly perturbing the distance between any two articles. You would expect it to, but it doesn't.

  • Re:Link distance (Score:4, Interesting)

    by jd (1658) <imipak@yaCOLAhoo.com minus caffeine> on Tuesday May 27, 2008 @06:34PM (#23563405) Homepage Journal
    In mathematical terms, this makes Wikipedia a non-simply-connected space. This has two consequences. Firstly, it makes the topology much harder to describe. Secondly, it means that topologists should have enough research material to write books and papers on the dynamics of Wikispace for years to come.
  • by mfarah (231411) <miguel.farah@cl> on Tuesday May 27, 2008 @06:36PM (#23563447) Homepage
    So far, my "personal best" has been 5 clicks:

    Shortest path from Pelagius of Asturias to Pham Nuwen

    Pelagius of Asturias
    Iberian Peninsula
    Africa
    Zheng He
    A Deepness in the Sky
    Pham Nuwen

    5 clicks needed


    I've found several others that require 5 links.

    I wish Stephen Dolan would have posted which article(s) has(have) the BIGGEST number as well...
  • What about language? (Score:5, Interesting)

    by kylehase (982334) on Tuesday May 27, 2008 @06:38PM (#23563473)
    The 6 degrees theory claims that everyone in the world is connected. That means you'd have to include every Wikipedia page in other languages as well, not just English.

    I tested some random Japanese Wikipages and the test failed. I then tried some very common English pages and those failed as well "Unknown article...". So I think their server might be having the /. effect.

    In any case it doesn't look like they included other languages in their setup.
  • by Gat0r30y (957941) on Tuesday May 27, 2008 @06:43PM (#23563551) Homepage Journal
    Those aren't linked from any other articles - but they link to other wikipedia articles. Since its a directional graph he's using (from what I gathered) it would appear to me that these would only be disjoint in a one way sort of style. I.E. You can get from A to B in a finite number of steps but you cannot get from B to A - he appears to measure the minimum distance. - However I was able to get thies -
    Shortest path from Agassaim to bananas No path found
    However that is not always the case for "orphaned" pages -
    Shortest path from Aldous to Gould Aldous Aldous Huxley 1949 Western Pacific Railroad Gould 4 clicks needed
    And since he is using a directional graph -
    Shortest path from Gould to Aldous No path found
  • by Redacted (1101591) on Tuesday May 27, 2008 @07:04PM (#23563833)
    Shortest path from Nikon D300 to Ossa

    No path found

    What do I win?
  • by Anonymous Cowpat (788193) on Tuesday May 27, 2008 @07:11PM (#23563949) Journal
    well, according to "Complexity vs. stability in small-world networks." (Sitabhra Sinha. Journal of Physics. 2004):

    The number of links per node (bi-directional), k (must be) >> ln(N), where N is the number of nodes, to avoid a fragmented network (assuming undirected link distribution).
    So - figure out the number of pages (nodes) in wikipedia, slap a natural log around it and you know how many links you would need to double and then have much more than to avoid fragmentation.

    So, you need much more than ~29 links per node to ensure no fragmentation.
    That leads me to conclude that there are well over 61.2m individual inter-article links on wikipedia.
    I wonder if that's accurate.

    Also, I thought of that algorithm first and it's called HPSAUCE!
  • Meh. There's a path I found manually in about a minute. There's probably a shorter one, though;
    Ossa
    Motorcycle
    Toyota
    Honda
    Nikon
    Nikon D300
  • Re:I know the center (Score:3, Interesting)

    by Slashidiot (1179447) on Wednesday May 28, 2008 @04:04AM (#23567723) Journal
    I've been delighted to find out the shortest path from A to B. Two clicks, through ASCII. So it's not a straight line, as people try to make us believe...
  • by teapot7 (1265806) on Wednesday May 28, 2008 @06:38AM (#23568361)
    It exists for rock/pop/whatever music and cover versions too:

    The path from Rob Zombie to Dusty Springfield isn't that long:

    - Rob Zombie covered Blitzkrieg Bop by Ramones
    - Ramones covered Surf City by Jan & Dean
    - Jan & Dean covered Lightnin' Strikes by Lou Christie
    - Lou Christie covered If Wishes Could Be Kisses by Dusty Springfield

    http://covertrek.com/findLinksBetween.html [covertrek.com]

  • Re:I know the center (Score:3, Interesting)

    by Fumus (1258966) on Wednesday May 28, 2008 @08:41AM (#23569177)
    Funny that. Start to end has five clicks needed.

    Shortest path from start to end
    Start
    Start signal
    Code
    Computer printer
    Black
    End
    5 clicks needed

"A great many people think they are thinking when they are merely rearranging their prejudices." -- William James

Working...