Six Degrees of Wikipedia 296
An anonymous reader notes that someone has applied the game Six Degrees of Kevin Bacon to the articles in Wikipedia. Instead of the relation being "in the same film," he used "is linked to by." From the blog post: "We'll call the 'Kevin Bacon number' from one article to another the 'distance' between them. It's then possible to work out the 'closeness' of an article in Wikipedia as its average distance to any other article. I wanted to find the centre of Wikipedia, that is, the article that is closest to all other articles (has minimum [distance])."
Why wouldn't there be disjoint partitions? (Score:4, Interesting)
Where All... (Score:4, Interesting)
Last friday at work I was researching different chemicals on wikipedia (a favorite past time of mine) and thought it would be pretty neat if there was a way to find how related two articles were - or to have some way to query the links between two articles to find similarities.
What I really wanted was a very simple query. My SQL is very rusty, so a plain english version might be perhaps, 'show links where link exists in article_a and article_b'
Is there a way to execute SQL queries on wikipedia without having to actually download the entire database? I asked google, but was presented with the SQL page on wikipedia....
Interestingly (Score:1, Interesting)
Math AND movies. Mmmm
Re:Why wouldn't there be disjoint partitions? (Score:5, Interesting)
Orca
Argentina
Saxophone
Oboe
3 clicks needed
Link distance (Score:5, Interesting)
From Bacon to Physics, 3 clicks. (Score:3, Interesting)
Off-topic, but this is as good a place as any: There was a project hosted on some academic server a few years ago that linked song lyrics together. Clicking on the lyric 'creep' in the lyrics of the Radiohead song of the same title would bring up links to the TLC and Stone Temple Pilots songs of the same title, as well as any other song that used that word in their lyrics. Two songs that shared certain words would be linked by at most 2 clicks. I'm sure it has been buried in Google-cruft in the years since someone figured out that lyrics pages could be slurped up and turned into banner ad farms, but I had been thinking about how this could be re-implemented using a Wiki that would turn every word into a link and then link to a 'what links here' page. Does anyone know where this original project is or what happened to it? Any hints on re-implementing the behavior with a wiki?
"What is the use... (Score:3, Interesting)
You're not the only one with this problem, I fear.
Re:Why wouldn't there be disjoint partitions? (Score:2, Interesting)
Well, that depends. (Score:3, Interesting)
The next thing to consider is that Wikipedia is produced by self-selecting contributors who are (necessariy) selective as to what facts (and what references) are to be used, making this a definitely non-random sample using incomplete data out of a population that may have unexpected biases.
What matters, then, is that even under heavily sub-optimal conditions, we are getting the same results as we'd expect from near-perfect data. What also matters is that the incompleteness of the data is not significantly perturbing the distance between any two articles. You would expect it to, but it doesn't.
Re:Link distance (Score:4, Interesting)
Re:Why wouldn't there be disjoint partitions? (Score:5, Interesting)
Shortest path from Pelagius of Asturias to Pham Nuwen
Pelagius of Asturias
Iberian Peninsula
Africa
Zheng He
A Deepness in the Sky
Pham Nuwen
5 clicks needed
I've found several others that require 5 links.
I wish Stephen Dolan would have posted which article(s) has(have) the BIGGEST number as well...
What about language? (Score:5, Interesting)
I tested some random Japanese Wikipages and the test failed. I then tried some very common English pages and those failed as well "Unknown article...". So I think their server might be having the
In any case it doesn't look like they included other languages in their setup.
Re:Why wouldn't there be disjoint partitions? (Score:3, Interesting)
Shortest path from Agassaim to bananas No path found
However that is not always the case for "orphaned" pages -
Shortest path from Aldous to Gould Aldous Aldous Huxley 1949 Western Pacific Railroad Gould 4 clicks needed
And since he is using a directional graph -
Shortest path from Gould to Aldous No path found
Re:Why wouldn't there be disjoint partitions? (Score:2, Interesting)
No path found
What do I win?
Re:Why wouldn't there be disjoint partitions? (Score:3, Interesting)
The number of links per node (bi-directional), k (must be) >> ln(N), where N is the number of nodes, to avoid a fragmented network (assuming undirected link distribution).
So - figure out the number of pages (nodes) in wikipedia, slap a natural log around it and you know how many links you would need to double and then have much more than to avoid fragmentation.
So, you need much more than ~29 links per node to ensure no fragmentation.
That leads me to conclude that there are well over 61.2m individual inter-article links on wikipedia.
I wonder if that's accurate.
Also, I thought of that algorithm first and it's called HPSAUCE!
Re:Why wouldn't there be disjoint partitions? (Score:3, Interesting)
Ossa
Motorcycle
Toyota
Honda
Nikon
Nikon D300
Re:I know the center (Score:3, Interesting)
Another type of six degrees of freedom... (Score:2, Interesting)
The path from Rob Zombie to Dusty Springfield isn't that long:
- Rob Zombie covered Blitzkrieg Bop by Ramones
- Ramones covered Surf City by Jan & Dean
- Jan & Dean covered Lightnin' Strikes by Lou Christie
- Lou Christie covered If Wishes Could Be Kisses by Dusty Springfield
http://covertrek.com/findLinksBetween.html [covertrek.com]
Re:I know the center (Score:3, Interesting)
Shortest path from start to end
Start
Start signal
Code
Computer printer
Black
End
5 clicks needed