Researcher's Wikipedia Big Data Project Shows Globalization Rate 16
Nerval's Lobster writes "Wikipedia, which features nearly 4 million articles in English alone, is widely considered a godsend for high school students on a tight paper deadline. But for University of Illinois researcher Kalev Leetaru, Wikipedia's volumes of crowd-sourced articles are also an enormous dataset, one he mined for insights into the history of globalization. He made use of Wikipedia's 37GB of English-language data — in particular, the evolving connections between various locations across the globe over a period of years. 'I put every coordinate on a map with a date stamp,' Leetaru told The New York Times. 'It gave me a map of how the world is connected.' You can view the time lapse/data visualization on YouTube."
Not "big data" (Score:3, Insightful)
Come on, 37G isn't big data. You'd have a hard time arguing 37TB is big data.
Cool stuff though.
Re: (Score:3, Informative)
Re: (Score:1)
there are some common dumps, like http://wiki.dbpedia.org/Downloads37 [dbpedia.org]
Re: (Score:2)
the ending of that movie (Score:2)
looks exponential :)
Re: (Score:2, Interesting)
Just like stars. If you consult a starmap, it's much denser near earth than further away. So looking at a star catalogue we'd be correct to surmise we're the center of the universe since all stars cluster around us right? Wrong.
Sampling bias. Starmaps are clustering stars around us because the stars in our vincinity are better sampled then those further away.
The movie looks exponential because the density of articles dealing with the present is higher than the the density of articles dealing with events lon
Re: (Score:3, Insightful)
"looks exponential :)"
As much as I'd like to think that meant the world is rapidly connecting, much more likely this is due to the fact that Wikipedia has only been around for a decade or so and people are inclined to write about things that are happening now (or have happened recently) than things that happened many years ago.
If Wikipedia had been available for the entire of those 200 years and had been consistently popular through that time and uniformly across the globe with no language bias then the res
To paraphrase Slashdot... (Score:3, Insightful)
If you're using Wikipedia as a metric to measure anything, you're insane.
Re: (Score:1)
Study about perspectives not history (Score:2)
From reading the NYT article, I understand this is a study of the English version of Wikipedia. That alone should raise a red flag about the significance of the study beyond being a survey of the interests or obsessions of Wikipedia editors.
It's useful only as a survey of a clearly unrepresentative sample of the world population. It's clearly biased against those that can't write English, itself a much smaller subset of those who can claim some fluency in English.
It tells us less about history and more abou
"Sentiment" Analysis? (Score:2)
After reading the article (yeah, I know) and viewing the video, it seems like "negative" entries appear most often around periods of time when there's a lot of war. Interesting and obvious... but I'd like to know if periods of religio