Yahoo Releases Open Source Hadoop Distribution 49
ruphus13 writes "Yahoo has been a vociferous Apache Hadoop user and supporter for several years now, and uses it extensively within its Search technologies. Hadoop has been gaining popularity in the Cloud Computing space, with companies like the NYTimes converting 4TB and 11 million articles to PDFs in under 24 hours using Hadoop and EC2 in late 2007. Hadoop has been made available in Amazon's cloud and Yahoo has now released its own Hadoop version. From the article: 'At today's Hadoop Summit in Silicon Valley, Yahoo! announced the availability of the Yahoo! Distribution of Hadoop, a source-only version of Apache Hadoop that Yahoo! uses within its own search engine. [Hadoop] is an open source software framework that helps process very large data sets, and is widely used in large-scale data mining applications as well as in search tools at sites like Facebook and many others. For developers and users interested in Hadoop, it's worth noting that the Yahoo! Distribution of Hadoop has been widely tested and developed at Yahoo! for years now.'"
Re:Yahoo! and OSS (Score:3, Interesting)
It's also worth noting that Yahoo has made major contributions to PHP as Rasmus is a Yahoo himself.
Comment removed (Score:5, Interesting)
Why is this a big deal? (Score:3, Interesting)
I'll admit to knowing basically nothing about Hadoop, but if I saw the same article with "Hadoop" replaced by "GCC", "Postfix", or "OpenOffice", I wouldn't see it as being a good thing.
Re:Why is this a big deal? (Score:3, Interesting)
Re:Hadoop is awesome (Score:3, Interesting)
We also use it extensively at Rackspace Email division. We generate about 200GB/day of logs from postfix and dovecot installs, and hadoop with mapreduce allows us to pull all sorts of metrics and diagnostic information in very short timeframes. It helps our customer facing support reps, as well as allows us to give more demanding customers the statistics and metrics that they want, plus it helps us with capacity planning and a bunch of other stuff.
And it's designed to run on commodity hardware.
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data [highscalability.com]
~Wx