Forgot your password?
typodupeerror
Programming Businesses Software The Internet Yahoo! Apache IT Technology

Yahoo Releases Open Source Hadoop Distribution 49

Posted by timothy
from the spread-it-out-in-little-chunks dept.
ruphus13 writes "Yahoo has been a vociferous Apache Hadoop user and supporter for several years now, and uses it extensively within its Search technologies. Hadoop has been gaining popularity in the Cloud Computing space, with companies like the NYTimes converting 4TB and 11 million articles to PDFs in under 24 hours using Hadoop and EC2 in late 2007. Hadoop has been made available in Amazon's cloud and Yahoo has now released its own Hadoop version. From the article: 'At today's Hadoop Summit in Silicon Valley, Yahoo! announced the availability of the Yahoo! Distribution of Hadoop, a source-only version of Apache Hadoop that Yahoo! uses within its own search engine. [Hadoop] is an open source software framework that helps process very large data sets, and is widely used in large-scale data mining applications as well as in search tools at sites like Facebook and many others. For developers and users interested in Hadoop, it's worth noting that the Yahoo! Distribution of Hadoop has been widely tested and developed at Yahoo! for years now.'"
This discussion has been archived. No new comments can be posted.

Yahoo Releases Open Source Hadoop Distribution

Comments Filter:
  • Re:Yahoo! and OSS (Score:3, Interesting)

    by linguizic (806996) on Wednesday June 10, 2009 @06:50PM (#28286687)
    THANK YOU!!!! I have found YDN enormously useful.

    It's also worth noting that Yahoo has made major contributions to PHP as Rasmus is a Yahoo himself.
  • Re:Yahoo! and OSS (Score:5, Interesting)

    by hairyfeet (841228) <bassbeast1968@NOsPAM.gmail.com> on Wednesday June 10, 2009 @08:50PM (#28287737) Journal

    And folks like to make fun of Yahoo search, but after switching from Google I just can't ever even think about going back. The more/concept tab(that is the blue button below the search box) is just too nice to give up.

    Example- i just picked up "Blacksite:Area 51" for $5. I type in "blacksi" and there it is. From "Blacksite:Area 51" in the search box under more/related I have cheats,patch.system reqs, PS3.Xbox360, Midway games west,multiplayer modes, squad based shooters, release date by region, etc. Just from typing "blacksi" and picking area 51 from the drop down I have all those different avenues related to my search right there at the top where they are easy to get at. It really lets me hone in on an area, and in some cases like movies it finds me interviews with the director which i often don't even know who directed a particular flick.

    So those that haven't tried their search in a few years really ought to give it a whirl. The more/related concept tab at the top makes search so easy to drill down. Plus Yahoo has an opt out [yahoo.com] for ad matching if you are concerned about privacy. I looked and I don't think Google even has an "opt out" short of using ABP. So give it a go, its free and you might find the more/concepts button as useful as I do. And competition is always a good thing, right?

  • by Eric Smith (4379) <eric@brouha[ ]com ['ha.' in gap]> on Wednesday June 10, 2009 @08:57PM (#28287795) Homepage Journal
    Does the world need another Hadoop distribution? In a case like this, isn't a "distribution" just a fork going by a different name that has a more positive connotation? there some good reason they did it this way rather than just pushing their changes upstream to Apache? Did Apache not want them?

    I'll admit to knowing basically nothing about Hadoop, but if I saw the same article with "Hadoop" replaced by "GCC", "Postfix", or "OpenOffice", I wouldn't see it as being a good thing.

  • by linguizic (806996) on Wednesday June 10, 2009 @11:28PM (#28288903)
    Does the world need another Linux distribution? The folks at Ubuntu thought so, and they've made an indelible mark on Linux. Just like Yahoo! is doing with Hadoop [slashdot.org].
  • Re:Hadoop is awesome (Score:3, Interesting)

    by zerocool^ (112121) on Thursday June 11, 2009 @01:20PM (#28296385) Homepage Journal

    We also use it extensively at Rackspace Email division. We generate about 200GB/day of logs from postfix and dovecot installs, and hadoop with mapreduce allows us to pull all sorts of metrics and diagnostic information in very short timeframes. It helps our customer facing support reps, as well as allows us to give more demanding customers the statistics and metrics that they want, plus it helps us with capacity planning and a bunch of other stuff.

    And it's designed to run on commodity hardware.

    http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data [highscalability.com]

    ~Wx

You can do this in a number of ways. IBM chose to do all of them. Why do you find that funny? -- D. Taylor, Computer Science 350

Working...