Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Yahoo Launches a Hadoop Blog

Posted by jamie on Wed Nov 14, 2007 08:15 PM
from the what-part-don't-you-understand dept.
Interesting news on the massively-parallelizable software front. The Hadoop project at Yahoo, after showing great results over the past year, has launched a blog today. Hadoop is open-source, and enables grid computation through a Map-Reduce implementation and a large distributed file system. It's written in Java but any client language can interact with it. And I guess it plays well with EC2. If you have a petabyte to crunch you probably already know about it... if you might have one next year, check out their blog.
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • How does this compare to Google's map-reduce implementation, from a technical standpoint?

    • Don't you mean "Hadoop this compare to Google's implementation"?

      Hadoop!

    • by eklitzke (873155) on Wednesday November 14 2007, @09:45PM (#21358653) Homepage

      How does this compare to Google's map-reduce implementation, from a technical standpoint?

      Uh, it's freely available. Unless you work at Google the issue is moot.
          • Re: (Score:2, Funny)

            Oh, I understand now. We need more information about Google MapReduce so we can figure out how to keep Google from using it to destroy our cities.

            Mr. President, we must not allow... a MapReduce gap!
    • I believe no one at Google can comment on that. The rest of us can only guess.

      Things that I guess are different:
      1. Ours is open source and available to everyone. Ok, I know this one. *smile*
      2. Our implementation is Java instead of C++.
      3. Our interfaces are object-based instead of raw bytes.
      4. Our reduces can output different types than the reduce inputs. Furthermore, our reduces can generate both keys and values instead of just values.
  • Attribution? (Score:4, Insightful)

    by IWannaBeAnAC (653701) on Wednesday November 14 2007, @08:30PM (#21357853)
    Full marks to Yahoo for supporting the project, but it looks to me like the Apache Lucene [apache.org] project deserves the credit here. Yahoo had nothing to do with the software development itself (although a few minutes browsing wasn't enough to determine to what extent Yahoo employees are contributing to the software itself).
    • Yahoo provided free search services to the developers to find reference material and examples.
    • It helps to read.... (Score:5, Interesting)

      by keepper (24317) on Wednesday November 14 2007, @09:24PM (#21358419) Homepage
      To build the necessary software infrastructure, we could have gone off to develop our own technology, treating it as a competitive advantage, and charged ahead. But we've taken a slightly different approach. Realizing that a growing number of companies and organizations are likely to need similar capabilities, we got behind the work of Doug Cutting (creator of the open source Nutch and Lucene projects) and asked him to join Yahoo to help deploy and continue working on the [then new] open source Hadoop project.

      But of course, the goog fanboys never give yahoo any credit. Bah... when was the last time google supported something of this scale? ( and yahoo gives more than that, they have a hundreds of node running nutch, which is the biggest install base, as well as hiring key employees, as well as providing financial support, etc etc )

      Disclaimer, i did work for yahoo.
      • Great, I was looking for something like that, but I didn't come across it in my few minutes looking, and this being slashdot I wanted to get in before the crapflood...
      • by chrisd (1457) * <(moc.anobid) (ta) (dsirhc)> on Thursday November 15 2007, @02:30AM (#21360743) Homepage
        Do you mean when was the last time we supported hadoop? We put in the user-permission patches last summer. Or do you mean supporting open source? Because I'd imaigne you'd want to see the code we've released on code.google.com [google.com] or see the summer of code [google.com], we conservatively generated over 4m lines of code last summer, our 3rd time running it.

        But, Yahoo's support of hadoop is pretty cool, for sure.

        Chris

        Disclaimer: I still work at Google :-)

        • Re: (Score:1, Offtopic)

          we conservatively generated over 4m lines of code last summer, our 3rd time running it.
          Yet there is still no Gtalk for Linux...I wish you guys over there would do something about that..

          The summer of code is amazing by the way.
          • Gtalk is a standard Jabber service and pretty much any IM client that works on Linux can be used with it.
            • My comment was referring to the official client, not the numerous alternatives that do not support VOIP or many of the other features that the windows client has.
              • The official client has features? It's distinct lack of ability to do anything other than send and receive messages to people was always one of it's better points I thought...
                • There's a lot of functionality that I miss, like my text logs being sent to my gmail account, or being able to leave voip messages to others gmail accounts and of course voip.

                  It's getting late in the game now for Gtalk. Linux already have a Skype client with voip and video which I am using.
                  • Ah, fair enough, that makes sense. I don't use any of that stuff, so I never really noticed it was there truth be told...
        • Re: (Score:2, Informative)

          Unfortunately, Google's eyes were bigger than their wallet (ok, really their interns' time budget *smile*) and the patch was not finished. A Yahoo engineer is currently working on getting file permissions in. I really appreciate that Google is supporting Hadoop and I love the educational materials [google.com] they have done for Hadoop. They just need to choose more realistic projects. *grin*

          Disclaimer, I'm a Hadoop committer and am paid by Yahoo to work on Hadoop full time.
        • Well, I don't think those changes have been "put in" yet, I think they are still in development (at least judging from HADOOP-1298). Also, don't forget generating curriculum for students and donating hardware to schools to learn about distributed computing (I took the UW's distributed computing class in Spring).

          Andrew

          PS: Hi Chris!
        • Submitting a patch, while VERY commendable and encouraged, isn't the same thing as supporting the WHOLE project.

          This would be akin to goog releasing part of their map/reduce or gfs implementation or big table, or some other technology related to a core part of their business. ( disclaimer again, hadoop isnt a core part of y! search, but its being explored for many core uses, and at this point, probably being used for some other search and distributed processing projects ).

          Yes Chris, everyone knows google do
    • Re:Attribution? (Score:4, Informative)

      by allenw (33234) on Thursday November 15 2007, @02:07AM (#21360619) Homepage Journal
      At this point in time, most (but certainly not all!) of the core Hadoop code is being written by Yahoo! employees. This also includes a lot of work on using, testing and debugging Hadoop at pretty significant scales (and are currently the largest known installations).

      Probably worth pointing out that Yahoo! also recently contributed Pig [apache.org] to the Apache incubator, which is an SQL-like language for usage with MapReduce and in particular Hadoop.
    • When I ran the numbers out of curiosity last month, 70% of the fixed jira issues for Hadoop (80% if you only count framework and not contrib) were contributed by Yahoo engineers.
  • seriously, it sounds syncopated, like jazz
    and jazz and programming don't mix
  • I have a question.

    Who comes up with these stupid names?

    Monad -> Gonad

    Hadoop -> HadaPoop