Forgot your password?
typodupeerror
Databases Programming Software IT

MapReduce Goes Commercial, Integrated With SQL 99

Posted by kdawson
from the patterns-in-the-data dept.
CurtMonash writes "MapReduce sits at the heart of Google's data processing — and Yahoo's, Facebook's and LinkedIn's as well. But it's been highly controversial, due to an apparent conflict with standard data warehousing common sense. Now two data warehouse DBMS vendors, Greenplum and Aster Data, have announced the integration of MapReduce into their SQL database managers. I think MapReduce could give a major boost to high-end analytics, specifically to applications in three areas: 1) Text tokenization, indexing, and search; 2) Creation of other kinds of data structures (e.g., graphs); and 3) Data mining and machine learning. (Data transformation may belong on that list as well.) All these areas could yield better results if there were better performance, and MapReduce offers the possibility of major processing speed-ups."
This discussion has been archived. No new comments can be posted.

MapReduce Goes Commercial, Integrated With SQL

Comments Filter:
  • by Anonymous Coward on Tuesday August 26, 2008 @04:01PM (#24756425)

    People who don't know LISP are bound to reinvent it, badly.

  • by Anonymous Coward on Tuesday August 26, 2008 @04:05PM (#24756475)

    they go together like paint and peanut butter.

    Map/Reduce is better suited for read-only data mining situations.

  • by Anonymous Coward on Tuesday August 26, 2008 @04:42PM (#24756879)

    I am with Bjarne on this one.
    Bjarne Stroustrup, creator of the C++ programming language, claims that C++ is experiencing a revival and
    that there is a backlash against newer programming languages such as Java and C#. "C++ is bigger than ever.
    There are more than three million C++ programmers. Everywhere I look there has been an uprising
    - more and more projects are using C++. A lot of teaching was going to Java, but more are teaching C++ again.
    There has been a backlash.", said Stroustrup.

    He continues.. ..What would the world be like without Google?... Only C++ can allow you to create applications as powerful as MapReduce which allows them to create fast searches.

    I totally agree. If Java ( or Pyhton etc. for that matter ) were fast enough why did Google choose C++ to build their insanely fast search engine. MapReduce rocks.. No Java solution can even come close.
    I rest my case.

  • by johanatan (1159309) on Tuesday August 26, 2008 @06:02PM (#24757629)

    To most people, C++ is C. :-) Unfortunate but true.

  • by Anonymous Coward on Tuesday August 26, 2008 @06:07PM (#24757689)

    Probably someone who read the post and knows how wrong he is. Like you traverse the web every time you want to look up a search term or how a map is really the same as load balancing...

  • wrong argument? (Score:2, Insightful)

    by fragbait (209346) on Tuesday August 26, 2008 @08:09PM (#24759017) Homepage

    Though this post is my introduction to both MapReduce and the argument, it strikes me that the people arguing are arguing the wrong problem.

    While MapReduce might be used against some structured data, it looks to be something for unstructured data and dynamically inventing structures in unstructured data. Additionally, you might want to keep that new structure around for a while. You might want to load it up with terabytes of data. At the same time, this data is less and less useful over time.

    Think about two of the key pieces of data Google has, web pages and user interaction and preference data. Web pages change over time. Web sites come and go. Some change a lot (news sites) and some change very little.

    There is a LOT of user interaction data. Clicks on pages, javascript that fires to doubleclick, etc. With preferences, that changes over time, too. Also, marketers want to dynamically react to the clicks and even the minute change of a preference that generates a buck.

    With such a large, changing, and time sensitive dataset, how could it be structured into something as relatively static as a schema? You would box yourself in by making it a schema and defining all the possible relationships.

    So, you take it up one abstraction level and make a "schema" for making relationships. Further more, there is a narrow window within which you even care about data and how it is structured. Granted, you want the webpage/site data to stick around for queries. But even that is marginally useful. Think about how many pages you go into a query on google? I'm sure that will vary by person, but I'd also bet that in practice it is pretty small.

    Maybe everyone else gets that and I'm just late to the party. But my point is that the wrong argument is being made that this should follow all the RDBMS work that has come to date.

    Sure, I do agree that they shouldn't completely ignore all of the research, but to suggest it has to have a schema, indices, etc. just comes across as arguing all data problems belong in a traditional database.

    Or maybe I can take a different approach to this....my brain doesn't have an index. It does categorize data and it can categorize the same piece of data in multiple ways. As I learn new things, my brain creates new "indices" of sort. A large portion of the data in my brain is time sensitive, or indexed over time. The older I get, the more the details of the minutia of life (what I had for dinner this evening) isn't important any more and it loses its categorization. I don't have a schema for my brain, rather I have multiple and I invent and dissolve them over time. I don't know what new one I'll need in the future. I can't know that and without that, I can't make a schema for it. I also can't be constantly modifying the same schema in place. It is easier for me to invent a new one as I go and just abandon the old ones. Sure, new schemas will have parts of the old, but it is still a new schema with the old one still in place and referencing the same data that the new one will soon reference.

    -fragbait

Hackers of the world, unite!

Working...