Forgot your password?
typodupeerror
Databases Programming Software IT

MapReduce Goes Commercial, Integrated With SQL 99

Posted by kdawson
from the patterns-in-the-data dept.
CurtMonash writes "MapReduce sits at the heart of Google's data processing — and Yahoo's, Facebook's and LinkedIn's as well. But it's been highly controversial, due to an apparent conflict with standard data warehousing common sense. Now two data warehouse DBMS vendors, Greenplum and Aster Data, have announced the integration of MapReduce into their SQL database managers. I think MapReduce could give a major boost to high-end analytics, specifically to applications in three areas: 1) Text tokenization, indexing, and search; 2) Creation of other kinds of data structures (e.g., graphs); and 3) Data mining and machine learning. (Data transformation may belong on that list as well.) All these areas could yield better results if there were better performance, and MapReduce offers the possibility of major processing speed-ups."
This discussion has been archived. No new comments can be posted.

MapReduce Goes Commercial, Integrated With SQL

Comments Filter:
  • by MarkWatson (189759) on Tuesday August 26, 2008 @04:02PM (#24756441) Homepage

    Data warehousing (here I mean databases stored in column order for faster queries, etc.) may get a lift from using map reduce over server clusters. This would get away from using relational databases for massive data stores for problems where you need to sweep through a lot of data, collecting specific results.

    I think that it is interesting, useful, and cool that Yahoo is supporting the open source Nutch system, that implements map reduce APIs for a few languages - makes it easier to experiment with map reduce on a budget.

  • First they attack it (Score:4, Interesting)

    by Intron (870560) on Tuesday August 26, 2008 @04:10PM (#24756545)
  • What a silly name... (Score:1, Interesting)

    by Anonymous Coward on Tuesday August 26, 2008 @04:15PM (#24756583)

    In functional programming map and reduce is very very old knowledge (and, yup, functional programming has its use and, yes, there are some very good and very successful programs written using functional languages).

    What's next? A product called DepthFirstSearch (notice the uber broken camel case for a product name) that has nothing to do with the depth-first search algorithm?

    Google? Allo?

  • by roman_mir (125474) on Tuesday August 26, 2008 @04:25PM (#24756705) Homepage Journal

    Except that relational databases are not just indexed objects copied across a large network of cheap PCs. What's good for Google may not be suitable for other databases, who actually care about ACID properties of transactions and not necessarily have the infrastructure to run highly parallel select queries.

  • Got what right? (Score:4, Interesting)

    by argent (18001) <peter@NOsPam.slashdot.2006.taronga.com> on Tuesday August 26, 2008 @04:59PM (#24757053) Homepage Journal

    I don't think you can credit Bjarne with "compiled code is faster than interpreted code" (or the 21st century version: "compilers can perform better optimizations that JIT translators").

    C++ happens to be the most popular fully compiled language, having edged Fortran out of that position some time near the end of the last century.

    Back in the early '80s, when he was coming up with C++, the big Fortran savants were saying stuff like "Fortran is bigger than ever. There are more than X million Fortran programmers. Everywhere I look there has been an uprising... a lot of teaching was going to Pascal, but more are teaching Fortran again. There has been a backlash."

    ----

    And that's not the only thing C++ has in common with Fortran, either.

  • by samkass (174571) on Tuesday August 26, 2008 @05:40PM (#24757425) Homepage Journal

    If Java ( or Pyhton etc. for that matter ) were fast enough why did Google choose C++ to build their insanely fast search engine.

    Because their developers knew it better? Because it had better 64-bit support when they started it? Because full GC's weren't compatible with their use case and IBM's parallel GC VM hadn't been released yet? Because they could get and modify all the source to all the libraries?

    I don't know the answer, but there are a lot of possibilities besides speed. You're jumping to an awfully big conclusion there, Mr. Coward.

  • Re:Got what right? (Score:3, Interesting)

    by johanatan (1159309) on Tuesday August 26, 2008 @06:08PM (#24757711)

    " (or the 21st century version: "compilers can perform better optimizations that JIT translators").

    Actually, JITters can do some optimizations that compilers can't--by splitting the compilation into a frontend and a backend. The front end is essentially just a parser, and the later the back-end compile happens, the more opportunities for optimizations actually open up (including such things as utilizing specific instruction sets for given architectures and fine tuning the compile based on run time statistics).

    See the LLVM for more info: http://llvm.org/ [llvm.org]

    (or .NET for that matter--but we're anti-MS around here. :-)

  • by Bazouel (105242) on Tuesday August 26, 2008 @07:19PM (#24758441)

    From a comment made about the article:

    You [the articles authors] seem to be under the impression that MapReduce is a database. It's merely a mechanism for using lots of machines to process very large data sets. You seem to be arguing that MapReduce would be better (for some value of better) if it were a data warehouse product along the lines of TeraData. Unfortunately the resulting tool would be less effective as a general purpose mechanism for processing very large data sets.

Living on Earth may be expensive, but it includes an annual free trip around the Sun.

Working...