Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Database Bigwigs Lead Stealthy Open Source Startup 187

BobB writes "Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica. And not just him, he has recruited former Oracle bigwigs Ray Lane and Jerry Held to give the company a boost before its software leaves beta testing. The promise — a Linux-based system that handles queries 100 times faster than traditional relational database management systems."
This discussion has been archived. No new comments can be posted.

Database Bigwigs Lead Stealthy Open Source Startup

Comments Filter:
  • buzzword enabled (Score:4, Insightful)

    by hey ( 83763 ) on Wednesday February 14, 2007 @05:24PM (#18016620) Journal
    "grid-enabled, column-oriented relational database management system"
    What does that mean?
    If anything.
  • Re:Partners (Score:5, Insightful)

    by AKAImBatman ( 238306 ) * <akaimbatman@gmaYEATSil.com minus poet> on Wednesday February 14, 2007 @05:25PM (#18016644) Homepage Journal

    i was a little surprised by hp - since they have been trying to get the word out about their own data warehousing and bi stuff.

    It's called "hedging your bets". If the little company doesn't work out, no big deal. If it does, then HP is in a position to either benefit from contractual relations, acquire it, or squash it. Whichever happens to be their fancy.
  • by stoolpigeon ( 454276 ) * <bittercode@gmail> on Wednesday February 14, 2007 @05:32PM (#18016758) Homepage Journal
    smaller in number - but i'm willing to bet much more profitable and growing rapidly. we've been looking at data warehousing options and frankly most of them suck in one way or another. if someone can do it right - they can make a killing.
  • by georgewilliamherbert ( 211790 ) on Wednesday February 14, 2007 @05:33PM (#18016774)
    Vertica's website has had all the details about what they're doing for months. They've had a Wikipedia article for a long time.

    This is some new Network World definition of "Stealthy", apparently...
  • Best of luck (Score:5, Insightful)

    by 140Mandak262Jamuna ( 970587 ) on Wednesday February 14, 2007 @05:40PM (#18016836) Journal
    I dont want to rain in their parade. But typically whenever people start with a spec like "100 times better than what they can do", they assume they will continue to perform at current levels while these people take years to develop and mature their new technology. In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.

    What happened to Gallium Arsenide replacing silicon? What happened to solid state memory completely repalcing magnetic disks? Technology field is littered with such fiascos.

  • by georgewilliamherbert ( 211790 ) on Wednesday February 14, 2007 @05:47PM (#18016910)

    A column oriented relational database? I'd like some more details on how that works.

    Column oriented is easy. Imagine a database as a set of tables, each of which has rows of data records, in organized columns (column 1 = "User name", column 2 = "User ID", column 3 = "Favorite slashdot admin", etc).

    Normal row-oriented databases store records which have a row of the data: "User name", "User ID", "Favorite slashdot admin" for user row #12345.

    Column oriented databases store records which have a column of the data: "User name" for user rows 1-100,000; "User ID" for user rows 1-100,000; etc.

    Updates are faster with row-oriented: you access the last record file and append something, or access an intermediate record file and update one "row" across.

    Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that goes relatively quickly.

    Indexes are useful, but column-oriented is more efficient in some ways. You don't have to maintain the indexes, and can just automatically search any column without having indexed it, in a reasonably efficient manner.

    Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...

  • by Qbertino ( 265505 ) <moiraNO@SPAMmodparlor.com> on Wednesday February 14, 2007 @06:01PM (#18017058)
    ... for a long time.
    Classic RDBMSes are crutches. A forced-upon neccesitiy we have to put up with for our app models to latch on to real world hardware and it's limitations. A historically grown mess with an overhead so huge it's insane. With a Database PL and 30+ dialects of it from back in the days when we flew to the moon using a slide-ruler as primary means of calculation.
    If what they claim is true, these guys are probably finally ditching the omnipresent redundant n-fold layers user and connection management in favour of a lean system that at last does away with the distinction of filesystem and database and data access layer. Imagine a persistance layer with no SQL, no extra user management, no extra connection layer, no filesystem under it and native object suport for any PL you wish to compile in.
    I tell you, finally ditching classic RDBMSes is *long* overdue, they're basically all the same ancient pile of rubble, from MySQL up to Oracle. If these guys are up to taking on this deed (or part of it) and they get finished when solid-state finally relieves our current super-slowpoking spinning metal disks on a broad scale we'll feel like being in heaven compared to the shit we still have to put up with today.
    I wish these guys all the best. They appear to have the skills to do it and the authority to emphasise that todays RDBMSes and their underlying concepts are a relic of the past.
    My 2 cents.
  • by flyingfsck ( 986395 ) on Wednesday February 14, 2007 @06:11PM (#18017168)
    Yup, it is all about making the individual files smaller and more regular. Kinda the opposite of XML.
  • by Splab ( 574204 ) on Wednesday February 14, 2007 @06:33PM (#18017396)
    Uhm... wtf?

    Seriously, you tested MySQL vs. other databases with "out of the box" setups? MySQL isn't a real database when running MyIsam engine, you simply cannot compare that with anything else. And on top of that, try do a proper insertion in MySQL, one single transaction with a few millions of rows and see how well that does. Oh and did you ever stop to think about _why_ MySQL does perform so much faster on that test? Try doing it on a InnoDB table with standard setup, even at 600k rows it slows to a crawl. (Easily fixable, but requires some optimizations)

    Seriously the reason why big vendors have a clause in their eula for people to NOT do benchmarks is exactly people like you, you have no idea about what you are comparing, just figured that setting up something out of the box will give a good insight into the speed. Sheesh.

    Ohh and the 100 fold increase in speed is very much likely to happen - on certain types of queries. With horizontal representation you can do sequential scan only on the part of the data that you need, not the entire set, which should be very very fast.
  • Re:Best of luck (Score:3, Insightful)

    by einhverfr ( 238914 ) <chris...travers@@@gmail...com> on Wednesday February 14, 2007 @10:04PM (#18019400) Homepage Journal
    For certain applications (particularly BI), I think that 100x speedups are practical, but I would not expect it in general OLTP systems.

    Let me give you an example.

    Suppose you have a table with, say, 100 billion rows. You want to create a report which provides aggregated data on a very large subset of a few columns of table. With a tradition RDBMS, you have to read through every single one of the 100 billion rows to aggregate the data (indeces don't help if you are going to be searching through a sizeable percentage of disk pages).

    Most systems currently tackle this problem using massive parallelism. I.e. you break up the table into little pieces on different systems and store pieces of it on different systems. Now imagine that in addition to this, you break up each column into its own table. Now you have fewer disk pages to search through. Less memory and disk bandwidth issues, faster performances.

    Now, this would be less useful if you were trying to do more complex queries on larger numbers of columns, and inserts/updates suck.

    So like many things, it is a tradeoff.
  • by Anonymous Coward on Wednesday February 14, 2007 @11:36PM (#18019962)
    Score 3, Insightful?

    You have got to be kidding me. This comment is the most nonsensical idiotic drivel ever. Read a goddamned database textbook before making such an asinine comment to at least learn the fundamentals of why RDBMS systems are used (and used successfully when used properly).

    Yes, SQL isn't all that great, but you can take your object persistence system and shove it right up your ass. Most of them are total pieces of shit.
  • by AKAImBatman ( 238306 ) * <akaimbatman@gmaYEATSil.com minus poet> on Thursday February 15, 2007 @01:33AM (#18020562) Homepage Journal

    Isn't that what indexes do?

    Yes! No! Sort of!

    Indexes only optimize some types of queries. To get the absolute maximum performance out of your database, you have to make sure that there is a specific index for each query you run, and that your indexes are properly rebuilt and optimized for least-time search. Suffice it to say, this rarely happens in the real world. So there's almost always some scanning, even after the indexes narrow things down a bit. By going with a column-oriented storage design, the scan can be streamed at higher levels of thoroughput than is possible with row-oriented databases.

    The downside is that you're sacrificing the time to access individual rows, so if you're pulling and processing millions of rows of data, this might actually be slower than a traditional row-oriented database. Updates are almost guaranteed to be slower as you have to write to several column-oriented data stores rather than a single row-oriented store.

    Still, column orientation makes a lot of sense for a variety of today's database applications. So if you're in need of querying a multi-terrabyte table, this product may be just what the (senior database) administrator ordered.
  • by Kjella ( 173770 ) on Thursday February 15, 2007 @09:18AM (#18022480) Homepage
    Under ideal conditions, I don't have a problem seeing that:

    1. Make up lots of 100-column+ tables
    2. Select one column from each table
    3. If you're IO bound, you should now see about a 100:1 increase

    However, most real data models don't work that way. Usually you put stuff that's useful at the same time in the same table, in which case it probably won't make much of a difference.

"When the going gets tough, the tough get empirical." -- Jon Carroll

Working...