Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Database Bigwigs Lead Stealthy Open Source Startup

Posted by ScuttleMonkey on Wed Feb 14, 2007 04:17 PM
from the hope-it-isn't-vaporcorp dept.
BobB writes "Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica. And not just him, he has recruited former Oracle bigwigs Ray Lane and Jerry Held to give the company a boost before its software leaves beta testing. The promise — a Linux-based system that handles queries 100 times faster than traditional relational database management systems."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Partners (Score:5, Informative)

    by stoolpigeon (454276) * <bittercode@gmail> on Wednesday February 14 2007, @04:22PM (#18016580)
    (http://thepeckfamily.us/ | Last Journal: Saturday November 10, @10:49AM)
    The article mentions that redhat and hp are listed among their partners. i'm not surprised by red hat or informatica (another partner though they aren't mentioned in the article) but i was a little surprised by hp - since they have been trying to get the word out [hp.com] about their own data warehousing and bi stuff. i wonder what that indicates about how they regard this new player.
     
    also interesting is the wikipedia article on Michael Stonebraker [wikipedia.org] if you aren't already familiar with him.
  • Column oriented databases (Score:2, Interesting)

    by Anonymous Coward on Wednesday February 14 2007, @04:22PM (#18016582)
    The article seems to describe the big advantage as being column oriented.

    How does this differ than KX System's kdb (www.kx.com) which IIRC is similar in that way; and is alredy in use at many if not most major financial institutions (see their customer list)?
  • When Will This Be Ported? (Score:4, Funny)

    by Anonymous Coward on Wednesday February 14 2007, @04:23PM (#18016610)
    The question is when will this be ported to a mainstream OS such as Windows?

  • by varmittang (849469) on Wednesday February 14 2007, @04:24PM (#18016616)
    (http://www.ducktapeandglue.com/)
    It was LAMP, now its LAVA. Much cooler name.
  • Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica ... The promise -- a Linux-based system that handles queries 100 times faster than traditional relational database management systems.

    Yeah, but what does its radar signature look like?
  • buzzword enabled (Score:4, Insightful)

    by hey (83763) on Wednesday February 14 2007, @04:24PM (#18016620)
    (Last Journal: Thursday December 08 2005, @04:33PM)
    "grid-enabled, column-oriented relational database management system"
    What does that mean?
    If anything.
    • Re:buzzword enabled (Score:5, Informative)

      by c0nst (655115) on Wednesday February 14 2007, @04:59PM (#18017032)
      Here you go:
      Stonebraker, Mike; et al. (2005). C-Store: A Column-oriented DBMS [mit.edu] (PDF). Proceedings of the 31st VLDB Conference.
      From the paper:
      Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of columnoriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures
      :-)
      [ Parent ]
    • Re:buzzword enabled (Score:5, Funny)

      by Jherek Carnelian (831679) on Wednesday February 14 2007, @05:39PM (#18017472)

      "grid-enabled, column-oriented relational database management system"
      What does that mean?

      Uh, a spreadsheet?
      [ Parent ]
    • Re:buzzword enabled (Score:5, Informative)

      by perfczar (1064296) on Wednesday February 14 2007, @05:54PM (#18017616)
      Buzzwords, yes, but they have a little bit of meaning left. Grid-enabled means that it works on a "shared nothing" environment, that you can use a networked cluster of commodity computers if one isn't enough to hold the data, and so on. This is in contrast to using one big huge box (big computer, big storage array, or whatever). Of course many databases are similarly grid-enabled. Column-oriented means that data is stored on disk by column, this makes it fast to process a subset of columns that touch lots of rows, as is typical in data warehouse applications. This is a key architectural difference among databases; Oracle, DB2, etc., are "row stores", while Sybase IQ, Vertica, etc. are "column stores". Note: I work for Vertica Systems
      [ Parent ]
    • Re:buzzword enabled (Score:5, Informative)

      by ChrisA90278 (905188) on Wednesday February 14 2007, @05:54PM (#18017618)
      Column oriented means it can read data in from one column from the disk without pulling in all the other bytes in the row. Possibly much less reduced I/O bandwidth usage depending on the query. (kind of like if you turned the normal file structure side ways.)

      Grid enabled - This means the DBMS can make use of a large distributed group of computers and potentially have access to a huge amount of computing power. The typical DBMS runs on at beat a multi-processor server. Thi sis kind of like a DBMS server running a a "seti at home" type network.

      Going solely by the developer's reputation, this could be a big deal. He is not some random hacker. He is a well known university professor who has several times in the past lead projects that have been revolutionary and turned the field around. His ideas are widely used Still "100X faster" is a big claim. Lots of smart people have been working on DMBSes for many years, a two order of magnitude improvement is a "I will have to see it to believe it" type claim

      I'm using PostgreSQL to handle some telemetry data right now. If my 45 minute run times can be reduced to seconds, I'll be happy.

      [ Parent ]
      • Big claims are backed (Score:4, Informative)

        by Virtual_Raider (52165) on Wednesday February 14 2007, @09:29PM (#18019562)
        (http://virtualraider.livejournal.com/)

        Still "100X faster" is a big claim. Lots of smart people have been working on DMBSes for many years, a two order of magnitude improvement is a "I will have to see it to believe it" type claim

        Oh ye of little faith, here i present thee with The Facts. Or a paper at the very least: One size fits all? a Benchmark [mit.edu]

        [ Parent ]
        • 1 reply beneath your current threshold.
      • welcome to 1994 by kpharmer (Score:2) Thursday February 15 2007, @12:45AM
      • Re:buzzword enabled (Score:4, Insightful)

        by Kjella (173770) on Thursday February 15 2007, @08:18AM (#18022480)
        (http://slashdot.org/)
        Under ideal conditions, I don't have a problem seeing that:

        1. Make up lots of 100-column+ tables
        2. Select one column from each table
        3. If you're IO bound, you should now see about a 100:1 increase

        However, most real data models don't work that way. Usually you put stuff that's useful at the same time in the same table, in which case it probably won't make much of a difference.
        [ Parent ]
      • Re:buzzword enabled by kpharmer (Score:1) Thursday February 15 2007, @12:37AM
      • 1 reply beneath your current threshold.
    • Re:buzzword enabled by bytesex (Score:2) Thursday February 15 2007, @04:15AM
    • 3 replies beneath your current threshold.
  • Column oriented? (Score:2)

    by JLavezzo (161308) on Wednesday February 14 2007, @04:24PM (#18016624)
    (http://www.westxylophone.com/)
    A column oriented relational database? I'd like some more details on how that works. I don't suppose it's just a regular SQL db with Excel's Pivot Tables run on it...

    Seriously, though, the target market for grid-based high volume data-warehousing type dbs are a lot smaller than the MySQL crowd. Not as big a deal as it seems, but it'd be nice to have if you needed it.
    • Re:Column oriented? by stoolpigeon (Score:3) Wednesday February 14 2007, @04:32PM
    • Re:Column oriented? by MrAnnoyanceToYou (Score:2) Wednesday February 14 2007, @04:33PM
    • Re:Column oriented? by truthsearch (Score:2) Wednesday February 14 2007, @04:35PM
    • Re:Column oriented? (Score:5, Informative)

      A column oriented relational database? I'd like some more details on how that works.

      http://en.wikipedia.org/wiki/Column-oriented_DBMS [wikipedia.org]

      It's basically an optimization of the current data access patterns. Databases have been row-oriented for decades, because they evolved from fixed width flat files. Once we eliminated COBOL-style accesses to databases, the full row data became less important. It became far more important to be able to scan a column as fast as possible. For example:

      select * from names where lastname LIKE '%son'

      The above query might have an index available to find what it needs. But it's just as likely that the database will need to do a table-scan. Since table-scans involve looking through every record in the database, you can imagine that it would be faster to just load the lastname column rather than loading every row in the database just to discard 90% of that data.
      [ Parent ]
    • Re:Column oriented? by prog99 (Score:1) Wednesday February 14 2007, @04:43PM
    • Re:Column oriented? (Score:5, Insightful)

      by georgewilliamherbert (211790) on Wednesday February 14 2007, @04:47PM (#18016910)

      A column oriented relational database? I'd like some more details on how that works.

      Column oriented is easy. Imagine a database as a set of tables, each of which has rows of data records, in organized columns (column 1 = "User name", column 2 = "User ID", column 3 = "Favorite slashdot admin", etc).

      Normal row-oriented databases store records which have a row of the data: "User name", "User ID", "Favorite slashdot admin" for user row #12345.

      Column oriented databases store records which have a column of the data: "User name" for user rows 1-100,000; "User ID" for user rows 1-100,000; etc.

      Updates are faster with row-oriented: you access the last record file and append something, or access an intermediate record file and update one "row" across.

      Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that goes relatively quickly.

      Indexes are useful, but column-oriented is more efficient in some ways. You don't have to maintain the indexes, and can just automatically search any column without having indexed it, in a reasonably efficient manner.

      Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...

      [ Parent ]
    • Re:Column oriented? by Anonymous Coward (Score:1) Wednesday February 14 2007, @04:53PM
    • Re:Column oriented? by mysticgoat (Score:2) Wednesday February 14 2007, @11:16PM
  • Awesome (Score:2, Interesting)

    by Fyre2012 (762907) on Wednesday February 14 2007, @04:25PM (#18016658)
    (http://www.sevenl.net/ | Last Journal: Sunday January 16 2005, @12:15AM)
    This is totally what we need.

    With comodity hardware getting faster and cheaper by the minute, having a system that can handle a higher than average load with optimized software is, imho, a winner.

    I'm sure everyone here can add some anecdotal evidence to how they had a heavy-hardware, database serving machine die on them because of some software bug.
    This is one of the reasons I've been looking forward to ZFS. Hopefully the DB guru's will take the best of what's good about software, drop the legacy crap and really deliver something that's going to handle the kind of load that a good slashdotting delivers with hardware that didn't require a lease to be affordable.
    • Re:Awesome by Grinin (Score:1) Wednesday February 14 2007, @04:55PM
  • open source? (Score:1)

    by Anonymous Coward on Wednesday February 14 2007, @04:30PM (#18016724)
    how is this open-source?
  • by StikyPad (445176) on Wednesday February 14 2007, @04:31PM (#18016728)
    (http://slashdot.org/)
    The promise -- a Linux-based system that handles queries 100 times faster than traditional relational database management systems... ...using the power of oxygen!
    • 1 reply beneath your current threshold.
  • Perfect timing (Score:4, Interesting)

    by defile (1059) on Wednesday February 14 2007, @04:31PM (#18016736)
    (http://michael.bacarella.com/ | Last Journal: Friday November 01 2002, @06:19PM)

    Loading a million random records out of a set of one hundred million records is an enormously difficult task for an RDBMS on commodity hardware (e.g. magnetic rotating disks). This is a more common task than you would think. ORM systems backed by an RDBMS, such as Ruby on Rails, Django, Hibernate, have exactly this requirement and will only demand more as these models become more mainstream. Think about what search engines have to do: find millions among billions, all to show a user a dozen.

    These problems are solvable now, but there's a lot of duplication of effort going on that a smart database vendor could solve for us.

  • Good..If it works (Score:1)

    by Gomer79 (43434) on Wednesday February 14 2007, @04:32PM (#18016744)
    (http://www.divinenatureflowers.com/)
    Without any benchmarks of any kind and a lack of data I remain skeptical but if it works this could be a huge breakthrough for the database management as data storage amounts continue to skyrocket. I am curious if it will be ported to Windows or other proprietary systems and if so what affect it will have on the speed claims. Because if the speed claims are true and it stays Linux I would think companies would have to consider moving to Linux to realize the speed gains.
  • by georgewilliamherbert (211790) on Wednesday February 14 2007, @04:33PM (#18016774)
    Vertica's website has had all the details about what they're doing for months. They've had a Wikipedia article for a long time.

    This is some new Network World definition of "Stealthy", apparently...
  • This sounds great but will it work with Windows applications? How proprietary is their system? Do they have a suitable set of signed ODBC drivers that will let my legacy applications talk to their system? Do they have .NET enabled database connectors so I can dump it into my project? How well has their DB been tested again chatty network environments like a mix of Windows and Mac's or weird routing? What are their DB management system like? Is it CLI or GUI?

    I can claim my custom written DOS database system is 20X faster then anything on the market(which it is), but if it can't easily work in a Windows and/or Linux (which it can't) then it worthless as marketable product. (But you should see what it can do on a serial network.)
    • MOD PARENT UP! by Dysfnctnl85 (Score:1) Wednesday February 14 2007, @04:52PM
    • Re:Sounds great but.. (Score:4, Informative)

      by perfczar (1064296) on Wednesday February 14 2007, @06:10PM (#18017784)
      The Vertica business model is to sell a database engine (software to store and query data). Clearly use of standard interfaces is important, otherwise nobody would be able to make use of the product (which really ends up being a component of a larger system or strategy) without going to a heap of trouble. So of course Vertica has:

      • A JDBC driver
      • An ODBC driver
      • An interactive SQL client
      • A growing list of tested integrations with other software

      Note: I work for Vertica
      [ Parent ]
    • 1 reply beneath your current threshold.
  • Best of luck (Score:5, Insightful)

    by 140Mandak262Jamuna (970587) on Wednesday February 14 2007, @04:40PM (#18016836)
    (Last Journal: Wednesday October 31, @08:33AM)
    I dont want to rain in their parade. But typically whenever people start with a spec like "100 times better than what they can do", they assume they will continue to perform at current levels while these people take years to develop and mature their new technology. In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.

    What happened to Gallium Arsenide replacing silicon? What happened to solid state memory completely repalcing magnetic disks? Technology field is littered with such fiascos.

  • Patent Problems (Score:3)

    by IflyRC (956454) on Wednesday February 14 2007, @04:42PM (#18016860)
    Watch...they'll run into patent problems with patents held by Oracle, Sybase, and MS.
  • open source? (Score:2)

    by oohshiny (998054) on Wednesday February 14 2007, @04:42PM (#18016862)
    Where does it say that Vertica is going to be open source?

    In any case, if people wonder how they get 100x speedups, it's probably related to Stonebraker's previous company called Streambase [streambase.com].
    • never mind by oohshiny (Score:2) Wednesday February 14 2007, @05:08PM
  • by WindBourne (631190) on Wednesday February 14 2007, @04:50PM (#18016934)
    (Last Journal: Friday December 01 2006, @10:51AM)
  • Speculation (Score:5, Informative)

    by cartman (18204) on Wednesday February 14 2007, @04:55PM (#18016998)

    I noticed that Stonebraker is the company founder. Stonebraker has contributed extensively to database research over the years.

    He's known for advocating the "shared-nothing" approach to parallel databases. The shared-nothing approach means that nodes in the parallel database don't attempt memory or cache synchronization, and each node has its own commodity disk array. In a shared-nothing parallel database, the data is "partitioned" across servers. So, for example, rows with id's 1-10 would be on the first server, 11-20 on the second server, etc. Executing the SQL query "select * from table where id < 1000" would send requests to multiple commodity servers and then aggregate the results. The optimizer is modified to take into account network bandwidth and latency, etc.

    My guess on what they're doing: they're working on a shared-nothing parallel RDBMS with an in-memory client similar to Oracle TimesTen.

    The are a few drawbacks to the shared-nothing approach: 1) the RDBMS software is more difficult to implement; 2) since the data is partitioned, any transaction that updates tuples on more than one database node requires a two-phase distributed commit, which is much more expensive; and 3) some queries are more expensive because they require transmitting large amounts of data over the network rather than a memory bus, and in rare cases that network overhead cannot be eliminated by the optimizer.

    The advantage, of course, is linear scalability by adding commodity hardware. No more need for $3M+ boxes.

  • by Qbertino (265505) on Wednesday February 14 2007, @05:01PM (#18017058)
    ... for a long time.
    Classic RDBMSes are crutches. A forced-upon neccesitiy we have to put up with for our app models to latch on to real world hardware and it's limitations. A historically grown mess with an overhead so huge it's insane. With a Database PL and 30+ dialects of it from back in the days when we flew to the moon using a slide-ruler as primary means of calculation.
    If what they claim is true, these guys are probably finally ditching the omnipresent redundant n-fold layers user and connection management in favour of a lean system that at last does away with the distinction of filesystem and database and data access layer. Imagine a persistance layer with no SQL, no extra user management, no extra connection layer, no filesystem under it and native object suport for any PL you wish to compile in.
    I tell you, finally ditching classic RDBMSes is *long* overdue, they're basically all the same ancient pile of rubble, from MySQL up to Oracle. If these guys are up to taking on this deed (or part of it) and they get finished when solid-state finally relieves our current super-slowpoking spinning metal disks on a broad scale we'll feel like being in heaven compared to the shit we still have to put up with today.
    I wish these guys all the best. They appear to have the skills to do it and the authority to emphasise that todays RDBMSes and their underlying concepts are a relic of the past.
    My 2 cents.
  • Given that... (Score:5, Informative)

    MonetDb, [monetdb.cwi.nl] is similarly configured as a column oriented AND Open source, and appears to clean the clock of most of the major commercial and Open Source databases for huge data set queries, (see the benchmarks at axyana.com [axyana.com] for an example), where is Vertica's market advantage supposed to be?


    By which I am asking that while Vertica is obviously well-researched and well funded as a start up, MonetDB is well-researched, already benchmarked and available now.. So why would I wait to invest my time, energy, and $$ in a proprietary future product rather than the time and energy, etc. to develop market leadership in my chosen corporate area in the present?

    • Re:Given that... (Score:5, Informative)

      by perfczar (1064296) on Wednesday February 14 2007, @06:46PM (#18018116)

      Here are a few of the technical reasons one might choose Vertica over Monet; I'll not get into business issues.


      Vertica is designed for large amounts of data, and is optimized for disk based systems. Monet does benchmarks against TPC-H Scale Factor 5 (30 million records, an amount which would fit in main memory) running on Postgres; Vertica does TPC-H Scale factor 1000 (6 billion records) against commercial row stores tuned by people who do such work to make a living.

      Vertica runs on multi-node clusters, allowing the cluster to grow as the amount of data grows, while Monet doesn't scale to multiple machines.

      There are numerous differences in the transaction systems, update architecure, tolerance of hardware failure, and so on, that make Vertica better suited to the enterprise DW market.


      Note: I work for Vertica
      [ Parent ]
      • Re:Given that... by CodeShark (Score:2) Thursday February 15 2007, @10:05AM
      • Re:Given that... by Circuit Breaker (Score:2) Thursday February 15 2007, @11:03AM
      • 1 reply beneath your current threshold.
    • Re:Given that... by fivelittlemonkeys (Score:1) Thursday February 15 2007, @02:44AM
  • Comprable? (Score:2)

    by Pinback (80041) on Wednesday February 14 2007, @05:31PM (#18017378)
    (http://web.mac.com/r.../Site/Blog/Blog.html | Last Journal: Monday October 16 2006, @05:58PM)
    I wonder how this compares to http://en.wikipedia.org/wiki/Netezza [wikipedia.org]Netezza.
    • Re:Comprable? by rla3rd (Score:1) Thursday February 15 2007, @09:16AM
  • Google uses this approach (Score:3, Informative)

    by russryan (981552) on Wednesday February 14 2007, @06:09PM (#18017774)
    See http://en.wikipedia.org/wiki/Bigtable [wikipedia.org] for a description of Google's column oriented database.
  • More Scalability (Score:2)

    by Doc Ruby (173196) on Wednesday February 14 2007, @06:26PM (#18017964)
    (http://slashdot.org/~Doc%20Ruby/journal | Last Journal: Thursday March 31 2005, @01:48PM)
    How about a database with the exact same query API (not just "but it's all SQL") as, say, Oracle or MS-SQL, or even Postgres, that allows any number of parallel query servers to work against a single datastore?

    In other words, instead of yet another incompatible database, how about one that we could just switch to from an existing one, that is arbitrarily scalable against shared data. If you're going to get clever and act like you can solve hard problems, why not give people what we need, and not just what you think you can give us?
  • by ramakant (256472) on Wednesday February 14 2007, @06:37PM (#18018052)
    This looks like it will be a commercial version of the Michael Stonebraker and MIT developed C-Store column-oriented:
    - Web site: http://db.lcs.mit.edu/projects/cstore/ [mit.edu]
    - Wikipedia Entry: http://en.wikipedia.org/wiki/C-Store [wikipedia.org]
    They distribute the source with a fairly liberal license, so this looks like something the open source community could pick up and run with.
  • An issue with column orientation (Score:2, Informative)

    by jfroelich (1022159) on Wednesday February 14 2007, @08:15PM (#18018996)
    Is that you do not scale as well to a large number of columns. To access a set of X records with 100 columns, you have 100 asynchronous I/O calls to the separate column stores. I sell an analytical software that does just this, and it is not a technical something that should just be ignored. In some regards the single file row oriented system has less I/O overhead. We have come up with some ways to reduce the file system overhead, but while it is small, it is noticeable, more so on systems not designed to have a some large amount simultaneous open files. All that really happened is that it switched part of the bottleneck to rely less on the product architecture and more on the system architecture. Whether you think that is wise, well, that's up to you.

    BTW, first post, I am no longer an eavesdropper, yay

    Josh
  • Stealthy? (Score:2, Funny)

    by plasmacutter (901737) on Wednesday February 14 2007, @08:33PM (#18019154)
    (Last Journal: Tuesday November 06, @02:39PM)
    it's on the front page of slashdot.. how stealthy can it be?
  • by BillAtHRST (848238) on Wednesday February 14 2007, @08:43PM (#18019242)
    What does this have to do with StreamBase? Is Stonebraker just throwing StreamBase under the bus? Are they complementary? How can one person (even someone with his abilities) function as CTO of two separate companies?
  • by bestguruever (666273) on Wednesday February 14 2007, @09:20PM (#18019504)
    Any relation to Required technologies? Unfortunately, that's what I think of when I hear about a column store
  • Stupid question: Still SQL? (Score:3, Interesting)

    by WoTG (610710) on Wednesday February 14 2007, @10:54PM (#18020050)
    (http://print-bingo.com/ | Last Journal: Monday August 04 2003, @12:43AM)
    I've never heard of column based databases prior to this article. Would I be correct in assuming that you still can work with these using regular SQL?
  • by MadnessASAP (1052274) <madnessasap@gmail.com> on Wednesday February 14 2007, @11:25PM (#18020230)
    Is that a Microsoft 100x increase or a Linux 100x increase? (For reference 100x(Microsoft) = 1.1x(Linux))
  • by DoChEx (558465) on Thursday February 15 2007, @02:18AM (#18021040)
    Column oriented sounds to me a lot like an index with a single field. Even if you place all columns into a single block other columns could still be required when displaying the record, if you have multiple columns like this it could mean a lot of blocks being read once you actually know the direct block locations of your data. So do you combine row + column style, then using some DDL highlighting which columns to store independent of the rest of that row.
  • by mkersten (1065114) on Friday February 16 2007, @04:52PM (#18044762)
    Vertica has indeed made the business/venture steps to
    follow the MonetDB approach to exploit column-based stores
    for large scale datawarehouse solutions.
    Its science library provides many studies on the underlying
    technology.

    MonetDB has already build a business history in the
    area of analytical CRM solutions available through SPSS.
    In the area of datamining PROXIMITY is a leading
    product for relational mining.

    Not to mention the support for both SQL and XQuery
    engine support. This all in the context of an open-source
    community activity for several years.

    See http://monetdb.cwi.nl/ [monetdb.cwi.nl]
    http://monetdb.cwi.nl/projects/monetdb/Development /Credits/Partners/index.html [monetdb.cwi.nl]
  • I had a long chat with Mike Stonebraker a few weeks ago, and came away with the following tentative opinions about Vertica's prospects [dbms2.com], and those for columnar systems in general.

            * Pinpoint data lookup doesn't seem like a great fit for columnar systems. Indeed, traditional rows-and-B-trees would seem to be best.
            * Constrained query and reporting would seem to be a sweet spot, even though it's a sweet spot for some of the best competition as well.
            * Cube-filling calculations involve big intermediate result sets. I'm not sure that's a great fit for columnar systems.
            * Hardcore tabular data crunching would seem in many cases to be another sweet spot, again against a lot of competition, at least in some of its sub-categories.
            * Text and media search are best done by specialized systems that, at least in the case of text, wind up being quasi-columnar. The same goes for other specialty areas. Systems like Vertica's have nothing to offer directly to these applications. However, it might be possible for Vertica to integrate with them fairly quickly, given that they're starting from vaguely similar philosophical roots.

    There also are some technical details in that article; a link to a short, somewhat hagiographic intro to Mike himself; and so on.

  • Re:Omg top 5 (Score:3, Funny)

    by bob.appleyard (1030756) on Wednesday February 14 2007, @05:31PM (#18017366)
    You're 100 times faster than anyone else, obviously.
    [ Parent ]
  • by georgewilliamherbert (211790) on Wednesday February 14 2007, @06:47PM (#18018132)
    It takes balls to say things like that about Michael Stonebraker in the database field... ...and lack of brains or historical clue...
    [ Parent ]
    • 1 reply beneath your current threshold.
  • 8 replies beneath your current threshold.