Catch up on stories from the past week (and beyond) at the Slashdot story archive

Database Bigwigs Lead Stealthy Open Source Startup 187

Posted by ScuttleMonkey on Wednesday February 14, 2007 @05:17PM from the hope-it-isn't-vaporcorp dept.

BobB writes "Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica. And not just him, he has recruited former Oracle bigwigs Ray Lane and Jerry Held to give the company a boost before its software leaves beta testing. The promise — a Linux-based system that handles queries 100 times faster than traditional relational database management systems."

This discussion has been archived. No new comments can be posted.

Database Bigwigs Lead Stealthy Open Source Startup

Load All Comments

Search 187 Comments Log In/Create an Account

Comments Filter:

Partners (Score:5, Informative)

by stoolpigeon ( 454276 ) * writes: <bittercode@gmail> on Wednesday February 14, 2007 @05:22PM (#18016580) Homepage Journal

The article mentions that redhat and hp are listed among their partners. i'm not surprised by red hat or informatica (another partner though they aren't mentioned in the article) but i was a little surprised by hp - since they have been trying to get the word out [hp.com] about their own data warehousing and bi stuff. i wonder what that indicates about how they regard this new player.

also interesting is the wikipedia article on Michael Stonebraker [wikipedia.org] if you aren't already familiar with him.

Share
twitter facebook
- Re:Partners (Score:5, Insightful)
  
  by AKAImBatman ( 238306 ) * writes: <.akaimbatman. .at. .gmail.com.> on Wednesday February 14, 2007 @05:25PM (#18016644) Homepage Journal
  
  i was a little surprised by hp - since they have been trying to get the word out about their own data warehousing and bi stuff.
  
  It's called "hedging your bets". If the little company doesn't work out, no big deal. If it does, then HP is in a position to either benefit from contractual relations, acquire it, or squash it. Whichever happens to be their fancy.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by stoolpigeon ( 454276 ) * writes:
    
    based on stonebraker's history -- somebody acquiring it at some point seems a relatively safe bet.
Column oriented databases (Score:2, Interesting)

by Anonymous Coward writes:

The article seems to describe the big advantage as being column oriented.

How does this differ than KX System's kdb (www.kx.com) which IIRC is similar in that way; and is alredy in use at many if not most major financial institutions (see their customer list)?
- Re:Column oriented databases (Score:5, Informative)
  
  by georgewilliamherbert ( 211790 ) writes: on Wednesday February 14, 2007 @05:37PM (#18016808)
  
  KX is primarily in-memory. The competing column-oriented product is primarily Sybase IQ, which has been on the market for a while now.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2)
      
      by georgewilliamherbert ( 211790 ) writes:
      
      http://www.sqlite.org/whentouse.html [sqlite.org]
      
      I haven't worked with it up close (browse around the website regularly, but don't run it), but all the docs I have seen say SQLite uses a B-tree, not a column store. Do you have an alternate reference to such?
      
      SQLlite says not to use it for more than a few gb to tens of gb of data. Sybase IQ, for example, is routinely run with TB plus quantities of data. It's been tested to a trillion plus rows of data and 155 TB of input data (which autocompressed down to 55 TB of disk
      - Re: (Score:2)
        
        by sonofagunn ( 659927 ) writes:
        
        We use Sybase IQ where I work, and other than rebooting it every few days, frequent crashes, and slow running queries, it's pretty good. We are right now offloading a huge reporting database into SQL Server because our queries run about 6x faster. This is a star-schema data warehouse with a few TB of data. The big difference is that SQL Server supports partitioning.
- Re: (Score:2)
  
  by JimDaGeek ( 983925 ) writes:
  
  Well, kdb+ is proprietary and expensive. Maybe this product will be Open Source or at the very least kill kdb+ on price? There are not many real players in this market, the more the better IMO. What would be best is a competitive Open Source offering in this space. The Open Source product could steal away most of the market share, or at the very least, really drive down prices! :-)
When Will This Be Ported? (Score:4, Funny)

by Anonymous Coward writes: on Wednesday February 14, 2007 @05:23PM (#18016610)

The question is when will this be ported to a mainstream OS such as Windows?

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by Mad Merlin ( 837387 ) writes:
  
  The question is when will this be ported to a mainstream OS such as Windows?
  
  Where by mainstream, you mean useless?
- Re: (Score:2)
  
  by JimDaGeek ( 983925 ) writes:
  
  No offense, but you must be as dumb as a rock. This is not a "mainstream" application. This is a very business specific application. If the app does what a company needs, not running on MS Windows is _really_ no big deal.
  
  I have been a senior developer for more than a decade now and have worked at 2 fortune 500 companies and 1 fortune 1000 company. All of the big companies use a multi-OS server setup. While most of the desktops are MS Windows, a lot of the servers are *nix. In fact, all of the reall
Everyone, we are moving to ASP now (Score:4, Funny)

by varmittang ( 849469 ) writes: on Wednesday February 14, 2007 @05:24PM (#18016616)

It was LAMP, now its LAVA. Much cooler name.

Share
twitter facebook
- You're bound to get some strange looks... (Score:5, Funny)
  
  by Anonymous Coward writes: on Wednesday February 14, 2007 @05:51PM (#18016956)
  
  during the transition when you tell people your business runs on LAVA-LAMP technology.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by smittyoneeach ( 243267 ) * writes:
    
    ...and, if some bonehead intern melts it down, you can always market the leftovers on eBay as a Pet Rock.
    Once you've another internet connection, of course.
- Re: (Score:2)
  
  by SatanicPuppy ( 611928 ) * writes:
  
  If only you could get A on LA with V...The OSS version would be LAVM(ono).
  
  Bah, it's no use...This system is already doomed like Postgres because it has no cool acronym.
  - Re: (Score:2)
    
    by einhverfr ( 238914 ) writes:
    
    Bah, it's no use...This system is already doomed like Postgres because it has no cool acronym.
    
    You have something against LAPPs?
    
    Racist!
buzzword enabled (Score:4, Insightful)

by hey ( 83763 ) writes: on Wednesday February 14, 2007 @05:24PM (#18016620) Journal

"grid-enabled, column-oriented relational database management system"
What does that mean?
If anything.

Share
twitter facebook
- Re:buzzword enabled (Score:5, Informative)
  
  by c0nst ( 655115 ) writes: on Wednesday February 14, 2007 @05:59PM (#18017032)
  
  Here you go:
  Stonebraker, Mike; et al. (2005). C-Store: A Column-oriented DBMS [mit.edu] (PDF). Proceedings of the 31st VLDB Conference.
  From the paper:
  Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of columnoriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures
  :-)
  
  Parent Share
  twitter facebook
- Re:buzzword enabled (Score:5, Funny)
  
  by Jherek Carnelian ( 831679 ) writes: on Wednesday February 14, 2007 @06:39PM (#18017472)
  
  "grid-enabled, column-oriented relational database management system"
  What does that mean?
  
  Uh, a spreadsheet?
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by smittyoneeach ( 243267 ) * writes:
    
    Soooo last decade:
    s/spreadsheet/XML/
- Re:buzzword enabled (Score:5, Informative)
  
  by perfczar ( 1064296 ) writes: on Wednesday February 14, 2007 @06:54PM (#18017616)
  
  Buzzwords, yes, but they have a little bit of meaning left. Grid-enabled means that it works on a "shared nothing" environment, that you can use a networked cluster of commodity computers if one isn't enough to hold the data, and so on. This is in contrast to using one big huge box (big computer, big storage array, or whatever). Of course many databases are similarly grid-enabled. Column-oriented means that data is stored on disk by column, this makes it fast to process a subset of columns that touch lots of rows, as is typical in data warehouse applications. This is a key architectural difference among databases; Oracle, DB2, etc., are "row stores", while Sybase IQ, Vertica, etc. are "column stores". Note: I work for Vertica Systems
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by shmlco ( 594907 ) writes:
    
    I guess on a column-oriented database I don't want to do a 'SELECT * FROM X' ?
- Re:buzzword enabled (Score:5, Informative)
  
  by ChrisA90278 ( 905188 ) writes: on Wednesday February 14, 2007 @06:54PM (#18017618)
  
  Column oriented means it can read data in from one column from the disk without pulling in all the other bytes in the row. Possibly much less reduced I/O bandwidth usage depending on the query. (kind of like if you turned the normal file structure side ways.)
  Grid enabled - This means the DBMS can make use of a large distributed group of computers and potentially have access to a huge amount of computing power. The typical DBMS runs on at beat a multi-processor server. Thi sis kind of like a DBMS server running a a "seti at home" type network.
  Going solely by the developer's reputation, this could be a big deal. He is not some random hacker. He is a well known university professor who has several times in the past lead projects that have been revolutionary and turned the field around. His ideas are widely used Still "100X faster" is a big claim. Lots of smart people have been working on DMBSes for many years, a two order of magnitude improvement is a "I will have to see it to believe it" type claim
  I'm using PostgreSQL to handle some telemetry data right now. If my 45 minute run times can be reduced to seconds, I'll be happy.
  
  Parent Share
  twitter facebook
  - Big claims are backed (Score:4, Informative)
    
    by Virtual_Raider ( 52165 ) writes: on Wednesday February 14, 2007 @10:29PM (#18019562)
    
    Still "100X faster" is a big claim. Lots of smart people have been working on DMBSes for many years, a two order of magnitude improvement is a "I will have to see it to believe it" type claim
    
    Oh ye of little faith, here i present thee with The Facts. Or a paper at the very least: One size fits all? a Benchmark [mit.edu]
    
    Parent Share
    twitter facebook
  - welcome to 1994 (Score:2)
    
    by kpharmer ( 452893 ) writes:
    
    > Grid enabled - This means the DBMS can make use of a large distributed group of computers and potentially have access
    > to a huge amount of computing power. The typical DBMS runs on at beat a multi-processor server. Thi sis kind of like a DBMS
    > server running a a "seti at home" type network.
    
    Or like teradata in around, what? 1992? Informix around 1994? db2 around 1995? Oracle isn't there yet since their grid solution is more about failover than partitioning.
    
    This is now lower-end functionality i
  - Re:buzzword enabled (Score:4, Insightful)
    
    by Kjella ( 173770 ) writes: on Thursday February 15, 2007 @09:18AM (#18022480) Homepage
    
    Under ideal conditions, I don't have a problem seeing that:
    
    1. Make up lots of 100-column+ tables
    2. Select one column from each table
    3. If you're IO bound, you should now see about a 100:1 increase
    
    However, most real data models don't work that way. Usually you put stuff that's useful at the same time in the same table, in which case it probably won't make much of a difference.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by bytesex ( 112972 ) writes:
  
  'grid enabled' like a beowolf cluster
  'column oriented' like a table, but then turned on its side.
  'relational database management system' you've got me there. I have no idea.
Column oriented? (Score:2)

by JLavezzo ( 161308 ) writes:

A column oriented relational database? I'd like some more details on how that works. I don't suppose it's just a regular SQL db with Excel's Pivot Tables run on it...

Seriously, though, the target market for grid-based high volume data-warehousing type dbs are a lot smaller than the MySQL crowd. Not as big a deal as it seems, but it'd be nice to have if you needed it.
- Re: (Score:3, Insightful)
  
  by stoolpigeon ( 454276 ) * writes:
  
  smaller in number - but i'm willing to bet much more profitable and growing rapidly. we've been looking at data warehousing options and frankly most of them suck in one way or another. if someone can do it right - they can make a killing.
  - Re: (Score:3, Interesting)
    
    by Virtual_Raider ( 52165 ) writes:
    
    I've worked in DW for a time, and I can tell you that it's not easy to "get it right" because so far it's not something that can be packaged. You can get the data models and fancy machinery, but you will most definitely need an architect to tailor it to the particular organization because all companies work differently on the inside. And that architect will have a dickens of a time understanding how the company works because the bigger they are, the more likely not even their own employees do. As long as th
- Re: (Score:2)
  
  by MrAnnoyanceToYou ( 654053 ) writes:
  
  Depends. Reporting and data warehousing are pretty important; Business Objects / Crystal Reports / etc. all seem to be slower than they could. If you were to be able to throw in the rows as quickly as in MySQL or Postgres and then report on it with ten times the efficiency, you've got a decent demand in store for you. If, say, Google or Amazon could run with 1/10th the overall servers I have this feeling they would. Just a guess though. It's always possible a new approach to the old problems has result
  - Re: (Score:3, Interesting)
    
    by stoolpigeon ( 454276 ) * writes:
    
    info week just ran an article on hp [informationweek.com] getting into data warehousing and bi that had this paragraph pretty early on: Until sitting down with InformationWeek recently, the company has been mum on the initiative--not so much as a peep from its normally talkative marketing team. Indeed, it's an unlikely move into a sector where IBM, Oracle, SAS Institute, and Teradata have years of experience, well regarded products, and loyal customers. Those four vendors--along with Microsoft, which has muscled in on the streng
  - - Re: (Score:2)
      
      by MrAnnoyanceToYou ( 654053 ) writes:
      
      Heh.
      
      I'm always amazed at the vehemence.
      
      What good is a huge pile of data with no order? Someone has to pay the bills, someone has to see where the profits are, someone has to see which shipments went out late. These are reporting functions. You create data once, but use it many, many times if you are paying attention to it. I'm sorry you feel differently.
      
      Execution of insert queries is extremely important and time sensitive. Execution of everything else is often not quite as mission critical, but i
- Re: (Score:2)
  
  by truthsearch ( 249536 ) writes:
  
  A lot of web sites that started out with small MySQL databases are now using replication. It can be a tough transition if not accounted for in the original development of the site. But if those sites started out with something that's "grid-based" maybe it would be much easier to grow (maybe). I have the feeling the market may be bigger than many people realize, especially if they start with something free.
- Re:Column oriented? (Score:5, Informative)
  
  by AKAImBatman ( 238306 ) * writes: <.akaimbatman. .at. .gmail.com.> on Wednesday February 14, 2007 @05:43PM (#18016880) Homepage Journal
  
  A column oriented relational database? I'd like some more details on how that works.
  
  http://en.wikipedia.org/wiki/Column-oriented_DBMS [wikipedia.org]
  
  It's basically an optimization of the current data access patterns. Databases have been row-oriented for decades, because they evolved from fixed width flat files. Once we eliminated COBOL-style accesses to databases, the full row data became less important. It became far more important to be able to scan a column as fast as possible. For example:
  
  select * from names where lastname LIKE '%son'
  
  The above query might have an index available to find what it needs. But it's just as likely that the database will need to do a table-scan. Since table-scans involve looking through every record in the database, you can imagine that it would be faster to just load the lastname column rather than loading every row in the database just to discard 90% of that data.
  
  Parent Share
  twitter facebook
  - - Re: (Score:3, Insightful)
      
      by AKAImBatman ( 238306 ) * writes:
      
      Isn't that what indexes do?
      Yes! No! Sort of!
      
      Indexes only optimize some types of queries. To get the absolute maximum performance out of your database, you have to make sure that there is a specific index for each query you run, and that your indexes are properly rebuilt and optimized for least-time search. Suffice it to say, this rarely happens in the real world. So there's almost always some scanning, even after the indexes narrow things down a bit. By going with a column-oriented storage design, the scan
- Re:Column oriented? (Score:5, Insightful)
  
  by georgewilliamherbert ( 211790 ) writes: on Wednesday February 14, 2007 @05:47PM (#18016910)
  
  A column oriented relational database? I'd like some more details on how that works.
  
  Column oriented is easy. Imagine a database as a set of tables, each of which has rows of data records, in organized columns (column 1 = "User name", column 2 = "User ID", column 3 = "Favorite slashdot admin", etc).
  Normal row-oriented databases store records which have a row of the data: "User name", "User ID", "Favorite slashdot admin" for user row #12345.
  Column oriented databases store records which have a column of the data: "User name" for user rows 1-100,000; "User ID" for user rows 1-100,000; etc.
  Updates are faster with row-oriented: you access the last record file and append something, or access an intermediate record file and update one "row" across.
  Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that goes relatively quickly.
  Indexes are useful, but column-oriented is more efficient in some ways. You don't have to maintain the indexes, and can just automatically search any column without having indexed it, in a reasonably efficient manner.
  Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by flyingfsck ( 986395 ) writes:
    
    Yup, it is all about making the individual files smaller and more regular. Kinda the opposite of XML.
  - Re: (Score:2)
    
    by SanityInAnarchy ( 655584 ) writes:
    
    Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...
    Given enough spare CPU cycles, yes. LZO compression is probably good for that. In fact, this is part of the theory behind which Hans Reiser claimed Reiser4 will be over
  - Re: (Score:2)
    
    by swilver ( 617741 ) writes:
    
    Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that
- Re: (Score:2)
  
  by mysticgoat ( 582871 ) writes:
  
  the target market for grid-based high volume data-warehousing type dbs are a lot smaller than the MySQL crowd.
  
  The growth potential for that market is staggering. We've now got desktop computers with enough storage capacity to hold everything a person has written or has ever read, from first grade to grave. We'll be looking for ways to organize these huge attics sometime soon.
- - Re: (Score:2)
    
    by stoolpigeon ( 454276 ) * writes:
    
    I can't find anything to suggest it, but I suspect this group has some tricks to make updates less painful,
    
    or - they are doing it an environment where data gets in via etl (or this streams stuff) and you aren't doing updates -- you are doing bi and reporting to make management's widgets do all kinds of nice things on their dashboards.
    
    i think they are targeting the data warehouse market - not the transactional or general purpose market.
Awesome (Score:2, Interesting)

by Fyre2012 ( 762907 ) writes:

This is totally what we need.

With comodity hardware getting faster and cheaper by the minute, having a system that can handle a higher than average load with optimized software is, imho, a winner.

I'm sure everyone here can add some anecdotal evidence to how they had a heavy-hardware, database serving machine die on them because of some software bug.
This is one of the reasons I've been looking forward to ZFS. Hopefully the DB guru's will take the best of what's good about software, drop the legacy c
But does it save the children? (Score:2)

by StikyPad ( 445176 ) writes:

The promise -- a Linux-based system that handles queries 100 times faster than traditional relational database management systems... ...using the power of oxygen!
Perfect timing (Score:4, Interesting)

by defile ( 1059 ) writes: on Wednesday February 14, 2007 @05:31PM (#18016736) Homepage Journal

Loading a million random records out of a set of one hundred million records is an enormously difficult task for an RDBMS on commodity hardware (e.g. magnetic rotating disks). This is a more common task than you would think. ORM systems backed by an RDBMS, such as Ruby on Rails, Django, Hibernate, have exactly this requirement and will only demand more as these models become more mainstream. Think about what search engines have to do: find millions among billions, all to show a user a dozen.

These problems are solvable now, but there's a lot of duplication of effort going on that a smart database vendor could solve for us.

Share
twitter facebook
- Re: (Score:2)
  
  by symbolic ( 11752 ) writes:
  
  Duplication of effort isn't bad at all....without it, you'll wind up with another Microsoft.
Doesn't "stealthy" require some stealth anymore? (Score:4, Insightful)

by georgewilliamherbert ( 211790 ) writes: on Wednesday February 14, 2007 @05:33PM (#18016774)

Vertica's website has had all the details about what they're doing for months. They've had a Wikipedia article for a long time.

This is some new Network World definition of "Stealthy", apparently...

Share
twitter facebook
- Re:Doesn't "stealthy" require some stealth anymore (Score:3, Funny)
  
  by drinkypoo ( 153816 ) writes:
  
  Vertica's website has had all the details about what they're doing for months. They've had a Wikipedia article for a long time. This is some new Network World definition of "Stealthy", apparently...
  
  Network World is a trade rag. To them, anything not advertised is stealthy. Especially since they want to motivate people to think "oh no, I don't want to be stealthy, that means unknown! quick buy some advertising!"
Best of luck (Score:5, Insightful)

by 140Mandak262Jamuna ( 970587 ) writes: on Wednesday February 14, 2007 @05:40PM (#18016836) Journal

I dont want to rain in their parade. But typically whenever people start with a spec like "100 times better than what they can do", they assume they will continue to perform at current levels while these people take years to develop and mature their new technology. In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.
What happened to Gallium Arsenide replacing silicon? What happened to solid state memory completely repalcing magnetic disks? Technology field is littered with such fiascos.

Share
twitter facebook
- Re: (Score:2)
  
  by georgewilliamherbert ( 211790 ) writes:
  
  Sybase IQ already shows that class of speedups on lots of datasets. Proof of concept is out there...
- Re: (Score:2)
  
  by PCM2 ( 4486 ) writes:
  
  In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.
  
  This might be the obvious conclusion if Vertica were targeting the mass market and trying to compete directly with Oracle, SQL Server, or DB2, but they are not. TFA says Vertica is targeted at the data warehousing market, which is a very specific application area that can be better served with niche products than with the traditional general-purpose relational RDBMSs. Base
- One size doesn't fit all (Score:2, Interesting)
  
  by perfczar ( 1064296 ) writes:
  
  This is a different kind of issue, really, more like the difference between a CPU and a GPU. At the moment, a good GPU has >100x the performance of a good CPU on a certain class of computations. Column stores will clearly never replace row stores for transaction processing for obvious reasons, but (coupled with a few other architectural decisions) they do exhibit >100x the performance of row stores for the kinds of queries seen in data warehouses.
  
  Also, the two technologies are complementary. The
- Re: (Score:3, Insightful)
  
  by einhverfr ( 238914 ) writes:
  
  For certain applications (particularly BI), I think that 100x speedups are practical, but I would not expect it in general OLTP systems.
  
  Let me give you an example.
  
  Suppose you have a table with, say, 100 billion rows. You want to create a report which provides aggregated data on a very large subset of a few columns of table. With a tradition RDBMS, you have to read through every single one of the 100 billion rows to aggregate the data (indeces don't help if you are going to be searching through a sizeable
- Re: (Score:2)
  
  by Ant P. ( 974313 ) writes:
  
  "100 times better" is perfectly feasible, if your average dataset is 100 (arbritary units), your old algorithm is O(n^2) and the new one is O(n).
Patent Problems (Score:3)

by IflyRC ( 956454 ) writes: on Wednesday February 14, 2007 @05:42PM (#18016860)

Watch...they'll run into patent problems with patents held by Oracle, Sybase, and MS.

Share
twitter facebook
open source? (Score:2)

by oohshiny ( 998054 ) writes:

Where does it say that Vertica is going to be open source?

In any case, if people wonder how they get 100x speedups, it's probably related to Stonebraker's previous company called Streambase [streambase.com].
- never mind (Score:2)
  
  by oohshiny ( 998054 ) writes:
  
  There wasn't much information on the web site, but everything is in Wikipedia (look under C-Store, the BSD-licensed open source version). It really is just a column-oriented database.
Why does a company promising Linux solutions... (Score:3, Interesting)

by WindBourne ( 631190 ) writes: on Wednesday February 14, 2007 @05:50PM (#18016934) Journal

run Windows for their website? [netcraft.com]

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by Mad Merlin ( 837387 ) writes:
  
  Look again...
  $ curl -I www.vertica.com HTTP/1.1 200 OK Date: Wed, 14 Feb 2007 23:00:26 GMT Server: Apache/1.3.33 (Unix) Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Expires: Sun, 19 Nov 1978 05:00:00 GMT Pragma: no-cache X-Powered-By: PHP/4.4.4 Set-Cookie: PHPSESSID=488de093f5b89a78277a234e1e9886a6; expires=Sat, 10 Mar 2007 02:33:46 GMT; path=/ Last-Modified: Wed, 14 Feb 2007 23:00:26 GMT Content-Type: text/html; charset=utf-8
  - - Re: (Score:2)
      
      by Bill, Shooter of Bul ( 629286 ) writes:
      
      apparently, they read slashdot.
Speculation (Score:5, Informative)

by cartman ( 18204 ) writes: on Wednesday February 14, 2007 @05:55PM (#18016998)

I noticed that Stonebraker is the company founder. Stonebraker has contributed extensively to database research over the years.
He's known for advocating the "shared-nothing" approach to parallel databases. The shared-nothing approach means that nodes in the parallel database don't attempt memory or cache synchronization, and each node has its own commodity disk array. In a shared-nothing parallel database, the data is "partitioned" across servers. So, for example, rows with id's 1-10 would be on the first server, 11-20 on the second server, etc. Executing the SQL query "select * from table where id < 1000" would send requests to multiple commodity servers and then aggregate the results. The optimizer is modified to take into account network bandwidth and latency, etc.
My guess on what they're doing: they're working on a shared-nothing parallel RDBMS with an in-memory client similar to Oracle TimesTen.
The are a few drawbacks to the shared-nothing approach: 1) the RDBMS software is more difficult to implement; 2) since the data is partitioned, any transaction that updates tuples on more than one database node requires a two-phase distributed commit, which is much more expensive; and 3) some queries are more expensive because they require transmitting large amounts of data over the network rather than a memory bus, and in rare cases that network overhead cannot be eliminated by the optimizer.
The advantage, of course, is linear scalability by adding commodity hardware. No more need for $3M+ boxes.

Share
twitter facebook
- - Re: (Score:3, Informative)
    
    by cartman ( 18204 ) writes:
    
    Gee, I don't know anyone who's been succuessfully doing this for years...
    I'm certainly not suggesting these guys are the first to implement a shared-nothing parallel RDBMS. IBM has offered DB2 parallel edition which is shared-nothing for some time now. However IBM wants a ton of money for parallel edition, and DB2 has some legacy stuff which might not be useful in a shared-nothing architecture. An open-source shared-nothing RDBMS might be compelling.
    I think the shared-nothing approach is the best one fo
I've been waiting for something like this ... (Score:3, Insightful)

by Qbertino ( 265505 ) writes: <moiraNO@SPAMmodparlor.com> on Wednesday February 14, 2007 @06:01PM (#18017058)

... for a long time.
Classic RDBMSes are crutches. A forced-upon neccesitiy we have to put up with for our app models to latch on to real world hardware and it's limitations. A historically grown mess with an overhead so huge it's insane. With a Database PL and 30+ dialects of it from back in the days when we flew to the moon using a slide-ruler as primary means of calculation.
If what they claim is true, these guys are probably finally ditching the omnipresent redundant n-fold layers user and connection management in favour of a lean system that at last does away with the distinction of filesystem and database and data access layer. Imagine a persistance layer with no SQL, no extra user management, no extra connection layer, no filesystem under it and native object suport for any PL you wish to compile in.
I tell you, finally ditching classic RDBMSes is *long* overdue, they're basically all the same ancient pile of rubble, from MySQL up to Oracle. If these guys are up to taking on this deed (or part of it) and they get finished when solid-state finally relieves our current super-slowpoking spinning metal disks on a broad scale we'll feel like being in heaven compared to the shit we still have to put up with today.
I wish these guys all the best. They appear to have the skills to do it and the authority to emphasise that todays RDBMSes and their underlying concepts are a relic of the past.
My 2 cents.

Share
twitter facebook
- Re: (Score:2)
  
  by LizardKing ( 5245 ) writes:
  
  Imagine a persistance layer with no SQL, no extra user management, no extra connection layer, no filesystem under it and native object suport for any PL you wish to compile in.
  
  I worked on just such a system, and ended up replacing it with a straightforward RDBMS. The object persistence layer serialised to disk, which offered no benefits over using an RDBMS as the backend data store (which had been in the original design oddly enough). It had to keep everything in memory - which proved impossible when th
- - Re: (Score:2)
    
    by Qbertino ( 265505 ) writes:
    
    You know, that last line of yours was really convincing. Solid arguements, I must say.
Given that... (Score:5, Informative)

by CodeShark ( 17400 ) writes: <ellsworthpc@yahPARISoo.com minus city> on Wednesday February 14, 2007 @06:26PM (#18017312) Homepage

MonetDb, [monetdb.cwi.nl] is similarly configured as a column oriented AND Open source, and appears to clean the clock of most of the major commercial and Open Source databases for huge data set queries, (see the benchmarks at axyana.com [axyana.com] for an example), where is Vertica's market advantage supposed to be?

By which I am asking that while Vertica is obviously well-researched and well funded as a start up, MonetDB is well-researched, already benchmarked and available now.. So why would I wait to invest my time, energy, and $$ in a proprietary future product rather than the time and energy, etc. to develop market leadership in my chosen corporate area in the present?

Share
twitter facebook
- Re:Given that... (Score:5, Informative)
  
  by perfczar ( 1064296 ) writes: on Wednesday February 14, 2007 @07:46PM (#18018116)
  
  Here are a few of the technical reasons one might choose Vertica over Monet; I'll not get into business issues.
  
  Vertica is designed for large amounts of data, and is optimized for disk based systems. Monet does benchmarks against TPC-H Scale Factor 5 (30 million records, an amount which would fit in main memory) running on Postgres; Vertica does TPC-H Scale factor 1000 (6 billion records) against commercial row stores tuned by people who do such work to make a living.
  
  Vertica runs on multi-node clusters, allowing the cluster to grow as the amount of data grows, while Monet doesn't scale to multiple machines.
  
  There are numerous differences in the transaction systems, update architecure, tolerance of hardware failure, and so on, that make Vertica better suited to the enterprise DW market.
  
  Note: I work for Vertica
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by CodeShark ( 17400 ) writes:
    
    Thanks for the perspective. That's what I like about /. I can put out a thought or question and mostly get good information back relatively quickly.
Comprable? (Score:2)

by Pinback ( 80041 ) writes:

I wonder how this compares to http://en.wikipedia.org/wiki/Netezza [wikipedia.org]Netezza.
Google uses this approach (Score:3, Informative)

by russryan ( 981552 ) writes: on Wednesday February 14, 2007 @07:09PM (#18017774)

See http://en.wikipedia.org/wiki/Bigtable [wikipedia.org] for a description of Google's column oriented database.

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by ramakant ( 256472 ) writes:
  
  Here's a good comparison of the two approaches:
  http://glinden.blogspot.com/2006/05/c-store-and-go ogle-bigtable.html [blogspot.com]
  (per my post below, Vertica is a commercial version of MIT C-Store: http://db.lcs.mit.edu/projects/cstore/ [mit.edu] )
More Scalability (Score:2)

by Doc Ruby ( 173196 ) writes:

How about a database with the exact same query API (not just "but it's all SQL") as, say, Oracle or MS-SQL, or even Postgres, that allows any number of parallel query servers to work against a single datastore?

In other words, instead of yet another incompatible database, how about one that we could just switch to from an existing one, that is arbitrarily scalable against shared data. If you're going to get clever and act like you can solve hard problems, why not give people what we need, and not just what y
- Re: (Score:2)
  
  by PCM2 ( 4486 ) writes:
  
  How about a database with the exact same query API (not just "but it's all SQL") as, say, Oracle or MS-SQL, or even Postgres, that allows any number of parallel query servers to work against a single datastore?
  
  What would be the purpose of that? Performance gains? I/O is going to be your bottleneck there, and it sounds like it would start to clog up sooner, rather than later.
  In other words, instead of yet another incompatible database, how about one that we could just switch to from an existing one, th
  - Re: (Score:2)
    
    by Doc Ruby ( 173196 ) writes:
    
    IO is the bottleneck anyway. The scheme I mentioned reduces the bottlenecks to that single one. And it allows arbitrary scaling with minimal (if any) recoding, just by adding HW.
    
    If you're going to get snotty and dismissive, why not recognize that the needs of the market, easily/cheaply scalable databases without complex planning in application design, are more important than what this team happens to think it can do better, and don't need a vendor white paper to make clear in a few sentences?
This is a commercial version of MIT C-Store (Score:4, Informative)

by ramakant ( 256472 ) writes: on Wednesday February 14, 2007 @07:37PM (#18018052)

This looks like it will be a commercial version of the Michael Stonebraker and MIT developed C-Store column-oriented:
- Web site: http://db.lcs.mit.edu/projects/cstore/ [mit.edu]
- Wikipedia Entry: http://en.wikipedia.org/wiki/C-Store [wikipedia.org]
They distribute the source with a fairly liberal license, so this looks like something the open source community could pick up and run with.

Share
twitter facebook
An issue with column orientation (Score:2, Informative)

by jfroelich ( 1022159 ) writes:

Is that you do not scale as well to a large number of columns. To access a set of X records with 100 columns, you have 100 asynchronous I/O calls to the separate column stores. I sell an analytical software that does just this, and it is not a technical something that should just be ignored. In some regards the single file row oriented system has less I/O overhead. We have come up with some ways to reduce the file system overhead, but while it is small, it is noticeable, more so on systems not designed to h
- Should scale better (Score:2)
  
  by Jayson ( 2343 ) writes:
  
  Column-oriented DBs should scale better with more columns if the the query doesn't access all the columns (which they rarely do). The DB only needs to keep the columns in memory that are being accessed. This is far better than a row-oriented DB that needs to cycle though the entire table or numerous indexes to get a result set.
Stealthy? (Score:2, Funny)

by plasmacutter ( 901737 ) writes:

it's on the front page of slashdot.. how stealthy can it be?
Stupid question: Still SQL? (Score:3, Interesting)

by WoTG ( 610710 ) writes: on Wednesday February 14, 2007 @11:54PM (#18020050) Homepage Journal

I've never heard of column based databases prior to this article. Would I be correct in assuming that you still can work with these using regular SQL?

Share
twitter facebook
- SQL would inefficient (Score:3, Informative)
  
  by Jayson ( 2343 ) writes:
  
  One of the benefits of column oriented DBs is that tables have an ordering, and that ordering can be exploited in queries. SQL doesn't give a good way to exploit it. Column DBs do allows SQL, but they also have other native languages that people tend to use.
- Re: (Score:2)
  
  by sonofagunn ( 659927 ) writes:
  
  Yes, still SQL. Column oriented DBs are meant to optimize SQL reads where you only are using a few columns in your SQL, but the tables have many columns. This doesn't change anything about SQL.
- Re: (Score:3, Funny)
  
  by varmittang ( 849469 ) writes:
  
  V
- Re: (Score:3, Funny)
  
  by Aqua_boy17 ( 962670 ) writes:
  
  Yeah, but what does its radar signature look like?
  Probably, a flock of seagulls.
- Re: (Score:2)
  
  by misleb ( 129952 ) writes:
  
  I think in this case "stealthy" means that nobody has really heard of them before and nobody seems to care.
  
  -matthew
  - Re: (Score:2)
    
    by ShieldW0lf ( 601553 ) writes:
    
    They've got 23 million in funding... apparently someone cares...
    - Re: (Score:2)
      
      by misleb ( 129952 ) writes:
      
      But it is monopoly money!
      - Re: (Score:3, Funny)
        
        by Gospodin ( 547743 ) writes:
        
        Microsoft is backing them?
- Re: (Score:2, Funny)
  
  by eclectro ( 227083 ) writes:
  
  Yeah, but what does its radar signature look like?
  
  It's not bad, but the new startup synergistica that I'm working on is gonna be completely invisible.
- Re: (Score:2)
  
  by nuzak ( 959558 ) writes:
  
  It's not a breakthrough, it's simply a vertical database design, and it will accellerate SOME kinds of queries, and not do so well on others. It's great for the kind of data mining where you're going to vertically slice the data anyway, not so good for OLAP and decision support where you usually want the whole record at once. You replicate to one of these databases, you don't usually primarily enter data into it -- with trading data being one notable exception. Financial apps love using kx, which is blin
  - KDB on disk (Score:2)
    
    by Jayson ( 2343 ) writes:
    
    KDB actually does its best when pulling from disk. If you can cache a B-tree in memory, there isn't much as much performance increase since disk seeking is your typical enemy in these large datasets. A hash table is even good when everything is in memory, so standard indexes do well. However, when going to disk, KDB can get away with far fewer disk seeks since it is pulling contiguous regions into memory.
    
    It will be interesting to see how Stonebreaker's new DB performs, since there are a number of column DBs
  - - Re: (Score:2)
      
      by nuzak ( 959558 ) writes:
      
      Yikes, I actually meant to say OLTP, since I contrasted it with data warehousing. I just choked on my alphabet soup a little. Thanks :)
- - Re: (Score:2)
    
    by georgewilliamherbert ( 211790 ) writes:
    
    A) Is your benchmark a data warehouse type app benchmark or transactional? Column oriented is slower for transactional typically but much faster for data warehouse. I don't care how many frames per second you measure if I'm buying a LAMP web server system.
    
    B) Your benchmark data doesn't show that you've tried to run Sybase IQ or C-store column-oriented databases against the workload.
    
    Are you really sure that you want to be so sure about this, given that you may not be testing the right thing, and haven't te
  - Re: (Score:3, Insightful)
    
    by Splab ( 574204 ) writes:
    
    Uhm... wtf?
    
    Seriously, you tested MySQL vs. other databases with "out of the box" setups? MySQL isn't a real database when running MyIsam engine, you simply cannot compare that with anything else. And on top of that, try do a proper insertion in MySQL, one single transaction with a few millions of rows and see how well that does. Oh and did you ever stop to think about _why_ MySQL does perform so much faster on that test? Try doing it on a InnoDB table with standard setup, even at 600k rows it slows to a cra
    - you read 40 pages in under 4 mins - you're fast? (Score:2)
      
      by tota ( 139982 ) writes:
      
      > people like you, you have no idea about what you are comparing, just figured that setting up something out of the box will give a good insight into the speed.
      
      I guess you didn't read the first page, or the second?
      
      As stated (multiple times), the purpose of this report is to compare various aspects with "out of the box" performance, with all the caveats that it implies.
      And FYI I will be comparing MySQL InnoDB next time around.
      
      > Ohh and the 100 fold increase in speed is very much likely to happen
      
      > W
      - Re:you read 40 pages in under 4 mins - you're fast (Score:2)
        
        by Splab ( 574204 ) writes:
        
        Right, so you got a table with lets say, 5 million rows spanning perhaps 2 GB of data. Now if you want to find out how many of the rows contain a specific value, lets say foo_bar is below 50, with vertical representation you need to scan your entire dataset, that is 2 GB. With horizontal representation you only need to dig through 5 million entries, let's assume it's 32bit integers and you are down to 20MB of data. Of course you can fix a bit of this by using an index covering just that column, but on very
        
        thanks for proving my point - more examples below (Score:2)
        
        by tota ( 139982 ) writes:
        
        So under some very specific circumstances, when compared with a table that is not indexed, horizontal is going to be much faster (since that's what it it is designed to do...). Well that's no surprise!
        All I am saying is that claiming that the performance is going to be 100 times faster is not a good measurement. Every database vendor will find a scenario that suits their engine and proves unequivocally that they are the best - but they can't all be right, now can they?
        This is exactly the kind of thing t
- Re: (Score:3, Funny)
  
  by bob.appleyard ( 1030756 ) writes:
  
  You're 100 times faster than anyone else, obviously.
- Re: (Score:3, Informative)
  
  by perfczar ( 1064296 ) writes:
  
  Vertica is not open source. Not sure where the confusion came from.
  
  Note: I work for Vertica.
- Re:Sounds great but.. (Score:4, Informative)
  
  by perfczar ( 1064296 ) writes: on Wednesday February 14, 2007 @07:10PM (#18017784)
  The Vertica business model is to sell a database engine (software to store and query data). Clearly use of standard interfaces is important, otherwise nobody would be able to make use of the product (which really ends up being a component of a larger system or strategy) without going to a heap of trouble. So of course Vertica has:
  
  A JDBC driver
  
  An ODBC driver
  
  An interactive SQL client
  
  A growing list of tested integrations with other software
  
  Note: I work for Vertica
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by georgewilliamherbert ( 211790 ) writes:
  
  It takes balls to say things like that about Michael Stonebraker in the database field... ...and lack of brains or historical clue...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Partners (Score:5, Informative)

Re:Partners (Score:5, Insightful)

Re: (Score:2)

Column oriented databases (Score:2, Interesting)

Re:Column oriented databases (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

When Will This Be Ported? (Score:4, Funny)

Re: (Score:3, Funny)

Re: (Score:2)

Everyone, we are moving to ASP now (Score:4, Funny)

You're bound to get some strange looks... (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

buzzword enabled (Score:4, Insightful)

Re:buzzword enabled (Score:5, Informative)

Re:buzzword enabled (Score:5, Funny)

Re: (Score:2)

Re:buzzword enabled (Score:5, Informative)

Re: (Score:2)

Re:buzzword enabled (Score:5, Informative)

Big claims are backed (Score:4, Informative)

welcome to 1994 (Score:2)

Re:buzzword enabled (Score:4, Insightful)

Re: (Score:2)

Column oriented? (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:Column oriented? (Score:5, Informative)

Re: (Score:3, Insightful)

Re:Column oriented? (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Awesome (Score:2, Interesting)

But does it save the children? (Score:2)

Perfect timing (Score:4, Interesting)

Re: (Score:2)

Doesn't "stealthy" require some stealth anymore? (Score:4, Insightful)

Re:Doesn't "stealthy" require some stealth anymore (Score:3, Funny)

Best of luck (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

One size doesn't fit all (Score:2, Interesting)

Re: (Score:3, Insightful)

Re: (Score:2)

Patent Problems (Score:3)

open source? (Score:2)

never mind (Score:2)

Why does a company promising Linux solutions... (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2)

Speculation (Score:5, Informative)

Re: (Score:3, Informative)

I've been waiting for something like this ... (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Given that... (Score:5, Informative)

Re:Given that... (Score:5, Informative)

Re: (Score:2)

Comprable? (Score:2)

Google uses this approach (Score:3, Informative)

Re: (Score:3, Informative)

More Scalability (Score:2)

Re: (Score:2)

Re: (Score:2)

This is a commercial version of MIT C-Store (Score:4, Informative)

An issue with column orientation (Score:2, Informative)

Should scale better (Score:2)

Stealthy? (Score:2, Funny)

Stupid question: Still SQL? (Score:3, Interesting)

SQL would inefficient (Score:3, Informative)