Are Relational Databases Obsolete? 417
jpkunst sends us to Computerworld for a look at Michael Stonebraker's opinion that RDBMSs "should be considered legacy technology." Computerworld adds some background and analysis to Stonebraker's comments, which appear in a new blog, The Database Column. Stonebraker co-created the Ingres and Postgres technology while a researcher at UC Berkeley in the early 1970s. He predicts that "column stores will take over the [data] warehouse market over time, completely displacing row stores."
They're not mutually exclusive. (Score:5, Insightful)
Okay, at the risk of sounding stupid...
Since when is a column store database and a relational database mutually exclusive concepts? I thought that both column store and row store (i.e. traditional) databases were just different means of storing data, and had nothing to do with whether a database was relational or not. I think the article misinterpreted what he said.
Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...
Re:They're not mutually exclusive. (Score:5, Interesting)
Agreed. It definitely looks like a storage preference. Though column-based storage has definite benefits over row-based when it comes to store once, read many operations. Kinda like what you'd find in a data warehouse situation...
Also, I don't think it's news that Michael Stonebraker (a great name, by the way), co-founder and CEO of a company that (surprise!) happens to develop column store database software, thinks that column store databases are going to be the Next Big Thing. Right or wrong, his opinion can't exactly be considered unbiased...
Hrm.. You must be new here....
Yea, it's all the same. (Score:5, Insightful)
Therefore, pick your method depending on your needs. Are you storing massive amounts of data? Column stores are probably not for you...Your application will run better on a row store, because writing to a row store is a simple matter of adding one more record to the file, whereas writing to a column store is often a matter of writing a record to many files...Obviously more costly.
On the other hand, are you dealing with a relatively static dataset, where you have far more reads than writes? Then a row store isn't the best bet, and you should try a column store. A query on a row store has to query entire rows, which means you'll often end up hitting fields you don't give a damn about while looking for the specific fields you want to return. With column stores, you can ignore any columns that aren't referenced in your query...Additionally, your data is homogenous in a column store, so you lose overhead attached to having to deal with different datatypes and can choose the best data compression by field rather than by data block.
Why do people insist that one size really does fit all?
Re:Yea, it's all the same. (Score:5, Interesting)
-theGreater
Re:Yea, it's all the same. (Score:5, Interesting)
Re: (Score:3, Interesting)
Naturally, you don't want to delete that person because then you lose lots of important archival data. So y
Re: (Score:3, Informative)
The easiest way to deal with proliferating events is to create a very simple table that has a timestamp, your basic audit information (user who made the change, change the terminal was made from, etc), and the change itself.
So say Bob makes 50,000 dollars. This entry was put in the table when he was hired and contains bob's employee re
Re: (Score:3, Insightful)
There are tried and true approaches to the problems you describe; several actually for most of them, depending on your needs.
If you run into these problems due to an evolutionary growth into these features, then its time to stop, take a step back, and re-architect your schemas to handle these needs from the get go.
There's no reason at all to resort to hacks like stored procedures and triggers. These a
Re:Yea, it's all the same. (Score:5, Funny)
Cell-based storage!!! Best of both worlds!!! Mix of both Row and Column based storage, how can we go wrong!
Just think about it, what could be better than one file for each column in each row?
And they said I couldn't have my cake and eat it too, sheesh
Re:Yea, it's all the same. (Score:5, Interesting)
You are years late. The PICK operating system/db already does that. Back in 1985 I used the DOS based Advanced Revelation to write GAP accounting packages. It used the ASCI 253 character to separate "columns" of data in a cell. Reading and writing was parsed automatically. Invoice information, for example, was stored in a Customer's info table, not in a invoice table, and doing a query on accounts receivable produced very fast results. Symbolic dictionary definitions containing formulas allowed for easy column and row totals.
In fact KDB/K looks a lot like a PICK system that uses FORTH as the language.
Re: (Score:3, Informative)
I actually wonder if some of the current databases such as Microsoft SQL Server, etc. aren't going to actually start morphing into these older styles of databases due t
Re: (Score:3, Informative)
Re:Yea, it's all the same. (Score:5, Insightful)
I went back and read the original article. To Michael Stonebreaker's credit, the ComputerWorld article (and the submitter) grossly misrepresents what he said.
He did not say that RDBMSes are "long in the tooth." He said that the technology underlying them hasn't changed since the 1970's, and that column stores is a better way to represent data in certain situations. In fact, the very name of his original column was "One Size Fits All - A Concept Whose Time Has Come and Gone"
Are Relational Databases Obsolete? Not at all (Score:4, Informative)
The relational concept will still exist regardless of the underlying storage methods.
Re: (Score:3, Interesting)
The modern RDBMS is good when the pipe from client to server is much smaller than the pipe from server to backing store/cache. Minimal communication for maximum results. The trade-off, of course, is that the server needs lots of resources because it's doing significant work on behalf of every client.
"Non-relational" databases still have their place today, howe
Re:Yea, it's all the same. (Score:5, Funny)
Oh, the horror! That's a heinous crime on Slashdot! Not even the editors do that!!!
Index on every column, how revolutionary! (Score:3, Insightful)
That's why row-oriented databases have indexes and perform index scans.
Perl Objects have both column and row DB advantage (Score:5, Interesting)
instead one can use blessed scalars holding a single integer value for instances and let the class variable contain all the instance data in arrays indexed by the instances scalar value.
This technique was originally promoted as an indirection to protect object data from direct manipution that bypassed get/set methods. But it also allows the object to be either row or column oriented internally. that is the class could store all the instance hashes in an array indexed by the scalar. or it could store each instance variable in a separate array that is indexed by the scalar value.
Thus the perl class can, on-the-fly, switch itself from column-oriented to row-oriented as needed while maintaining the same external interface.
Of course this is not a perl-exclusive feature and it can implemented in other languages. It just happens to be particularly easy and natural to do in perl.
Re:Perl Objects have both column and row DB advant (Score:3, Insightful)
Re: (Score:2)
-theGreater.
Re:They're not mutually exclusive. (Score:5, Interesting)
Re: (Score:2)
1st Q: "Can you run MDX queries against the Vertica DB?" A: "No, we don't support MDX queries."
The conversat
Re:They're not mutually exclusive. (Score:5, Funny)
Re:They're not mutually exclusive. (Score:5, Interesting)
Rule of thumb:
- you use row dbs for OLTP. They're great for writing.
- you use column dbs for data mining. They're amazing for reading aggregates (average, max, complex queries...)
The major problem with column dbs is the writing part. If you have to write one row at a time, you're screwed because it needs to take each column, read, insert into it and store. If you can write in batch, the whole process isn't much more expensive. So writing a single row could take 500ms, but writing 1000 rows will take 600ms.
Once the data's in, column dbs are the way to go.
Re: (Score:3, Interesting)
Re: (Score:2)
Okay, at the risk of sounding stupid...
Since when is a column store database and a relational database mutually exclusive concepts?
It doesn't. The original blog is about Row-oriented DBMS vs. Column-oriented DBMS, and the author of the article (or his know-it-all-better editor) confused himself enough to believe somebody abbreviated that as RDBMS which of course means Relational DBMS. The submitter probably not reading the Wikipedia article he linked to didn't help either.
Stonebraker's current track record (Score:5, Insightful)
Re:Stonebraker's current track record (Score:5, Informative)
Later, Stonebraker's work on postgres (theorey AND code) was how to handle different datatypes within databases. He took an OO approach to that. That was directly used in Illustra and then went on to Informix. More importantly, Oracle used a lot of that work to create 8i as has other DBs. IOW, he IS a leading theorists AND knows the code.
Considering that he has been on top of all the major advances within the DB world, why would you discount what the man says? As it is, you mention Gray and Mohan who both did some good work at IBM, but have not really advanced DBs forward that much. They simply moved relational model DB forward( Bascially, they were red herrings). But Stonebraker is working across ALL the spectrums and contributes heavily to knew models. His work is everywhere.
Finally, think about what he says. The column major is more useful for data warehousing BECAUSE it allows for data to be compressed quickly, tighter (which makes sense), AND allows you to work with just the data that you need. In a row major, you will end up creating and maintaining indexes to increase the speeds of reads. But an index is for the most part a single (or just a few) columns, which basically makes them a column major. But this requires LOADS of cpu and space to maintain. The column major approach simply keeps the indexes, if you will and discards the rows. This allows for FAST operations if you are doing LOADS of reads, and little changes. That is PERFECT for data warehousing.
So armed with that knowledge, exactly WHY would you discount his work and his statements?
Re:They're not mutually exclusive. (Score:5, Insightful)
An opinion is subjective, but it's not necessarily biased. A disinterested observer could have an unbiased opinion.
C'mon, the guy is biased! (Score:5, Funny)
Re:C'mon, the guy is biased! (Score:5, Funny)
Mod Article -1 (Author doesn't get it) (Score:5, Informative)
Re: (Score:2)
dual-mode db? (Score:5, Interesting)
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:3, Informative)
The FA threw me for a loop a couple of times, I honestly _did_ try to read it :) Correct me if I'm incorrect, but wouldn't having a service for column stores be (usually) not needed on most Unix-like platforms? Since this is mostly reading, I would think such efforts might be better spent on sqlite (or similar)?
If your in a situation where you're most
Re: (Score:2)
Re: (Score:3, Funny)
Re: (Score:2)
No but you can make views and stored procedures that can do the trick, that makes it look like it and aids in programming and can save time.
Re: (Score:2)
Actually yes it is a SQL 2005 thing although there was a way to do it SQL 2000 called Data cubes from my understanding. You end up having multiple data files just like you would in an Oracle situation. It's easier to explain in Oracle terms as you'd just create a tablespace for column based tables and a tablespace for row-based tables. Then away you go, both storing files as you see fit.
I guess in SQL 2005 terms you'd be creating another database on the same server and just use server linking to get your
Re: (Score:3, Informative)
If you are doing killer aggregates (tell me the sum of the sales in every month for the last 25 years), you are going to be limited by possibly 2 things: CPU cycles and disk I/O throughput.
There are several ways of addressing these issues. Basically this means either optimizing or parallelizing. Column-oriented stores are likely to help optimize the disk i/o throughput so you can just thow more processor effort at the problem.
You can also do what Teradata and BizgressMPP do wh
Re: (Score:3, Interesting)
RLE on the data columns is a pretty big win for column-based stores, too. If the slaves manage RLE during a replication, you could have one hell of a DB farm.
well (Score:5, Informative)
Re: (Score:2)
Wait a minute!!! Are you suggesting that the submitter actually reads the article before submitting it? Blasphemer!
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
both have their pros and cons, now stop trying to take a stupid joke an make it into your personal database soapbox
Re: (Score:2)
Well it stores my data and meets my performance requirements. Is there something else I need it to do, given that I already own a dish washing machine.
Re: (Score:2)
Yes. You need to take your dish washing machine out to dinner occasionally, and buy her flowers. And tell her she looks beautiful.
Rotate (Score:5, Funny)
>"column stores will take over the [data] warehouse market over time, completely displacing row stores."
Hmmmm. So if I rotate my Paradox or Excel table by 90 degrees, I have achieved database coolness? Who knew it was so easy.
Re: (Score:2)
Re:Rotate (Score:4, Informative)
Re:Rotate (Score:5, Insightful)
Re: (Score:2, Funny)
The guy... (Score:5, Interesting)
Stonebraker has been pushing the concept of column-oriented databases for quite some time now, trying to get someone, ANYONE, to listen that it's superior. While I think he has a point, I'm not sure if he really goes far enough. Our relational databases of today are heavily based on the ISAM files of yesteryear. Far too many products threw foreign keys on top of a collection of ISAMs and called it a day. Which is why we STILL have key integrity issues to this day.
It would be nice if we could take a step back and re-engineer our databases with more modern technology in mind. e.g. Instead of passing around abstract id numbers, it would be nice if we had reference objects that abstracted programmers away from the temptation of manually managing identifiers. Data storage is another area that can be improved, with Object Databases (really just fancy relational databases with their own access methods) showing how it's possible to store something more complex than integers and varchars.
The demands on our DBMSes are only going to grow. So there's something to be said for going back and reengineering things. If column-oriented databases are the answer, my opinion is that they're only PART of the answer. Take the redesign to its logical conclusion. Let's see databases that truly store any data, and enforce the integrity of their sets.
Re: (Score:2)
I read the blurb and thought "Haven't we had the same 'debate' over the same guy a bunch of times before?" The name stuck in my head as I always envision the former Notre Dame linebacker [cstv.com] and his famously low GPA turning to a career in database architecture.
Re: (Score:3, Insightful)
An interesting idea for improving database technology is to actually change the way that database data is mirrored in a disk array. Rather than writing EXACT duplicates of the data, perhaps one set could be written in row-oriented form, while the other set would be written in column-oriented form. This guarantees that the data
Differnt Solutions (Score:2)
Nope! Apparently there is a new method and thus it must be the real One Twue Way.
IMS--Hierarchical DB Still Exists (Score:5, Insightful)
Re:IMS--Hierarchical DB harder to use? (Score:5, Interesting)
From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.
We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.
What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.
As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...
In the 3rd generation programming languages this was just a simple structure with 100 entries.
The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.
Next: There were about 10 parameters per field (column).
1: Column name
2: Column name length
3: data type
4: data length
5: character representation
finally 10: Address where the data lives.
The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.
Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.
Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.
I gave up on that interface.
---------------
Oracle had pre-compilers. They did the same damn thing. The code generated by the pre-compilers was just awful.
---------------
While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.
Nooo...
In the old days one could read and write into a flat file at a given location with a single statement or function call depending on the language. Of course "where" to read and write became a real issue and I do fully understand the complexity of file based tree structures and so forth, especially since I wrote a lot of code to implement these algorithms.
The thing is now we have RDBMS and other solutions that give us the data organisational abilities we need - and we lose the ease of mapping these structures into a suitable structure or object in the programming language.
I for one do not think we have stepped forward very far at all.
-------------
I'll toss in a case in point made by a good buddy of mine who just happens to be one of the top geophysical programmers in this city.
One of his clients was running an application hooked to an Oracle database running on a fast SUN. Run times were measured in close to a day.
Finally they removed the Oracle interface and replaced it with a glorified flat file. They clearly built in some indexing. The result is the run times dropped to under 20 minuets.
As my buddy says - He will NOT use any RDBMS. He can take 5 views of the data comprising 1000's of seismic lines and the user can click on any trace number, line number, well tie and so forth and in real time he can modify all views of the data on as many as 5 s
Re: (Score:3, Informative)
I think a place to start is to ask how to map language structures and RDBMS structures into a common denominator. One should never be looking at function calls with over 1000 parameters. That is just plain stupid. One should also never be dynamically mapping each and every tidbit of every field in a row on the fly at run time and especially so for each row in a table.
Quite right, which is why programmers who still have have their sanity use JDBC or DBI. This part of your problem has already been solved at least twice.
Here's how I write out a customer record:
$dbh->do(
'insert into customer (id,name,yada1,yada2,yada3) values (?,?,?,?,?)',
undef,
@customer{id,name,yada1,yada2,yada3}
);
I think that's even easier than your 3rd-gen code, and I didn't have to write my own indexing code.
Re: (Score:3, Insightful)
Re:IMS--Hierarchical DB harder to use? (Score:4, Interesting)
On the contrary.
From a standard 3rd generation programing language one can read and write into flat files and we can do close to this with a hierachical database.
I think there is a key distinction here. Application object store vs data management. Hierarchical db's are far better at storing object information, but *far* worse at real data managment.
We lose this with relational databases because the way the database organises data has no direct mapping to the way it might be set up in a standard programming language.
What this means is that every transaction to and from the database must go through a literally horrible re-mapping. IE. The language data structures do not correspond to the RDBMS data structures and visa versa.
In LedgerSMB, we solved this by putting a functional interface in the db. Then we dynamically map the objects and their properties into functions and arguments. Works great :-)
As an example - in postgreSQL the last I looked at writing a simple row into a table where there were something like 100 columns in the row...
You are either trolling or you need to fire the DB architect who designed that. THere is *no way* that a 100-column table is good DB design. (Ok, mathematically, there is nothing that precludes it being good db design, but I can't even imagine a scenario where this would be OK).
In the 3rd generation programming languages this was just a simple structure with 100 entries.
Oh, you were the one who designed the 100-column table. Sorry..... Please go out and get some books on db design. You will thank me :-)
The data transfer from that structure generated a function call with more than 1000 parameters. This was to be mapped and re-mapped with each call to transfer data, this is even though the structure itself is static and determined at compile time.
IMO, your problem honestly is in the fact that you are using a monkey wrench as a ball peen hammer. It may sorta work but you are using the wrong tool for the job. If you want a simple object store use BDB or something like it. If you want a real data management solution, build your db *first.* If that is not your goal, use something other than an RDBMS.
Next: There were about 10 parameters per field (column).
... etc
1: Column name
2: Column name length
3: data type
4: data length
5: character representation
finally 10: Address where the data lives.
The thing is such a table could be set up very easily and populated with a simple loop that rolls in the required values via say a mapping function with about 10 arguments. This could be done ONCE at run time to prepare for the transfer of data and then the same table could be referenced for each call and simply an address could be sent with the transfer.
Noooo.. It was dynamic and the data was encoded as parameters on the stack. This means the stack must be build and torn down and rebuilt for each call.
How is this an issue with RDBMS's?
Next - the implementation was so bad that the program would run in test mode with only a few parameter but it failed when the whole row was to be transfered.
Again, this is not a PostgreSQL problem ;-)
I gave up on that interface.
From your description, that sounds like a wise choice.
While there is much good to say about RDBMS's in general. The issue I ran into was the interface from 3rd generation languages took a HUGE step backward. IMHO we should have a high level language statement called DBRead() and DBWrite(). In C this should generally correspond to fread() and fwrite(). If this is too complex then DBWriteStruct() could be implemented with suitable mapping helper function.
Again, this is an issue with the frameworks you are using. Personally, I tend to do the
Re: (Score:3, Insightful)
Re : Are Relational Databases Obsolete? (Score:5, Funny)
It's like the packet of crisps that says "Is there a 20 pound note in here !!?" - the answer should always be 'No'.
Except maybe for one person.
sed -e 's/crisps/potato chips/' -e 's/pound/dollar/'
Re: (Score:3, Funny)
sed -e 's/crisps/potato chips/' -e 's/pound/dollar/' -e 's/note/bill/' -e 's/packet/bag/'
that doesn't mean they're going to become obsolete (Score:5, Insightful)
In fact, the new wave of user-generated-content websites and webapps seems to me to indicate the exact opposite - if anything, row-store databases, with their usefulness in write-heavy applications, should becoming, if anything, more and more necessary/useful on the web.
So...chalk this one up to some grandstanding on the part of a guy who wants to put more money in his pockets...
Marketing hype by FUD.. typical (Score:2, Informative)
"Column-oriented databases -- such as the one built by Stonebraker's latest start-up, Andover, Mass.-based Vertica Systems Inc. -- store data vertically in table columns rather than in successive rows. "
Marketing hype for his startup.
What a sleezeball.
No. (Score:2)
Relational databases will be around as long as humans generate relational data. Take the classic example of an invoice that may have many entries, each entry referencing an inventory item. This sort of thing is likely to exist forever, and RDBMSes model that pretty well.
As far as whether the backend is row- or column-oriented - who cares? As long as I can use the one most appropriate to my access pattern, the implementation details just don't interest me enough to get worked into a furor. Don't get me
Re: (Score:2)
We have one customer, a large contractor, who is trying a
Aha! (Score:5, Funny)
turning your head sideways.
Re: (Score:2)
Should be, but isn't, and won't. (Score:5, Interesting)
It is very frustrating because we do have programmers on staff that create third party plug-ins to these databases to try to make solutions that the OEM code doesn't. When you meet younger programmers, many of them are frustrated themselves to work on ancient solutions that have no hope of being upgraded, because these industries we work in are not in a rush to try anything new and shiny, but instead are happy with the status quo.
I just bid a job a few months back that would cost $150,000 to upgrade their database infrastructure, and likely save the company $300,000+ annually in added efficiency, less downtime, and a more robust report system. Guess what they said? "We all think it is fine the way it is." That's money thrown out the window, employees who are frustrated (without knowing why), and forcing the company to lose efficiency by not being able to compete with newer companies that are utilizing newer technology to better their bottom line.
Ugh.
Re: (Score:3, Insightful)
I've been in the banking industry for the past 6 years and every bank I've worked at has relied on text-only server side applications that we connect to via various terminal emulators. The workstations are all modern, but we don't use anything more taxing than excel and an e-mail app.
Why have none of them changed beyond a few interface bolt-ons? Well . . . one of them actually did once . . . and it wasn't pretty. Sure it was graphical and point-and-click and more "user friendly" in appearance. But the
Re: (Score:3, Informative)
You do know those aren't remotely comparable, right? FoxPro scales to more users than Access (due to tables separated into different files), but they're otherwise on a similar level in terms of what sort of jobs they're appropriate for. MS SQL Server is a full-fledged enterprise RDBMS. It may not scale quite as far as Oracle or DB2, but it get closer every generation, and having worked mostly in Oracle for the last year or so, I've been m
Common Business Mistake (Score:3, Insightful)
#1: Assuming what you think your customer needs is what your customer wants.
#2: Assuming they are the ones who made the mistake when you lost the job.
Simple solution. (Score:4, Funny)
Re: (Score:2)
Well, he WOULD say that (Score:2)
Object Databases (Score:4, Interesting)
Re: (Score:3, Insightful)
1. OR mappers like Hibernate have gotten to the point that they are quite good, so they make the value add prop of object databases less compelling.
2. Object databases are never going to get the speed of relational databases. This is the real dealbreaker. Suppose an object database can handle 95% of my queries with adequate performance. All well and good, but I'm totally screwed on those other 5%. On the other hand, if I was using a relational data
He may have a point (Score:3, Interesting)
But there's no way that RDBMS's are going away -- relational algebra simply solves too many data storage problems.
Are relations obsolete? (Score:5, Informative)
SenSage is earlier example of column-oriented DB (Score:2)
This will be the year (Score:2)
paradigm shift! (Score:5, Funny)
Careful (Score:2)
The Dvorak keyboard is more efficient by a factor of 10 and you don't see it taking over the keyboard layout landscape.
Just because something is "better", even in technology, doesn't mean it's going to take over.
I've also lived through the decline of mainframes...still around. The internet was going to replace faxes...I still have a fax machine.
Linux is better than Windows, columns are better than rows but I wouldn't get all a-twitter over either of them just yet. Particularly from someone selling c
rtfa before posting (Score:4, Informative)
To add some content, this is about optimal storage for SQL databases in a data warehouse context where there are some interesting products that use something more optimal than the one size fits all solutions currently available from the big RDBMS vendors. The API on top is the same (i.e. SQL and other familiar dataware house APIs), which makes it quite easy to integrate.
Regarding the obsolescence question, one size fits all will be good enough for most for some time to come. Increasingly people are more than happy with lightweight options that are even less efficient on which they slap persistence layers that reduce performance even more just because it allows them to autogenerate all the code that deals with stuffing boring data in some storage. Not having to deal with that makes it irrelevant how the database works and allows you to focus on how you work with the data rather than worrying about tables, rows and ACID properties. Autogenerating code that interacts with the database allows you to do all sorts of interesting things in the generated code and the layers underneath. For example, the hibernate (a popular persistence layer for Java) people have been integrating Apache Lucene, a popular search index product, so that you can index and search your data objects using lucene search queries rather than sql. It's a quite neat solution that adds real value (e.g. fully text searchable product catalogs are dead easy with this).
Column based storage is just an optimization and not really that critical to the applications on top. If you need it, there are some specialized products currently. The author of the column is probably right about such solutions finding their way into mainstream products really soon. At the application level, you'll still be talking good old SQL to the damn thing though.
Wrong approach? (Score:4, Interesting)
I think you would have to determine the main use of the table beforehand (write-seldom or write-often), but the DB engine could use a different scheme for each table that way. I know some will claim that it can't be more efficient to split things this way, but remember that this guy is claiming 50x the speed for write-seldom operations.
As for Relational Databases... How is this exclusive to that? This is simply how the data is stored and accessed. If he is claiming 50x speed-up because he doesn't deal with the relational stuff, that's bunk. You could write a row-store database with much greater speed as well, given those parameters.
Specialized versus generalized? (Score:4, Interesting)
I know very little about DBMS systems, but I thought it has always been true that you can achieve monumental performance increases by building somewhat specialized database systems in which the internals of the system make assumptions, and are tied to, the structure of the data being modelled. In fact, when RDBMS systems came in, one of the knocks on them was that they were far more resource-intensive than the hierarchical databases they displaced. However, the carved-in-stone assumptions of those models made them difficult and expensive change or repurposed.
I'm sure I remember innumerable articles insisting that "relational databases don't need to be really all that much terribly slower if you know how to optimize this that and the other thing..."
In other words, as an outsider viewing from a distance, I've assumed that the increasing use of RDBMS was an indication that in the real world it turned out that it was better to be slow, flexible, and general, than fast, rigid, and specialized.
So, what is a "column store?" It sounds like it is an agile, rapid development methodology for generating fast, rigid, specialized databases?
Soon... (Score:4, Funny)
The near future. Mr. Stonebraker walks into a store.
Mr. Stonebraker: How much are these plums?
Checkout girl: Plums? They're $0.99, $1.39, $12.49, $15.99, $26.38, $13.37...
Far from it! (Score:3, Funny)
What? Not that kind of relational?
ODBMS (Score:3, Informative)
1. Object-oriented databases are designed to work well with object-oriented programming languages such as Python, Java, C#, Visual Basic
2. ODBMSs use exactly the same model as object-oriented programming languages.
3. It is also worth noting that object databases hold the record for the World's largest database (over 1000 Terabytes at Stanford Linear Accelerator Center).
4. Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers (e.g. the objects are stored in trees for fast retrieval). Dynamic indexing schemes further speeds up retrieval of full text searches.
5. Provides data persistence to applications that are not necessarily 'always on' - e.g. HTTP based stateless applications.
I think RDBMSs will be around for some time -- but they will be relegated to more structured situations and smaller data sets. ODBMSs will take over where data is changing, persistence is critical, data types are mostly large binary objects with associated meta-data, and datasets are humongous.
Right now my favorite ODBMS is the ZODB (Zope Object Data Base) [wikipedia.org] - an ODBMS system tightly integrated with both Python (implimented using Python's native 'pickle' object persistence functionality), and the Zope web application development system - which itself is built with and uses Python. You can learn more about Zope at Zope.org [zope.org].
We're talking about this topic at geekSessions (Score:3, Informative)
Josh Berkus from the PostgreSQL core team
Paul Querna from Apache and Bloglines (wrote his own filesystem for Bloglines)
Chad Walters from Powerset who is implementing BigTable there.
Hope to see you there!
Arther makes great point, complex subject (Score:3, Informative)
For most users this does not matter (Score:3, Insightful)
SSDs and column stores... (Score:3, Insightful)
Re: (Score:2)
The main problem is so far nobody really has brought out something more reable to deal with sets in a mathematical sense, you could use mathematical operators but then things would become even less readable than SQL is.
All approaches on the programming side I have seen (criteria objects etc...) make things only easier in some domains, after that you revert to plain sql and its derivates.
Re: (Score:2)
TRUNCATE TABLE SQL_LANGUAGE;
COMMIT;
There, feel better?