Slashdot Log In
Is the One-Size-Fits-All Database Dead?
Posted by
kdawson
on Tue Jan 09, 2007 09:50 PM
from the specialized-and-optimized dept.
from the specialized-and-optimized dept.
jlbrown writes "In a new benchmarking paper, MIT professor Mike Stonebraker and colleagues demonstrate that specialized databases can have dramatic performance advantages over traditional databases (PDF) in four areas: text processing, data warehousing, stream processing, and scientific and intelligence applications. The advantage can be a factor of 10 or higher. The paper includes some interesting 'apples to apples' performance comparisons between commercial implementations of specialized architectures and relational databases in two areas: data warehousing and stream processing." From the paper: "A single code line will succeed whenever the intended customer base is reasonably uniform in their feature and query requirements. One can easily argue this uniformity for business data processing. However, in the last quarter century, a collection of new markets with new requirements has arisen. In addition, the relentless advance of technology has a tendency to change the optimization tactics from time to time."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
"In the last quarter century..." (Score:2, Funny)
Stonebreaker has a vested interested in Stream Dbs (Score:2, Informative)
http://www.streambase.com/about/management.php [streambase.com]
Was there ever a one-size-fits-all database? (Score:2)
Re: (Score:2, Funny)
Re:Was there ever a one-size-fits-all anything? (Score:2)
There never has been, and probably never will be. A small embedded database will never be replaced by a fat-asses SQL database any more than Linux will ever find aplace in the really bottom-end microcontroller systems.
Re:Was there ever a one-size-fits-all anything? (Score:4, Funny)
Maybe they could make rubber databases ?
(or it's a bit of a stretch)
Parent
Re: (Score:3, Funny)
"They came in 3 sizes, extra large, large and white man"
Re: (Score:3, Informative)
See the history of PostgreSQL [postgresql.org].
When the community picked the old, dormant Postgres source code up (no problem due to the BSD licensing), the first that was added (after some debates) was the SQL syntax, hence the name change to PostgreSQL.
Bye egghat.
Noticed how roll your own is faster? (Score:2, Interesting)
Re:Noticed how roll your own is faster? (Score:5, Interesting)
Parent
Re:Noticed how roll your own is faster? (Score:5, Informative)
Parent
Taken seriously (Score:3, Funny)
Re: (Score:3, Interesting)
Prediction... (Score:5, Insightful)
2) Mainstream database systems will modularize their engines so they can be optimized for different applications and they can incorporate the benefits of the specialized databases while still maintaining a single uniform database management system.
3) Someone will write a paper about how we've gone from specialized to monolithic...
4) Something else will trigger specialization... (repeat)
Dvorak if you steal this one from me I'm going to stop reading your writing... oh wait.
Re:Prediction... (Score:4, Interesting)
I agree with this prediction. Database interfaces (such as SQL) do not dictate implimentation. Ideally, query languages only ask for what you want, not tell the computer how to do it. As long as it returns the expected results, it does not matter if the database engine uses pointers, hashes, or gerbiles to get the answer. It may however require "hints" in the schema about what to optimize. Of course, you will sacrifice general-purpose performance to speed up a specific usage pattern. But at least they will give you the option.
It is somewhat similar to what "clustered indexes" do in some RDBMS. Clusters improve the indexing by a chosen key at the expense of other keys or certain write patterns by physically grouping the data by that *one* chosen index/key order. The other keys still work, just not as fast.
Parent
Re: (Score:3, Interesting)
Interfaces like SQL don't dictate the implementation, but they do dictate the model. Sometimes, the model that you want is so far from the interface language, that you need to either extend or replace the interface language for the problem to be tractable.
SQL's approach has been to evolve. It isn't quite "there" for a lot of modern applications. I can forsee a day when SQL can efficiently model all the capabilities of, say, Z39.50, but we're not there now.
Re: (Score:3, Insightful)
Z39.50 is actually much, much more than mere "text searching". If you think hard about the way that you interact with a library catalogue or Google compared with how you interact with a RDBMS, you'll realise there are quite a few more differences than just "text searching".
Think about highly heterogeneous data. Libraries, for example, might index books, periodicals, audio-visual items and online resources such as journals. Google indexes web pages, Usenet news articles, PDF documents and so on. And you
Re: (Score:3, Informative)
This can be taken a stage further, with general persistence APIs. The idea is that you don't even require SQL or relational stores: you express queries in a more abstract way and let a persistence engine generate highly optimised SQL, or some other persistence process. I use the Java JDO 2.0 API like this: I can persist and
one size fits 90% (Score:5, Insightful)
But for most uses of databases - or any back-end processing - performance just isn't a factor and haven't been for years. Enron may have needed a huge data warehouse system; "Icepick Johhny's Bail Bonds and Securities Management" does not. Amazon needs the cutting edge in customer management; "Betty's Healing Crystals Online Shop (Now With 30% More Karma!)" not so much.
For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.
Re: (Score:2)
Imagine that.... (Score:5, Insightful)
steve
(+1 Sarcastic)
Dammit (Score:5, Insightful)
The problem I've noticed is that too many applications are becoming specialized in ways that are not handled well by traditional databases. The key example of this is forum software. Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system. Yet we keep trying to cram them into SQL databases, then get surprised when we're hit with performance problems and security issues. It's simply the wrong way to go about solving the problem.
As anyone with a compsci degree or equivalent experience can tell you, creating a custom database is not that hard. In the past it made sense to go with off-the-shelf databases because they were more flexible and robust. But now that modern technology is causing us to fight with the databases just to get the job done, the time saved from generic databases is starting to look like a wash. We might as well go back to custom databases (or database platforms like BerkeleyDB) for these specialized needs.
Re: (Score:3, Funny)
Eventually the folks working on web forums will realize that they are just recreating NNTP and move on to something else.
Re: (Score:3, Insightful)
I wasn't referring to Slashdot in particular, but rather general web forum software. Your PhpBB, vBulletins, and JForums of the world are more along the lines of what I'm referring to. After dealing with the frustrations of setting up, managing, and hacking projects like these, I've come to the conclusion that the backend datastore is the problem. The relational theories still hold true, but the SQL database implementations si
Duh (Score:5, Insightful)
Who thinks that a specialized application (or algorithm) won't beat a generalized one in just about every case?
The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly -- programmer time is expensive, and hardware is cheap.
See also: high level compiled languages versus assembly language*.
(*and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.)
Re:Duh (Score:5, Informative)
I've programmed extensively in assembly. Your statement may be true up to a couple of thousand lines of code. Past that, to avoid going insane, you'll start using things like assembler macros and your own prefab libraries of general-purpose assembler functions. Once that happens, a compiler that can tirelessly do global optimizations is probably going to beat you hands down.
Parent
Re:Duh (Score:5, Insightful)
Parent
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:3, Interesting)
Only people who haven't seen recent advancements in CPU design and compiler architecture will say what you just said.
Modenr compilers apply optimizations on a so sophisticated level that would be a nightmare for a human to support such a solution optimized.
As an example, modern Intel processors can process certain "simple"
Re: (Score:2, Insightful)
Humans have been writing optimized assembler for decades, the compilers are still trying to catch up. M
Re:Duh (Score:4, Insightful)
There are three quite simple things that humans can do that aren't commonly available in compilers.
First, a human gets to start with the compiler output and work from there :-) He can even compare the output of several compilers.
Second, a human can experiment and discover things accidentally. I recently compiled some trivial for loops to demonstrate that array bounds checking doesn't have a catastrophic effect on performance. With the optimizer cranked up, the loop containing a bounds check was faster than the loop with the bounds check removed. That did not inspire confidence.
Third, a human can concentrate his effort for hours or days on a single section of code that profiling revealed to be critical and test it using real data. Now, I know JIT compilers and some specialized compilers can do this stuff, but as far as I know I can't tell gcc, "Compile this object file, and make the foo function as fast as possible. Here's some data to test it with. Let me know on Friday how far you got, and don't throw away your notes, because we might need further improvements."
I hope I'm wrong about my third point (please please please) so feel free to post links proving me wrong. You'll make me dance for joy, because I do NOT have time to write assembly, but I have a nice fast machine here that is usually idle overnight.
Parent
Re: (Score:2, Insightful)
. .
KFG
I thought I was an assembler demon (Score:2)
Go figure -- I hung up my assembler badge. Still a useful skill f
Parallel databases (Score:2)
Specialized software and hardware outperforms generic implementations! Film at 11!
SQL is Dead - Long Live SQL (Score:2)
But it doesn't have to be that way. SQL can be retained as an API, but different storage/query engines can be run under the hood to better fit different storage/query models for differ
This has been known for years already (Score:3, Interesting)
Re:This has been known for years already (Score:4, Insightful)
This is why you pay a good wage for your Oracle data architect & DBA -- so that you can get people who know how to do these sort of things when needed. And honestly I'm not even scratching the surface.
Consider a data warehouse for a giant telecom in South Africa (with a DBA named Billy in case you wondered). You have over a billion rows in your main fact table, but you're only interested in a few thousand of those rows. You have an index on dates and another index on geographic region and another region on customer. Any one of those indexes will reduce the 1.1 billion rows to 10's of millions of rows, but all three restrictions will reduce it to a few thousand. What if you could read three indexes, perform bitmap comparisons on the results to get only the rows that match the results of all three indexes and then only fetch those few thousand rows from the 1.1 billion row table. Yup, that's built in and Oracle does it for you for behind the scenes.
Now yeah, you can build a faster single-purpose db. But you better have a god damn'd lot of dev hours allocated to the task. My bet is that you'll probably come our way ahead in cash & time to market with Oracle, a good data architect and a good DBA. Any time you want to put your money on the line, you let me know.
Parent
Please reduce lameness (Score:5, Insightful)
Can't we get used to the fact that specialized & new solutions don't magically kill existing popular solution to a problem?
And it's not a recent phenomenon, either, I bet it goes back to when the first proto-journalistic phenomenons formed in early uhman societies, and haunts us to this very day...
"Letters! Spoken speech dead?"
"Bicycles! Walking on foot dead?"
"Trains! Bicycles dead?"
"Cars! Trains dead?"
"Aeroplanes! Trains maybe dead again this time?"
"Computers! Brains dead?"
"Monitors! Printing dead yet?"
"Databases! File systems dead?"
"Specialized databases! Generic databases dead?"
In a nutshell. Don't forget that a database is a very specialized form of a storage system, you can think of it as a very special sort of file system. It didn't kill file systems (as noted above), so specialized systems will thrive just as well without killing anything.
Death to Trees! (Score:3, Interesting)
Very specialized? Please explain. Anyhow, I *wish* file systems were dead. They have grown into messy trees that are unfixable because trees can only handle about 3 or 4 factors and then you either have to duplicate information (repeat factors), or play messy games, or both. They were okay in 1984 when you only had a few hundred files. But they
Re: (Score:3, Insightful)
You know, I've seen my share of RDBMS designs to know the "messiness" is not the fault of the file systems (or databases in that regard).
Sets have more issues than you describe, and you know very well Vista had lots of set based features that were later downscaled, hidde
Isn't it just stating the obvious? (Score:5, Funny)
I've made some similar discoveries myself!
Who woulda thought that specific-use items might improve the outcome of specific situations?
Creative Commons License (Score:3, Interesting)
Re:Perl & CSV (Score:5, Funny)
It failed the "relational" part of the test. But it failed very quickly.
Parent
Re:Perl & CSV (Score:5, Funny)
Yep. On the plus side, the Perl hacker who put it together only wasted the time it took to write one line. Granted, the line was 103,954 characters long. He considered breaking it up into two lines to improve readability but ultimately rejected the notion -- anyone not capable of reading the program clearly had no business messing with it anyhow. (Quick question aside from the snark: since Perl has associative arrays can't it emulate a relational database? It was my understanding that after you've got associative arrays you can get to any other conceivable data structure... assuming you're willing to take the performance hit.)
Parent
Re:Perl & CSV (Score:4, Interesting)
Once you have lambda you can get to any conceivable data structure. The question is, do you really want to?
sub Y (&) { my $le=shift; return &{sub {&{sub {my $f=shift; &$f($f)}}(sub {my $f=shift; &$le(sub {&{&$f($f)}(@_)})});}}}
Parent
Re:Perl & CSV (Score:5, Interesting)
sub Y (&) {
my $le=shift;
return &{
sub { ## SUB_A
&{
sub { ## SUB_B
my $f=shift;
&$f($f)
}
} ##Close SUB_A's block
(sub { ## SUB_C
my $f=shift;
&$le(sub { ##SUB_D
&{
&$f($f)
}
(@_)
}## END SUB_D
)} ##END SUB_C
); ##End the block enclosing SUB_C
} ## END SUB_A
} ## Close the return line
} ##Close sub Y
Y can have any number of parameters you want (this is sort of a "welcome to Perl, n00b, hope you enjoy your stay" bit of pain). The first line of the program assigns le to the first parameter and pops that one off the list. That & used in the next line passes the rest of the list to the function he's about to declare. So we're going to be returning the output of that function evaluated on the remaining argument list. Clear so far?
OK, moving on to SUB_A. We again use the & to pass the list of arguments through to
OK, unwrapping the arguments. There is only one argument -- a block of code encompassing SUB_C. (Wasted 15 minutes figuring that out. Thats what I get for doing this in Notepad instead of an IDE that would auto-indent for me. Friends don't let friends read Perl code.)
By now, bits and pieces of this are starting to look almost easy, if no closer to actual readable computer code. We reuse the function we popped from the list of arguments earlier, and we use the same trick to get a second function off of the argument list. We then apply that function to itself, assume the result is a function, and then run that function on the rest of the argument list. Then we pop that up the call stack and we're, blissfully, done.
So, now that we understand WTF this code is doing, how do we know its the Y combinator? Well, we've essentially got a bunch of arguments (f, x, whatever). We ended up doing LAMBDA(f,(LAMBDA(x,f (x x)),(LAMBDA(x,f (x x)))) . Which, since I took a compiler class once and have the nightmares to prove it, is the Y combinator.
Now you want to know the REALLY warped thing about this? I program Perl for a living (under protest!), I knew the answer going in (Googled the code), and I have an expensive theoretical CS education which includes all of the concepts trotted out here... and the Perl syntax STILL made me bloody swim through WTF was going on.
I. Hate. Perl.
And the reason I hate Perl, more than the fact that the language makes it *possible* to have monstrosities like that one-liner, is that the community which surrounds the language actively encourages them.
Parent
Re: (Score:3, Insightful)
This is from someone who's spent the last seven years with Perl and in the community. YMMV
Re: (Score:3, Informative)
Not all of us encourage this.
Its considered *clever* and a mark of great skill that you can strip out all the code that actually explains WTF your code is doing and be left with the perfectly compressed version.
They call this Perl Golf (shaving strokes of your game. Get it?)
Many of us do not consider it cle
Write-only languages (Score:5, Insightful)
As any English teacher will tell you, any language that will support great poetry and prose will also make it possible to write the most gawdawful cr*p. Perl bestows great powers, but the perl user must temper his cleverness with wisdom if he is to truly master his craft.
However in this specific case Google reveals that
was simply "borrowed" from y-combinator.pl [synthcode.com]. This is an instance of Perl being used in a self-referential manner to add a new capability (the Y combinator allows recursion of anonymous subroutines (why anyone would bother to do such an arcane thing comes back to the English teacher's remarks)). Self-referential statements are always difficult to understand because, well, they just are that way (including this one).Parent