




"Slacker DBs" vs. Old-Guard DBs 267
snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."
Re:Laziness Rules (Score:3, Interesting)
... and rather than learning it's just as easy to just wave it all off as obsolete.
I don't know about that. But maybe these slacker DBs are perfect for what they're doing? Glancing at the those mentioned in the FA, it just looks like their simple tools to do simple things.
Don't get me wrong. I once had the pleasure of working with an Oracle god. This dude was about to take his final Oracle exam in a series of exams and he turned my Join that took ten seconds into a Join that took less than a thousandth. I have no idea what he did to this day, but it took several lines of PL/SQL. We were dealing with tens of millions of rows that had to be processed every night.
My point is if it's something simple to do, why all the RDBM overhead? Many times, just a simple flatfile is all you need and maybe a little more.
Re:Normalization doesn't exist to save disk space (Score:3, Interesting)
Re:moral of the story (Score:1, Interesting)
I think the question is "to what end will this change benefit anyone".
I view these new "ideas" on data storage and retrieval as a dumbing down of DBs the way higher level languages have dumbed down programming. On the one hand, it's much, much nicer to be able to whip together a working PHP app in a day than it is to have to constantly comb C code to make sure every little exception has been handled and every little bit of data checked. On the other hand, I don't feel quite the same way as I'm constantly zapping little bugs introduced by laziness in type checking or data validation.
It's all well and good that this allows people with less knowledge of the field to put together "good enough" applications, but sometimes I wonder if we're really that much more productive, or if we just shifted all our workload from building stable apps to constantly maintaining buggy ones.
You young whippersnappers don't know nothing! (Score:5, Interesting)
Relational DB? People forget Network Model Databases (http://en.wikipedia.org/wiki/Network_model) and flat databases.
Network model databases will outperform relational all the time. You just don't have the same flexibility.
Newer models are not based on the design or performance issue, but the distribution of the data. These are not invalid reasons, but the old issues still apply.
I have had arguments with people who consider PC programming different from mainframe. The same rules apply. The difference is that many PC programmers are just sloppier. When you have cheap CPU and memory, people don't analyze and optimize as much.
Re:Laziness Rules (Score:4, Interesting)
In the end, the problem is that people just want a "default tool". They don't want to think about their requirements for data consistency. The really scary bit is that while RDBMses are the "default tool" of yesterday and slacker DBs are the "default tool" of tomorrow, neither of them are really the "problem".
The "default tool" attitude IS the problem. Unless you carefully weigh your data consistency requirements, you shouldn't be making that call at all.
I welcome the slackers and all of their new options along the spectrum of speed versus consistency. It's just that most of the people developing applications scare the shit out of me. They're so cavalier (or should I say, "agile", or maybe "pragmatic") about requirements that it's truly disturbing.
That said, if you're really interested in all of the options, I also recommend checking out memcachedb, memcacheq, and redis.
Re:Laziness Rules (Score:3, Interesting)
I'm just going on the statements he made about his own (lack of) knowledge in this video [infoq.com].
Berkeley DB is awesome (Score:5, Interesting)
I can attest that Berkeley DB does exactly that, and does it really, really well. We use Berkeley DB for all of the data storage in the Citadel [citadel.org] system, including the mailboxes themselves. Some sites have tens of gigabytes or even hundreds of gigabytes of data, and Berkeley DB just keeps chugging along, happily and reliably doing its thing. Our biggest problem? People who point at it and say "storing email in a database is unreliable" because they know it constantly explodes when Exchange does it. Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).
Eschewing the full set of RDBMS features isn't slacking. It's choosing the right tool for the job.
All toys (Score:2, Interesting)
From TFA: "The problem is that JOINs are really, really slow when the data is spread out over several machines."
This is the result of a poor design, not a database flaw. If you are running a web application against multiple databases, either cluster them or store all the data for a user in one database. (i.e. hash the login_id and select the database based on the result). If someone is doing JOINs across multiple machines and doesn't have a very good reason for doing so, then nothing short of a lobotomy is going to help them.
From TFA: "Each query can only run 5 seconds. The answer can only hold 250 items. Each item can have only 250 pairs."
Yeah, I'd say that meets the definition of a toy database alright.
From TFA: "Many of the complaints about the other toy databases revolve around how a missing feature makes it impossible to find the right data. If you want to add a bit more functionality to the database here, you can whip up many of the features locally in Python. If you want a JOIN, you can synthesize one in Python and probably customize the memory cache at the same time. This is especially useful for Web applications that let users store their data in the service. If you need to add security to restrict each user to the right data, you can code that in Python too."
The writer must be joking. Who would do this when there are better options that don't involve implementing your own database?
From TFA: "there's no big reason to use Ruby, Python, Java, or PHP on the server when it can all be packaged in JavaScript"
Many people who write web applications actually want to do usefull things with the data they store like generate reports, keep logs, track inventory, or run queries. This doesn't work very well when the "database" is a text file sitting on the user's harddrive.
"Schema-less" storage with MySQL (Score:2, Interesting)
How FriendFeed uses MySQL to store schema-less data [appspot.com]
Given their needs in terms of adding features, altering the schema, and building indexes, being able to make the indexes "eventually consistent" was huge. You have to remember that to keep things nice and denormalized, you need lots of tables, joins, and that MySQL (or any other FOSS RDMS) CANNOT build indexes across tables.
Music from your teenage years gets extra cred (Score:3, Interesting)
It turns out that there actually _are_ neurological reasons that music from your teenage years is extra-evocative, just as language-learning works better with young kids. Go read "This is Your Brain on Music" for more details.
A certain amount of music sensitivity appears to be hardwired into our brains, and the extra hormones after puberty increase music-remembering ability and the emotional aspects of it that younger kids don't have as much of. There's also a lot of intellectual development going on in those years, and it's easier to pick up more complex ideas from the music than you could when you were younger.
As you get older, that still happens a bit, and you'll still run into music that's new and cool which you'll enjoy years later, but now it's competing with lots of other cool music that's in your head which your teenage-years music wasn't.
What's much more annoying is when you find yourself tuning by a different radio station and wondering "What is all this noise those kids are listening to? They should turn that crap down and listen to good stuff" just like your parents said when you were a kid. Some of that's because 90% of everything is crap, and it's not the crap that you find evocative because it was around when you were a kid, and some of it's because 90% of everything on the radio is highly-packaged commercial crap, making it 99% crap instead of only 90%. And some of that's because kids always want to listen to new stuff and piss off their parents, and musicians always like to do new stuff, and if you want to bust into the Top 40 you've either got to do identical commercial crap better than anybody who's already there or else do something new. Rap was creative and interesting, but the whole gangstas-dissing-women motifs that dominated it were offensive. Hip-hop took that music and started doing lots of interesting things with it, though I haven't followed it. I'm finding my self playing a lot of old-timey (average hair color in our jam session == gray, leaning toward white :-), and starting to listen to jazz more (lots of deep classical stuff in there, which I haven't had the patience to listen to for a while.)
Re:who needs transactions? (Score:3, Interesting)
Oh, come on. MySQL suffers from the same thing that PHP does; that it's industry standard and easy to use.
The bigger thing that they both suffer from is having a rather poor history. The problem with people saying how bad they are is that the complaints are based on old versions. PHP5 is much better than PHP4 or PHP3, and MySQL is steadily becoming something resembling a real database (5.0 is good, in particular if you use InnoDB, 4.1 was decent, but anything below 4.1 barely qualifies as a database).
They're a niche, not a full replacement (Score:2, Interesting)
I get tired of hearing the same old discussion about whether or not the relational database is going to die. They're not. But the new breed of *specialized* databases work well for their *specialized* purposes. Big surprise. But all of them inevitably make a trade-off. Anyone who works seriously with database design knows that it's all about trade-offs.
One of the main motivations for the new breed of databases is that the standard SQL database relies on things such as foreign keys and other constraints for data consistency, but that requires the data to be directly managed by that running DBMS process. When you require data to be distributed over a network (i.e. over many separate processes), then the only way a *foreign key* can work is if the DBMS process has some sort of link over the network to the separate DBMS process and then use that somewhat as if it were local. (Other strategies involve using external application code for consistency rather than foreign keys, etc.) Of course, the DBMS process can't use it's usual local low-level optimizations behind-the-scenes in order to handle that query efficiently over the network, so it doesn't scale. Specialized DBMS's for distributed data focus on optimizing being distributed, while the typical SQL DBMS optimizes storage and retrieval of data as if it were local. The bottom line is that the traditional SQL database scales well vertically, but not horizontally concerning hardware. Or rather, when you scale horizontally, you forgo a lot of its advantages. The new breed of databases trade-off consistency and other assurances for the sake of "good enough" consistency and really fast retrieval of domain-specific data.
But not everyone is trying to be Google or Amazon. Financial institutions such as banks can't tolerate "good enough" consistency. The biggest problem with relational databases I see nowadays is that people are ignorant about why "relational" is such a good idea, and how SQL only gets you part of the way to "relational" and that SQL's shortcomings are a different issue. The second biggest problem is that most people are used to only one or two data usage patterns, and if it "works for them", then they assume it should *always* be done that way. For example, the hordes of people who barely know Excel (i.e. not a relational database) or Access, and then like to give "expert" advice. Or a web programmer that believes that ORM's are the One True Way because they abstract away choices of DBMS in order to keep favorite language X, despite the needs of other people are the opposite: perhaps we want to abstract away the choice of programming language so that we can keep the same database, and so maybe it's a good idea if the database itself can ensure data consistency rather than relying on the ORM, etc.