




"Slacker DBs" vs. Old-Guard DBs 267
snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."
Re:Normalization doesn't exist to save disk space (Score:3, Informative)
Oracle doesn't have a "string" datatype.
Re:Normalization doesn't exist to save disk space (Score:3, Informative)
Ah, my apologies. Really, it should be an indexed enum (or whatever Oracle equivalent there is... it's been a while since I used it) if there's no additional data to go along with the status code... or another table if there is additional data.
distributed databases and P2P (Score:5, Informative)
The problem of distributed consistency has kept researchers occupied for quite a while. For example, see project Scalaris [onscale.de]. They are using a distributed hash table to distribute data among many nodes. This should be relatively easy, at least once you have a good hashing function on your hands. But a lot of research has been done on P2P networks during the last decade, so there is quite a lot of stuff to read and take ideas from.
The interesting part is that it can maintain consistency and support ACID properties. From the site it appears that they accomplish that by using a modified Paxos Algorithm [wikipedia.org] which basically is a way to maintain consensus among many different peers in a non-Byzantine system (this means that there are no malevolent peers in the system -- peers can break down and cease working but not sabotage the system). Leslie Lamport [lamport.org] of Microsoft Research has done a lot of work on this, anyone interested may take a look at his papers, very advanced stuff there.
I've never understood the UNIX world's fascination (Score:5, Informative)
I've never understood the UNIX world's fascination with relational databases.
Speaking as a programmer in mainframe online transaction environments for the past 20+ years, I've become very familiar with very fast and simple database systems like the "freespace" files we use on the Unisys mainframe platform.
We don't need relations for real-time processing. Most programs just need a place to keep data, and a simple key to retrieve that data. Some efficiency in disk usage is nice, but the primary design factor is performance.
A freespace file is a collection of pre-allocated fixed-length records of various sizes (e.g. 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, and 8192 bytes). Each record size is a assigned a type number (e.g., 1 through 6 in the above case), and a given file is created and pre-allocated with a mix of various records depending on the usage pater for that particular file. If you know all you need is tiny records, create a file containing a few hundred or thousand type 1 and maybe 2 records.
Records not allocated are filled with a deallocated fill pattern.
A program uses a record by performing a Write New operation. That tells the database manager to find a record in that file closest and >= to the size required, stick the presented buffer in the record, save it, and return a key to that record to the calling program. Typical key format is where Record Number is a number from 1 ... n. If your file has 1000 Type 3 records, it'd be from 1...1000 or 0...999.
To read a record, use a key from a previous Write New (stored away somewhere), perhaps in another file) to read that record from a file. Length is not required.
Programs use a very simple read-and-lock mechanism when modifying existing records. If one program has a record locked, another program must wait. Not a problem with intelligent coding.
We've used this system in airline systems for 40+ years. It works well. Sometimes an environment has robust commit and rollback/recovery features to allow for an entire series of changes to be rolled back on error, sometimes not. It doesn't seem to matter that much, especially for transient data like weather, flight schedule data, etc.
I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)
Re:Laziness Rules (Score:5, Informative)
Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...
I feel old (Score:2, Informative)
When I saw the title I thought "I'm old-guard". Then I read the article and JOINs are a key concept to the old-guard.
My first few DB apps involved using a b-tree or ISAM library (or writing our own). Then the "new guys" started wanting to pay for a server that did JOINs. We did JOINs, just at the app layer and without the guaranteed consitency that a good relational design gives you. And getting a server that does it was expensive.
I wouldn't want to go back to pre-relational server days, but am also very thankful that I did write my own DBs from the ground up. I will probably never need to use the entire experience, but can often use bits and pieces of it, and I appreciate a good key/value store.
Re:Laziness Rules (Score:3, Informative)
first some context. i architect data warehouses for a living. i also live in a world of building fairly specialized frameworks to deal with data warehouses architected as star and snowflake schemas. i tend spend quite a lot of time in pseudo-relational databases [wikipedia.org] that don't fully implement codd's rules [wikipedia.org].
for fun, i like to spend some time toying with couchdb, using it for loose data warehousing, extending it, and generally enjoying the application development freedom it gives me.
that said, let me respond to some of your points:
map/reduce solves a specific problem in data warehousing - column based lookups given specific rules, able to be broken down into atomics and performed in massive parallel. this allows for very cheap horizontal scaling over a large dataset.
this just shows ignorance. even just a cursory scan of damien's resume [209.85.173.132] says otherwise.
Re:Normalization doesn't exist to save disk space (Score:5, Informative)
People that haven't done it don't realize how easy it is to end up in that situation. Say, I write reports about people, and Robin writes reports about assets, whose owners are people, and puts a person's name in her table to make it faster. Someone gets married, their name changes, and now Robin's reports are wrong.
Re:Laziness Rules (Score:2, Informative)
He started work on CouchDB in 2005. Prior to that he was a Notes grunt of little significance.
He started at MySQL in 2007.
The point holds.
Re:Berkeley DB is awesome (Score:4, Informative)
For others who are interested in Berkeley-style key-value stores, check out Tokyo Cabinet [sourceforge.net].
Re:Normalization doesn't exist to save disk space (Score:2, Informative)
Re:Laziness Rules (Score:1, Informative)
I presume he said that because SQLite doesn't actually keep track of a column's data type. So there's nothing in the database that explicitly keeps you from writing addresses and blog posts in a column titled "Date of Birth" (which in another DB would explicitly be a date type). At least, that's the only explanation I can think of.
Re:Normalization doesn't exist to save disk space (Score:2, Informative)