Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Databases Software Programming IT Technology

"Slacker DBs" vs. Old-Guard DBs 267

snydeq writes "Non-relational upstarts — tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model' — have grabbed attention in large part because they willfully ignore many of the rules that codify the hard lessons learned by the old database masters. Doing away with JOINs and introducing phrases like 'eventual consistency,' these 'slacker DBs' offer greater simplicity and improved means of storing data for Web apps, yet remain toys in the eyes of old guard DB admins. 'This distinction between immediate and eventual consistency is deeply philosophical and depends on how important the data happens to be,' writes InfoWorld's Peter Wayner, who let down his old-guard leanings and tested slacker DBs — Amazon SimpleDB, Apache CouchDB, Google App Engine, and Persevere — to see how they are affecting the evolution of modern IT."
This discussion has been archived. No new comments can be posted.

"Slacker DBs" vs. Old-Guard DBs

Comments Filter:
  • by FlashBuster3000 ( 319616 ) on Tuesday March 24, 2009 @01:34PM (#27315597) Homepage

    FTA: "The world won't end if some snarky, anonymous comment on Slashdot disappears."
    What? Nothing more important than anonymous slashdot trolls to moderate :/

  • by TheSpoom ( 715771 ) * <slashdot AT uberm00 DOT net> on Tuesday March 24, 2009 @01:34PM (#27315603) Homepage Journal

    Is it just me or did this article go out of its way to insult people who use "traditional" RDBMSs?

    I mean, I'm well versed in SQL and data consistency et al, but I'm still more than willing to consider new technologies. What the hell?

    • I read it exactly the other way, that they were slagging on the newcomers in favor of us old fogies (PostgreSQL FTW!).

    • It did. Are you one of those fossilised old farts who insists on using a remote control as a remote control? [slashdot.org]
    • Wayner's usually a good writer, and did some good theoretical-computer-science work back in the day, but this article was too short to answer the questions he asks at the beginning, and he mostly highlighted the new shiny things from big ASPs, which is generally what Infoworld wants.

      I'm particularly disappointed that while he referred to the name and history of Berkeley DB, aka Sleepycat, aka Oracle Renamed-foo, he didn't actually talk about using it. (OTOH, Infoworld did review one version of it in 2005 [infoworld.com].)

      • Re: (Score:3, Insightful)

        by petermgreen ( 876956 )

        the thing that always puzzled me about berkerlydb is it's incessent format breakage requiring dumps and restores.

        On a database server at least data upgrading can be handled centrally but on a file based DB where datafiles can be scattered anywhere a lack of a stable data format seems like a fatal flaw.

    • Cloud Zealots(google employees, amazon EC2 developers, Wall St. analysts that are trying to attach their name to the "Next Big Thing", etc.) tend to have an insulting tone to their rhetoric.

    • by Sarusa ( 104047 )

      It went out of its way to be irreverent to everyone: 'The new twerps really get those codgers steamed when they talk about how all of the computers in the cluster will get around to replicating the data' is a playful slap at both sides. And just like the PS3 Fanboys/XBots, if you identify too much with one of the sides you will nod your head knowingly at one and gasp and fan yourself as you suffer an attack of the vapours of offense at the other.

  • by qoncept ( 599709 ) on Tuesday March 24, 2009 @01:35PM (#27315613) Homepage

    Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...

    You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.

    And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.

    • Couldn't you index a char just as easily as you could an int? Or are you saying their status codes are strings?

      • Re: (Score:3, Informative)

        by qoncept ( 599709 )
        "CHAR(50)"

        Oracle doesn't have a "string" datatype.
        • Re: (Score:3, Informative)

          by TheSpoom ( 715771 ) *

          Ah, my apologies. Really, it should be an indexed enum (or whatever Oracle equivalent there is... it's been a while since I used it) if there's no additional data to go along with the status code... or another table if there is additional data.

          • Re: (Score:3, Interesting)

            by qoncept ( 599709 )
            My point exactly. :) There are a lot of things are data warehouse should be and it's not. We're working on redesigning it now though so we should be resolving a lot of the issues. But most people aren't just about to redesign their databases because it's a huge deal. We have 8 different apps using the warehouse, hundreds of reports and people hitting it we don't even know about that will all be obsolete. The cost to redesign is huge, and we only have the opportunity now because a project it is dependent on
    • You don't want to store the same data in multiple places.

      But if one of them is wrong, you can check the others and correct it.

      My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".

      • by mooingyak ( 720677 ) on Tuesday March 24, 2009 @02:12PM (#27316165)

        But if one of them is wrong, you can check the others and correct it.

        My boss - a lead senior senior lead developer from Android Whorehouse & Douche - several years back, when I tried explaining "why I'd missed some fields out of one of the tables".

        I was about to post something explaining to you why that's bad, and then I reread your post and the whooshing noise around me quieted down.

        • To be fair, I should have quoted the first bit. Maybe that's why I never made it to "lead senior senior lead developer".
      • by qoncept ( 599709 ) on Tuesday March 24, 2009 @02:25PM (#27316379) Homepage
        Right, and, boss, which one is right?

        People that haven't done it don't realize how easy it is to end up in that situation. Say, I write reports about people, and Robin writes reports about assets, whose owners are people, and puts a person's name in her table to make it faster. Someone gets married, their name changes, and now Robin's reports are wrong.
        • Re: (Score:3, Funny)

          by Hognoxious ( 631665 )

          Total failure to understand the situation. You, I mean he, didn't understand the concept of "factoring out" common information - say, the customer details on an order - from the variable per item data - product code, quantity.

          What he, er, you appear to be talking about is natural vs surrogate keys.

    • And, uh, I have the pleasure of working now with a huge data warehouse that hasn't normalized status codes, so instead of quickly searching for an integer, the queries run slow as hell scanning char fields. It's not good.

      What the hell does that have to do with schema normalization? Normalization has to do with how you architect your tables and relations. The types you use for the columns, and how you standardize their values, is an entirely different, though somewhat related, discussion.

      And as it happens,

      • He probably means standardised rather than normalised, but I'm guessing it's pretty hard doing joins where the common field has different values meaning the same thing, or same values meaning different things.
        • If that's true, then his comments have nothing to do with the quoted section about how, "Now that disk space is so cheap and many of the data models don't benefit as much from normalization, ...". Because that comment is *specifically* about schema normalization in the formal sense.

    • You don't want to store the same data in multiple places. Your query might run faster, but your data integrity is going to suck.

      I wonder if normalization is going to be less important as de-duplication [sun.com] gets integrated into more file systems.

      It's going to be a part of ZFS later this year I think. It looks like it's going to be a block level implementation within ZFS rather than file level. Since ZFS uses a copy on write model it seems fairly easy for them to implement compared to other file systems.

      Databases that work on top of existing filesystems could benefit from this whereas databases that use block level addressing (Oracle) may

      • Then all you have to do is make sure that all your duplicated data is exactly 1 disk block in length. That's got to be easier than just not storing it twice in the first place, or adding more disks.
    • A short char field isn't necessarily slower than an integer, though. Right? They could both be indexed with log(n) search time.

  • by Anonymous Coward on Tuesday March 24, 2009 @01:35PM (#27315615)

    Like the article says, "The world won't end if some snarky, anonymous comment on Slashdot disappears."

  • Laziness Rules (Score:5, Insightful)

    by ergo98 ( 9391 ) on Tuesday March 24, 2009 @01:37PM (#27315651) Homepage Journal

    Slacker DBs like CouchDB and SimpleDB, have taken off for the simple reason that most developers have absolutely mediocre database knowledge or skills, and rather than learning it's just as easy to just wave it all off as obsolete.

    It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.

    • Re: (Score:3, Interesting)

      ... and rather than learning it's just as easy to just wave it all off as obsolete.

      I don't know about that. But maybe these slacker DBs are perfect for what they're doing? Glancing at the those mentioned in the FA, it just looks like their simple tools to do simple things.

      Don't get me wrong. I once had the pleasure of working with an Oracle god. This dude was about to take his final Oracle exam in a series of exams and he turned my Join that took ten seconds into a Join that took less than a thousandth. I have no idea what he did to this day, but it took several lines of PL/SQL. We were

      • Re:Laziness Rules (Score:4, Interesting)

        by KagatoLNX ( 141673 ) <.kagato. .at. .souja.net.> on Tuesday March 24, 2009 @02:01PM (#27315973) Homepage

        In the end, the problem is that people just want a "default tool". They don't want to think about their requirements for data consistency. The really scary bit is that while RDBMses are the "default tool" of yesterday and slacker DBs are the "default tool" of tomorrow, neither of them are really the "problem".

        The "default tool" attitude IS the problem. Unless you carefully weigh your data consistency requirements, you shouldn't be making that call at all.

        I welcome the slackers and all of their new options along the spectrum of speed versus consistency. It's just that most of the people developing applications scare the shit out of me. They're so cavalier (or should I say, "agile", or maybe "pragmatic") about requirements that it's truly disturbing.

        That said, if you're really interested in all of the options, I also recommend checking out memcachedb, memcacheq, and redis.

      • Re: (Score:3, Insightful)

        by phoenix321 ( 734987 ) *

        Problem is, you're re-inventing the wheel several times over in the process. Hint: "a flatfile and maybe a little more" could very well be all the storage technology invented today only a few years down the road.

        At first, all you need is to store key:value pairs. That works with a flat file or with Oracle. Then you need some consistency checks, which are can be modelled fast in Oracle or reasonably fast in your software. Then you need some triggers, which could be written fast in Oracle and not-so fast in y

        • Re: (Score:3, Insightful)

          by sl0ppy ( 454532 )

          Everything else will require at least a medium rewrite at some point when you switch over to a real database. You could of course extend everything upon a glorified flatfile until your reinvented wheels strangles all your progress.

          not really. i think that you (and, unfortunately, the FA) are missing the point that the map and reduce functionality, while powerful, have one major advantage: scalability. simply put, a query can be, by definition of the map function, broken up into several discrete operation

      • Re:Laziness Rules (Score:5, Insightful)

        by Ambiguous Puzuma ( 1134017 ) on Tuesday March 24, 2009 @02:12PM (#27316161)

        If you want "a little more" than a simple flat file, perhaps SQLite [sqlite.org] is the answer? The people on the Firefox team seem to think so, for example.

        SQLite has been a pleasure to use for a small personal project involving a few Perl scripts. Granted my background is with SQL Server and Oracle, so perhaps I'm not the target audience, but I found it extremely easy to use and surprisingly efficient--and I didn't need to set up a server or anything. I didn't even need to explicitly create a database!

        • Re: (Score:2, Insightful)

          by Trifthen ( 40989 )

          That's what I don't quite understand about all this. It's been the case for a while now that:

          1. If you want a full RDBMS, use Oracle, or PostgreSQL, or a similar ACID + SQL92 compliant DB.
          2. If you don't really care, use MySQL.
          3. If you want ridiculous speed, and actively hate your data, use SQLite.
          4. If you have one file, or maybe two, use BerkeleyDB or similar.
          5. Flat files are fine for config.

          I'm not sure we need yet another category here. Then again, we're now seeing things surfacing like database shard [codefutures.com]

          • Re: (Score:3, Insightful)

            by tepples ( 727027 )

            If you want ridiculous speed, and actively hate your data, use SQLite.

            Care to explain why SQLite requires one to "actively hate [one's] data"?

      • For the most part the Overhead of running a Real DB is usually made up over time.

        Small Apps tend to grow to big ones over time. Having Babbie Databases can become a stubmling block to your application. As well as for the organization. They may want to wharehouse your application data for making better business decisions and integration across apps. So you little 1 million record database will need to be integrated into a billion/trillion record database.

      • by samkass ( 174571 )

        The fundamental problem of enforced consistency with our system is that it requires hard locking, and that is at odds with distributed scalability. When you're going over a few satellite hops and you've got users real-time deep collaborating (ie. they're all heavy writers) with >2s latency, a traditional DB isn't going to scale very well. Even the Facebook MySQL+memcached is going to break down in that environment.

    • Re:Laziness Rules (Score:5, Informative)

      by metalhed77 ( 250273 ) <andrewvc@gmail . c om> on Tuesday March 24, 2009 @02:02PM (#27316003) Homepage

      Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...

      • Re: (Score:3, Interesting)

        by ergo98 ( 9391 )

        I'm just going on the statements he made about his own (lack of) knowledge in this video [infoq.com].

      • by Anonymous Coward on Tuesday March 24, 2009 @02:19PM (#27316275)

        Thanks for validating the OP comments....

      • Re:Laziness Rules (Score:4, Insightful)

        by diamondsw ( 685967 ) on Tuesday March 24, 2009 @02:25PM (#27316365)

        >Damien Katz, CouchDB's creator ... worked on Lotus Notes prior to that...

        That's not exactly a ringing endorsement.

        • After all, who wouldn't mind an application that says you need to restart your computer because the application crashed? A text pushing application at that....

      • Re: (Score:2, Informative)

        by Anonymous Coward

        Damien Katz, CouchDB's creator, worked at MySQL prior to writing CouchDB, and worked on Lotus Notes prior to that...

        He started work on CouchDB in 2005. Prior to that he was a Notes grunt of little significance.

        He started at MySQL in 2007.

        The point holds.

    • ...especially when you don't know what "better" is and you're too lazy to learn: unwillingness to learn is stupidity. Like the quote says, ignorance is curable; stupidity is terminal.

      People who use these things and think they're great and that they're doing amazing things don't realize that the time they're taking and the problems they're struggling with are long-solved trivialities. Nothing new. Nothing cool. It's like someone struggling with a bunch of complicated excel formulas to make their spreadsh

    • Re: (Score:3, Informative)

      by sl0ppy ( 454532 )

      first some context. i architect data warehouses for a living. i also live in a world of building fairly specialized frameworks to deal with data warehouses architected as star and snowflake schemas. i tend spend quite a lot of time in pseudo-relational databases [wikipedia.org] that don't fully implement codd's rules [wikipedia.org].

      for fun, i like to spend some time toying with couchdb, using it for loose data warehousing, extending it, and generally enjoying the application development freedom it gives me.

      that said, let me respond to

    • It's no surprise that the creator of CouchDB, for instance, hadn't a clue about databases when he began his project. All of that built up knowledge just ignored while someone invented their own, and it's as rational as rolling your own encryption from scratch without the slightest clue about encryption algorithms or theories.

      It's funny how some people react by attributing ignorance to others when confronted with things they themselves don't understand.

    • Well, even something that's based off of at-the-time sound principles can end up being a mess.

      Take, for instance, a product falled FileMaker. It's a product with a long software lineage - it's origins were FoxPro, way back when. I don't know how it performed back then, or how it was designed, but now it's got a massive WYSIWYG themableing 'frontend' to make a custom application, and the database is not directly accessible by the designer (just logical containers). It probably can be normalized, to some degr

  • by oldhack ( 1037484 ) on Tuesday March 24, 2009 @01:37PM (#27315657)

    Either is cool with me, as long they are cool and takes care of business, you know what I am saying?

    It's all good.

  • a base of data (Score:5, Insightful)

    by poot_rootbeer ( 188613 ) on Tuesday March 24, 2009 @01:44PM (#27315743)

    "tools that tack the letters 'db' onto a 'pile of code that breaks with the traditional relational model"'

    If "database" were intended to mean only "relational database", we wouldn't have had any need for the latter term...

    • There's more to it than that. If I make a wrapper for a text file that lets me find and delete rows that's not really a database. (It only becomes a database when I call it TextDB and package it with an AJAX API)
  • by alen ( 225700 ) on Tuesday March 24, 2009 @01:47PM (#27315781)

    the article is right that in some cases it doesn't matter if a transaction is lost. but in any case where money is involved it's a must. you can't just start a fund from your Oracle or SQL Server savings to pay for mistakes because it will kill your brand and you may lose a lot of future business. and any savings will be eaten up by the extra cost to hire people to solve all the data problems

    i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found

    • i've seen this. no constraints on the data that is orginally put in, not enough referential integrity and you get customers opening up a lot of trouble tickets and you end up hiring people to clean up the data every time a mistake is found

      Really not trying to troll here, but this isn't too far from what a lot of people are dealing with when they use MySQL, especially the MyISAM engine.

      A lot of people are using MySQL so it's just another step in the same direction.

      In some projects, RDBM's aren't necessary. Look at what Google's been able to do with Bigtable/MapReduce. The open source equivalent seems to be Apache's HBase [apache.org] in the Hadoop project.

  • by thanasakis ( 225405 ) on Tuesday March 24, 2009 @01:52PM (#27315851)

    The problem of distributed consistency has kept researchers occupied for quite a while. For example, see project Scalaris [onscale.de]. They are using a distributed hash table to distribute data among many nodes. This should be relatively easy, at least once you have a good hashing function on your hands. But a lot of research has been done on P2P networks during the last decade, so there is quite a lot of stuff to read and take ideas from.
    The interesting part is that it can maintain consistency and support ACID properties. From the site it appears that they accomplish that by using a modified Paxos Algorithm [wikipedia.org] which basically is a way to maintain consensus among many different peers in a non-Byzantine system (this means that there are no malevolent peers in the system -- peers can break down and cease working but not sabotage the system). Leslie Lamport [lamport.org] of Microsoft Research has done a lot of work on this, anyone interested may take a look at his papers, very advanced stuff there.

  • Seriously any Old Guard DBA will put MySQL in the toy category.
    • by dacut ( 243842 ) on Tuesday March 24, 2009 @02:18PM (#27316257)

      MySQL strives to provide RDBMS and ACID semantics, though its quality of service (QoS) may fall short. By contrast, these "slacker" databases don't even try to support RDBMS or ACID; even if they operated perfectly, they won't provide RDBMS/ACID.

      I work for one of the companies in question (no, I don't speak for them). We rely heavily on a combination of these "slacker" dbs, Berkeley dbs, memcached, Oracle, flat files, and tape backups. Each fills a niche. I wish these articles would quit trying to create a false dichotomy.

  • I wrote an article about non-relation databases, and there were some interesting comments about the various tradeoffs etc: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/ [metabrew.com]
  • by www.sorehands.com ( 142825 ) on Tuesday March 24, 2009 @02:00PM (#27315961) Homepage

    Relational DB? People forget Network Model Databases (http://en.wikipedia.org/wiki/Network_model) and flat databases.

    Network model databases will outperform relational all the time. You just don't have the same flexibility.

    Newer models are not based on the design or performance issue, but the distribution of the data. These are not invalid reasons, but the old issues still apply.

    I have had arguments with people who consider PC programming different from mainframe. The same rules apply. The difference is that many PC programmers are just sloppier. When you have cheap CPU and memory, people don't analyze and optimize as much.

    • by hey ( 83763 )

      That network model looks useful. Too bad there don't seem to be any reality available implementations to try out.

  • I've never understood the UNIX world's fascination with relational databases.

    Speaking as a programmer in mainframe online transaction environments for the past 20+ years, I've become very familiar with very fast and simple database systems like the "freespace" files we use on the Unisys mainframe platform.

    We don't need relations for real-time processing. Most programs just need a place to keep data, and a simple key to retrieve that data. Some efficiency in disk usage is nice, but the primary design factor is performance.

    A freespace file is a collection of pre-allocated fixed-length records of various sizes (e.g. 256 bytes, 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, and 8192 bytes). Each record size is a assigned a type number (e.g., 1 through 6 in the above case), and a given file is created and pre-allocated with a mix of various records depending on the usage pater for that particular file. If you know all you need is tiny records, create a file containing a few hundred or thousand type 1 and maybe 2 records.

    Records not allocated are filled with a deallocated fill pattern.

    A program uses a record by performing a Write New operation. That tells the database manager to find a record in that file closest and >= to the size required, stick the presented buffer in the record, save it, and return a key to that record to the calling program. Typical key format is where Record Number is a number from 1 ... n. If your file has 1000 Type 3 records, it'd be from 1...1000 or 0...999.

    To read a record, use a key from a previous Write New (stored away somewhere), perhaps in another file) to read that record from a file. Length is not required.

    Programs use a very simple read-and-lock mechanism when modifying existing records. If one program has a record locked, another program must wait. Not a problem with intelligent coding.

    We've used this system in airline systems for 40+ years. It works well. Sometimes an environment has robust commit and rollback/recovery features to allow for an entire series of changes to be rolled back on error, sometimes not. It doesn't seem to matter that much, especially for transient data like weather, flight schedule data, etc.

    I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)

    • Oops. Forgot that brackets get eaten. Typical record format is RECORDTYPE/FILENUMBER/RECORDNUMBER. The first Type 1 record for File 100 might look like 01-0127-0001 or whatever (specific binary representation in hex or octal would obviously vary depending on implementation and preference).

      In our case, it's a 36-bit word shown as 12 octal digits, probably not a popular choice with UNIX folks. :-)

      • 01-0127-0001 is the first type 1 record for file 127. 01-0100-0001 would be the first for file 100. That's what I get for doing patchwork re-editing an existing message before sending it...

      • Unix hackers are traditionally fine with octal, as long as you don't try to fit a whole digit in it, though I've generally found hex more useful. And as far as 36-bit words go, I know one local Unix hacker who has a PDP-10 in his garage. (Not sure if it's still there, and it might have been a -20 instead.) I don't think my wife's copy of "Meet Macro-10" survived our mid-90s move, and when I took a compiler course at that school, I decided to use the still-clumsy-at-the-time Amdahl mainframe Unix system at

    • How does it work for searching though? If I just have my "freespace" file and my pointers to records, does a search for some piece of user requested data have to hit every record or is there a hash somewhere for the data contained in the record? You don't mention it in your description.

      It seems that the biggest advantage to a relational DB is that the syntax for accessing it is well known, SQL. It has a human read-able interface and while sometimes whonky to work with for complex operations, it provides the simplest cross-platform way to access data. I don't need to know which data blocks hold the data, I just ask the database for them "SELECT slashdotid, name FROM users where slashdotid 20000"... and I get rows of data.

      Could I just read it from a file? Yes. Would it be simpler? Maybe. But what if I have 200001 records, then I have to do some magic sorting in my program, and I have to manage memory for them, and disk space, etc. It is simpler to let the DB handle that mess and I just ask for the data I need.

      It breaks up the process of programming into data storage and data manipulation/presentation. DB's for storage, my bad python for manipulation and presentation.

      --Donald

      • by dcowart ( 13321 )

        In Re: to my Re:, I like sqlite for simple DB applications, I get DB functionality with a very low overhead. Otherwise I use postgresql.

        I have used Oracle and some others before now, but those are my two current DB's (sql-engines?) of choice.

    • Re: (Score:3, Insightful)

      by LWATCDR ( 28044 )

      Okay how do you find the data without a record number? I can see the value of the system but it also seems very inflexable.
      I do agree that way to many programmer use MySQL for a file system, flat files, configs, and goodness knows what else.
       

    • I would LOVE to see a freespace database ported to Solaris, personally. We'd use it heavily. :-)

      Sounds like a great open source project so why not start working on that? If you want it badly and would use it heavily and yet you cannot be bothered to do the work of porting one, writing one, or paying someone else to do it then why bother complaining about it?

    • My guess is that part of the reason is historical - RDBMSs were coming out around the time Unix machines were, and both could be used by small departments as opposed to mainframe production shops.

      They're also an extension of the native Unix toolsets, which were flat files with tab-or-comma-separated columns of data, so anybody who learned Unix in its first couple of decades generally had the expectation that you could do ad-hoc queries and build tools to automate them, without needing to spend 6-12 months n

  • Harsh? (Score:3, Insightful)

    by Bobb Sledd ( 307434 ) on Tuesday March 24, 2009 @02:09PM (#27316115) Homepage

    I'm a DB admin, and I use things that aren't toys; but what I've heard here is kinda harsh.

    Look, it's all about "right tool for the right job." Why do you need a nuclear-powered drill that can make a tunnel from here to China, when really all you needed was a shovel?

    For most daily projects that have small amounts of data, they may be using something like Crystal Reports or Excel or SPSS that just does all the number-crunching client-side anyway. You don't always need Oracle or [favorite DB flavor] for that.

  • I feel old (Score:2, Informative)

    by a2wflc ( 705508 )

    When I saw the title I thought "I'm old-guard". Then I read the article and JOINs are a key concept to the old-guard.

    My first few DB apps involved using a b-tree or ISAM library (or writing our own). Then the "new guys" started wanting to pay for a server that did JOINs. We did JOINs, just at the app layer and without the guaranteed consitency that a good relational design gives you. And getting a server that does it was expensive.

    I wouldn't want to go back to pre-relational server days, but am also ver

    • by __aasqbs9791 ( 1402899 ) on Tuesday March 24, 2009 @02:25PM (#27316377)

      I was listening to the radio (didn't pay attention the the station it was on) one day and generally liking the music I was listening to on it. Then the station ID came across between songs. It was the "oldies" station. I suddenly felt like I needed a cane (or perhaps a walker). Why does that happen? And is it going to happen every 10 years or so? I don't think I can take too many more of those moments.

      • It turns out that there actually _are_ neurological reasons that music from your teenage years is extra-evocative, just as language-learning works better with young kids. Go read "This is Your Brain on Music" for more details.

        A certain amount of music sensitivity appears to be hardwired into our brains, and the extra hormones after puberty increase music-remembering ability and the emotional aspects of it that younger kids don't have as much of. There's also a lot of intellectual development going on in t

    • Not sure what platform you were using or what years (lots of things had b-trees, though ISAM tended to be on IBM machines), but Unix V7 had a join command, which worked on the canonical tab-delimited ascii flat files that most Unix tools did, and PDP-11s weren't that expensive.

      I last used it in the early 90s; I'd prototyped an application in Informix, but my department was too cheap to buy enough licensed copies for production use. You had to sort your data for the join to work, but that also meant you cou

  • by IGnatius T Foobar ( 4328 ) on Tuesday March 24, 2009 @02:18PM (#27316263) Homepage Journal
    I can't believe there hasn't been any mention of Berkeley DB [oracle.com] yet. Guess what, folks: sometimes you just don't need the features of a full relational database. Sometimes all you need is fast, robust, reliable storage of indexed key/value pairs.

    I can attest that Berkeley DB does exactly that, and does it really, really well. We use Berkeley DB for all of the data storage in the Citadel [citadel.org] system, including the mailboxes themselves. Some sites have tens of gigabytes or even hundreds of gigabytes of data, and Berkeley DB just keeps chugging along, happily and reliably doing its thing. Our biggest problem? People who point at it and say "storing email in a database is unreliable" because they know it constantly explodes when Exchange does it. Well guess what, folks: Berkeley DB ain't the Exchange database (actually, maybe Exchange wouldn't be so unreliable if they switched to Berkeley DB).

    Eschewing the full set of RDBMS features isn't slacking. It's choosing the right tool for the job.
  • Non-normalized databases are fine, and might be faster, for small sites, but when things scale, the sloppy databases (or worse, sloppy frameworks like Ruby's Active Record) just cause problems.

    A scalable, normalized database means consistent data, when you have multiple applications hitting it.

    For a web forum, sure, a relational database may be the wrong tool, because all you care about is speed on new stuff, the archive can crawl, etc.

    However, what happens when your web forum adds some actual data, and the

  • For the vast majority of web applications, the "key-value pair" class of databases work fine.

    I think the real problem is that the "relational database weenies" look down on the key-value pair databases, and there are a lot of non-DB-weenies out there who like using true relational databases as nothing more than key-value pair. It degenerates into name calling, instead of getting the job done, pretty fast.

    • The problem is that the programmers think they own the data when they don't. Your app will come and go and the data will remain. People will want to query that data, report on that data, or even transfer it into other databases. Database people think beyond the current requirements of your particular app.
  • by Prototerm ( 762512 ) on Tuesday March 24, 2009 @02:45PM (#27316687)

    You may have seen in the news recently how in the last decade or so Wall Street ignored some of the hard-won regulations and guidelines developed in the wake of the Great Depression.

    We all know what happened as a result.

    The same is true when dealing with data. You don't ignore the rules completely, or follow them only when you feel like it, or when you have time. As the old joke goes, Quality is *not* Job 1.1.

    If the data isn't important enough to store correctly, then it's not important enough to be stored at all.

  • by Thaelon ( 250687 ) on Tuesday March 24, 2009 @02:45PM (#27316699)

    Databases at a very abstract level are just data structures. Choosing a relational database when you don't need that much functionality is just as wrong as choosing a flat file when you need a database.

    Knowing the ins & outs of your data structures is still a vital skill of programming.

  • by plopez ( 54068 ) on Tuesday March 24, 2009 @02:50PM (#27316755) Journal

    so you start a small project, "we just need a few hundred/thousand records, a few key value links and the occasional transaction". so you start with a slacker DB. A slacker DB far too often implies a slacker hack software d00d.

    Then it grows. Instead of educating themselves (Q: what's the difference between those who can't read and those who don't? A: nothing. ) and finding a better DB solution they thrash around trying to hack in DB functions into their code.

    So they lose consistency etc. Soon they have a polluted DB that breaks all the time. Often they are proud of the heroics of the wasted effort they put into it. A good programmer know how to be correct form of lazy: do not reinvent the wheel.

    • by oGMo ( 379 )

      A good programmer know how to be correct form of lazy: do not reinvent the wheel.

      YES. Good lazy is "I shouldn't have to do all this work, either use someone else's, or make the computer to it for me." Bad lazy is "whine, I don't want to figure anything out, I just want to get it done." The difference is crucial; the first is willing to spend time learning to save unnecessary labor, the latter is willing to do unnecessary labor to save learning. The former is laziness, the latter is stupidity.

  • All toys (Score:2, Interesting)

    by zig43 ( 1422373 )
    Every database covered in the article is a toy.

    From TFA: "The problem is that JOINs are really, really slow when the data is spread out over several machines."

    This is the result of a poor design, not a database flaw. If you are running a web application against multiple databases, either cluster them or store all the data for a user in one database. (i.e. hash the login_id and select the database based on the result). If someone is doing JOINs across multiple machines and doesn't have a very good rea

  • Fine, codger together some assemblance of data storage using notepad, access, abacuses, whatever. If, heaven forbid, these "startups" ever took hold and gained any significant size, this "new model" will break, and I can't even imagine the hell it would be to merge, the "new model" into classical rdbms.

    Sorry kids, you've bitten off more than you can chew, should have stayed in school and actually attended a class in db modelling. Good luck with this "eventual consistency", you'll need it.
  • FTA:

    The field was surprisingly diverse despite the fact that the offerings are so stripped down that they really don't have more than three major commands: Insert, Update, and Delete.

    There's a write-only database now?

  • The term "old-school" in this context makes me laugh. Back in the days when air was clean and sex was dirty, "relational" databases were considered a resource hog and were shunned by competent programmers. The fastest and most efficient databases were the "network" databases, but they also required the most work and the trickiest coding. Right in the middle were the "hierarchal" databases. Many programmers avoided the database problem by using a "reverse ISAM" arrangement which still used up some extra reso

Murphy's Law, that brash proletarian restatement of Godel's Theorem. -- Thomas Pynchon, "Gravity's Rainbow"

Working...