Follow Slashdot stories on Twitter

The NoSQL Ecosystem 381

Posted by kdawson on Tuesday November 10, 2009 @01:12AM from the no-relation dept.

abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'

This discussion has been archived. No new comments can be posted.

The NoSQL Ecosystem

Load All Comments

Search 381 Comments Log In/Create an Account

Comments Filter:

Why worry? (Score:5, Funny)

by Anonymous Coward writes: on Tuesday November 10, 2009 @01:14AM (#30042488)

Microsoft Access is here!

Share
twitter facebook
- Re:Why worry? (Score:5, Funny)
  
  by MichaelSmith ( 789609 ) writes: on Tuesday November 10, 2009 @01:14AM (#30042494) Homepage Journal
  
  Don't forget excel!
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by socceroos ( 1374367 ) writes:
    
    Ah yes, Excel 97 - the days when you could be in a flight simulator and legitimately tell your boss you were crunching numbers.
  - Re:Why worry? (Score:5, Interesting)
    
    by Anonymous Coward writes: on Tuesday November 10, 2009 @06:53AM (#30043850)
    
    You laugh, but the things I see done in Excel on a daily basis in production environments getting a LOT of work done are a testament to it's power. It is one of the best rapid application development platforms in existance. People with no CS background programming away in a functional style and getting shit done and not even realising they are programming. It could be so much better but it's still the best of breed. Any yes I have tried, and seen others try, O.O. et al. Forget it. Lets not go down that worn old road.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by mikael_j ( 106439 ) writes:
  
  Sadly there is plenty of production code that uses Access databases for things they just shouldn't be used for, at a previous job I actually built several production websites that used Access as the db backend because the client didn't want to use MySQL (Open source is scary!) and they didn't want to pay for MSSQL...
  /Mikael
  - Re: (Score:2, Informative)
    
    by Yoozer ( 1055188 ) writes:
    
    That's when you tell customers about MSDE (now SQL Server Express) [wikipedia.org] which does the job a lot better without breaking the bank.
- Re: (Score:3, Insightful)
  
  by Linker3000 ( 626634 ) writes:
  
  Oh Great! I have just migrated 5 offices from a veterinary management system based around Access 97 onto the new, MS-SQL-based one.
  How can I expect to maintain my value to the company if they stick with old, reliable systems instead of moving onto more sophisticated 'solutions' that require a shit-load of tweaking and technical guesswork to keep them running smoothly?
  - Re: (Score:3, Funny)
    
    by Hognoxious ( 631665 ) writes:
    
    Don't be so pessimistic. There's OO databases and the cloud. That should see you almost through to retirement.
- And I am missing it greatly on Linux (Score:2, Interesting)
  
  by Errol backfiring ( 1280012 ) writes:
  
  MS-Access had some really great features: it could be accessed with both SQL and with a blazingly fast (because almost running on the bare OS) ISAM-style library. I am still missing anything like it on Linux. SQLite is a file-system database, but why on earth should it parse full-blown SQL at runtime and why on earth should my program write another program in SQL at runtime just to load some data? Get serious. Parsing and building SQL is just overhead, and especially parsing SQL is no easy and light task.
  Si
  - Re: (Score:2)
    
    by QuoteMstr ( 55051 ) writes:
    
    Parsing and building SQL is just overhead, and especially parsing SQL is no easy and light task.
    No optimization without quantification. Parsing is very fast, especially compared to disk IO. Are you sure the SQL is slowing your program to any appreciate (or even measurable) degree? You should be able to measure any supposed effect with a profiler.
    Nevertheless, if SQLite so offends your sensibilities, you can always use Berkeley DB [oracle.com]. It gives you a similarly powerful storage engine without the necessity (or ab
    - - Re: (Score:3, Insightful)
        
        by QuoteMstr ( 55051 ) writes:
        
        One of the big attractions of using a database to store your information is having a consistent API for accessing your data. I'm not convinced that what you want, having both SQL and non-SQL methods to access the same dataset, is ever actually useful. The overhead SQL imposes is actually minuscule compared to the cost of data access itself.
        If you go the Berkeley DB route, you're going to need to build an application-level data access layer anyway. If you have a complex query to perform, just do it through t
        
        Re:And I am missing it greatly on Linux (Score:4, Informative)
        
        by Errol backfiring ( 1280012 ) writes: on Tuesday November 10, 2009 @07:09AM (#30043916) Journal
        
        I did profile my code. It is not my gut feeling, but my experience.
        
        Parent Share
        twitter facebook
- - Re:Why worry? (Score:4, Funny)
    
    by Manos_Of_Fate ( 1092793 ) writes: <link226@gmail.com> on Tuesday November 10, 2009 @04:48AM (#30043360)
    
    Because there's no "scary because it's true" mod.
    
    Parent Share
    twitter facebook
bad design (Score:2, Insightful)

by girlintraining ( 1395911 ) writes:

So... every time I open my inbox in Facebook, it has to search through 50TB of data? That sounds like a design problem. What has always floored me is why people think everything needs to be stuffed into a database. Terabyte sized binary blobs? You know, there's a certain point where people need to stop and actually think about the implimentation.
- Re:bad design (Score:5, Funny)
  
  by bennomatic ( 691188 ) writes: on Tuesday November 10, 2009 @01:19AM (#30042518) Homepage
  
  I'm a terabyte sized binary blob, you insensitive clod!
  
  Parent Share
  twitter facebook
- Re:bad design (Score:5, Insightful)
  
  by munctional ( 1634709 ) writes: on Tuesday November 10, 2009 @01:29AM (#30042560)
  
  Ever heard of bloom filters? Sharding? Indexes? They are clearly not doing a table scan on 50gb of data every time you open your Facebook inbox.
  You know, there's a certain point where people need to stop and actually think about the implimentation.
  Um, they do. They regularly blog about their solutions to their problems and open source their solutions and contributions to existing projects. They come up with amazing solutions to their large scale problems. They're running over five million Erlang processes for their chat system!
  http://developers.facebook.com/news.php?blog=1 [facebook.com]
  http://github.com/facebook [github.com]
  Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation".
  
  Parent Share
  twitter facebook
  - Re:bad design (Score:5, Funny)
    
    by socceroos ( 1374367 ) writes: on Tuesday November 10, 2009 @01:49AM (#30042650)
    
    Ever heard of bloom filters? Sharding? Indexes?
    Don't forget flux capacitors, FTL drives and crossfading splicers.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Funny)
      
      by TheModelEskimo ( 968202 ) writes:
      
      Yeah. And those guys down the street, the tweakers, nose jobs, and johnny-come-latelys.
    - Re: (Score:2)
      
      by Linker3000 ( 626634 ) writes:
      
      ..reverse tachyon beams...oh, and more cowbell.
  - Re: (Score:3, Insightful)
    
    by kestasjk ( 933987 ) * writes:
    
    They use bloom filters for messaging? What for?
    - - Re: (Score:3, Interesting)
        
        by Pseudonym ( 62607 ) writes:
        
        Bloom filters are not as useful as they once were for large-scale indexing. As memory sizes increase, the tradeoff between precision and space efficiency changes. It's just as easy to distribute a hash table or a radix trie across multiple machines these days.
        A more common modern use is when you have data which is logically tabular, with potentially many "columns" which can contain arbitrary-sized objects, but the table is expected to be sparse. Traditional SQL table representations rely on predetermined
  - Re:bad design (Score:4, Interesting)
    
    by Ragzouken ( 943900 ) writes: on Tuesday November 10, 2009 @03:33AM (#30043072)
    
    "Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation"."
    When was the last time you tried to use Facebook or Facebook chat and didn't get failed transport requests, unsent chat messages, unavailable photos, or random blank pages?
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by donaggie03 ( 769758 ) writes:
      
      None of that ever happens to me, and I use facebook all the time. Maybe facebook just doesn't like you!
  - Re:bad design (Score:4, Insightful)
    
    by Zombywuf ( 1064778 ) writes: on Tuesday November 10, 2009 @06:17AM (#30043718)
    
    The problem is when people don't think about the solution and apply the cargo cult mentality. Facebook uses Eeeerlaaaang therefore we should. Facebook wrote it's own database, therefore we should. People end up writing their own database engines that do exactly the same thing as modern relational engines, with all the bugs that were fixed in the relational engines 10 years ago (5 for Microsoft). Even MS SQL will split a large group by aggregate operation (which takes 3 lines to specify) across multiple CPUS by turning it into a map reduce problem, and it will do this all without you having to be aware of it. Oracle (and many others, Oracles is supposed to be the best) will maintain multiple concurrent versions of your data in order to allow multiple users to work with a snapshot that doesn't change under them while others are changing the data, and this happens transparently. You can go ahead and implement all this stuff yourself if you want, in C and sockets, call me when your done, in 10-20 years.
    The real issue I have with the NoSQL people is they're a bunch of whiny babies, who haven't even taken the time to understand the problem before lashing out at the first thing they see. Just the name tells you this, they call themselves "No SQL" and then lash out at relational databases. SQL is is a terrible language, which really needs replacing, but it is only one possible language for querying relational databases. Relational databases represent several decades of research into how to query data in a fault tolerant scalable way as a standing implementation, re-implementing them is a waste of time.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Informative)
      
      by Muad'Dave ( 255648 ) writes:
      
      ...they call themselves "No SQL" and then lash out at relational databases.
      Had you read the article, you would've seen that the "No" in NoSQL stands for Not Only, not No, as in none whatsoever. I welcome any and all research into better, tighter synergy between databases and object persistence.
    - - Re: (Score:3, Insightful)
        
        by QuoteMstr ( 55051 ) writes:
        
        What makes you think that relational calculus can't be extended to support spatial information [mysql.com]? After all, it's just another kind of index.
  - - Re: (Score:3, Interesting)
      
      by gutter ( 27465 ) writes:
      
      Sounds like you don't know much about Erlang. Erlang processes are MUCH lighter weight than unix processes, and are designed to scale to millions of processes. Generally, you want one Erlang process for each concurrent task in the system, like maybe one process for each active chat session. So, having 5 million Erlang processes would be as designed.
      - Re: (Score:2, Funny)
        
        by Zombywuf ( 1064778 ) writes:
        
        Sounds like you don't understand sarcasm. I'll spell it out for you: Simply because Facebook are running 5 million processes is neither here nor there. The impressive thing is that it actually works (from what I hear it does any way. If it did it with one process or 5 million it has nothing to do with the relative weight of Erlang and Unix processes.
        Next up, tying your own shoelaces...
- Re:bad design (Score:4, Interesting)
  
  by JavaPunk ( 757983 ) writes: on Tuesday November 10, 2009 @01:38AM (#30042600)
  
  Yes it does (look through 50TB of data), and how would you design it? It has to access all of your friends and find their postings. Robert Johnson gave an excellent talk on facebook's design two weeks ago at OOPSLA (it should be in the ACM digital library soon). He stated that there is no clear segregation of data, the (friend) network is too connected and extracting groups of friends isn't possible. Basically they have a huge mysql farm with memcached on top. Loading an inbox will hit multiple servers (maybe even a different server for each of your friends) across the farm.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by jcnnghm ( 538570 ) writes:
    
    Yes it does (look through 50TB of data), and how would you design it?
    When a users posts a message, I would have the web server pass the message to a server that listens for messages that are being sent. That server would collect the mail then place them as a payload package in the messaging queue when either a fixed number of mail recipients, probably around 500, or a fixed time passes, probably 500ms, whichever comes first. When the payload reaches the front of the queue, the messaging server working on the payload would parse through all the messages building a model of
- Re:bad design (Score:5, Funny)
  
  by ErikTheRed ( 162431 ) writes: on Tuesday November 10, 2009 @02:10AM (#30042712) Homepage
  
  So... every time I open my inbox in Facebook, it has to search through 50TB of data? That sounds like a design problem. What has always floored me is why people think everything needs to be stuffed into a database. Terabyte sized binary blobs? You know, there's a certain point where people need to stop and actually think about the implimentation.
  Could be worse. They could try to find something on my desk.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by mabhatter654 ( 561290 ) writes:
  
  all electronic data is stored in a "database"... even a file system is a type of database you just can't query it with SQL... it's just that database programs that use SQL as the front end have more tools built for them.
  You have a point that the implementation has to be thought about. If you RTFA you'll see the issue is more that RDMS implementations like Oracle rely on breaking up clusters so you can cram as much of the data into RAM as possible on to fast CPUs... in the case of something like Facebook th
hmm (Score:5, Insightful)

by buddyglass ( 925859 ) writes: on Tuesday November 10, 2009 @01:18AM (#30042514)

With regard to scalability, it strikes me that the problem isn't so much SQL but the fact that current SQL-based RDBMS implementations are optimized for smaller data sets.

Share
twitter facebook
- Re:hmm (Score:5, Insightful)
  
  by phantomfive ( 622387 ) writes: on Tuesday November 10, 2009 @01:38AM (#30042602) Journal
  
  The biggest problem is the cloud. A lot of cloud APIs don't allow full relational database access, so now it seems we are coming up with all these justifications for why we don't really need it. Notice that this blog is from a company pushing a cloud based solution.
  
  Parent Share
  twitter facebook
  - Re:hmm (Score:5, Insightful)
    
    by MightyMartian ( 840721 ) writes: on Tuesday November 10, 2009 @01:48AM (#30042642) Journal
    
    That's my take as well. We have these crippled semi-databases that lack a lot of useful features that anyone that has used RDBMSs over the last few decades have gotten used to, so suddenly it becomes a justification game; "Well, SQL doesn't deliver the output we need, so here's some half-way-to-SQL tools which are really better, kinda... oh yes, and Netcraft confirms it, SQL is dying!!!!"
    I have a feeling that this part hype, part inept programmers who don't actually understand SQL, or database optimization. The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.
    
    Parent Share
    twitter facebook
    - Re:hmm (Score:5, Interesting)
      
      by buchner.johannes ( 1139593 ) writes: on Tuesday November 10, 2009 @02:45AM (#30042874) Homepage Journal
      
      The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.
      Your point is valid, but I think there is more to it. And the problems these solutions try to solve are quite old too. For example:
      Ever tried to design a database, but got the requirement that you should be able to reconstruct the modification history? It boils down to not deleting (ever), and 'deleted' flag fields and other uglyness. A multi-version relational database would be nice, you actually don't need modification/delete operations in this scenario, just 'updates' that add to the previous status. CouchDB [blogspot.com] does append operations.
      In some cases you may not need a complete SQL database, just key->value relations, but have them scaling very well. http://project-voldemort.com/ [project-voldemort.com] states: "It is basically just a big, distributed, persistent, fault-tolerant hash table." Then they state that they provide horizontal scalability, which MySQL doesn't (OTOH, we should really look at Oracle for these things).
      And you can't really say MapReduce/Hadoop [apache.org] is pointless.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Interesting)
        
        by phantomfive ( 622387 ) writes:
        
        Ever tried to design a database, but got the requirement that you should be able to reconstruct the modification history? It boils down to not deleting (ever), and 'deleted' flag fields and other uglyness.
        I did it by every time I did an INSERT, DELETE, or UPDATE query, taking an exact copy of the query and dumping it into a special table in the database (along with a stack trace of where it was called from). To reconstruct I could just run those commands straight from the database, to whatever point was desired. It was simple, straightforward and efficient, although I'm sure someone else has a better idea.
        
        Re: (Score:3, Interesting)
        
        by h4rm0ny ( 722443 ) writes:
        
        It was simple, straightforward and efficient, although I'm sure someone else has a better idea.
        I'd love someone to post it if they do. We use the same method and the one time we had to replay the sequence to get what we wanted, it took most of a day. Yes, that was because are last snapshot "starting point" was nearly a week old, but nonetheless... if technology has moved on and there's a better way of doing this, then I'm sure a lot of us will be interested.
        
        Re: (Score:3, Informative)
        
        by larry bagina ( 561269 ) writes:
        
        create post insert, update, and delete triggers which file the data (as well as the action, timestamp, and user) into an audit table.
        
        Re: (Score:3, Insightful)
        
        by mbourgon ( 186257 ) writes:
        
        Mod parent up. That way you're not dealing with the statements themselves, just the data. And you can add the UserID to the Audit table - then find the most recent row for that particular person, or get the most recent row for each ID and apply that.
        
        Re: (Score:3, Interesting)
        
        by popeyethesailor ( 325796 ) writes:
        
        Why not use the DB features? Most enterprise-y databases have PITR(Point-in-time Recovery features).. Although it's not designed for that sort of thing, it could be used in such a fashion.
        Most DBs do the same thing you guys do, i.e, use a transaction log. The transaction log could be replayed to get into a Point-in-time state. The one disadvantage is it's all or nothing i.e, you can't do it for specific transactions(although I'm sure some DBA will wander in correct me on this ;)
    - Re: (Score:3, Insightful)
      
      by CaptainZapp ( 182233 ) * writes:
      
      I have a feeling that this part hype, part inept programmers who don't actually understand SQL, or database optimization. The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.
      Thank you very much for this comment, you put it far more eloquently then my venting, I just wanted to grace this thread with. The real kicker though is
      There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.
      This statement is just so full of shit. And the real larff riot, for me at least, is when people or shops employing MySQL (for heavens sake!) make such statements.
      Ej, folks: Rigid schema design is an asset, not a liability!
      - Re:hmm (Score:4, Funny)
        
        by Hognoxious ( 631665 ) writes: on Tuesday November 10, 2009 @05:29AM (#30043516) Homepage Journal
        
        Rigid schema design is an asset, not a liability!
        Not to people who think a free format text field is the ideal place to store the price, quantity and delivery date of an order. Why not, it's long enough for it all to fit. And it saves all that moving between fields.
        
        Parent Share
        twitter facebook
    - Re: (Score:2)
      
      by sohp ( 22984 ) writes:
      
      Thank St. Codd we are finally getting away from the relational database dogma. I've seen more companies and projects crippled or completely killed by DB architects trying to enforce the One True Schema than just about any other idiocy in IT.
    - - Re:hmm (Score:4, Insightful)
        
        by geminidomino ( 614729 ) * writes: on Tuesday November 10, 2009 @08:41AM (#30044354) Journal
        
        It's not without precedent. Drop all the features of SQL databases that make them a good idea and you end up with MySQL.
        (Burn, baby, burn)
        
        Parent Share
        twitter facebook
  - Vendor Hype Orange Alert (Re:hmm) (Score:3, Interesting)
    
    by Tablizer ( 95088 ) writes:
    
    Notice that this blog is from a company pushing a cloud based solution.
    That is indeed suspicious. But if they want to sell clouds, then make a RDBMS that *does* scale across cloud nodes instead of bashing SQL. (SQL as a language doesn't define implementation; that's one of it's selling points.) It may be that since there's not one out yet, they instead hype the existing non-RDBMS that can span clouds.
    (I agree that SQL could use some improvements, such as named sub-queries instead of massive deep nesting to
    - Re: (Score:2)
      
      by QuoteMstr ( 55051 ) writes:
      
      named sub-queries
      What do you think stored [microsoft.com] functions [mysql.com] and [oracle.com] procedures [eioba.com] are?
      - Re: (Score:2)
        
        by QuoteMstr ( 55051 ) writes:
        
        In MSSQL a FUNCTION doesn't perform well as a named sub-query - because it merely evaluates the function over and over for each row it encounters.
        I'm surprised the query optimizer doesn't try to inline the function. That's a shame. Views are nice too, but aren't quite as general-purpose.
    - - Re:Vendor Hype Orange Alert (Re:hmm) (Score:5, Informative)
        
        by Just Some Guy ( 3352 ) writes: <kirk+slashdot@strauser.com> on Tuesday November 10, 2009 @11:40AM (#30046144) Homepage Journal
        
        A lot of times people who don't know about joins do the basic join of select x.a y.b from x, y where x.c = y.c Not realizing that Most SQL engines will take all the records of x and cross them with y so you will have x.records*y.records Loaded in your system, the it goes and removes the matches. So O(n^2) in performance, Vs. If you do a Select x.a, y.b from x left join y on x.c
        Dude. That is so unbelievably wrong. First, implicit (comma) joins are inner, not left: your results will differ from the original query. Second, please name one popular database released in the last 3 years that implements inner joins with predicates in the way you describe. I can't speak for the others, but PostgreSQL sure as hell doesn't:
        => select count(1) from invoice; select c count --------- 1241342 => select count(1) from ship; count -------- 664708 => select invoice.invid from invoice, ship where invoice.shipid = ship.shipid and ship.name_delpt = 'redacted'; invid --------- 12345 12346
        
        Each of those queries against our live production database ran in under a second (and I only edited the input and output of the final query). PostgreSQL may be quick, but I promise you it didn't have time or RAM to create 825,129,958,136 tuples and then winnow out the non-matches. Maybe you're stuck on an ancient version of a DB that was crappy to start with, but the rest of us don't put up with the same insanities you describe.
        
        Parent Share
        twitter facebook
  - Re: (Score:2)
    
    by Firehed ( 942385 ) writes:
    
    What now? The problem is that relational databases suck at scaling, and as a result we have to come up with absurd hacks like sharding to fix problems that are the fault of the storage engines (if the engine has to do that to not fall apart when dealing with large datasets, fine; but that should be entirely behind-the-scenes and transparent to the application). If these various NoSQL tools are faster than traditional databases and your data isn't particularly relational, then great! But I'd much rather see
    - Shards and clusters and servers, oh my! (Score:2)
      
      by shmlco ( 594907 ) writes:
      
      Worse, sharding and other such solutions usually end up requiring the application to know way, way too much about the back end structure, how tables are split, where they are split, and so on.
      And your solution to improving the storage engine doesn't help. At some point in a RDBMS you need to do joins and so forth, and that assumes that the machine doing the join is capable of doing so AND of handling the load and the number of transactions being tossed at it. Hence we start getting into clusters and other s
    - Re: (Score:2)
      
      by countach ( 534280 ) writes:
      
      "But I'd much rather see effort put into solving the lack of horizontal scalability associated with relational DBs"
      I think I'd rather see the opposite: That non-relation DBs become the mainstream, and they have SQL added for the odd occasion it is useful. Relational has some nice properties for ad-hoc querying, but for everything else they are a nuisance.
      - Re:hmm (Score:5, Insightful)
        
        by QuoteMstr ( 55051 ) writes: <dan.colascione@gmail.com> on Tuesday November 10, 2009 @04:04AM (#30043190)
        
        I think I'd rather see the opposite: That non-relation DBs become the mainstream, and they have SQL added for the odd occasion it is useful. Relational has some nice properties for ad-hoc querying, but for everything else they are a nuisance.
        Berkeley DB [wikipedia.org] is a very good non-relational database with multiple language bindings, several storage engines, and transaction support. It's been around for 24 years, and has seen some appreciable use.
        But that use was nothing compared to the database explosion that SQLite [sqlite.org] brought about when it was released. SQLite is almost exactly like Berkeley DB, except that it has a SQL engine on top. Almost everyone is using SQLite, and many Berkeley DB users are moving over to it.
        Why? Because SQLite is relational! That constitutes some serious evidence that relationship databases are more than "a nuisance".
        
        Parent Share
        twitter facebook
        
        like the network effect and developer laziness (Score:2)
        
        by Colin Smith ( 2679 ) writes:
        
        its simpler to switch to a different rdbms when your queriees are already in sql.
        It's mostly just human ignorance and laziness.
  - Re: (Score:2)
    
    by PhrostyMcByte ( 589271 ) writes:
    
    There's a reason Google, Amazon, and Microsoft all designed their cloud databases without SQL -- it has a lot of features that don't scale well when your data spans a crap ton of servers. Imagine a website that does several JOIN queries for each page view -- now if you've got data spanning 50 servers, that's a hell of a lot of I/O that will be very hard to scale. When you take out these extra features, you end up not having much more than the basics -- usually just a simple insert, update, and delete with
    - Re: (Score:3, Informative)
      
      by QuoteMstr ( 55051 ) writes:
      
      I don't think you've thought clearly about the problem.
      If a JOIN is causing problems because it's causing too much non-local data access, then you're going to run into the same problem when you re-code the JOIN in the application. In fact, it might hit you worse because you won't benefit from the database's query optimizer.
      The solution is clearly to improve locality of reference. You can do that by duplicating some data, denormalizing the database, and so on. But you can do all those things just as easily w
      - Re: (Score:2)
        
        by PhrostyMcByte ( 589271 ) writes:
        
        These databases are all schemaless, so it's not like they could use an established RDBMS... they had to make something new. I was trying to give an example of why they wouldn't bother implementing something like JOIN in something new that they planned to scale from the beginning.
        I'd actually be pretty stoked to find a schemaless DB that uses mostly standard SQL, because I find schemaless makes a lot of sense for use beyond cloud-level scalability. Most of the time they let you store things in a more natur
        
        Re: (Score:2)
        
        by QuoteMstr ( 55051 ) writes:
        
        There exists some schema for every data set. The lack of a schema is (being generous) actually a lack of imagination and ingenuity on the part of the programmer.
        you don't have to copy everything to a new table or ALTER the existing one
        How is that any better than creating a second table with the new schema, and slowly (or lazily) migrating records from the old to the new table?
  - Re: (Score:2)
    
    by jilles ( 20976 ) writes:
    
    That's just another way of saying sql databases are a poor match for the requirements big websites face. SQL databases used at scale almost always throw characteristic features like transactions, joins, or even ACID out of the window in order to scale. Once you start doing that, SQL databases just become a really complicated way to store stuff. The one database that is really popular on big websites is mysql, which started out its popularity as a non transactional database. While most common features have b
- Re:hmm (Score:5, Insightful)
  
  by KalvinB ( 205500 ) writes: on Tuesday November 10, 2009 @01:48AM (#30042644) Homepage
  
  For the vast majority of use cases, large data sets can be made logically small with indexes or physically small with hashes.
  If you're dealing with massive data you're probably not dealing with complex relationships. E-Mail servers associate data with only one index: the e-mail address. Google only associates content with keywords. E-mail servers logically and physically separate email folders. Google logically and physically separates the datasets for various keywords. So by the time you hit it, it knows instantly where to look for what you want. You don't have a whole complex system of relationships between the data. It looks at the keywords , finds the predetermined results for each and combines the results.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by tkinnun0 ( 756022 ) writes:
    
    What if you ARE dealing with massive data AND complex relationships?
- Re: (Score:3, Interesting)
  
  by Prof.Phreak ( 584152 ) writes:
  
  Depends. We've been using Netezza with ~100T of data, and... well... it takes seconds to search tables that are 30T in size. I'd imagine Teradata, greenplum and other parallel db's get similar performance---all while using standard SQL with all the bells and whistles you'd normally expect Oracle SQL to have (windowing functions, etc.).
- Re:hmm (Score:5, Insightful)
  
  by mzito ( 5482 ) writes: on Tuesday November 10, 2009 @02:21AM (#30042788) Homepage
  
  Uh, no, that is not correct. Relational DBMSes such as Oracle, Teradata, DB2, even SQL Server are all designed to scale into the multi-terabyte to petabyte range. The issue is one of a couple of things:
  - Cost - "real" relational databases are expensive. I once had a conversation with someone who worked at Google, who talked about how much infrastructure they have written/built/maintain to deal with MySQL. Many of those problems were solved in an "enterprise" DBMS 3-10 years ago. However, the cost of implementing one of those enterprise DBMS is so high that it is cheaper to build application layer intelligence on top of a stupid RDBMS than purchase something that works out of the box
  - Workload style - most of the literature around tuning DBMS is for OLTP or DSS workloads. Either small question, small response time (show me the five last things I bought from amazon.com) or big question, long response time (look through the last two years worth of shipping data and figure out where the best places to put our distribution centers would be). Many of these workloads are combos - there could be very large data sets and complex data interdependencies, with low latency requirements. It may be possible to write good SQL that does these things (in fact, I know a couple luminaries in the SQL space that will claim just that), but the community knowledge isn't there.
  - Application development - when you're building your app from scratch, you can afford to work around "quirks" (bugs) and "gaps" (fatal flaws) to get what you need. This dovetails with the other issues, but when your core business is building infrastructure, it's worth your while to deal with this. When your core business is selling insurance or widgets, or whatever, it is not.
  None of this is to say that the "nosql" movement is a bad thing, or that there's no reason for its existence, or that no one should bother looking at it. However, there is a definite trend of "this is so much better than SQL" for no good reason. SQL has scaled for years, and I know loads of companies who work with terabytes and terabytes of data on a single database without any issue.
  A far more interesting discussion is the data warehouse appliance space - partitioning SQL down to a large number of small CPUs and pushing those as close to the disk as possible.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by FlyingGuy ( 989135 ) writes:
    
    Well said.
    Lack of imagination is another problem when people look at the dry subject of data storage and retrieval. Yes at some point the math has to work, but in the meantime looking outside the box and taking an RDBMS in different and new directions within the engine itself can give those mountains of scale that are the holy grail these days.
    While I was writing that paragraph I thought it might be interesting to take a structure such as an n-node b-tree and then make the leaves table references and the n
    - Re: (Score:2)
      
      by chthon ( 580889 ) writes:
      
      Prior to the relational database, there where network databases and hierarchical databases. They must be navigated to search for data. What you propose here seems to be the same, but then mapped on top of a relational database, which makes it probably slower than the previous solutions.
  - Re: (Score:2)
    
    by sohp ( 22984 ) writes:
    
    They may have been "designed" to scale, but they were clearly never tested or even trialed under real-world conditions. Distributed replication and transactions are a complete joke for scalability. As for "Many of those problems were solved in an "enterprise" DBMS 3-10 years ago", which ones specifically? Oh, right, none of them, because "enterprise" is just marketspeak for "really expensive and complicated but we promise it'll work great if you just sign over more money".
    If I had a nickel for every organiz
    - Sorry, you need to get real (Score:3, Informative)
      
      by Shivetya ( 243324 ) writes:
      
      I work on a very large db2 system. Enterprise systems cost money because they work. There still seems to be this ignorant self absorbed counter culture which believes big iron and similar (anything about look what I can build in my basement) isn't cool so it cannot work.
      Between radix, sparse, derived, encoded vector indexes I can pretty much serve up anything my partners want, whether they are native or foreign db2 ,jdbc or odbc connected. With the tools I have at my disposal I can analyze statements pr
      - Re: (Score:3, Funny)
        
        by QuoteMstr ( 55051 ) writes:
        
        And so we come to the core of the issue: people aren't really opposed to relational databases, but instead to relational database administrators.
    - Re: (Score:3, Informative)
      
      by mzito ( 5482 ) writes:
      
      I sort of agree with you, from the perspective that there's crusaders on either side - people who insist that traditional RDBMSes are the Only Way and people like you who insist they've "never been trialed under real-world conditions". Both statements are clearly incorrect on their face.
      However, there are a multitude of features that these systems have that are not available in NoSQL systems, or only available in such a watered down form that its unfair to compare the two. A list:
      - On-disk encryption
      - Com
Dynamic Relational: change it, DON'T toss it (Score:5, Interesting)

by Tablizer ( 95088 ) writes: on Tuesday November 10, 2009 @01:26AM (#30042546) Journal

The performance claims will probably be disputed by Oracle whizzes. However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array). Non-existent columns in any given row are treated as Null/empty instead of an error. Perhaps tables can also be created just by inserting a row into the (new) target table. No need for explicit schema management. Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified. We have dynamic app languages, so why not dynamic RDBMS also? Let's fiddle with and stretch RDBMS before outright tossing them. Maybe also overhaul or enhance SQL. It's a bit long in the tooth.
More at:
http://geocities.com/tablizer/dynrelat.htm [geocities.com]
(And you thought geocities was de

Share
twitter facebook
- Re: (Score:2, Insightful)
  
  by Prodigy Savant ( 543565 ) writes:
  
  What you are suggesting is to mimic a key-value design with something like a json or serialized data as the value.
  This would work if you never had to index on any of the values in the json. All your sql queries must have there where parts running off the key.
  This is a problem that couchdb and mongodb solve.
  I am not trying to paint SQL in an unflattering shade -- there would still be a lot of situations where an RDBMS design would be optimal. Infact, I am currently working on a mongodb/mysql hybrid solution
  - Re: (Score:2, Interesting)
    
    by Tablizer ( 95088 ) writes:
    
    What prevents indexing a dynamic-relational DB? Although I said that you didn't need a data-definition language, but that doesn't mean one *must* skip the DDL (for things such as indexes). Another thing to explore is auto-indexing. If so many queries keep filtering by a given column, then it could automatically put an index on it.
- Re: (Score:2)
  
  by TubeSteak ( 669689 ) writes:
  
  Let's fiddle with and stretch RDBMS before outright tossing them.
  This isn't "it ain't broke, don't fix it"
  Instead we're dealing with "I have a hammer, so every problem looks like a nail"
  The desire to "fiddle with and stretch" software instead of sinking dollars into something new is
  part of the reason we have a clusterfark of decades old technologies & hardware that won't go away.
  Sometimes you have to accept that a hammer isn't the right tool for the job.
- Re:Dynamic Relational: change it, DON'T toss it (Score:5, Interesting)
  
  by sco08y ( 615665 ) writes: on Tuesday November 10, 2009 @08:14AM (#30044202)
  
  However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array).
  You described an entity attribute value model, which winds up reinventing half the DBMS, poorly. Don't worry, *everyone* does one once until they realize it's a bad idea.
  Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified.
  A "rigid" schema is preventing a ton of totally redundant code being written on the app side. All those constraints wind up in the schema because your UI designer doesn't want to consider that Mary might have 5 addresses or 6 mothers or work 7 jobs simultaneously. And your UI tester doesn't want to test an exploding combinatorial number of possibilities.
  I'd like to see, however, a decent type system, proper logical / physical separation, etc.
  Maybe also overhaul or enhance SQL. It's a bit long in the tooth.
  I'm starting from scratch. [github.com] (Currently I'm slowly retyping about 40 pages into Latex...)
  
  Parent Share
  twitter facebook
- Re: (Score:3, Insightful)
  
  by pla ( 258480 ) writes:
  
  RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array). Non-existent columns in any given row are treated as Null/empty instead of an error. Perhaps tables can also be created just by inserting a row into the (new) target table. No need for explicit schema management.
  
  Aaaaaaaand, congratulations, you've described "fixing" the problem of schema flexibility by using an RDBMS as a non-relational flat hashed memory storage area, with at least three
NoSQL? That'd Be DL/I, Right? (Score:5, Informative)

by BBCWatcher ( 900486 ) writes: on Tuesday November 10, 2009 @01:27AM (#30042548)

I think I've heard of non-relational databases before. There's a particularly famous one, in fact. What could it be [ibm.com]? Let's see: first started shipping in 1969, now in its eleventh major version, JDBC and ODBC access, full XML support in and out, available with an optional paired transaction manager, extremely high performance, and holds a very large chunk of the world's financial information (among other things). It also ranks up there with Microsoft Windows as among the world's all-time highest grossing software products.
....You bet non-relational is still highly relevant and useful in many different roles. Different tools for different jobs and all.

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by Tablizer ( 95088 ) writes:
  
  IMS is very efficient for known query patterns, but not very flexible for stuff not anticipated. This is a common characteristic of non-relational databases: optimize for specific query paths at the expense of general queries (variety).
  Often IMS data is exported and re-mapped nightly or periodically to a RDBMS so that more complex queries can be performed on the adjusted copy. The down-side is that it's several hours "old".
  Note that it's also possible to optimize RDBMS for common queries using well-planned
  - Re: (Score:2)
    
    by Trepidity ( 597 ) writes:
    
    To some extent relational databases are making a similar bet, optimizing for particular kinds of access at the expense of more general queries. There are more expressive database languages that support more general kinds of queries, including quantification and variable binding and such, like Datalog, but they're harder to make efficient (though they can also be optimized for common query paths). I see SQL as something of a middle-of-the-road choice: more general than tuple stores and some other approaches,
Starting to love the idea (Score:5, Interesting)

by Just Some Guy ( 3352 ) writes: <kirk+slashdot@strauser.com> on Tuesday November 10, 2009 @01:30AM (#30042562) Homepage Journal

I'm a huge PostgreSQL fan and took classes in formal database theory in college. I'm saying this as someone who understands and thoroughly appreciates relational databases: I'm starting to love schema-less systems. I've only been playing with CouchDB for a few weeks but can certainly see what such stores bring to the table. Specifically, a lot of the data I've stored over the years doesn't neatly map to a predefined tuple, and while one-to-one tables can go a long way toward addressing that, they're certainly not the most elegant or efficient or convenient representation of arbitrary data.
I'm certainly not going to stop using an RDBMS for most purposes, but neither am I going to waste a lot of time trying to shoehorn an everchanging blob into one. Each tool has its place and I'm excited to see what niche this ecosystem evolves to fill.

Share
twitter facebook
Everything old is new again (Score:5, Interesting)

by QuoteMstr ( 55051 ) writes: <dan.colascione@gmail.com> on Tuesday November 10, 2009 @01:51AM (#30042656)

We didn't start with relationship databases. RDBMSes were responses to the seductive but unmanageable navigational databases [wikipedia.org] that preceded them. There were good reasons for moving to relational databases, and those reasons are still valid today.
Computer Science doesn't change because we're writing in Javascript now instead of PL/1.

Share
twitter facebook
- Re: (Score:2)
  
  by sohp ( 22984 ) writes:
  
  Wrong. Every dogmatic RDBMS lapdog drags out the old "navigational database" claims, but that's just bs to cover up the fact that relational is a solution seeking a problem. Question for you: if relation is so great, why isn't the DNS system that is the backbone of the internet built on top of a relational schema? It's because the decentralized scalability we need for real-world applications can never be properly addressed by any relational implementation.
  - Re:Everything old is new again (Score:4, Interesting)
    
    by QuoteMstr ( 55051 ) writes: <dan.colascione@gmail.com> on Tuesday November 10, 2009 @06:43AM (#30043804)
    
    Your question reminds me of the people who say, "if flight records are so strong, why don't we just build the whole plane out of the stuff they use to make them?" You might as well ask, "if DNS is so great, why don't we implement filesystems in terms of it?" Your post demonstrates that you you haven't considered context and purpose.
    Relational databases are models. You can certainly describe DNS in terms of a relational schema. In principle, you could construct a wrapper and query it with SQL. But there's no reason to do that, because with someone as simple as DNS, the full power of a relational query engine doesn't buy you much.
    Most datasets aren't that simple.
    Furthermore, DNS is an open standard that needs to be accessible in as simple a way as possible. Complicating it with relational semantics wouldn't have been worthwhile (because of DNS's relative simplicity), and would have significantly hampered DNS's interoperability.
    That is, if relational databases had existed when DNS was implemented, which they didn't.
    Furthermore, DNS is a distributed, decentralized database. You couldn't use a RDBMS (the software that realized the abstract model of a relational database) to manage it even if you wanted to. That doesn't apply to most datasets, which however large, are still managed by a single organization, and which are accessed by software under the control of that organization.
    Your comparison really makes no sense whatsoever. The vast majority of databases aren't put under the same constraints DNS, and so can take advantage of the much greater flexibility an RDBMS affords.
    You're basically arguing that we can't have efficient engines in automobiles because of a few of them might need to tow 18 ton trailers and withstand mortar rounds. It's ridiculous.
    
    Parent Share
    twitter facebook
    - - Re: (Score:3)
        
        by QuoteMstr ( 55051 ) writes:
        
        I gave concrete reasons why DNS wouldn't work well implemented as if it were a single global relational database. You reply with hysterical rhetoric and conspiratorial allegations. I'm done with you.
        By the way: it's perfectly possible (and in some cases, even reasonable) for a DNS server to use a relational database to store its records.
10 years ago, they had the same problem (Score:3, Interesting)

by johnlcallaway ( 165670 ) writes: on Tuesday November 10, 2009 @02:19AM (#30042772)

I was an admin on a system that spread the data across 10 database servers. Each server had a complete set of some data, like accounts, but the system was designed so that ranges of accounts stored their transaction type data a specific server, and each server held about the same number of accounts and transactions. As data came in, it was temporarily housed on the incoming server until a background process picked it up and moved it to the 'correct' one. This is a very simplistic view, but the reality was that it worked quite well. Occasionally, there was a re-balancing that had to be done. But it was very scalable. The incoming data wasn't so time sensitive that if it took a few hours to get moved, everything was still OK. When an 'online' session needed data, it knew which server to connect to to get it. Processing was done overnight on each server, then summarized and combined as needed.

So yes .. .people have been coming up with innovative ways to solve these problems for a very long time.

And they will continue to do so.

Share
twitter facebook
I/O bottleneck (Score:2, Interesting)

by Begemot ( 38841 ) writes:

Let's not forget where the bottleneck is - the I/O. It's expensive but once you build a fast and solid storage system, correctly configure it and partition your data properly over a sufficiently large number of hard drives, RAIDs, LUNs etc., you might be able to use SQL. We run a database of 10TB on MS SQL with hundreds of millions of records with an equal rate of reads and writes and could not be happier.
- Re: (Score:2)
  
  by FlyingGuy ( 989135 ) writes:
  
  WOW! Do you get ahold of an older SyBase version?!
  But seriously, yeah even MS-SQL can hang in if you set up it up just so and never let MS patch the thing forever after.
  - Re: (Score:3, Insightful)
    
    by cervo ( 626632 ) writes:
    
    NO offense, but you probably have no idea what you are talking about. MS-SQL is a relatively solid product. SQL Server 2000 and SQL Server 2005 are pretty stable and can easily handle rather large data sets (in the TB). Of all the Microsoft Products, personally Visual Studio and SQL Server are my favorites. I like PostgreSQL as well, so I'm not strictly a Microsoft Fan. But an awful lot of companies are realizing that MS SQL can manage their data much cheaper than Oracle can. Of course PostgreSQL can
30 Years? (Score:2)

by uncqual ( 836337 ) writes:

...the traditional relational database technology that has served us well for over thirty years...
Hmm... Before 1979, market share for RDBMS was TINY. It really didn't begin to "serve us well" until the mid 80's.
This again (Score:3, Interesting)

by Twillerror ( 536681 ) writes: on Tuesday November 10, 2009 @03:56AM (#30043158) Homepage Journal

Wow a "object oriented" database discussion again. I've never read one of these :P I've only been doing this 15 years and I've lost count of these talks a long time ago.
What is the difference between schema less and schema rigid anyways. I don't see what that has anything to do with performance. The real issue is uptime and transaction support. People want to add a column or index without taking the system down. That is different then dealing with PBs of data. Most table structures can easily deal with that much data.
If you have a DB that is big you have lots of outs. Pay...get Enterprise version of whatever. Break it into many DB/tables and merge together. Archive. Archive I bet will get most people by. Does eBay really need all that bidding info for items over a few weeks old...only for analysis maybe. Move that old stale data out of the active heavily hit data tiers.
The fact remains that MySQL should be able to scale to TBs of data. The fact that it can't is a failure of the product. All the others have been for a while. Why can't it...I don't know...the fact that it uses a F'in different file for each index on a table. If you don't understand how old school that is start using Paradox. Just because it is open source doesn't mean it has to be so damn out of date. Please for the love of god save multiple tables/indexes in the same pre sized file...god.
Google has all the power to go and use something different. Google gets to cheat. Google is a collection of pretty static data. They scan the internet a lot, but imagine if every time you did a search Google had to scan every web page on the planet, index them, and then give you search results. That would be impractical for sure. So for now they just store big collections of blobs and a big fast index for searching keywords and links to pages. Impressive none the less, but it's not like your typical app. GMail is...funny that it is one system they've had problem with. Even then EMAIL DOESN'T CHANGE. It's user specific, but it's still f'in static. GoogleTastic if you ask me.
The fact is people are using RDBMS right now to solve real world problems. Some start up is finding a way to tweek MySQL to do something cool and then posting it on a blog...then all of the sudden RDBMS is dead. RDBMS is fine, it will be fine for at least 10 years if not longer. In that time it will evolve as well so that it will be around for even longer. MySQL in 5 years will have online index addition, performance hitless online column addition, partitioning, geo indexing, XML columns, BigASS table support, Oracle RAC like support, and a thousand other features that some RDBMSs have today and some will not see for even longer. Then developers that spent all that cash developing custom shit will revert and post comments like this one.
That's the way it goes in software development. The middle tier gets bigger, gets inept, custom shit comes out, it gets integrated into the middle tier shit....continue;
Instead of pronouncing death start talking about how dated a 2 dimensional result set is. JOINs should return N dimension result sets similar to XML with butt loads of meta data. ODBC/JDBC are dated...so updated them.
select u.login, ul.when from users u join user_logins ul as logins.login ON ul.user_id = u.user_id where u.name = 'me' should equal something like a nested XML packet instead of duplicated crap when there is more then one user_logins.

Share
twitter facebook
TFA is bullshit (Score:2)

by WarwickRyan ( 780794 ) writes:

I've seen OLAP systems in the 100TB range which work fantastically well on Oracle.
Object databases could be a nice idea, but not for performance or scaling reasons. An object oriented database would be beneficial as a method to sidestep ORM. So you can, effortlessly and without any significant amount extra work persist the state of your objects.
Then you can build POxOs to represent your objects and just implement a few lines of code to have them persisted.
Not sure if anything like that already exists. I
- Re: (Score:2)
  
  by QuoteMstr ( 55051 ) writes:
  
  How on earth is that better than using an existing ORM library? Even if you have to write your own, an ORM isn't particularly difficult to write.
  - Re: (Score:2)
    
    by WarwickRyan ( 780794 ) writes:
    
    I can only talk about nHibernate or LINQ2SQL, but in either of those cases I have to do something in the database and/or write some XML. That's duplication of work: you've already defined the properties of the object in the class, so adding anything else on top of that is a waste of time.
    With ORM you usually end up keeping three seperate definitions in sync - the database table, the ORM metadata (mappings) and the object. That costs time, and time costs money.
    Rails have solved this through scaffolding and
    - Re: (Score:2)
      
      by QuoteMstr ( 55051 ) writes:
      
      Is writing SQL table definitions instead of equivalent assertions in your language of choice really the bottleneck of your development? Switching to an entirely different database paradigm solely to avoid writing "CREATE TABLE" sounds inefficient.
      - Re: (Score:2)
        
        by WarwickRyan ( 780794 ) writes:
        
        The problem is not that it's 'instead of', it's 'as well as'. Sure, you can go all ActiveRecord, but then you loose a lot when it comes to your object design. Which you really don't want to do.
- Re: (Score:2)
  
  by sohp ( 22984 ) writes:
  
  Object databases could be a nice idea, but not for performance or scaling reasons.
  [citation needed]
  You're just talking out of your ass.
solution looking for a problem? (Score:4, Insightful)

by timmarhy ( 659436 ) writes: on Tuesday November 10, 2009 @07:33AM (#30044022)

SQL databases if designed properly DO handle enourmous datasets. the problem starts when you have wits designing the database and then managers attempting to use the DB for purposes it wasn't meant for.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by MightyMartian ( 840721 ) writes:
  
  In the olden days you didn't have centralized message stores. That's largely a relic of PC-based networking schemes like Novell, Lotus Notes and Exchange. The Unix model used individual mailboxes (in fact, the whole breakdown was for all of a user's data being in their own hierarchy). Obviously the Unix mailbox scheme wasn't that great as we started saving many megabytes of data, so you create indexed systems, but each user's mail is still effectively independent. I've used Pine to navigate my old mbox
  - Re: (Score:2)
    
    by Malc ( 1751 ) writes:
    
    I bet it can't find old messages at the speed I do with X1 + 10 years of Exchange-based email (more than 250,000 messages). I stuck with Pine through the end of the 90s, when everybody else I worked with was switching to Netscape Communicator. I wouldn't go back now though.
- Re: (Score:2)
  
  by Firehed ( 942385 ) writes:
  
  That's very clever and all (and I'm sure quite effective), but it doesn't address the original issue: RDBMSs suck at scaling. We should be able to throw a rack of servers with a load balancer and a SAN at the problem and have it go away. We shouldn't have to rewrite our application logic to scale it out any more than we currently have to write special code because our hard drives are in RAID5 (read: not at all).
  The storage engines and their indexing should take care of all of this nonsense automatically.
- Re:I know the type well (Score:4, Interesting)
  
  by QuoteMstr ( 55051 ) writes: <dan.colascione@gmail.com> on Tuesday November 10, 2009 @08:42AM (#30044360)
  
  Right. Don't forget PostgreSQL too. Really, the problem here is MySQL. Hell, look at the "tips and tricks" comments for this story: they all deal with ways to work around deficiencies in MySQL (and old versions of MySQL at that.)
  The guy who recommends using the first two characters of the MD5 hash to select a table is particularly hilarious. Doesn't he realize that's what a database index already does, and that databases (even MySQL) will do that for him?
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Why worry? (Score:5, Funny)

Re:Why worry? (Score:5, Funny)

Re: (Score:2)

Re:Why worry? (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3, Insightful)

Re: (Score:3, Funny)

And I am missing it greatly on Linux (Score:2, Interesting)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:And I am missing it greatly on Linux (Score:4, Informative)

Re:Why worry? (Score:4, Funny)

bad design (Score:2, Insightful)

Re:bad design (Score:5, Funny)

Re:bad design (Score:5, Insightful)

Re:bad design (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re:bad design (Score:4, Interesting)

Re: (Score:2)

Re:bad design (Score:4, Insightful)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:2, Funny)

Re:bad design (Score:4, Interesting)

Re: (Score:3, Insightful)

Re:bad design (Score:5, Funny)

Re: (Score:2)

hmm (Score:5, Insightful)

Re:hmm (Score:5, Insightful)

Re:hmm (Score:5, Insightful)

Re:hmm (Score:5, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Re:hmm (Score:4, Funny)

Re: (Score:2)

Re:hmm (Score:4, Insightful)

Vendor Hype Orange Alert (Re:hmm) (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:Vendor Hype Orange Alert (Re:hmm) (Score:5, Informative)

Re: (Score:2)

Shards and clusters and servers, oh my! (Score:2)

Re: (Score:2)

Re:hmm (Score:5, Insightful)

like the network effect and developer laziness (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:hmm (Score:5, Insightful)

Re: (Score:2, Insightful)

Re: (Score:3, Interesting)

Re:hmm (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Sorry, you need to get real (Score:3, Informative)

Re: (Score:3, Funny)

Re: (Score:3, Informative)

Dynamic Relational: change it, DON'T toss it (Score:5, Interesting)

Re: (Score:2, Insightful)

Re: (Score:2, Interesting)

Re: (Score:2)

Re:Dynamic Relational: change it, DON'T toss it (Score:5, Interesting)

Re: (Score:3, Insightful)

NoSQL? That'd Be DL/I, Right? (Score:5, Informative)

Re: (Score:2, Informative)

Re: (Score:2)

Starting to love the idea (Score:5, Interesting)

Everything old is new again (Score:5, Interesting)