Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Databases Programming

Why Some Devs Can't Wait For NoSQL To Die 444

theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."
This discussion has been archived. No new comments can be posted.

Why Some Devs Can't Wait For NoSQL To Die

Comments Filter:
  • by BLToday ( 1777712 ) on Sunday March 28, 2010 @11:46AM (#31647678)
    There's a place for SQL, but there are some cases where BigTable-like (ie. HyperTable) works better. Our company manages data using SQL, but when we present data to the users it's through a HyperTable implementation. SQL is easier to data management but HyperTable uses our server resources better.
  • by Vellmont ( 569020 ) on Sunday March 28, 2010 @11:48AM (#31647692) Homepage

    So you're in surgery for 3 hours doing a kidney transplant, having used your trusty medium vascular clamp that have served you for the past 20 years. You're finally done and the patient is in recovery, so you sit down to relax with the latest copy of JAMA. They've got a great article about the latest development of Cardiac clamps, and you think to yourself "Why not use a heart clamp for kidney transplants!" Brilliant. So you order up some new clamps from, and use them on your next patient. The surgery goes fine, but 3 months later the patient is back in your office with a failed kidney. You open 'em up, and it's obvious the clamp exerted too much pressure on the artery, damaging it in the process. Stupid carciac clamps! You're not a heart surgeon!

  • by pavera ( 320634 ) on Sunday March 28, 2010 @12:12PM (#31647872) Homepage Journal

    Pretty sure he meant 1M page views/day as he compares it to slashdot using alexa data.... Is reading comprehension really that hard? Context clues are your friend.

    I run a site using django/postgres, we do about 100k page views/day on a 512Mb 10GB Virtual machine. Its not doing anything crazy like google, but yeah, we aren't close to needing more power yet. When we do, first thing we'll do is bump up RAM for increased cache space...

  • by WrongSizeGlass ( 838941 ) on Sunday March 28, 2010 @12:18PM (#31647918)
    I think this would have been better if you'd used a car analogy ... maybe something with hose clamps?
  • by tukang ( 1209392 ) on Sunday March 28, 2010 @12:37PM (#31648070)

    OO does make code re-use a bit easier BUT that is NOT the claim that people often make. Trust me, I ask this in interviews and it is always the same answer. Apparently you can't re-use functions. No way, no how. NEXT!

    You can reuse functions but you can't extend them and that's where OOs reuse shines. It's very powerful to be able to lay out your code as a tree and control the reuse 'flow' at the nodes.

  • by cervo ( 626632 ) on Sunday March 28, 2010 @12:43PM (#31648134) Journal
    Many of the NoSQL sources scale better than a normal database and are available cheap. Oracle costs a fortune, and if you want to run Oracle on a cluster good luck. They also don't let you publish benchmarks without their permission. But most people I know who use Oracle claim it totally beats everything else (without further clarification). DB2 includes a cluster edition that is also quite good. It uses a shared nothing architecture. But none of these solutions are free. Also teradata is also cited as a good parallel database. If you are a start-up and your choice is a NoSQL solution that is almost free or 100,000+ for some commercial parallel database, which do you go to?

    But no matter what you will consume resources with a relationship database on ensuring consistency (which many times is what you want but not 100% of the time). Amazon's Dynamo works by not caring so much about consistency and trading consistency for availability of the overall service. For a shopping cart it is fine, but you wouldn't want to do your credit card processing using it. Google's GFS is optimized to do the file operations that google does the most. However there was an article in the ACM not that long ago comparing Map Reduce (Hadoop's implementation) against two parallel databases, and it lost. OF course the Parallel Databases were all not free....and hadoop is....

    So overall I'd say the decision comes down to price mostly (as it does with most startups). If you can make do with one server than sure do PostgreSQL (or mySQL...although they always tried to force licensing for commercial products even though it is GPL...). If you need a cluster, both have clustering solutions, but as far as I can tell they are not as good as the commercial Parallel databases. If you have lots of money then sure go with Oracle, it seems through word of mouth Oracle is the best for both parallel and stand alone in terms of performance. DB2 was good enough for a former job. They had terabytes in the mid 1990's using about 20 servers. Now that the hardware is much better I'm sure it scales even better.... But if money is a consideration, then go with an open source noSQL solution. A lot of people now swear by Cassandra, I haven't had a chance to check it out yet.
  • by Anonymous Coward on Sunday March 28, 2010 @12:48PM (#31648182)

    Pure dynamic. It's a datamining / analysis site, so every user is viewing their own set of data, slicing and zooming randomly. Caching is completely useless for 99.9% of the pages, but we do store some heavy "SELECT COUNT(*) ... GROUP BY ..." queries in memcached. We chose PSQL because it can handle the complex multiple table joins with many indexes required - just that one thing would mean endless pain in a non relational datastore.

    If you still have any doubt, just write your code the easy way and grab Apache JMeter [] to benchmark your site on localhost. You'll be surprised how well even the dev server works, on an average page with ~10 queries, it takes only 50-100ms to serve a page. At 10/sec/core, extrapolated to 24 hours means almost a million pages/core. You can just take this and run it on a 8-12 cores node and survive any traffic surge imaginable, without cache. Add cacheing and I really can't see how a blog/news site/forum/CMS can ever require NoSQL to run, except when you reach "Facebook" popularity.

    PS.: We aim for these numbers for a non cacheable page: 1s = slow but manageable. 0.2s = good. 0.1s or less = perfect.

  • Re:Article summary (Score:3, Informative)

    by Phroggy ( 441 ) <.slashdot3. .at.> on Sunday March 28, 2010 @01:50PM (#31648704) Homepage

    ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.

    But if MythTV takes twice as many milliseconds to read a channel listing, it really doesn't matter. Nobody's suggesting that SQLite can replace a real database server in all cases, but performance and scalability are completely unimportant in some applications.

  • Re:Article summary (Score:3, Informative)

    by spongman ( 182339 ) on Sunday March 28, 2010 @01:56PM (#31648766)

    MSSQL's TIMESTAMP is non-standard. so if you're trying to port 'standard' SQL code from the mythical standard DBMS in the sky, then you've got some work cut out for you.

  • Re:Article summary (Score:4, Informative)

    by raynet ( 51803 ) on Sunday March 28, 2010 @02:11PM (#31648902) Homepage

    This might explain some of the problems with it []

    Basicly MSSQL timestamp aint a timestamp.

  • Re:Article summary (Score:5, Informative)

    by Onymous Coward ( 97719 ) on Sunday March 28, 2010 @02:40PM (#31649160) Homepage

    Two orders of magnitude is not 20x, it's 100x.

    And for non-intensive applications, that's still fine.

    And SQLite isn't actually that slow anyway []. It's comparable.

  • Re:Article summary (Score:3, Informative)

    by Vancorps ( 746090 ) on Sunday March 28, 2010 @02:55PM (#31649288)
    Hate to reply to my own thread but Power objects was released in 1995 not EOL'd. Oracle actually only recently dropped support.
  • Re:Article summary (Score:3, Informative)

    by K. S. Kyosuke ( 729550 ) on Sunday March 28, 2010 @03:26PM (#31649530)

    SQL has its problems, but its one of the best. That's why it has left its competitors in the dust of time.

    Oh, bullshit. SQL succeeded because it came from IBM, and what comes from IBM must be good by definition...or not? If we're talking about *relational* databases, then SQL is about as good a relational query language as COBOL is a general purpose language. C.J. Date wrote The Third Manifesto [] for a reason.

  • by BitZtream ( 692029 ) on Sunday March 28, 2010 @04:47PM (#31650116)

    If you're worrying about the cost of an Oracle license, what DB you use is irrelevent, you simply aren't large enough to make a wrong choice.

    When you are large enough for this to matter, the cost of Oracle or the cost of a handful of DBAs is the least of your concern.

    It blows my mind how much value slashdot geeks put on the cost of software. You guys have absolutely no fucking clue how much a single employee costs a company excluding salary do you? You've been spending far too much time living in the basement and drooling over free (as in no cost) software to realize that not everyone is broke like you are. Real businesses don't worry about software license costs, they are so trivial in the grand scheme of things. You realize repurchasing all the software on pretty much any workers PC will be paid off in a couple months of their salary? Do you really not have any idea how 'cheap' Oracle is when you get to that scale?

    No, you don't. Clearly.

    Right tool for the right job is correct, and building your own or using someone elses half assed hacked together pile of 'OSS' is generally not the way businesses care to run. They typically want to use software from someone who has some sort of vested interest in the software not sucking ass. Its far less expensive to buy from Oracle than it is to deal with a fincky OSS developer. If you're going to hire your own inhouse developer to maintain it you've instantly spent more than you would have spent just buying some software and you now have none of the advantages of such.

    Stop talking about business reality when you clearly haven't even been in that part of the real world.

  • Re:Article summary (Score:2, Informative)

    by rkit ( 538398 ) on Sunday March 28, 2010 @06:26PM (#31651048) Homepage
    sqlite is extremely slow when writing data. The reason is its implementation of transactions with separate journal files for each transaction. Also, there is only a very basic query optimizer. The main advantage of sqlite is that it does not require administration, certainly not performance.
  • Re:Article summary (Score:4, Informative)

    by batkiwi ( 137781 ) on Sunday March 28, 2010 @07:13PM (#31651380)

    Timestamp in mssql is a misnomer, it's not a timestamp at all. It's more of a binary format concurrency key.

    This doesn't excuse the use of the name by MS, but once you realize that it makes the column useful again.

  • Re:Article summary (Score:4, Informative)

    by Jaime2 ( 824950 ) on Sunday March 28, 2010 @07:24PM (#31651492)
    ... and it was never intended to be. You link to an article stating that MSSQL timestamp isn't compliant with SQL 2003's timestamp definition. However, the first version of MSSQL out after 2003 deprecated the timestamp datatype. MSSQL timestamp is a unique update identifier that was never supposed to be a date/time. Think of it more as a update sequence number. If you want an actual timestamp, it's been there since the product was introduced in the form of the datetime datatype.

    Saying MSSQL doesn't have a proper timestamp is like saying that Oracle doesn't have a proper VARCHAR because Oracle only has a VARCHAR2 data type.
  • Re:Article summary (Score:3, Informative)

    by einhverfr ( 238914 ) <{moc.liamg} {ta} {srevart.sirhc}> on Sunday March 28, 2010 @10:45PM (#31652724) Homepage Journal

    OK, so enlighten us with your brilliance! Share with us the ultimate answer of what should be done to differentiate a null (logically, "I don't know") with a blank string (logically, "We know there's nothing there") and what should be done differently?

    Well, the way PostgreSQL handles it is that a NULL is stored as a NULL and treated as one (i.e. NULL || ' more text' evaluates to NULL). '' is stored as an empty string and processed as one (i.e. '' || ' more text') evaluates to ' more text'

    Really, that strikes me as the correct way to do things (that seems obvious...). Oracle OTOH is braindead in its approach of treating NULLs and empty strings as equivalent.

  • Re:Article summary (Score:4, Informative)

    by Thundersnatch ( 671481 ) on Monday March 29, 2010 @09:06AM (#31655874) Journal

    I'm fairly certain that SQL Server inherited its TIMESTAMP keyword from Sybase, and that usage of TIMESTAMP pre-date SQL-89 and SQL-92 usages of that keyword.

    In short, they can't fix it properly, because it would break a ton of existing (very critical) applications that use the existing Sybase and MSSQL semantics of TIMESTAMP. Microsoft deprecated its usage of TIMESTAMP long ago, but they can't just change it without pissing off a lot of people. Oracle is in the same boat with many of its features that "violate" the ANSI standards.

    It's sort of like bitching about IE6 not supporting CSS2 features. IE6 predated the CSS2 standards ratification. It's actually the fault of those writing the standards: they ignored widely-used software and practices. In this case, they chose to use the TIMESTAMP keyword when something like DATEWITHTIME would have been clearer and would not have collided with anybody.

    In my experience, MSSQL is actually the most ANSI-compliant of the major commercial DBs.

"Atomic batteries to power, turbines to speed." -- Robin, The Boy Wonder