Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Databases Programming

Why Some Devs Can't Wait For NoSQL To Die 444

theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."
This discussion has been archived. No new comments can be posted.

Why Some Devs Can't Wait For NoSQL To Die

Comments Filter:
  • by Anonymous Coward on Sunday March 28, 2010 @11:47AM (#31647684)

    It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second; if one isn't enough, just add more hardware, one month of salary can buy you another node, a year can buy you a whole cluster of rackable systems or a chassis full of blades. If it takes a few months extra for a team to solve the problem the NoSQL way, that's a few months of extra salary costs and missed sales.

    Slashdot runs on SQL. I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load. Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales. You can survive a slashdotting just fine without spending so much time on those exotic tools.

  • by Anonymous Coward on Sunday March 28, 2010 @11:50AM (#31647700)

    I think this fellow's blog entry sums this up pretty nicely - especially the last paragraph: http://blog.cleverelephant.ca/2010/03/nonosql.html [cleverelephant.ca]

  • Re:Article summary (Score:5, Interesting)

    by RedMage ( 136286 ) on Sunday March 28, 2010 @11:50AM (#31647704) Homepage

    We're using both - about five days from our "go-live", and things look good. We just use what makes sense for each part of our application.
    For us, this means PostreSQL for the parts that must be transactional ACID, and Amazon's S3 and SimpleDB for parts that don't. In practice, for the 1.0 release, this means things like notes, user accounting, and documents are in S3 and SDB. The rest is plain ole SQL.

    Not that there wasn't a learning curve with our developers - we're a bunch of old-time enterprise type developers, so "letting go" and moving out of the traditional SQL world took a little thought and proving time. We'll use the first few months to learn more about doing architecture this way.

    We've had the language wars - lets avoid the SQL/NOSQL wars please. I'm tired.

  • by Anonymous Coward on Sunday March 28, 2010 @11:54AM (#31647752)

    Facebook.com, the highest-traffic site on the Internet, serves more than 95% of its data out of memcached. Twitter, Wikipedia, etc are major users too. And of course, Google serves its web index out of memory.

  • Comment removed (Score:5, Interesting)

    by account_deleted ( 4530225 ) on Sunday March 28, 2010 @12:20PM (#31647932)
    Comment removed based on user account deletion
  • Re:Article summary (Score:3, Interesting)

    by nacturation ( 646836 ) * <nacturation AT gmail DOT com> on Sunday March 28, 2010 @12:27PM (#31648006) Journal

    I would also fire anyone who specifies MSSQL - with immediate effect, and no severance pay: On grounds of insubordination, incompetence and reckless endangerment.

    So it's a no-go on MSSQL for that Microsoft contract your company just got? Of course, you didn't specify the type of work your company does so this attitude comes across as being rather narrow-minded. And good luck on that no severance pay thing. "I'd fire anyone in my organization who suggested we callously disregard labor laws like that." :)

  • Re:Article summary (Score:5, Interesting)

    by ducomputergeek ( 595742 ) on Sunday March 28, 2010 @12:41PM (#31648112)

    I don't have mod points, but I've found the same thing. It's the perfect development database if you think that your program is ever going to need to support Enterprise class stuff. On the small scale, I've found that it's fast enough. Is MySQL faster? Yes, but where I've tested it's not been enough to really matter compared to the other advantages of PostgreSQL. Primarily that it's ACID compliant. What we've found is that it works well until you start getting into databases that are GB in size. But then you can easily port the datatables to DB2 or Oracle and go. Especially if you designed the rest of the software to do this from the get go.

    In production, we moved all but one of our databases from MySQL to PostgreSQL. We were having problems with Innodb corrupted once every couple months. When it was announced that Oracle was bidding on Sun, we ported over to PostgreSQL, spent a couple weeks rewriting code, and we've not touched the Postgres database since. It's not corrupted and not even hiccuped once since we deployed. We run regular vacuuming and maintenance and that's it. It's been humming for well over a year and now is getting 400x's the use than we ever had with MySQL.

    The only thing that PostgreSQL was lacking has been HA support. There are number of 3rd party tools that run well, PGCluster, Slony, GridSQL, but this looks like PostgreSQL is going to support native replication, clustering, and HA with hot-standby...

  • Re:Article summary (Score:3, Interesting)

    by JamesP ( 688957 ) on Sunday March 28, 2010 @12:51PM (#31648208)

    SQL isn't the problem

    Yes, it is

    Overhead caused by structuring your data the way relational dbs needs.
    Lack of flexibility
    Scalability capabilities (horizontal scaling is easier)
    Speed (see overhead)

  • Re:Article summary (Score:3, Interesting)

    by SanityInAnarchy ( 655584 ) <ninja@slaphack.com> on Sunday March 28, 2010 @12:51PM (#31648212) Journal

    SQL isn't the problem, it's a tool. Bad programmers are the problem.

    You could say the same about assembly language. You could also say the same about threads, and dismiss things like functional programming and the actor model as fads.

    I'll give you a simple example: Given a big transactional SQL database, if you want it to scale to more than a few machines, you're going to want to shard it. That's going to be a ton of manual work, figuring out what you can shard, what keys to shard it on, adjusting it later on the fly to ensure that each DB server has exactly what it can handle in terms of data and load, and so on. You might be able to write software to do this for you, but that software is going to be fairly tightly coupled to your data model and your app.

    It's possible I'm missing something there, and it's possible there's an easier way to do it, but it seems like every way to scale SQL has similar tradeoffs. Put a proxy in front of your DB cluster, giving the impression of a single database out of those shards? Your app is now not talking directly to the database, and certain queries won't be supported, and certain other queries will be slow or unreliable.

    The database I'm working with now is Google AppEngine. It's pretty much natively sharded, and the tradeoffs are understood up front -- you can only transact over entities in the same group, but if your app is built up front to define entity groups appropriately, Google can physically shard them for you. It's a similar advantage to using Erlang for concurrency -- you probably won't be running your Erlang app on a machine with several thousand cores, but if you've got several thousand concurrent actors, it will trivially scale to anything in between.

    Like Erlang, it's also not a magic bullet. I still use SQL in things like SQLite, because it's the best tool for the job.

  • by RAMMS+EIN ( 578166 ) on Sunday March 28, 2010 @12:55PM (#31648234) Homepage Journal

    I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.

    From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.

    If so, it seems to me that the whole NoSQL vs. RDMBS [wikipedia.org] debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.

    This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open [opengroup.org]). It's also long been known that, for example, Berkeley DB [oracle.com] can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git [git-scm.com] is a very successful example of this.

    To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.

  • Re:Article summary (Score:2, Interesting)

    by Anonymous Coward on Sunday March 28, 2010 @01:01PM (#31648270)

    ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.

  • by Anonymous Coward on Sunday March 28, 2010 @01:08PM (#31648326)

    I've got news for you ... all the major stock exchanges, banks, and telecoms in the world use SQL RDBMSs to track transactions that match or exceed anything Facebook and Twitter are doing. I guarantee you, without a single doubt in my mind, that Facebook and Twitter could be run on a SQL RDBMS ... by that I mean Oracle, not MySQL.

  • There are times... (Score:3, Interesting)

    by lenski ( 96498 ) on Sunday March 28, 2010 @01:08PM (#31648328)

    Our development organization is heavily invested in PostgreSQL, finding it to be perfectly matched to almost all of our needs. It is exceptionally reliable, and is very (but not perfectly) manageable. (We've had issues in the past with mis-timed auto-VACUUM for instance which are now resolved.) We even found a small but significant corner-case bug which upon being reported, received immediate attention from the developers, resulting in a resolution in under 72 hours. I believe our use of this particular tool has saved us significant resources (dollars, developer time) that has allowed the development organization to direct our time and money to our own application development.

    But we're finding that even PostgreSQL has limits, mostly with respect to the large and growing datasets our application uses for large scale real time control. We could transition to a really expensive SQL solution, but we are at least considering the choices that may be a better fit for these particular subsystems than PostgreSQL or any other SQL solution. Just a few weeks ago, we started seeing a good comment in teh interWebs... "NoSQL" should mean "not only SQL".

    Not a rejection of a powerful toolkit that holds a central role in our organization, but rather a recognition that we would be remiss in our responsibilities if we didn't pay attention to the choices that could simplify our lives as developers.

  • Re:Article summary (Score:3, Interesting)

    by TheLink ( 130905 ) on Sunday March 28, 2010 @01:34PM (#31648562) Journal
    Just curious - in what way can't MSSQL handle timestamps properly
  • Re:Article summary (Score:5, Interesting)

    by TheLink ( 130905 ) on Sunday March 28, 2010 @01:53PM (#31648732) Journal
    The syntax might be crap, but it's far easier to get everyone to standardize on SQL to talk to DBs.

    "NoSQL" stuff is fine if your company is simple in structure - very few products/services, and it has to write most of that stuff itself anyway.

    When you have many different departments with their own different apps (in house and 3rd party), and they all want to access the same bunch of databases, SQL just becomes the "standard API or language" you use to talk to them. In contrast say you have some custom "NoSQL" DB, it's going to be harder to find stuff that talks to it (you might have to write your own connectors).

    It's just like "English", the syntax might be crap, but it's far easier to get 3rd parties and other departments to use it. In contrast if you use Lojban, despite its supposed advantages you're probably going to have to get translators (or worse - train your own translators) whenever you need to deal with outsiders who don't speak it.
  • Re:Article summary (Score:3, Interesting)

    by RedMage ( 136286 ) on Sunday March 28, 2010 @01:59PM (#31648792) Homepage

    Would it not have been less complex to use PosgreSQL for everything, or was there enough difference to be worth the complexity?

    Turns out, yes and no. We're distributed already, so it would have entailed setting up another DB anyway, and all the management infrastructure around that. AWS also seemed like a good fit for things that were essentially document-oriented and it seemed that it would be efficient for this kind of data model.

  • Yes, it does. (Score:3, Interesting)

    by gbutler69 ( 910166 ) on Sunday March 28, 2010 @02:25PM (#31649026) Homepage
    Each transaction of those 200,000,000 for WalMart is a fairly significant source of revenue. Averaging on the order of $50.00 to $100.00 per transaction. That same 200,000,000 transactions for a web application would average like $ 0.03 (yes, 3 cents). Now, if the cost per transaction using tradition RDBMS is something like $ 0.25 (25 cents), how is that going to work for the Web case? What if the cost is $ 0.01? Still epic fail for the web case.
  • Re:Article summary (Score:4, Interesting)

    by Vancorps ( 746090 ) on Sunday March 28, 2010 @02:50PM (#31649240)

    Given that Oracle has a java client and java is supported on OS/2 how did Oracle drop OS/2? Even with 10 and 11g you can still connect from a OS/2 box although I would say your application has some fundamental design flaws if workstations are directly connecting to a database.

    Also, some the biggest general ledger applications deployed are running on MS SQL, that includes Great Plains and Navision.

    As for Oracle Power Objects you have the same situation, Oracle has another product that achieves the same functionality and more and it evolved into that. Much like Oracle Forms and Reports 10g has no 11g version, Oracle didn't drop support for Forms and Reports services though, they came out with a new product and have a clear and rather easy transition path provided you have a good amount of Oracle infrastructure.

    MSSQL timestamp is a really weak argument as well as there is nothing that forces you to use it's timestamp which we'll agree is different from what you get with Oracle, MySQL, and Postgresql. We get around that by converting to strings since we work with multiple platforms. Each of them have serious strengths and of course, serious weaknesses. I personally believe that the only product worthy of such animosity is mysql because the developers clearly knew nothing about databases in it's design. Naturally they even admit that. They learned along the way and have created a flexible product but it has all the problems that Oracle had 20 years ago and the MSSQL had 15 years ago. When you rely on your application for data integrity you will run into problems again and again and again.

    Sounds to me like you weren't happy being forced off dying platforms, given how long Oracle extended support for both it seems you were quite stubborn. EOL for Power Objects was in 1995 and support actually ended in 2000. That is one seriously long transition period.

  • Re:Article summary (Score:2, Interesting)

    by BluenoseJake ( 944685 ) on Sunday March 28, 2010 @03:13PM (#31649438)
    Have you used SQL Server? I thought not.
  • Re:Article summary (Score:5, Interesting)

    by jc42 ( 318812 ) on Sunday March 28, 2010 @03:43PM (#31649648) Homepage Journal

    ... the syntax of PROLOG, for example, seems much simpler, more powerful, and makes more sense to me.

    Yeah, wouldn't it be wonderful if instead of all the complex cruft usually needed to find the data you need in that morass, you could just write a prolog expression and let the interpreter resolve it? But when I mention this to Team Leaders, they inevitably look at me like I'm from Mars. They have no idea what prolog is or does. (And I'm actually from a planet much farther away than Mars. ;-)

    But when all is said and done, you can get familiar with most of SQL in a couple weeks.

    True, perhaps, and I did that years ago. But that doesn't deal with the major problem with SQL: In my experience, every relational database I've ever worked with was in the grips of a set of professional RDB priests, and you didn't do anything in SQL without their blessing. If they didn't approve of what you were trying to do (typically because they couldn't be bothered to listen to you), it wouldn't get done during your lifetime.

    So I've learned to cultivate them as an acolyte. I write my "prototype" to use flat files, typically small files full of name:value pairs, sometimes with the name part the file name and the value the contents, and a directory tree of multiply-linked files to classify stuff. I agree with their criticism of this, and say that I'd be happy to convert the code to use their DB when they have the time to help me get those subroutines working right. While they chew on that, I get the project working with the flat files, and get some users using it. When the priest finally face the fact that the project works without their help, they finally deign to help.

    But I've never seen them actually get the SQL working to the point that it can supplant the flat files. The parts that do work are always so slow that turning on the "useDB" switch makes it too sluggish to actually use. In some cases, I can get around this by writing "pre-pass" code to extract the common data sets from the DB and write it to flat files, which the interactive software can read through quickly.

    It has long seemed to me that SQL and RDBs in general are Good Ideas. But unless we can find a way to end the stranglehold of the DB priesthood in an organization, it's all sorta hopeless for a mere "developer" to even consider jumping into the mess. It's better to just develop stuff that works, and let the DB experts handle the task of porting it to the DB. That way, we developers can keep our hands clean of all the theology, and actually develop stuff that works.

    Of course, this is all heresy to the True Believers ...

  • Re:Article summary (Score:5, Interesting)

    by BitZtream ( 692029 ) on Sunday March 28, 2010 @04:05PM (#31649838)

    Considering that by the time you 'need' Oracle, the price of Oracle is a drop in the bucket.

    The only people that ever complain about the price of Oracle are the people who will never have the need to use it because they'll never have the traffic to it to require it.

    Sorry you haven't got to play with the big boys, but in general if you spend your time worrying about how much 'software costs' your business sucks. Software costs, even for Oracle, are trivial compared to the other costs that go into it.

    An Oracle DB serving internet facing customers for instance is going to cost an order of magnitude more for bandwidth in the first year than the cost of an Oracle license to deal with it.

    But you go ahead, keep pretending you have some sort of clue and are witty by pointing out its expensive. If you ever make it to that scale, the last thing on your mind will be the price of an Oracle license.

  • Comment removed (Score:4, Interesting)

    by account_deleted ( 4530225 ) on Sunday March 28, 2010 @05:14PM (#31650364)
    Comment removed based on user account deletion
  • Re:Article summary (Score:4, Interesting)

    by Chitlenz ( 184283 ) <chitlenz@chitleFREEBSDnz.com minus bsd> on Sunday March 28, 2010 @08:41PM (#31651966) Homepage
    Ummm FTFL?

    Timestamp equivalent * Eventually, MS will convert the current timestamp of a unique row number, to an actual date and time. * Use ROWVERSION instead of timestamp. Row version provides the same functionality and the same value as the current timestamp.


    MSSQL 2008 and above is fine, and we use timestamps almost to an atomic precision in medical imaging... eventually came right after that post ... in 2007. SQL Server Vs. Oracle/MySQL is the only fight worth wasting time on. Here's the thing about RDBMS. Not only has it been the standard for 20 years, virtually assuring their own persistence because by very nature they grow.. a LOT, but it is one of the few standards that actually has a solid foundation. You see, in this age of marketing driven products, there are still a few things out there quietly running the world. And I assure you it's not XML pages.

    my 2cents.

    --chitlenz
  • Re:Article summary (Score:3, Interesting)

    by shutdown -p now ( 807394 ) on Sunday March 28, 2010 @09:18PM (#31652182) Journal

    ... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.

    One of the following two things are missing in your post:

    1) A reference to back such a bold claim.

    2) A qualifier along the lines of "... with many concurrent writers".

  • Re:Article summary (Score:1, Interesting)

    by Anonymous Coward on Sunday March 28, 2010 @10:18PM (#31652578)

    In general, asking programmers to get things right in a language where "true = !false" is not a true statement is gonna be a difficult proposition. Hence the avoidance of prolog.

  • Re:Article summary (Score:3, Interesting)

    by Tacvek ( 948259 ) on Sunday March 28, 2010 @11:24PM (#31652880) Journal

    What I find interesting is that one of the biggest users of a BASE[0] non-relational database (a NoSQL database), namely Facebook, who uses Cassandra [1], has created an SQL style query interface named FBQL. The interface includes some rather advanced SQL features like embedded sub-queries in addition to the traditional selecting on joined tables.

    Then again, that may be due in large part to the fact that they are using a database schema that is all but identical to a normalized schema used in relation ACID databases, and simply code with the expectation that the database may be inconsistent, so always expect broken references. That is not really the optimal way to use a non-relational database, but it works.

    [0] Basically Available, Soft state, Eventual consistency. The somewhat the opposite of ACID.

    [1] This can be pretty noticeable at peak hours, when you end up seeing an inconsistent database, one in which you are friends and are not friends with another user at the same time.

  • Re:Article summary (Score:3, Interesting)

    by Hangtime ( 19526 ) on Sunday March 28, 2010 @11:34PM (#31652922) Homepage

    From the immortal words of Joe Celko in response to a similar question you discuss and one of the most true statements ever written:

    My SQL program is trying to compete with a flat file system.

    If you want to get data to a single user, in a fixed format, you will
    lose. The reason we have databases is not speed. Databases are for sharing
    data (concurrency control and all that jazz), and keeping data integrity
    (normal forms, constraints and all that jazz).

    You can get to the ground floor a lot faster by jumping down an empty
    elevator shaft instead of waiting for the car to arrive. However, there
    are trade-offs ...
    --CELKO--

    If data has little to no value for you then you do not need a relational database. However, if data is of any importance to you then you have to think beyond a flat file. Flat files, hierarchal databases have been around since the dawn of computing. Relational databases were brought about to solve concurrency and integrity problems inherent in these models not to make your application faster. Like the quote implies jumping out down the elevator shaft is faster then taking the car, but there are trade-offs. I think the better question would be is why does your database design or queries take so much time that flat files are faster when there are just a few users of the system?

To the systems programmer, users and applications serve only to provide a test load.

Working...