Why Some Devs Can't Wait For NoSQL To Die 444
theodp writes "Ted Dziuba can't wait for NoSQL to die. Developing your app for Google-sized scale, says Dziuba, is a waste of your time. Not to mention there is no way you will get it right. The sooner your company admits this, the sooner you can get down to some real work. If real businesses like Walmart can track all of their data in SQL databases that scale just fine, Dziuba argues, surely your company can, too."
Hardware is cheap. Developers aren't. (Score:5, Interesting)
It's really that simple. A standard dual socket server with the latest CPU's from Intel or AMD can handle hundreds of requests per second; if one isn't enough, just add more hardware, one month of salary can buy you another node, a year can buy you a whole cluster of rackable systems or a chassis full of blades. If it takes a few months extra for a team to solve the problem the NoSQL way, that's a few months of extra salary costs and missed sales.
Slashdot runs on SQL. I run a site of 1M pages daily (1/3-slashdot according to Alexa) with just a single system with 2x Xeon E5420, Django/PostgreSQL at 10% load. Unless you attract enough attention to require scaling past 10M pages a day, you're wasting your time reinventing the wheel with NoSQL, just stick with a standard ORM, launch your site and start convincing customers and generate sales. You can survive a slashdotting just fine without spending so much time on those exotic tools.
Different strokes for different folks... (Score:1, Interesting)
I think this fellow's blog entry sums this up pretty nicely - especially the last paragraph: http://blog.cleverelephant.ca/2010/03/nonosql.html [cleverelephant.ca]
Re:Article summary (Score:5, Interesting)
We're using both - about five days from our "go-live", and things look good. We just use what makes sense for each part of our application.
For us, this means PostreSQL for the parts that must be transactional ACID, and Amazon's S3 and SimpleDB for parts that don't. In practice, for the 1.0 release, this means things like notes, user accounting, and documents are in S3 and SDB. The rest is plain ole SQL.
Not that there wasn't a learning curve with our developers - we're a bunch of old-time enterprise type developers, so "letting go" and moving out of the traditional SQL world took a little thought and proving time. We'll use the first few months to learn more about doing architecture this way.
We've had the language wars - lets avoid the SQL/NOSQL wars please. I'm tired.
Re:Can't wait it to die? (Score:3, Interesting)
Facebook.com, the highest-traffic site on the Internet, serves more than 95% of its data out of memcached. Twitter, Wikipedia, etc are major users too. And of course, Google serves its web index out of memory.
Comment removed (Score:5, Interesting)
Re:Article summary (Score:3, Interesting)
I would also fire anyone who specifies MSSQL - with immediate effect, and no severance pay: On grounds of insubordination, incompetence and reckless endangerment.
So it's a no-go on MSSQL for that Microsoft contract your company just got? Of course, you didn't specify the type of work your company does so this attitude comes across as being rather narrow-minded. And good luck on that no severance pay thing. "I'd fire anyone in my organization who suggested we callously disregard labor laws like that." :)
Re:Article summary (Score:5, Interesting)
I don't have mod points, but I've found the same thing. It's the perfect development database if you think that your program is ever going to need to support Enterprise class stuff. On the small scale, I've found that it's fast enough. Is MySQL faster? Yes, but where I've tested it's not been enough to really matter compared to the other advantages of PostgreSQL. Primarily that it's ACID compliant. What we've found is that it works well until you start getting into databases that are GB in size. But then you can easily port the datatables to DB2 or Oracle and go. Especially if you designed the rest of the software to do this from the get go.
In production, we moved all but one of our databases from MySQL to PostgreSQL. We were having problems with Innodb corrupted once every couple months. When it was announced that Oracle was bidding on Sun, we ported over to PostgreSQL, spent a couple weeks rewriting code, and we've not touched the Postgres database since. It's not corrupted and not even hiccuped once since we deployed. We run regular vacuuming and maintenance and that's it. It's been humming for well over a year and now is getting 400x's the use than we ever had with MySQL.
The only thing that PostgreSQL was lacking has been HA support. There are number of 3rd party tools that run well, PGCluster, Slony, GridSQL, but this looks like PostgreSQL is going to support native replication, clustering, and HA with hot-standby...
Re:Article summary (Score:3, Interesting)
SQL isn't the problem
Yes, it is
Overhead caused by structuring your data the way relational dbs needs.
Lack of flexibility
Scalability capabilities (horizontal scaling is easier)
Speed (see overhead)
Re:Article summary (Score:3, Interesting)
SQL isn't the problem, it's a tool. Bad programmers are the problem.
You could say the same about assembly language. You could also say the same about threads, and dismiss things like functional programming and the actor model as fads.
I'll give you a simple example: Given a big transactional SQL database, if you want it to scale to more than a few machines, you're going to want to shard it. That's going to be a ton of manual work, figuring out what you can shard, what keys to shard it on, adjusting it later on the fly to ensure that each DB server has exactly what it can handle in terms of data and load, and so on. You might be able to write software to do this for you, but that software is going to be fairly tightly coupled to your data model and your app.
It's possible I'm missing something there, and it's possible there's an easier way to do it, but it seems like every way to scale SQL has similar tradeoffs. Put a proxy in front of your DB cluster, giving the impression of a single database out of those shards? Your app is now not talking directly to the database, and certain queries won't be supported, and certain other queries will be slow or unreliable.
The database I'm working with now is Google AppEngine. It's pretty much natively sharded, and the tradeoffs are understood up front -- you can only transact over entities in the same group, but if your app is built up front to define entity groups appropriately, Google can physically shard them for you. It's a similar advantage to using Erlang for concurrency -- you probably won't be running your Erlang app on a machine with several thousand cores, but if you've got several thousand concurrent actors, it will trivially scale to anything in between.
Like Erlang, it's also not a magic bullet. I still use SQL in things like SQLite, because it's the best tool for the job.
I'm Still Fuzzy on NoSQL (Score:5, Interesting)
I'm still fuzzy on what NoSQL is supposed to be and what it is supposed to bring to the table.
From what I've understood, it's basically a common banner for various different databases that all share the common property of not being relational databases and not providing ACID guarantees.
If so, it seems to me that the whole NoSQL vs. RDMBS [wikipedia.org] debate is about a false dichotomy. There are some applications where a relational database is the right tool for the job, and there are some where a relational database is not the right tool for the job. In some of those latter cases, one of the NoSQL databases may be the right thing.
This is nothing new. Non-relational databases have been used on Unix for a long time, and are even a standard part of POSIX (see for example the manpage for dbm_open [opengroup.org]). It's also long been known that, for example, Berkeley DB [oracle.com] can be a lot faster than an RDBMS - as long as your application doesn't make use of all the features an RDBMS provides. Lots of programs even don't use one of these database systems, but invent their own, custom format. Git [git-scm.com] is a very successful example of this.
To me, it seems that what we are seeing here is loads of people who had learned to use relational databases for all their storage needs discovering that there are other ways to store data, and that one of those methods may work better than an RDMBS for a particular application. Well, yes. Does that surprise anyone? It sure doesn't surprise me. Does it mean that RDMBSes are now useless? Not at all. Does it mean you should use a non-relational storage system where this makes more sense? Of course! Now, can we please get back to work? I don't see the point of having a holy war over whether RDBMS or NoSQL is better, when common sense says that they both have their uses.
Re:Article summary (Score:2, Interesting)
... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.
Re:The Article Is Right... And Wrong (Score:1, Interesting)
I've got news for you ... all the major stock exchanges, banks, and telecoms in the world use SQL RDBMSs to track transactions that match or exceed anything Facebook and Twitter are doing. I guarantee you, without a single doubt in my mind, that Facebook and Twitter could be run on a SQL RDBMS ... by that I mean Oracle, not MySQL.
There are times... (Score:3, Interesting)
Our development organization is heavily invested in PostgreSQL, finding it to be perfectly matched to almost all of our needs. It is exceptionally reliable, and is very (but not perfectly) manageable. (We've had issues in the past with mis-timed auto-VACUUM for instance which are now resolved.) We even found a small but significant corner-case bug which upon being reported, received immediate attention from the developers, resulting in a resolution in under 72 hours. I believe our use of this particular tool has saved us significant resources (dollars, developer time) that has allowed the development organization to direct our time and money to our own application development.
But we're finding that even PostgreSQL has limits, mostly with respect to the large and growing datasets our application uses for large scale real time control. We could transition to a really expensive SQL solution, but we are at least considering the choices that may be a better fit for these particular subsystems than PostgreSQL or any other SQL solution. Just a few weeks ago, we started seeing a good comment in teh interWebs... "NoSQL" should mean "not only SQL".
Not a rejection of a powerful toolkit that holds a central role in our organization, but rather a recognition that we would be remiss in our responsibilities if we didn't pay attention to the choices that could simplify our lives as developers.
Re:Article summary (Score:3, Interesting)
Re:Article summary (Score:5, Interesting)
"NoSQL" stuff is fine if your company is simple in structure - very few products/services, and it has to write most of that stuff itself anyway.
When you have many different departments with their own different apps (in house and 3rd party), and they all want to access the same bunch of databases, SQL just becomes the "standard API or language" you use to talk to them. In contrast say you have some custom "NoSQL" DB, it's going to be harder to find stuff that talks to it (you might have to write your own connectors).
It's just like "English", the syntax might be crap, but it's far easier to get 3rd parties and other departments to use it. In contrast if you use Lojban, despite its supposed advantages you're probably going to have to get translators (or worse - train your own translators) whenever you need to deal with outsiders who don't speak it.
Re:Article summary (Score:3, Interesting)
Would it not have been less complex to use PosgreSQL for everything, or was there enough difference to be worth the complexity?
Turns out, yes and no. We're distributed already, so it would have entailed setting up another DB anyway, and all the management infrastructure around that. AWS also seemed like a good fit for things that were essentially document-oriented and it seemed that it would be efficient for this kind of data model.
Yes, it does. (Score:3, Interesting)
Re:Article summary (Score:4, Interesting)
Given that Oracle has a java client and java is supported on OS/2 how did Oracle drop OS/2? Even with 10 and 11g you can still connect from a OS/2 box although I would say your application has some fundamental design flaws if workstations are directly connecting to a database.
Also, some the biggest general ledger applications deployed are running on MS SQL, that includes Great Plains and Navision.
As for Oracle Power Objects you have the same situation, Oracle has another product that achieves the same functionality and more and it evolved into that. Much like Oracle Forms and Reports 10g has no 11g version, Oracle didn't drop support for Forms and Reports services though, they came out with a new product and have a clear and rather easy transition path provided you have a good amount of Oracle infrastructure.
MSSQL timestamp is a really weak argument as well as there is nothing that forces you to use it's timestamp which we'll agree is different from what you get with Oracle, MySQL, and Postgresql. We get around that by converting to strings since we work with multiple platforms. Each of them have serious strengths and of course, serious weaknesses. I personally believe that the only product worthy of such animosity is mysql because the developers clearly knew nothing about databases in it's design. Naturally they even admit that. They learned along the way and have created a flexible product but it has all the problems that Oracle had 20 years ago and the MSSQL had 15 years ago. When you rely on your application for data integrity you will run into problems again and again and again.
Sounds to me like you weren't happy being forced off dying platforms, given how long Oracle extended support for both it seems you were quite stubborn. EOL for Power Objects was in 1995 and support actually ended in 2000. That is one seriously long transition period.
Re:Article summary (Score:2, Interesting)
Re:Article summary (Score:5, Interesting)
... the syntax of PROLOG, for example, seems much simpler, more powerful, and makes more sense to me.
Yeah, wouldn't it be wonderful if instead of all the complex cruft usually needed to find the data you need in that morass, you could just write a prolog expression and let the interpreter resolve it? But when I mention this to Team Leaders, they inevitably look at me like I'm from Mars. They have no idea what prolog is or does. (And I'm actually from a planet much farther away than Mars. ;-)
But when all is said and done, you can get familiar with most of SQL in a couple weeks.
True, perhaps, and I did that years ago. But that doesn't deal with the major problem with SQL: In my experience, every relational database I've ever worked with was in the grips of a set of professional RDB priests, and you didn't do anything in SQL without their blessing. If they didn't approve of what you were trying to do (typically because they couldn't be bothered to listen to you), it wouldn't get done during your lifetime.
So I've learned to cultivate them as an acolyte. I write my "prototype" to use flat files, typically small files full of name:value pairs, sometimes with the name part the file name and the value the contents, and a directory tree of multiply-linked files to classify stuff. I agree with their criticism of this, and say that I'd be happy to convert the code to use their DB when they have the time to help me get those subroutines working right. While they chew on that, I get the project working with the flat files, and get some users using it. When the priest finally face the fact that the project works without their help, they finally deign to help.
But I've never seen them actually get the SQL working to the point that it can supplant the flat files. The parts that do work are always so slow that turning on the "useDB" switch makes it too sluggish to actually use. In some cases, I can get around this by writing "pre-pass" code to extract the common data sets from the DB and write it to flat files, which the interactive software can read through quickly.
It has long seemed to me that SQL and RDBs in general are Good Ideas. But unless we can find a way to end the stranglehold of the DB priesthood in an organization, it's all sorta hopeless for a mere "developer" to even consider jumping into the mess. It's better to just develop stuff that works, and let the DB experts handle the task of porting it to the DB. That way, we developers can keep our hands clean of all the theology, and actually develop stuff that works.
Of course, this is all heresy to the True Believers ...
Re:Article summary (Score:5, Interesting)
Considering that by the time you 'need' Oracle, the price of Oracle is a drop in the bucket.
The only people that ever complain about the price of Oracle are the people who will never have the need to use it because they'll never have the traffic to it to require it.
Sorry you haven't got to play with the big boys, but in general if you spend your time worrying about how much 'software costs' your business sucks. Software costs, even for Oracle, are trivial compared to the other costs that go into it.
An Oracle DB serving internet facing customers for instance is going to cost an order of magnitude more for bandwidth in the first year than the cost of an Oracle license to deal with it.
But you go ahead, keep pretending you have some sort of clue and are witty by pointing out its expensive. If you ever make it to that scale, the last thing on your mind will be the price of an Oracle license.
Comment removed (Score:4, Interesting)
Re:Article summary (Score:4, Interesting)
Timestamp equivalent * Eventually, MS will convert the current timestamp of a unique row number, to an actual date and time. * Use ROWVERSION instead of timestamp. Row version provides the same functionality and the same value as the current timestamp.
MSSQL 2008 and above is fine, and we use timestamps almost to an atomic precision in medical imaging... eventually came right after that post
my 2cents.
--chitlenz
Re:Article summary (Score:3, Interesting)
... were it not for the fact that SQLite is at least two orders of magnitude slower than any other database, including ones written by first year comp sci students.
One of the following two things are missing in your post:
1) A reference to back such a bold claim.
2) A qualifier along the lines of "... with many concurrent writers".
Re:Article summary (Score:1, Interesting)
In general, asking programmers to get things right in a language where "true = !false" is not a true statement is gonna be a difficult proposition. Hence the avoidance of prolog.
Re:Article summary (Score:3, Interesting)
What I find interesting is that one of the biggest users of a BASE[0] non-relational database (a NoSQL database), namely Facebook, who uses Cassandra [1], has created an SQL style query interface named FBQL. The interface includes some rather advanced SQL features like embedded sub-queries in addition to the traditional selecting on joined tables.
Then again, that may be due in large part to the fact that they are using a database schema that is all but identical to a normalized schema used in relation ACID databases, and simply code with the expectation that the database may be inconsistent, so always expect broken references. That is not really the optimal way to use a non-relational database, but it works.
[0] Basically Available, Soft state, Eventual consistency. The somewhat the opposite of ACID.
[1] This can be pretty noticeable at peak hours, when you end up seeing an inconsistent database, one in which you are friends and are not friends with another user at the same time.
Re:Article summary (Score:3, Interesting)
From the immortal words of Joe Celko in response to a similar question you discuss and one of the most true statements ever written:
My SQL program is trying to compete with a flat file system.
If you want to get data to a single user, in a fixed format, you will
lose. The reason we have databases is not speed. Databases are for sharing
data (concurrency control and all that jazz), and keeping data integrity
(normal forms, constraints and all that jazz).
You can get to the ground floor a lot faster by jumping down an empty ...
elevator shaft instead of waiting for the car to arrive. However, there
are trade-offs
--CELKO--
If data has little to no value for you then you do not need a relational database. However, if data is of any importance to you then you have to think beyond a flat file. Flat files, hierarchal databases have been around since the dawn of computing. Relational databases were brought about to solve concurrency and integrity problems inherent in these models not to make your application faster. Like the quote implies jumping out down the elevator shaft is faster then taking the car, but there are trade-offs. I think the better question would be is why does your database design or queries take so much time that flat files are faster when there are just a few users of the system?