PostgreSQL Outperforms MongoDB In New Round of Tests 147
New submitter RaDag writes: PostgreSQL outperformed MongoDB, the leading document database and NoSQL-only solution provider, on larger workloads than initial performance benchmarks. Performance benchmarks conducted by EnterpriseDB, which released the framework for public scrutiny on GitHub, showed PostgreSQL outperformed MongoDB in selecting, loading and inserting complex document data in key workloads involving 50 million records. This gives developers the freedom to combine structured and unstructured data in a single database with ACID compliance and relational capabilities.
It doesn't matter (Score:5, Funny)
Because Postgres isn't web-scale. I want web-scale.
Re:It doesn't matter (Score:4, Funny)
"Web-scale" is "big enough to hold a Wordpress database"?
Re:It doesn't matter (Score:5, Informative)
the linux kernel doesn't even have to load it into RAM, it goes from disk to network directly.
Oh really? so your network card is a bus master and can initiate transfers from other peripherals without using DMA?
I assure you, the Linux kernel still loads the file into RAM. Your RAM is fast compared to SATA or Ethernet, it's an excellent staging ground for such transfers. But obviously you don't need to load the entire file before you start sending it out, there are tricks that let the kernel deal with it by tracking the DMA status of the ethernet card and using memory mapped files.
Re: (Score:2)
Probably Windows uses a Windows equivalent of sendfile() to send the file over SMB.
Without sendfile () you'll be context-switching between kernel and userspace and probably copying data between them as well.
With sendfile () you have an open socket and you tell the kernel to send a file over that socket. No more copying of data and no context-switches.
That is probably why it uses very little CPU.
Re: (Score:2)
Sendfile() (and mmap()) are, indeed, the proper interfaces for sending an entire file out. Unfortunately, they aren't particularly well-used — the speed of RAM on modern computers usually makes any wins from funky APIs not worth the effort. It does not help, that various OSes have a completely different call named "sendfile"...
Even Varnish — which mos
Re: (Score:2)
Re: (Score:1)
PostgreSQL does not lock unless you are about to write data. It uses snapshots, which is just as fast as reading a table without locks.
Re: (Score:2)
Still goes into RAM (Score:3)
Even when using sendfile() in linux, the disk does a DMA transfer into RAM, and the NIC does a DMA transfer out of RAM.
While the CPU is not involved in copying the data, it still goes into RAM.
Re:It doesn't matter (Score:5, Informative)
Just adding to what the others have stated: RAM speed is in the vicinty of a million times HDD speed. You won't notice a file going to RAM before being sent to the network interface. Doing all of the kludgework for this to happen (if possible!) would be for a negligible gain.
Re: (Score:1)
I hope you don't do this stuff for a living.
Re:It doesn't matter (Score:4, Informative)
I was about to get wooshed, and post something about how "web-scale" isn't any kind of meaningful standard, but then I thought better of it, looked it up, and found it's MongoDB's tagline.
I can only think of one database that isn't "webscale", and that's TinySQL, which I still use for personal web projects regardless.
Re:It doesn't matter (Score:5, Funny)
I'm afraid you were still semi-wooshed. I was actually making a reference to this. [youtube.com]
Re: (Score:2)
Re: (Score:2)
This is my favourite bit:
Re:It doesn't matter (Score:5, Funny)
I can only think of one database that isn't "webscale", and that's TinySQL, which I still use for personal web projects regardless.
I hadn't heard of TinySQL, so I just Googled it. From http://sourceforge.net/project... [sourceforge.net]
> tinySQL is a SQL engine written in Java.
Is the name meant to be ironic or something?
Re: It doesn't matter (Score:2)
I did pretty much the same. Looked it up and suddenly the word "oxymoron" came to mind.
Re:It doesn't matter (Score:4, Funny)
It's a reference to how much data it can hold. The rest of the FS is taken up by the swap file.
Re:It doesn't matter (Score:5, Funny)
"I can only think of one database that isn't "webscale", and that's TinySQL, which I still use for personal web projects regardless."
You've just made all those MS Access developers cry.....
Re: (Score:3)
This comment just made me shudder with the painful memories of writing web applications in Cold Fusion that used ADO to talk to MS Access files as the DB backend in like 1998. Ugh.
Re: (Score:3)
This comment just made me shudder with the painful memories of writing web applications in Cold Fusion that used ADO to talk to MS Access files as the DB backend in like 1998. Ugh.
You think that's bad? I was doing this in 2008.
(First task was to get them the fuck out of access and onto an actual SQL server.)
Re: (Score:1)
Statement no one is going to argue with to justify myself:
Access isn't a database, and access users aren't developers.
Re: (Score:2)
I'm an Excel user, you insensitive clod!
You forgot SQLite (Score:2)
I really, really hope nobody is using SQLite for a production web database, but sadly I know somebody probably is.
Re: (Score:2)
I was considering it, actually — for when the web-server and the DB must be on the same host for budgetary reasons.
What's wrong with it?
Re: (Score:2)
Re: (Score:2)
But let's not change the subject — nor answer a question with a question...
Re: (Score:3)
The performance degrades (or at least, doesn't scale well) once you have multiple processes accessing the same database, as you would be on a web server. It's a great tool, don't get me wrong, and I can definitely see the use case for a test environment. But even on a single-server system, you're better off with an actual database process.
Re: (Score:2)
It reminds me of msql (miniSQL) that was single threaded so a long request would make the others wait. In another topic, MS-SQL server login was single threaded back in 2000 although the server could handle parallel requests I think. I assume it must have been fixed by now. You could deny all other users from login just by telnetting to the MS SQL port and sit there, put into a loop when the connection timed out and almost nobody could login to the server.;-)
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
It seems the real use for SQLite (besides teaching) is for cases where you simply don't have multiple processes accessing the same database, or if they do occasionally, performance isn't a big concern. One big example I can think of is storing configuration variables for applications. That's not something you want to have a full-blown database like PostgreSQL running for, but it can be handy to use SQLite so you have more power than you'd get with flat files.
Re: (Score:2)
Oh, exactly. I used SQLite for a game database - RPGs have a lot of stats and such, and SQLite was a million times faster than the hand-rolled CSV parser I was using. And I love how focused it is on reliability and correctness and standards compliance. It's just not built for certain things - it will work as a web database, and one of the frameworks I use even ships with it as a testing option, but it's not a good pick for production use.
Re:It doesn't matter (Score:5, Informative)
Ah, memories. That had us rolling on the floor at my office at the time.
For those who missed it, or want to relive it: http://www.youtube.com/watch?v... [youtube.com]
Re: (Score:2)
Yea, that was great, but Redis did actually turn into a useful tool compared to Memcached.
Re: (Score:2)
Obligatory reference [youtube.com]. Makes me laugh every time.
Re: (Score:2)
/Oblg. Mongo DB Is Web Scale [youtube.com] :-)
Re: (Score:2)
Shard your data between servers and have your middle layer connect to the one that holds the user's data.
Which becomes difficult when a user in one shard and a user in another shard want to perform a transaction together.
Re: (Score:1)
NoSQL does not allow relationships between records anyway. The JOIN goes out the window. You have to build that relationship yourself in code. With sharding, you have more options, but the underlying problem is the same. You can still JOIN within the same shard, which for most schemas, is just fine. For example, a company account will have all of their employees within the same shard, so employee JOINs work. Client accounts do not relate to one another, so there is no need to JOIN between them, unless
Re: (Score:2)
Which becomes difficult when a user in one shard and a user in another shard want to perform a transaction together.
Sounds like miscegenation to me.
Re: (Score:2)
I'd say just to be safe, you should be using 3- or 4-phase commit.
Re:It doesn't matter (Score:4, Funny)
I'd say just to be safe, you should be using 3- or 4-phase commit.
Oh yeah? Well my database goes up to 11-phase commit.
Re: (Score:2)
How do you make that atomic?
Bejeweled (Score:2)
Who cares (Score:1)
Is it web-scale?
"Small" amount of data (Score:5, Interesting)
I am confused. If they are testing the performance of ACID and BASE database systems, why did they use a data load that can easily fit on a single computer? The data size for both databases was under 150 GB which can easily sit on a single hard drive let alone a single server. Why would a BASE database have any edge over an ACID one for a data set that does not require distribution between multiple servers?
It is still important to see how much faster a more established DBMS is than a relative newcomer for smaller loads, but I still feel this comparison is a bit lacking.
Re: (Score:2, Interesting)
Also, the point of scaling databases horizontally isn't just being able to distribute large amounts of data, but to distribute large amounts of queries.
I think a lot of people here like to point and laugh at NoSQL and particularly Mongo because they fail to consider the aspect of distributing QUERIES, which is where NoSQL shines compared to *SQL.
Yes, you can scale sql horizontally, but anyone who doesn't have a deep understanding of the internals of the *SQL engine of choice and has tried to scale them hori
Re: (Score:3, Interesting)
So, setup replication, yeah it's little more work to do with Postgres than with Mongo, but it's not hard by any means nowadays and you get more flexibility and performance and queries that can actually do things, seem like good deal to me.
Re: (Score:3)
Actually, the queries in NoSQL document databases are frequently more useful. For example, the atomic FindOneAndModify() search, which can query any set of data--including array values. You can have data that has { PhoneNumber: [5559992332, 5551112234, 5552201212] } and FindOneAndModify({PhoneNumber: 5551112234}, { $pull {PhoneNumber: 5551112234}} ) and delete that specific element from the array.
Re: (Score:1)
And you can't do that in Postgres? Its basically just this no? (PhoneNumber being array column):
UPDATE table SET PhoneNumber = array_remove(PhoneNumber, 5551112234) WHERE 5551112234 = ANY(PhoneNumber) ?
Re: (Score:3)
Or even:
UPDATE table tablename SET PhoneNumber = PhoneNumber - {5551112234}
Re:"Small" amount of data (Score:4, Insightful)
Re: (Score:2)
Please design a test (Score:2)
And then let it be reviewed.
Re:"Small" amount of data (Score:5, Informative)
Have you used a version of PostgreSQL that is not 10 years old? The vacuum process performs some necessary work asynchronously from your transaction, so that you can have higher concurrency and scalablity. The modern autovacuum does not have locking problems.
Re:"Small" amount of data (Score:4, Informative)
Locking up tables for over 30 minutes when they haven't even been updated
There is no vacuuming on tables that have no updates.
Re: (Score:2)
Re: (Score:1)
Last time this was true was more than 10y ago...
Re:"Small" amount of data (Score:4, Informative)
Except MongoDB can't do any better!
"Important notes on compacting:
"
"This operation blocks all other database activity when running and should be
"used only when downtime for your database is acceptable. If you are running
"a replica set, you can perform compaction on secondaries in order to avoid
"blocking the primary and use failover to make the primary a secondary before
"compacting it.
Source: http://blog.mongolab.com/2014/01/managing-disk-space-in-mongodb/
The solution in both cases is to mirror your data. I don't know if this particular problem has been solved theoretically, but in any event neither support a compacting vacuum without some write locking. And both support the same workaround.
NoSQL databases aren't magic. Magic would be asynchronous multi-master replication while maintaining relational integrity, and without any strings attached. No product can do this. Every product has a story, but they either tweak the problem or spin their solution. But that's the state of the art at the moment. We just aren't there, yet.
If you read the fine print, rather than worshipping with the cargo cultists, you'd know this. Like transactional memory (e.g. lockless data structures), the database field is just filled with misinformation and unrealistic expectations. Developers are clueless. Mostly because no matter what they use, it's going to be fast enough for them. So they're never forced to face their assumptions.
Re: (Score:2)
Interestingly, you can get close to "asynchronous multi-master replication while maintaining relational integrity" using using NetIQ eDirectory. I know a number of huge applications that use it in this way. It's not truly ACID but the multi-master replication is unmatched.
The stress-testing wasn't needed (Score:2)
Re:The stress-testing wasn't needed (Score:5, Insightful)
I've worked extensively on both kinds of systems over the past decade. Under a particular workload that is exactly what an RDBMS is designed for, an RDBMS has the best performance? Wow, who would have bet on that one?
Then again, I've had workloads (my go-to example is writing several billion records in a matter of hours for statistical analysis, with live intermediate results) where a NoSQL solution had the best performance.
NoSQL isn't some rebellion against traditional databases. Engineering isn't a contest. Rather, NoSQL, column-stores, distributed warehousing, or any other term you'd like to throw out all just point to an additional option for how to manage your data. Pick the right choice for your project, and use it. Don't worry about "web-scale" or "ACID compliance" talking points unless your project needs them. For the past few decades, we've been forced into the assumption that data must perfectly normalized, arranged in tables, and must be queried as relations. For some projects, massaging the data into that form will damage your performance far more than your database engine ever will, so a different engine makes a better choice.
Stop listening to hype, deserved or not, and use the right tool for the job.
Re:The stress-testing wasn't needed (Score:4, Funny)
"Engineering isn't a contest."
But...but...I have a hammer. I KNOW how to use a hammer. A hammer is the best! FUCK YOU SCREW! TAKE THAT SCREW! Job complete!
Re: (Score:3)
On Slashdot, it's all about people with deeply held irrational opinions who make unsupportable claims. Additionally, they express themselves in rants and slander, and rational discourse is considered a sign of weakness. If you want to find people who think, you're in the wrong place.
Re: (Score:2)
Then again, I've had workloads (my go-to example is writing several billion records in a matter of hours for statistical analysis, with live intermediate results) where a NoSQL solution had the best performance.
Was that even transactional? That almost sounds like a use case for Kx or similar stuff.
Re: (Score:2)
Was that even transactional?
Nope.
The live results were restricted to simple queries (mostly for internal reporting), but the workload to be performed ultimately was more straightforward map-reduce algorithms. Everything (live and final) was statistics, so missing a few records due to lacking ACID wasn't ever a real concern.
That almost sounds like a use case for Kx or similar stuff.
It's been a while, but I believe Kx was considered during the initial research phase. Ultimately Hadoop and HBase fit our needs better.
I dipped my toe in MongoDB (Score:5, Interesting)
But the real problem with MongoDB was that nearly everything, while appearing simple, required a google search to figure out how to do it. A mark of a very well designed API is that you soon start guessing the commands and your guesses are really close or right on. But with MongoDB I found that nothing really made sense. Only after carefully crafted "debate team" arguments could any unusual aspect of MongoDB defend itself. Whereas redis is the opposite, it just works. Or even simpler systems like Memcache, that couldn't be simpler, when read the API for either of those they just made sense. There is no layer upon layer upon layer of complexity. It is data goes in, and data comes out.
In fact redis would be a good example of ease of use mixed with advanced capabilities. The basic commands are things like get, append, save, while more advanced commands are more esoteric such as PEXPIREAT which has to do with timestamp expiries. So you can happily use redis like a simple minded fool and it is wonderful. Or you can dig in deeper and only mildly shake your head at some of the command names. But with MongoDB it is just a pain in the ass from the first moment you truly have even vaguely complicated data.
But back to PostgresSQL. The JSON related features are mildly complex but appear to be solving the most common problems. Also by using PostgresSQL it solves the entire debate of relational vs NoSQL. Use PostgresSQL and you can just do both without giving it a second thought. And I for one can certainly say that I have data that demands NoSQL and I have other data that demands relational; all in the same project. But oddly enough the technique that I use is MariaDB for the relational and redis for everything else. This is ideal for me as the relational data is very simple and won't need to scale much whereas the redis stuff needs to run at rocket speeds and will be the first to scale to many machines.
But as for MongoDB, it has been deleted from all machines, development and deployment and will never be revisited regardless of this weeks propaganda.
Re: (Score:2)
what are the MEAN hipster coders going to do now? (Score:1)
MEAN (MongoDB, Express, Angular.js, Node.js)
Re: (Score:3, Insightful)
Anything but read a computer science textbook
Re: (Score:1)
Launch more instances?
what are the MEAN hipster coders going to do now? (Score:1)
The tipping point (Score:4, Informative)
As soon as you need more than one machine to host your database (which usually happens around 1000 active users on your website at any given time, depending on your application), consider switching off of an SQL database.
The tipping point (Score:1)
same applies for either disconnected operations (Couch) or multi-data center deployment for HA.
Re:The tipping point (Score:4, Informative)
The tipping point (Score:3)
Re: (Score:3)
Have you seen Postgres-XC? It's pretty much built for this purpose.
http://postgresxc.wikia.com/wi... [wikia.com]
Re: (Score:2)
No idea. Hadn't seen XL before.
I saw that EnterpriseDB offered to support XC though. http://www.enterprisedb.com/se... [enterprisedb.com]
Re: (Score:2)
Although, on further examination it looks like it is structured almost EXACTLY the same.
Re:The tipping point (Score:4, Informative)
Postgres-XL's main focus is OLAP workloads. Buzzword: Big Data.
They both borrow code from each other. At some time in the future it's entirely possible that the two will merge.
Re: (Score:2)
Re: (Score:3)
If you have a single machine, then Oracle is the best performing database, followed by Postgres. When you need more than 4 dedicated servers hosting a database, then mongo can handle about 180% of the volume that oracle can, and about 220% the volume of postgres, and about 110% the volume of Casandra.
This, this, a million times this. A recent employer needed to be able to sustain 250,000 inserts per second. Not 24/7, mind you, but at random prolonged intervals throughout the day. The "PostgreSQL is the fast" chart shows it handling 10,600 bulk load operations per second or 1,700 individual inserts per second. That would be about 1/150th of the insert load we needed to handle.
I'm a huge fan of PostgreSQL - when it's appropriate. If you need strong relational and consistency guarantees, there's nothing I'
Re: (Score:1)
The thing is, excluding minor variations in ACID implementations and guarantees, you are limited by your storage hardware regardless of what technology you use. Cassandra was able to outperform PostgreSQL because you compared many servers running Cassandra against one server running PostgreSQL. Cassandra makes data access between many servers easy (once you get used to its specialized API), but you could have done the same on multiple servers with their own PostgreSQL server by sharding your data among th
Re: (Score:2)
you are limited by your storage hardware regardless of what technology you use.
Well, right, but I think we set our expectations too low in some cases. For example, the data item {"key": "foo", "value": "bar"} serializes to 30 bytes of JSON. With a few bytes to act as record separators, a hard drive with a 100MB/s write speed should be table of recording about 3,000,000 items per second. There's a lot more overhead than that, of course! But in the document we're discussing, PostgreSQL was averaging about 1,700 inserts per second, or about 170,000 times slower than the hypothetical maxi
Re: (Score:2)
The best one. Ah. (Score:2, Interesting)
"MongoDB, the leading document database and NoSQL-only solution provider,"
According to who?
What happened to all the rest of them, like CouchBase or Riak?
I will admit bias, though. I like my db's eventually consistent.
No news! (Score:3)
Re: (Score:2)
Actually, they didn't. The benchmarking was done as the JSONB feature in Pg is brand new and they wanted to see how it stood up to the competition (being much much slower is a sign that something could be improved).
Being faster on a single node was a surprise.
Now, if only they'd come up with... (Score:2)
Re: (Score:1)
...Only Pawn in game of life (Score:1)
Candygram for Mongo!
Not surprising... (Score:5, Interesting)
... because of the way MongoDB actually stores records and parses them. It is more or less a simple tree or linked list, and hence doing almost anything involves decending branches to the leaves. This is horrendously inefficient in many contexts, while still being perfectly lovely in others. Just doing a match, though, can involve a non-polynomial time search. Maybe they've improved this from when I was trying to use Mongo to drive modelling, but I doubt it as it would have involved substantially changing the way the data is actually stored and dereferenced. I had to cheat substantially in order to get anything like decent performance, and any of the SQLs outperformed it handily.
Note well that it was strictly a scaling issue. For small trees and DBs, it probably works well enough. For large DBs with millions of records and substantial structure, it is like molasses. Only worse.
rgb
I've been in plenty of places (Score:2)
Could that chart suck more? (Score:3)
Postgres the best NoSQL DB (Score:2)
Of course there are
Re: (Score:1)
It is not FUD. They add a compatibility layer for a certain proprietary database vendor who charges an arm and a leg for functionality that PostgreSQL gives you for free.
Re: (Score:1)
Which proprietary database vendor would that be?
Re: (Score:1)
Ok, then I am confused by why they call themselves "The Postgres Database Company" Do you mean a compatibility layer for Oracle ? Oracle has their own NoSQL database which is not Mongo.
Re:Replication anyone? (Score:5, Funny)
Can PostgreSQL do replication? Not really.
That's news to me, I guess the data on our read servers just magically appear and what more magically appear to be the same data we need there.
Re: (Score:2)
Any sufficiently advanced technology is indistinguishable from magic.
- Arthur C. Clarke.
I guess MongoDB's replication just isn't advanced enough.
Re: (Score:2)
I see what you did there...
Re: (Score:3)
Can MongoDB do master-master replication? Oh, it's can't, and really only CouchDB does in the NoSQL space? Oh, that's too bad. Of course, most of us don't NEED M-M replication, as it introduces serious issues with reliability (oh I wrote the client record to server A and then queried server B on the next page load and it didn't exist yet -> Null Exception #AWESOME!) and is only useful for backups/reporting/import/export scenarios. The rest of us who actually want to GET WORK DONE will probably continue w