PostgreSQL vs. MySQL comparison 390
prostoalex writes "Ever find yourself wondering which open source database is the best tool for the job? Well, wonder no more, and let your tax dollars do the work in the form of Fermi National Accelerator Laboratory publishing this unbiased review of MySQL vs. PostgreSQL. After reading it, however, it seems that MySQL ranks the same or better on most of the accounts." My poor sleepy eyes misread the date of posting on here; caveat that this is more then 15 months old.
Foreign Keys (Score:2, Insightful)
From TFA:
Foreign keys are nice, I have to say; I implement them in mysql anyway, in spite of the fact that they're ignored for MyISAM.
Re:Foreign Keys (Score:5, Insightful)
I want to see more databases - Firebird, Derby (Score:2, Insightful)
The most important factor to me in any comparison is the licensing agreement. I like a very open agreement and the MySQL license requires you to release the source code to your product in some cases, or you have to purchase a license from them.
Comment removed (Score:3, Insightful)
I'd rather (Score:3, Insightful)
Re:MySQL is ridiculously easy to configure (Score:5, Insightful)
Doesn't mean a thing and this is why ...... (Score:3, Insightful)
What about clustering? (Score:2, Insightful)
MySQL speed will really depend on the database engine you use (MyISAM or InnoDB do not perform the same!). PostgreSQL performance is pretty much consistent across platforms.
On the HA side, PostgreSQL has maybe less options: Slony/I (http://gborg.postgresql.org/project/slony1/ [postgresql.org]) for master/slave or Sequoia (http://sequoia.continuent.org/ [continuent.org]) for multi-master.o n.html [mysql.com]) for master/slave, MySQL cluster (http://dev.mysql.com/doc/refman/5.0/en/mysql-clus ter.html [mysql.com]) for those who want to switch to a new storage engine (NDB) or Sequoia (URL:http://sequoia.continuent.org/) for multi-master with transparent failover.
MySQL offers MySQL replication (http://dev.mysql.com/doc/refman/5.0/en/replicati
Re:Foreign Keys (Score:3, Insightful)
It is the same thinking that probably made the retards at MySQL AB make a datatype that accepts 30th February as a date. (At least did, a few years ago.) Why EVEN include a datetime datatype if it isnt capable of the SIMPLEST validations ever.
Yes, I'm fuming. Those MySQL retards has made a generation of programmers think they can do SQL when they manage to put crap into MySQL. Gahhh, I hope their puny webapps will haunt them down sometime.
(I was once searching for a simple webbased forum, and tested phpnuke. It had the following gem to display the 5 most recent articles in the database:
1. "SELECT * FROM ARTICLES ORDER BY ID DESC"
2. Retrieve all articles from the database
3. Then a for loop printing out the 5 first entries.
They basically transferred all data in the articles database everytime, just to iterate over the 5 first rows. Gahhhhhh)
Re:stability (Score:3, Insightful)
Re:Foreign Keys (Score:5, Insightful)
Bingo!
It doesn't cease to amaze me, when the Mysql croud argues that "you don't really need those pesky integrity stuff, it just slows down the database."
Guess what guys; You're dead wrong!
Any DBA worth his salary will enforce data integrity on the lowest possible level, which means constraints (however implemented) on the object level.
Sure, you can let your coders in Bengaluru ensure that the primary key is unique instead of just applying a unique index and the same goes for referential constraints between tables. You can implement them in the application just fine until somebody overlooks some minor detail in the code and you're royally fucked!
Again! Foreign keys or triggers are not "niceties". They are essential in implementing an industry strength database; period!
Re:MySQL is ridiculously easy to configure (Score:3, Insightful)
Re:Foreign Keys (Score:2, Insightful)
Re:Foreign Keys (Score:3, Insightful)
Of course, MySQL is effectively two products... (Score:3, Insightful)
MySQL/MyISAM is the one with the massive legacy code base, the one that your open-source blogging software uses and probably the one that your web host supports. It beautifully hits the "sweet spot" for data-driven web sites with infrequent and simple updates, where trading integrity for "read only" performance is sensible. It does not even purport to compete with PostgreSQL on features - but it does offer fulltext searches, again
MySQL/InnoDB is the one that offers transactions, foreign keys etc. (ISTR it doesn't do fulltext indexes, though) - this is the "version" that bears comparison with PostgreSQL. I wonder how its user base compares?
(OK - you can mix InnoDB and MyISAM tables in a single database, but you can't use InnoDB if your web host hasn't installed it - heck, one provider I use is still on MySQL V3.23)
Flamewars have tended to pit PostgreSQL against a mythical database with the performance of MyISAM and the features of InnoDB...
As for the GUI software, the MySQL GUI Admin/query browser stuff is shinier than PgAdmin3 - but the MacOS version of the former is a complete crashfest! Neither of them steps up to the plate of providing a FOSS equivalent of (the good bits) of MS Access.
Re:MySQL is ridiculously easy to configure (Score:3, Insightful)
The problem is that you apparently missed his point entirely.
Re:Not similar to my experience (Score:3, Insightful)
It seems to me that if you step back from the details, there is a fundamental difference in style between the two systems that could be summarized thus:
Postgres: emphasizes completeness, correctness, and conformance.
MySQL: emphasizes immediate practicality.
One style is not intrinsically better than the other. Given time their results may begin to converge, which I think is starting to happen. However, I am not surprised that many people are starting to give Postgres a second look after having dismissed several years ago. The Postgres strategy is a long term one. Early adopters of Postgres were a minority with a particular interest in the relational model and for whom conformance was a relatively high priority. Pragmatists who wanted to cherry pick a few of the model's most important advantages were drawn to MySQL.
Re:Foreign Keys (Score:3, Insightful)
Re:Foreign Keys (Score:5, Insightful)
Actually it shouldn't (in this context). Typically, one database will have several client applications attached to it. If data consistency is not checked at DB level, then:
Re:Foreign Keys (Score:4, Insightful)
Emphasis mine. In other words, You don't have to pay the $200 if your project is itself compliant with the GPL or similar license scheme.
"Comply with the GPL or pay us $200 to legally use our code or libraries" is not the same as saying "You have to pay us $200 if you plan to sell software you made using our code or libraries."
=Smidge=
Re:Foreign Keys (Score:5, Insightful)
The database's function is to provide a RELIABLE storage for your data. Part of the whole reliability thing is making sure crap can't get in, because once it's there everything goes to heck.
For instance, let's take a shopping cart. Can an order be for a negative quantity? If your app doesn't work that way (it could, using a negative amount for returns for example), and you still allow it in the DB, then all your reporting goes to heck, as SELECT SUM... now returns the wrong thing.
A proper database is set up in such a way that every piece of data in it makese sense. This means for instance not having things like orders hanging around without in the void without being linked to some client. This is something easily ensured by foreign keys. Otherwise you have an utter mess - the total of the orders in the database doesn't match the sum of the orders of all clients!
If you put your checks in the database, you have a guarantee that when somebody else codes another frontend to it (say, you had a website and now are making a special version for PDAs), if the application does the wrong thing, the database simply won't let it happen. This may cost a bit of speed, but I assure you that peace, your sanity and your ASS (if you have a boss and he's got any sense, he's not going to like it at ALL if it turns out that reports don't match reality, and that reality can't be even easily extracted) is far, far more valuable.
Disapponting start (Score:3, Insightful)
There were a couple comparisons a couple years ago. It was my understanding that PostgreSQL did better with large data sets in a P vs. M match. In getting hammered with connections, another test between MySQL, PostgreSQL, DB2, Oracle, and SQLServer, if I remember, Microsoft's offering started to crap out along a power curve at maybe just 200(?) hits and the others degraded pretty equally along a straight line.
My client/server experience started with some Oracle classes and managing a department server. I must say I am _much_ more comfortable with PostgreSQL and find MySQL a little alien no matter how popular it is. Just my 2 cents.
Unbiased? (Score:3, Insightful)
From the comparison table:
These statements convey the same information but that the author has presented them in different lights suggests to me a premeditated bias in favour of MySQL.
Re:Foreign Keys (Score:4, Insightful)
Re:Foreign Keys (Score:3, Insightful)
Re:Foreign Keys (Score:3, Insightful)
I also happen to believe I am better qualified than anybody else for selecting what indexes I want in a particular database I'm designing, which none of you others know the purpose of nor the update frequency of nor the join frequency of. It's a good rule of thumb; it's a lousy requirement.
Oh, and I'm perfectly aware that MySQL can power cool stuff - I have used it a ton myself (as an inherited database too expensive to replace, mostly). That doesn't mean that it doesn't suck compared to PostgreSQL (in my experience), and IMO is popular mostly because of being insecure by default (thus easy to install), being incompatible with the rest in subtly icky ways ("embrance and extend"), and due to semi-falsified benchmarks a long time ago (MySQL AB published only the benchmarks where they were best, varying what benchmarks they displayed by what database they were comparing against, giving the impression that the they were "as good or better" in all performance areas.)
Eivind.
Re:more recent benchmarks (Score:4, Insightful)
As the article shows, every time they double the number of cores, Postgres gains 75% in performance - like any good application should do. At 4 cores, it is already twice as fast as MySQL under reasonable concurrency; I'd like to see this test on a 8-core server - my guess is MySQL wouldn't be much faster than it is now and Postgres would perform at least 3 times better than MySQL.
Oh, and Postgres doesn't think 0000-00-00 is a valid date, which is nice too.
Re:Foreign Keys (Score:3, Insightful)
The maintenance costs are usually lower, and the reason for that is *when* you catch the error. If the database enforces some simple constraints, you catch the error *before* it goes into your database, and you know exactly which application tried to insert bad data. It's the best kind of error report a programmer can see.
If you do everything at the application level, any kind of bug can result in bad data being inserted. When a later point in your app finds the bad data, you have no idea where it came from. Good luck tracking that down. The part that inserted the bad data might not have even been written by you, maybe it can't be replaced by you, and maybe you don't even have the source to it. Maybe it's due to a security flaw.
Not only that, but some constraints are nearly impossible to enforce from the application without going way overboard on locking. UNIQUE constraints are a good example.
The "duplication" argument is just not true. The constraints offer your application an API in a way: it shows the app developer the nature of the data they can get from the database, and the nature of the information you can add to your database. In an application, almost all non-static functions do some sanity checking on their inputs, and by your definition that would be "duplication". However, it might be almost impossible for the caller to know it would cause an error, and there are so many callers that, if you don't do sanity checking, it would be impossible to trace the error backwards. It's exactly the same with a database, unless your application is so simple that tracing backwards is possible (i.e. only a couple of points can modify data). How would you like it if API calls didn't return errors? After all, it's just duplicating code, since you should check the data first anyway. And comments are the ultimate in "duplication"; I'm sure you don't write any of those, right?
Errors from constraint violations are a part of normal operating procedure for databases; a database error does not necessarily indicate an application bug.
Re:Foreign Keys (Score:4, Insightful)
Correct. That extra layer of checks will probably actually slow things down a bit.
But foreign keys aren't about performance. They're about data integrity, which I would hope every database administrator or developer is more concerned with anyway. It doesn't matter how many requests/second your DBMS can handle if the data is fuxxored.
Your app should be checking itself anyway.
Yes, it should be catching "foreign key constraint violation" exceptions thrown by the DB interface and handling them appropriately. I hope that's what you meant.
Re:stability (Score:3, Insightful)
I've got nothing against PostgreSQL -- just never used it. I'm sure it's a fine piece of software, but please don't spread falsehoods about MySQL just because people don't know how to configure it. That would be like me claiming PostgreSQL sucks because I couldn't get it working easily. It's all about knowing what you're doing in any case.
Specifically, most of the errors you're seeing are because they've got it configured to use more memory than their 32 bit arcitecture supports. It's fairly easy to misconfiure so that in a high traffic situation the MySQL process will use over 2GB and then the OS shuts it down. The options are to go 64 bit or to configure it to use less memory for performance or limit the connections -- just like with Apache's MaxClients option.
Cheers.
Re:Foreign Keys (Score:5, Insightful)
When are you non-database types going to stop saying "Your app should be checking itself anyway."
This is an insanely inneficient method of execution. It's also highly presumptive.
Inneficient: If you are going to insert a record you have to first check to make sure it's not there. Then if it is there you have to change your INSERT to an UPDATE. This is dumb. Some databases do a INSERT OR UPDATE. but if they don't, it's faster to do an INSERT, handle failure, UPDATE. Alternatively -- UPDATE and INSERT on ZERO ROWS CHANGED. This means you have to run less than 2 queries on average. Your app should check method guarantees two SQL statements are executed every single time.
Dumb. Say you check for a record to exist. You get a "NO" answer. While you are preparing and executing your next INSERT, some other process or a thread inserts that same record into the databse. Now you have an error and you still don't know what to do. In short, you're in a pretty bad way.
Presumptive. In all my years of living I've never seen any company happy with the only interface to the data being through the application interface. Especially with a database on the back end. The business types, Marketing in partitular, love to screw with database information to try and identify trends, patterns, and correlations between the customer behaviour, product representation, and sales metrics. It is presumptive that the application can safely contain all of the business logic and you can assume that no one will ever come in the back end and change something -- thereby breaking all your business rules.
The other consideration is that the business logic contained in a database is going to run a heck of a lot faster on the database than anything you can dream up in your application, unless the application is written in C. Databases are generally written in C/C++. Applications are generally written in Java,Perl,Python,Ruby. None of these can compete with C. Add to that the fact that databases have been designed for years to do only one thing -- manage data. Do you seriously think you can out perform a decade of database optimization in a ruby script?
If you are going to base an application on data it would be useful to know how to capitalize on the features of a database rather than trying to repeat it. At the very least, you are less likely to introduce bugs.
Re:Foreign Keys (Score:5, Insightful)
Additionally, databases generally can do this faster than the application code. I can say this because databases are written in C and optimized and debugged for years. Applications are rarely (relatively) written in C and have not been debugged for years when released.
This is something that actually really pisses me off about Ruby, Rails, and ActiveRecord. ActiveRecord is an insane violation of everything that a database has been built to do. It breaks consistency, violates keys, ignores so many rules... And it's beats the crap out of a database to do what a database is designed to do and can handle much faster.
This is regardless of the flame wars of Postgres vs MySQL.
Re:Foreign Keys (Score:3, Insightful)
Personally, I tend to steer towards procs that are complex internally with a simple external signature rather than using triggers. I find triggers are a real pain in the ass when you're trying to figure out how to optimize a slow running query.
When I develop, I usually put all my data access functionality into stored procedures, deny access to all tables and views, then selectively grant access to the stored procs.
Makes securing your data a lot easier, prevents most sql injection attacks, avoids the whole "magic quotes" mess, makes centrally managing your data a lot easier, and keeps the code-jockeys from screwing things up when they're in a rush.
It's also a big advantage when you're changing your schema. You don't even need to touch the codebase in a lot of cases.
On top of all this, it's more efficient. You send a lot less data back and forth across the wire, which most people don't think of until things start to bog down and it's time to move your db off the webserver and onto its own box on the network. And most dbs support some level or another of precompilation, which saves even more resources.
If you can save a trip across the wire to the db by doing data validation in code, checking that that email address has an @ symbol and all that jazz, well that's good. But if you need to hit the db to do that validation, as you'd have to do when you're enforcing integrity in the middle tier, you just wasted network resources. You shouldn't have bothered.
Even if you don't need all these sorts of benefits right now, there's still value in doing things the right way. Aside from building good working habits in yourself, you're building something that has a value external to the application.
A well designed database generally has value that goes beyond the application that prompted its initial design. In the absense of the middle and client tier, it can still be utilized to generate projections and answer questions, and it's trivial to slap a new UI onto it. This is generally not true for dbs that are tightly bound to the web tier.
To throw my 2c into the Postgres vs MySQL debate, there is one thing that stands out between the two that has nothing to do with the technology.
MySQL developers have demonstrated time and again through their history that they have no problems selling the ignorant a bunch of bullshit to spur adoptation of their product. They do not concede that their product is unsuitable for some niches because of its limitations, instead, they knowingly advise new users to use poor development techniques even as they struggle to fix those limitations in their product.
I do not trust them not to lie to me, and I would not stake my reputation on their products for that reason. That's something that is most likely never going to change regardless of what any "feature set" charts say.
Re:Foreign Keys (Score:4, Insightful)
None of this applies when somebody logs in with psql/enterprise manager/whatever and updates something in the database by hand. You can have all the OO and libraries you want, but it doesn't help if the new application doesn't use it. Yesterday we had code in VB6, today we have it in C#. Application is completely different. Guaranteeing that all the VB code will be exactly translated to C# is very, very hard.
On the other hand, the database remains being the same, and all the constraints it has don't care about which language, methodology or whatever is being used. VB, C#, Perl, PHP, are all automatically held to the same constraints.
And what's the problem with that? Use stored procedures and triggers then. Seriously, in a database of any size, forget about any attempts at compatibility with other databases. It only works on very, very trivial applications.
Just take postgres and mysql. PostgreSQL loves big transactions. The overhead for a transaction is high, but it's perfectly happy with large, long running transactions, and the more the better. PostgreSQL will be slow if you have a transaction per statement.
On the other hand, databases like mySQL want tiny transactions because the locks are really problematic. Leave a transaction uncommitted, and quickly things will grind to a halt. On the other hand, on postgresql the worst problem will be the lack of vacuum, which will gradually slow things down, but doesn't cause immediate problems.
If you make it for mySQL, without a redesign it'll suck on postgres and viceversa. If you try to make it for both, it'll be suboptimal on both.
Re:Foreign Keys (Score:3, Insightful)
No it shouldn't -- the purpose of a database is to make it happen whether the client apps care about data integrity or not. Now a good client will gracefully handle the errors that a database throws back at it, but the database is supposed to take care of the checks in the first place.
TFA reminds me of the anti-transaction FUD in old MySQL docs, which suddenly disappeared as soon as MySQL got a transactional backend. But hey, its system tables are still MyISAM, so you'd best be careful with those admin apps.
Re:Foreign Keys (Score:3, Insightful)
So you do it in multiple transactions then. Transaction 1 inserts the order data, transaction 2 processes the payment, transaction 3 updates statistics, with each saving a note somewhere of how far it got.
This way you have both things: consistency, and the possibility to have a partially complete (but cut off at a well determined point) operations.
That's easy, in my application I just log all of the requests headers and POST data. That's step 1, and will always succeed unless the database is down, as it's nice and simple.
Step 2 is processing it, in one transaction. If it fails, I can retry the operation.
Why? In a well designed database, things don't just vanish. What happens is that it returns an error to the user, who knows it wasn't saved at all, instead of being in some half-saved state the user may not be able to recover from. Then the user can retry saving it, knowing that unless the database says it's good, nothing gets written, and so there won't be 20 half-written records in the database due to the previous attempts.
Re:Foreign Keys (Score:2, Insightful)