Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

PostgreSQL vs. MySQL comparison 390

prostoalex writes "Ever find yourself wondering which open source database is the best tool for the job? Well, wonder no more, and let your tax dollars do the work in the form of Fermi National Accelerator Laboratory publishing this unbiased review of MySQL vs. PostgreSQL. After reading it, however, it seems that MySQL ranks the same or better on most of the accounts." My poor sleepy eyes misread the date of posting on here; caveat that this is more then 15 months old.
This discussion has been archived. No new comments can be posted.

PostgreSQL vs. MySQL comparison

Comments Filter:
  • Unbiased ? (Score:2, Interesting)

    by UncleH ( 8863 ) on Monday December 18, 2006 @09:34AM (#17284974)
    Just take a look at the description per item. I couldn't possibly call this unbiased in any way.
  • Re:Foreign Keys (Score:2, Interesting)

    by Tony Hoyle ( 11698 ) <tmh@nodomain.org> on Monday December 18, 2006 @09:59AM (#17285210) Homepage
    Foreign keys don't speed anything up, they just add an extra layer of checks on your database. Your app should be checking itself anyway.

    It's the subselects that get me - without them you have to jump through a lot of hoops. The sentence quoted basically translates as 'as long as you are only storing your CD collection and not doing anything serious with a database, then use mysql'.

    TFA also fails to mention that mysql cannot be used in commercial development without paying $200 per client - which makes it more expensive than most other solutions (except maybe oracle, and even they have cheap licenses for some uses).
  • LAMP (Score:1, Interesting)

    by Anonymous Coward on Monday December 18, 2006 @10:13AM (#17285354)
    Out of LAMP, the only component that doesn't suck goats is Apache. Linux is a total mess (2.6 is unstable to the point of being useless), MySQL can barely be called a database (absolutely key functionality missing, error reporting extremely weak) and PHP is a security joke (unless you add in Suhosin).

    The low entry bar to PHP coding and the fact that most of the bargain bin webhosts offer a poorly configured MySQL install as part of a $1/mnth plan only serves to perpetuate this horrid combination of software. What is worse is the way in which it has seeped into corporate applications. I hate you all.
  • by punker ( 320575 ) on Monday December 18, 2006 @10:17AM (#17285390)
    This almost seems like the same comparisons we've been hearing for years.
    1) Postgresql is more full featured than MySQL
    2) MySQL is faster in a read-mostly environment
    That's pretty much the same as the anecdotal arguments have been for years.

              In my job, we moved from mysql to postgres several years ago (around PG 7.0). At the time, we needed to make the move for performance reasons. We are in a read-write system, and MySQL's locking was killing us (this was before InnoDB was well established). The features are better too, as our developers were used to having data integrity features, server side programming, and all of the SQL92 constructs available. We also learned a bit about PG performance, which I'll share.

    1) Run EXPLAIN ANALYZE on everything. Postgresql is touchier about query performance than MySQL was. This just needs to be a habit if you're using PG. (You really should do performance analysis no matter your DB. It's just a good practice). The biggest gain will be making sure you're using index scans rather than sequential scans.

    2) Use persistent connections. Everyone likes to point out the forking issue with PG vs. MySQL's threaded. PG's connection handling is slow, there's no doubt about it. But there's an easy answer. Just limit how often you connect. If you can keep a connection pool, and just reuse those connections, you'll save this big hit.

    3) Full vacuum and reindex regularly. We've found the docs to be a bit off on this. It indicates that you should run these occasionally. If you're in a read-write system, a full vacuum on a regular basis is very important. It really doesn't take that long if you do it regularly. Also, we've had trouble with indexes getting unbalanced (we see 50->90% tuple turnover daily). This has gotten better, but it doesn't hurt to let your maintenance scripts make things ideal for you. So, we run a full vacuum and reindex of our tables nightly through cron.

    4) Get your shared memory right. PG's shared buffers is probably the most important config attribute. It controls how much of your DB is memory resident vs disk resident. Avoiding disk hits is a big deal for any DB, so get this right. If you can fit your whole DB in memory, then do it. If not, make sure your primary tables will fit. The more you use the shared memory, and the less you have to page data in/out, the better your overall performance will be.

    Most DB systems seem to be read-mostly, so I can understand the performance comparisons focusing on that. In our read-write system though, the locking was the biggest issue and it tilted the performance comparison toward PG.
  • by darekana ( 205478 ) on Monday December 18, 2006 @10:42AM (#17285728) Homepage
    http://tweakers.net/reviews/657 [tweakers.net]

    They compare PostgreSQL 8.2 vs MySQL 4.1.20 and MySQL 5.1.20a.
  • Re:stability (Score:3, Interesting)

    by scribblej ( 195445 ) on Monday December 18, 2006 @11:16AM (#17286180)
    This is what blows my mind, too. I've been using postgresql 7.x (yeah, I'm a Debian user) for years and I have never, ever seen it crash, disconnect, dump core, or leak memory. Ever.

    I've never seen MySQL do those things, either, to be fair, but I don't use it as much, and I *did* have it destroy a bunch of my data once when a machine was rebooted without being properly shut down.

    The comment that Postgresql is unstable, even in the older version numbers, seems to indicate some kind of separation from reality. I didn't see anything in the article about their testing methods. I wonder, given some of the other complaints, if they tested them on *WINDOWS* despite the lousy support for Windows, and the outright recommendation that you don't try it, from the devs.

    Seriously. Anyone else with me on that? Even outdated as this review is, it STILL doesn't make sense... unless you think maybe they did it on Windows. Then it makes more sense. Postgresql on Windows used to suck, and for good reason. It wasn't supported, either, and for good reason.

  • by Hornsby ( 63501 ) on Monday December 18, 2006 @11:25AM (#17286280) Homepage
    I had to make a decision recently between Mysql and Postgresql for a database composed of many tables with greater than 50,000,000 rows. While going through the decision making process, I loaded a sample table with 50,000,000 rows to do some benchmarks. The first thing I had to do to run my tests was index the table. I started with Mysql using a InnoDB table type. I had both database servers relatively tuned to the hardware they were running on. I ran the create index with MySQL and detached my screen session. I came back several hours later to find MySQL was doing something along the lines of INDEX via REPAIR SORT. After some reading, I learned that this takes an order of magnitude longer than building an index the "normal" way and is caused by the index becoming corrupted during the creation. Okay... so, I restarted this process several times and encountered the same problem. This is clean data mind you that has already been exported from an existing SQL server. I duplicated my install on a second server and had the same problems. Very annoyed with MySQL, I gave Postgres a try. It worked on the first time in less than 25 minutes without issue. Since then I've been using it on 250,000,000 row datasets without issue. It's always reliable, and as long as you remember to use CURSORS for huge SELECT statement, it's painless to work with.
  • Re:Foreign Keys (Score:3, Interesting)

    by Trifthen ( 40989 ) on Monday December 18, 2006 @11:27AM (#17286300) Homepage
    I think the concern here was that even 15 months ago, Postgres was at least in the 8.0 tree, a vast improvement over the 7.4 tree which itself offered not insignificant advances over 7.2 and 7.3. The linked table of pros/cons for each database is dated February 2005, and browsing the postgres archives informs me that 8.0.1 was released by that point. If they waited two months, 8.0.2 would have been available. As it stands, the 8.0 tree was deprecated back in November of 2005, and the future now sits squarely on 8.2 after 8.1 experienced a short life of roughly a year.

    The truth of the matter is that postgres is a rapidly fluxuating target. I remember waiting five years for mysql 5.0 to finally see the light of day, and during that time postgres has truly experienced a ridiculous flurry of revisions and tweaks sometimes requiring alterations to their core storage format. That they do this all while providing full ACID compliance is a Godsend.

    I don't begrudge MySQL their success, but the roots of that database were thanks to simplicity and speed, like where SQLite is now. It's also an evolving product, but I think its lost the original focus in the mad rush of (friendly) competition all products eventually engage.
  • Re:stability (Score:3, Interesting)

    by Blimey85 ( 609949 ) on Monday December 18, 2006 @11:38AM (#17286458)
    Just because you see that error doesn't mean that MySQL has "died". If the number of people hammering your site exceeds the number of connections you have allowed, that message will come up. Granted by that point the server is probably crawling but maybe not. You may be doing a dump from a particular table causing it to remain locked while the dump is in progress. All queries to that table will stack waiting for it to get unlocked and that can cause your connections to max which would in turn cause that particular error message. Not the only caue of course but the main one I've come across.
  • Re:stability (Score:3, Interesting)

    by rycamor ( 194164 ) on Monday December 18, 2006 @12:06PM (#17286898)
    It flies in the face of my experience, too. Even with the older 7.x series, if I accidentally ran a stupid query, (for example, a join on multiple tables, forgetting the WHERE clause, resulting ridiculous multiplication of returned rows) PostgreSQL would literally run that query for 24 hours without dying, until I finally killed the query or re-started the server.

    Come to think of it, I haven't had PostgreSQL ever die on me, once. Even when my server crashed and rebooted, PostgreSQL would just happily pick up where it left off and get back to work.

    Meanwhile, on more than one Linux server I had the pleasure to maintain, MySQL was notorious for randomly dying.

    Something tells me that the person who wrote this report simply read the MySQL mailling lists, and compiled his report from the discussions, without any real testing, much less considering that a checklist is meaningless unless you also evaluate how WELL each DBMS implements said feature.
  • Re:Foreign Keys (Score:4, Interesting)

    by anothy ( 83176 ) on Monday December 18, 2006 @12:36PM (#17287434) Homepage
    good clarification. but still: does this really hold water? i mean, unless they're using a modified GPL or whatever (is the GPL open-source?), they don't really get to dictate the terms. "derivative work", for example, isn't something they get to define. if i'm shipping a closed-source app that simply requires MySQL be also installed, and not something that's statically linked to their binaries, where's the "derivative work" come in?
  • Re:Foreign Keys (Score:2, Interesting)

    by thedave ( 79572 ) on Monday December 18, 2006 @01:44PM (#17288676)
    Don't get to excited about accurate calendaring. Our calendar has only been in use for a few hundred years.

    Very few (except special purpose) databases do ancient dates correctly.

    AD alone is highly revised. BC is all over the map,

    This bug report for DEC VMS is amazing in its analysis:

    D I G I T A L

    SPR ANSWER FORM

    SPR NO. 11-60903

    SYSTEM VERSION PRODUCT VERSION COMPONENT
    SOFTWARE: VAX/VMS V3.2 VAX/VMS V3.2 Run-Time Library

    PROBLEM:

    The LIB$DAY Run-Time Library service "incorrectly" assumes the year
    2000 is a leap year.

    RESPONSE:

    Thank you for your forward-looking SPR.

    Various system services, such as SYS$ASCTIM assume that the year 2000
    will be a leap year. Although one can never be sure of what will
    happen at some future time, there is strong historical precedent for
    presuming that the present Gregorian calendar will still be in affect
    by the year 2000. Since we also hope that VMS will still be around by
    then, we have chosen to adhere to these precedents.

    The purpose of a calendar is to reckon time in advance, to show how
    many days have to elapse until a certain event takes place in the
    future, such as the harvest or the release of VMS V4. The earliest
    calendars, naturally, were crude and tended to be based upon the
    seasons or the lunar cycle.

    The calendar of the Assyrians, for example, was based upon the phases
    of the moon. They knew that a lunation (the time from one full moon
    to the next) was 29 1/2 days long, so their lunar year had a duration
    of 364 days. This fell short of the solar year by about 11 days.
    (The exact time for the solar year is approximately 365 days, 5 hours,
    48 minutes, and 46 seconds.) After 3 years, such a lunar calendar
    would be off by a whole month, so the Assyrians added an extra month
    from time to time to keep their calendar in synchronization with the
    seasons.

    The best approximation that was possible in antiquity was a 19-year
    period, with 7 of these 19 years having 13 months (leap months). This
    scheme was adopted as the basis for the religious calendar used by the
    Jews. (The Arabs also used this calendar until Mohammed forbade
    shifting from 12 months to 13 months.)

    When Rome emerged as a world power, the difficulties of making a
    calendar were well known, but the Romans complicated their lives
    because of their superstition that even numbers were unlucky. Hence
    their months were 29 or 31 days long, with the exception of February,
    which had 28 days. Every second year, the Roman calendar included an
    extra month called Mercedonius of 22 or 23 days to keep up with the
    solar year.

    Even this algorithm was very poor, so that in 45 BC, Caesar, advised
    by the astronomer Sosigenes, ordered a sweeping reform. By imperial
    decree, one year was made 445 days long to bring the calendar back in
    step with the seasons. The new calendar, similar to the one we now
    use was called the Julian calendar (named after Julius Caesar). It's
    months were 30 or 31 days in length and every fourth year was made a
    leap year (having 366 days). Caesar also decreed that the year would
    start with the first of January, not the vernal equinox in late March.

    Caesar's year was 11 1/2 minutes short of the calculations recommended
    by Sosigenes and eventually the date of the vernal equinox began to
    drift. Roger Bacon became alarmed and sent a note to Pope Clement IV,
    who apparently was not impressed. Pope Sixtus IV later became
    convinced that another reform was neede

  • by Angvaw ( 992553 ) on Monday December 18, 2006 @02:21PM (#17289266)
    I've used Oracle for about 3 years and Postgres for about 1. Both are good databases. I may write what you're asking for eventually, but don't hold your breath. :) Here are a few thoughts off the top of my head that might help.

    In a way-oversimplified nutshell, Postgres seems sorta-kinda modelled after Oracle. But many times I look into a feature that I've used on Oracle and find it doesn't exist in Postgres or it's not up to my expectations. The Postgres feature might be a little half-assed. One example is the multi-version concurrency control. If you update a row in Oracle, the undo tablespace contains a snapshot of the old version. All other concurrent users are directed to the undo tablespace until the transaction completes. They get the old version of the uncommitted row from the undo tablespace. Postgres on the other hand makes a new row in the table for your update, and the old one is marked as a "dead tuple" whose data can be retrieved by other transactions.

    Might not seem like a major difference, but you'll have to "vacuum" your Postgres database periodically to clean up your dead tuples. Otherwise, table scans have to scan over all the dead tuples in addition to the "real" data. Some DBAs simply run a vacuum job every night, or use the auto-vacuum daemon. It's not always so simple however. Imagine you have to update data every morning and get it done ASAP. Well, now that you've updated your 500,000 rows, you have to vacuum the table to keep queries running at a decent speed. When I was still a Postgres newbie, I would just vacuum whenever I remembered. One time I forgot about it for a few days, when I finally ran the vacuum it took 17 hours to complete - granted it was a one disk server. It's just another thing to worry about.

    Once I looked for a replication solution for Postgres. Some were incomplete. Slony-I - the most popular Postgres replication solution - seemed like the way to go. But upon delving into it, I found it imposed limitations because it sort of "corrupts" the Postgres data dictionary. I believe this corruption broke triggers on all replicated tables (don't quote me on that). Also Slony-I does not support multi-master replication at all.

    Sometimes a feature is just great. Rarely is the Postgres feature better, except in terms of user-friendliness. For instance the psql command line tool blows sqlplus out of the water IMO. (Wow, paginated results, tab completion!)

    Oracle has a wealth of OLAP/business intelligence features that are lacking in Postgres. There is a flavor of Postgres called Bizgres that is a bit better in this area, though still lacking in comparison. Bizgres is more cutting edge and doesn't seem to have nearly as strong a community as Postgres. Go to http://www.bizgres.com/bb/ [bizgres.com] and you'll see there are very few posts.

    Oracle has an expert named Tom - asktom.oracle.com - who is a genius and answers all my questions. Postgres has an expert, also named Tom, who is a genius and answers all my questions on the newsgroup.

    When would I use one over the other? Personally I would use Oracle if there were no budgetary constraints, but if Postgres definitely met all my requirements, I'd go with that. (Yes yes, I am Captain Obvious.) If you need Oracle's enterprise-level features then Postgres is out of the question. There is however EnterpriseDB, a commercial, Postgres-based enterprise-level database. Several months ago Sony "snubbed" Oracle and went with EnterpriseDB for its MMORPGs. I have not looked into EnterpriseDB myself so I can't comment further.

    Well I'm rambling and not getting work done. Hope that helps just a little bit.
  • by alienmole ( 15522 ) on Monday December 18, 2006 @02:29PM (#17289414)
    I suspect you're thinking of relatively simple cases where you have a single application working against a database. Which of course is the standard MySQL scenario.

    Where foreign keys and the other referential integrity features really shine is in true enterprise scenarios, when you may have hundreds or thousands of applications, written in multiple languages, working against the same shared database(s).

    In that scenario, the only viable way to duplicate the functionality of foreign keys at the application level is to have a middle layer which all other applications are required to go through. Realistically, that middle layer has to be implemented as a server, serving requests for object/record creation, update and delete over the network. Implementing it as a library to be linked into applications doesn't work well, because there are multiple applications accessing the database, and integrity enforcement needs to be centrally coordinated.

    Implementing a middleware data server for an application isn't all that difficult, but integrating it into applications can be. Most application development environments know how to talk to databases, but don't automatically know how to talk to your application-specific, language-independent, data server. So now you're writing a client library for each app dev platform used in the enterprise, and dealing with things like integrating your custom interface with data-bound controls in the user interface. BTW, this is where people start resorting to e.g. SOAP, and projects start going off the rails (no pun intended, Ruby fans).

    Luckily, as it turns out, there are already standardized, widely-available, well-supported systems that implement a centralized data serving service which enforces referential integrity. They're called databases. And foreign keys are an essential part of the service they provide.
  • Re:Foreign Keys (Score:2, Interesting)

    by Branko ( 806086 ) on Monday December 18, 2006 @03:07PM (#17289958)
    You are correct about everything except the most important part. I'm not sure your wrong, but I question it. That is that it's typical to have multiple applications attached to the same db.


    Good point. Al lot of "small" Web applications are actually "single application - single database". In addition to that, many are heavily read oriented with very few writes and manipulate data that is not very critical. In environment like this, having "application level" data consistency might be quick and dirty solution that actually works quite well in practice.


    However, I'm under impression that many people learn these quick and dirty solutions first and never quite take the time u understand why the "slow and clean" solutions were invented in the first place.


    In addition to that, I would argue that most development goes into "large" Web applications (they are fewer, but require significantly more development). Also, Web is by no means the only area where databases find their application.

  • Re:Foreign Keys (Score:3, Interesting)

    by Haeleth ( 414428 ) on Monday December 18, 2006 @03:59PM (#17290708) Journal
    i mean, unless they're using a modified GPL or whatever (is the GPL open-source?), they don't really get to dictate the terms. "derivative work", for example, isn't something they get to define.
    MySQL AB's interpretation of the terms matches the FSF's interpretation, and has also been supported by courts in various parts of the world. It holds water. Nobody's found a loophole yet, and believe me, there are a hell of a lot of people who have been looking.

    if i'm shipping a closed-source app that simply requires MySQL be also installed, and not something that's statically linked to their binaries, where's the "derivative work" come in?
    The FSF's interpretation of their own license holds that both static and dynamic linking create a derivative-work relationship. I don't know what their legal argument is, but the advice they issue is that if a binary does not work unless a specific shared library available, it uses that library and is therefore required to respect that library's license. Note that a even dynamically linked executable does generally have specific references to parts of the shared library compiled into it, and on some platforms is even statically linked to an import library, so there is some logic in this position.

    Now, if what you had in mind was an application that wasn't linked to MySQL libraries at all, but interacted with MySQL by running the mysql program with specific command-line options and parsing the output, then you would be correct: that does not create a derivative-work relationship, and even RMS himself is of the opinion that a program that works that way would not infringe on the GPL and could be licensed however you please. Of course, there would probably be something of a performance hit.
  • Re:Foreign Keys (Score:3, Interesting)

    by cyberon22 ( 456844 ) on Monday December 18, 2006 @05:09PM (#17291776)
    This is a really interesting post. What are your thoughts on a program that could hot-swap between backend database systems, optionally choosing to interact with MySQL or Postgresql or SQLite or whatever backend database storage system was required. If the choice was left to the user in terms of implementing the database backend, would this still be considered a derivative work for sales purposes?

    Not looking for legal advice, just curious because you have some interesting thoughts on things.

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...