Forgot your password?
typodupeerror
Databases Programming Software IT Technology

First "Real" Benchmark for PostgreSQL 275

Posted by ScuttleMonkey
from the waiting-for-the-opportune-moment dept.
anticlimate writes "A new benchmark published on SPEC shows PostgreSQL's performance approaching that of Oracle's and surpassing or on par with MySQL (however the test-hardwares of the other DB systems are somewhat different). The test was put together by PostgreSQL's core developers working at Sun. They certainly are not unbiased, but this is the first 'real' benchmark with PostgreSQL — according to Josh Berkus's blog. The main difference compared to earlier benchmarks (and anecdotes) seems to be the tuning of PostgreSQL."
This discussion has been archived. No new comments can be posted.

First "Real" Benchmark for PostgreSQL

Comments Filter:
  • by Anonymous Coward on Monday July 09, 2007 @03:26PM (#19805367)
    and real DBA's use informix..

    like BSD.. no it's not dead yet.
  • Bad firehose! (Score:5, Informative)

    by greg1104 (461138) <gsmith@gregsmith.com> on Monday July 09, 2007 @03:27PM (#19805377) Homepage
    Why this emaciated post made it while mine didn't I'll never know...here's how I submitted this story:
     
    The current version of PostgreSQL now has its first real benchmark [ittoolbox.com], a SPECjAppServer2004 [spec.org] submission [spec.org] from Sun Microsystems. The results required substantial tuning [sun.com] of many performance-related PostgreSQL parameters, some of which are set to extremely low values in the default configuration — a known issue that contributes to why many untuned PostgreSQL installations appear sluggish compared to its rivals. The speed result is close but slightly faster than an earlier Sun submission using MySQL 5 [spec.org] (with enough hardware differences to make a direct comparison of those results unfair), and comes close to keeping up with Oracle on similarly priced hardware — but with a large software savings. Having a published result on the level playing field of an industry-standard benchmark like SPECjAppServer2004, with documentation on all the tuning required to reach that performance level, should make PostgreSQL an easier sell to corporate customers who are wary of adopting open-source applications for their critical databases.
  • by Kalriath (849904) on Monday July 09, 2007 @03:32PM (#19805427)
    DB2? It's only useable on large mainframes (big iron, so to speak) from what I understand. Generally speaking only the REALLY large shops would use it, so I wouldn't be surprised you'd never seen it - neither have I. We're a pretty big organisation where I work, and we have a mix of Oracle, MSSQL, and Sybase servers.
  • by KillerCow (213458) on Monday July 09, 2007 @03:34PM (#19805467)
    I think that somebody sent the wrong link and (surprise!) the editors didn't even follow it to check.

    Here's a more useful one: All SPEC jAppServer2004 Results Published by SPEC [spec.org]

    The benchmarks aren't standardized enough for any useful comparison. The hardware and configurations vary in almost every one.
  • by morgan_greywolf (835522) on Monday July 09, 2007 @03:43PM (#19805571) Homepage Journal

    DB2? It's only useable on large mainframes (big iron, so to speak) from what I understand.


    Um, no. DB2 [ibm.com] these days runs on most major UNIX variants (HP-UX, Solaris, AIX, IRIX, etc.), Linux and Windows. It's used quite often, in fact. Most Enovia/VPM installations use DB2 backends, for instance. Modern versions use XML along with regular relational database stores and are very, very up-to-date technology-wise. Very scalable.

  • by Pap22 (1054324) on Monday July 09, 2007 @03:44PM (#19805585)

    This publication shows that a properly tuned PostgreSQL is not only as fast or faster than MySQL, but almost as fast as Oracle (since the hardware platforms are different, it's hard to compare directly). This is something we've been saying for the last 2 years, and now we can prove it.
    Postgresql 8.2 on UltraSPARC T1 [spec.org]
    MySQL 5 on AMD Opteron 285 [spec.org]

    The UltraSPARC has 8 cores on 1 chip and 16GB of memory.
    The Opteron has 4 cores and 8GB of memory.

    The UltraSPARC should smoke it every time.

  • by Anonymous Coward on Monday July 09, 2007 @03:45PM (#19805587)
    Intelligent people use SQLite (or Berkeley DB) for apps that don't require a heavyweight DBMS.

    Databases are even being used in many situations that would be better addressed using flat files.
  • by jhines (82154) <john@jhines.org> on Monday July 09, 2007 @03:45PM (#19805593) Homepage
    If you read the details, while being Sun machines, they are Opteron based, so yeah they compare.
  • by pavera (320634) on Monday July 09, 2007 @04:25PM (#19806065) Homepage Journal
    I've been using both MySQL and PostgreSQL for 7 years now... I've never felt a need for "developer support". Or if that phrase means "documentation + message boards" I've never felt a lack from postgresql. I also really don't get the PostgreSQL is hard to use argument. In every linux distro I've used in the last 7 years (redhat, centos, suse, fedora core, ubuntu) they are exactly the same... IE apt-get, yum, whatever, install postgresql-server/mysql-server. Then $startupscriptdir/postgresql start....

    Do people really get that hung up on using postgres as the initial user instead of root?!? That is the ONLY difference that I can see.

    Now I can agree that maybe postgresql doesn't really "target" an audience, but that is also because they are a true open source project. They don't have a commercial version. I really don't think Linus sits around saying "Ok, we need to add this feature so that more fortune 500's will adopt linux". Or "we need to add feature xyz so this will appeal to small businesses". MySQL has a "target" audience because they are selling something. If you aren't selling anything, by definition you don't have a target.
  • by Anonymous Coward on Monday July 09, 2007 @04:25PM (#19806081)
    The JEE side used Opterons, the DB side used UltraSPARC: Sun Fire T2000 Server, 1 chip 8 cores 1.2GHz UltraSPARC T1.
  • by Anonymous Coward on Monday July 09, 2007 @05:12PM (#19806599)
    Berkeley is extremely fast for simple key/value pairs. It's often embedded and has bindings to every major scripting language.

    SQLite is a small and fast database that works from local files. Again it's often embedded and there are bindings for all major scripting hosts. You could run into concurrency issues using it under heavy load.

    Flat files are simple and fast, you have the concurrency issue and need to lock the file for writes.

    XML is typically flat files but sleepycat and others have XML databases. XML is mainly useful for interchange where several apps need access to the same data. Leaner formats like YAML and JSON are gaining ground for serialized data.

    Then you get the full blown RDBMS which are first choice for large scale apps with multiple clients. Multi user ERP, accounts packages, busy web forums - it's the right choice for these apps. Using MySQL or Postgres to catalog your personal DVD collection or for a personal blog/forum with only a handful of users is worse than overkill - it's ridiculous.
  • by jedidiah (1196) on Monday July 09, 2007 @05:19PM (#19806683) Homepage
    If you do that to even a "fast and robust" RDBMS server you are bound to be bludgeoned by the DBA.
  • by Per Wigren (5315) on Monday July 09, 2007 @05:33PM (#19806843) Homepage

    What am I missing?
    You are missing that recent versions don't need that anymore as it will auto-vacuum for you incrementally in the background. You should still run a "vacuum full analyze" after schema changes though but you usually don't change the schema live on heavily loaded databases anyway...
  • by ctr2sprt (574731) on Monday July 09, 2007 @06:40PM (#19807435)

    You can do way better than that with PostgreSQL, at least, and I suspect with MySQL as well. I wrote a benchmark similar to yours, but a good bit more complex. I had two tables, one of which was seeded and another which was populated by the benchmark. The benchmark table had six columns (int, timestamp, 4x bigint), a primary key (int + timestamp), four check constraints (on the bigints), a foreign key constraint (int, to the seeded table), and two indexes (one int, one timestamp). I would do a commit every 75k rows, with 24 such commits per iteration and 30 passes per benchmark run, so 54 million rows total. I also used a thread pool, and there are two reasons for that. First, some amount of parallelism improves DB performance. Second, it more accurately simulated our predicted usage patterns of the database. We ran my benchmark against PSQL and IBM DB2.

    The results were interesting (at least, I thought they were). First, PSQL can only handle about 10 threads doing work at once. Past about 10 threads, the DB completely falls apart. DB2, however, could handle more busy threads than Linux could, with a very gradual (and linear) degradation in performance past about 25 threads. I stopped testing at 100 threads. Second, PSQL's inserts per second (IPS) rate cut in half by the end of the bechmark. DB2 followed a similar trend until about 5 million rows, at which point IPS went up to where it started and stayed there without moving. Third, DB2 was I/O-bound, whereas PSQL was CPU-bound. I suspect it's why DB2 was able to handle an order of magnitude greater concurrency: more threads just meant the CPUs had something to do while waiting on the disks. However, it does mean that PSQL might do better with faster CPUs, whereas DB2 would not (it'd just be able to handle more threads).

    And the numbers: DB2 averaged 1100 IPS, PSQL 600. Note that for the first million rows or so PSQL was faster: it just eventually dropped down to ~400 IPS after ten million rows or so, killing the average. Of course, since this table would never have fewer than 54M rows - actually, it would typically have 160M - the IPS I got at the end was the one that mattered. Also, this was on a pretty weak server, at least for this kind of workload. With more (and faster) cores, more memory, and more spindles, I'm pretty sure you could increase those numbers by 50% or more. With tuning, perhaps that much again.

  • by megalomaniacs4u (199468) on Monday July 09, 2007 @06:57PM (#19807581)
    As the DBA for Postgres DB where we do that for a website that searches over several million rows (with daily updates of up to 25% of the DB) in under 10 seconds. I can say that a similar query is ugly as hell but it only took a couple days tweaking to get the average case to under 3 seconds - certain parameters max out at 20 seconds.

    Certainly postgres plays nice with self joins and natural joins as the query turned out faster than trying an iterative stored proc so we just access a view from the proc.
  • by greg1104 (461138) <gsmith@gregsmith.com> on Monday July 09, 2007 @06:59PM (#19807599) Homepage
    Josh Berkus, the person whose blog was referenced here, is one of the PostgreSQL core developers, and he's currently employed by Sun to work on projects like this--which includes contributions back to the PostgreSQL development efforts. He's been doing a lot of work scaling performance upwards to larger capacity servers than the database has traditionally been deployed on.
  • by Anonymous Coward on Monday July 09, 2007 @07:01PM (#19807629)
    Heh. I should have mentioned my testing was on a 800 Mhz Athlon desktop I use as a spare "server" (wife won't let me put up a rack for real servers yet :(

    It was also on a server loaded down by apache2/samba/X/mysq/psql and whatever else I've installed to play with over the past 2 years. So while not realistic as far as a production environment, I thought I'd share my experience since I thought postgres would have handled at least close to mysql. Never imagined the performance would be 5 times slower though. I'll shoot you an email for your config files though, since now I'm curious :)

    Also, to the person who suggested I use a flat text file, I only inserted with no indexes for speed, I built the indexes again afterwards. Furthermore, I was already importing from a flat text file, and I wanted to do better than a linear search and also the DBs compress the data which is helpful in keeping disk space usage down (mysql did roughly 7 GB text to 3.6 GB MYD file).
  • by Anonymous Coward on Monday July 09, 2007 @07:06PM (#19807671)
    Google is your friend...

    http://www.varlena.com/GeneralBits/Tidbits/perf.ht ml [varlena.com]

    http://www.revsys.com/writings/postgresql-performa nce.html [revsys.com]

    There is also good material in chapter 4 (Performance) of "PostgreSQL" by Douglas and Douglas (Sams Publishing Developer's Library)
  • by swusr (689597) on Monday July 09, 2007 @08:01PM (#19808115)

    Real close, I agree.. Postgres had a 40GB cache, Oracle used 21GB

    PostgreSQL setup: 8 cores, 16GB RAM [spec.org].

    Oracle setup: 64 cores, 512GB RAM [spec.org].

  • Re:Bias (Score:3, Informative)

    by pavera (320634) on Monday July 09, 2007 @09:08PM (#19808635) Homepage Journal
    The great (sarcasm) thing to me is they have added all these features (views, stored procs, triggers) in the last year or so, unfortunately if you want to use any of those features there are huge caveats when using their main selling point of "easy" clustering. Triggers, Views and Stored Procs all break replication under pretty normal use cases. So, if you want to use them you can't use the built in clustering.

    I don't know if the same holds true for PostgreSQL, but I would be extremely surprised since Postgre has had all of those features for years and anyone designing a replication scheme would take them into account. MySQL in contrast had replication first, and now the designers of the additional features have to take the replication scheme into account (which is much harder I'd imagine, as views, triggers and stored procs all have some semblance of a standard, and at the very least have a well defined feature set, and sometimes you can't bend a feature set to fit into an already existing replication scheme)

    Also, even if you own the server, converting an existing MyISAM DB to InnoDB can be a huge chore. I've done this for at least 3 customers, and every time it is a much more painful process than you'd think. Mostly because existing data in MyISAM almost always has referential problems after any real world usage (like its been live for more than a month). Someone has deleted a customer, and now you have orders that are orphaned, someone ran a bad update that set the FK wrong in the orders table, and MySQL happily obliged, and now you've got orphaned data... I mean to me the most BASIC functionality of an Rdbms is that it enforces the relationships you give it. I don't even count MySQL w/MyISAM as a database. Might as well just use a bunch of flat comma separated files for tables.

  • Re:Which MySQL? (Score:4, Informative)

    by jadavis (473492) on Monday July 09, 2007 @10:10PM (#19809097)
    MySQL is modular [ from grandparent post ]
    The point IS pick'n'mix as you put it. [ from parent ]

    I think this is a huge misconception, and completely backwards. MySQL allows you to change storage engines, but the behavior at the semantic level changes. That's the antithesis of modularity: the semantic behavior should remain constant, while the performance changes. If you change both, that's not modularity, that's a new configuration that breaks client applications.

    MySQL is configureware, like PHP. Everything you get is a trade. Want full text indexes (MyISAM)? You have to give up transactions (InnoDB). But the marketing material always just says "Yup, we have full text indexes (MyISAM), SQL compliance (strict mode=on), transactions (InnoDB), large number of apps (strict mode=off)". Of course many of the features are mutually exclusive. When postgresql says it has a feature, it's really there.

    Just need some in-memory lookups? Memory table engine. Clustered data? NDB.

    Seems like the memory engine and NDB are the same thing: http://dev.mysql.com/doc/refman/5.0/en/mysql-clust er-overview.html [mysql.com]. That means that your "cluster" is not much of a database, and is totally unacceptable for many applications that require a database. Power off == no more data.

    Now compare with PostgreSQL. PostgreSQL has one "storage-engine", but there are many access methods to that storage engine. There's Btree, GiST, and GIN. On top of the access methods, there are also a lot of plans. There's sequential scan, index scan (good for lookups or some sorting needs), bitmap index scan (good for AND/OR with other bitmap index scans, and always orders the I/O in file order), hash join, hash aggregate, merge join, nested loop, group aggregate, etc.

    Look at all those algorithms for accessing the single storage mechanism. It's amazing. MySQL doesn't have all those, and even if it did have the algorithms, it couldn't use them. How would MySQL know when to use a hash join and when to use a merge join? PostgreSQL collects statistics, calculates expected costs, and chooses the best plan based on the given query (called a cost-based planner). MySQL uses a rule-based planner (does it have an index? ok, let's use it then). To PostgreSQL, these two queries are different:

      (1) SELECT * FROM parents WHERE number_of_children=2;
      (2) SELECT * FROM parents WHERE number_of_children=20;

    The stats collector would know that #1 will match many more records than #2. It will probably choose a sequential scan for #1, since it needs to read every page anyway. It will probably choose an index scan for #2. Now, if that's in a subselect, postgresql will know that #1 will return a lot of records, and maybe choose a different join algorithm than if it were #2. Again, PostgreSQL chooses these plans for you based on cost -- no special configuration required.

    If you want modular, it's being able to replace a full text GiST index with a full text GIN index and the application never knows the difference except performance. And by the way, the DDL in postgresql is transactional, so you don't even have to restart the application even if you do some major restructuring of the table (like replacing a table with an updatable view).
  • by Anonymous Coward on Monday July 09, 2007 @11:51PM (#19809753)
    SQLite has before insert triggers. Why doesn't MSSQL?
  • by Anonymous Coward on Tuesday July 10, 2007 @04:49AM (#19810957)
    This is the best post I have read in ages. I am old school. I like DB's and when doing requirements gathering am already thinking in terms of tables and relational schema. In a really well designed relational schema it is possible to do very complex operations without ever resorting to plsql/transact etc. simply using plain sql (insert/update etc). I hate cursors/recordsets. As soon as I see these constructs in the db I generally find a VB/java programmer messing where he is not wanted.

    PS: Oracle is not Fortran - PL/SQL is derived from Ada and therefore shares much of the Pascal/Modula feel.

The difficult we do today; the impossible takes a little longer.

Working...