Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

PostgreSQL 9.2 Out with Greatly Improved Scalability 146

The PostgreSQL project announced the release of PostgreSQL 9.2 today. The headliner: "With the addition of linear scalability to 64 cores, index-only scans and reductions in CPU power consumption, PostgreSQL 9.2 has significantly improved scalability and developer flexibility for the most demanding workloads. ... Up to 350,000 read queries per second (more than 4X faster) ... Index-only scans for data warehousing queries (2–20X faster) ... Up to 14,000 data writes per second (5X faster)" Additionally, there's now a JSON type (including the ability to retrieve row results in JSON directly from the database) ala the XML type (although lacking a broad set of utility functions). Minor, but probably a welcome relief to those who need them, 9.2 adds range restricted types. For the gory details, see the what's new page, or the full release notes.
This discussion has been archived. No new comments can be posted.

PostgreSQL 9.2 Out with Greatly Improved Scalability

Comments Filter:
  • Re:LOL (Score:1, Informative)

    by Anonymous Coward on Monday September 10, 2012 @07:30PM (#41294403)

    Because we love to bash our keyboards into so much plastic scrap whenever we come across one of its many standards-defiant idiosyncracies?

  • by Anonymous Coward on Monday September 10, 2012 @08:31PM (#41294845)

    9.3. Seriously.


  • by Anonymous Coward on Monday September 10, 2012 @08:57PM (#41295063)


  • by dragonk ( 140807 ) on Monday September 10, 2012 @09:17PM (#41295241) Homepage

    I just posted this to the blog, but I will repeat it here --

    There is a very good reason we OS vendors do not ship with SysV default limits high enough to run a serious PostgreSQL database. There is very little software that uses SysV in any serious way other than PostgreSQL and there is a fixed overhead to increasing those limits. You end up wasting RAM for all the users who do not need the limits to be that high. That said, you are late to the party here, vendors have finally decided that the fixed overheads are low enough relative to modern RAM sizes that the defaults can be raised quite high, DragonFly BSD has shipped with greatly increased limits for a year or so and I believe FreeBSD also.

    There is a serious problem with this patch on BSD kernels. All of the BSD sysv implementations have a shm_use_phys optimization which forces the kernel to wire up memory pages used to back SysV segments. This increases performance by not requiring the allocation of pv entries for these pages and also reduces memory pressure. Most serious users of PostgreSQL on BSD platforms use this well-documented optimization. After switching to 9.3, large and well optimized Pg installations that previously ran well in memory will be forced into swap because of the pv entry overhead.

  • Re:Postgres-Curious (Score:5, Informative)

    by Art3x ( 973401 ) on Monday September 10, 2012 @11:14PM (#41295959)

    PostgreSQL replication is new (revision 9.1) so there may be little out there (Yes, there was replication, but with additional software, like Slony).

    I'm in the weird position of having used PostgreSQL mainly --- for seven years, writing dozens of applications --- but never MySQL. I've also used --- out of necessity only --- Microsoft SQL, Oracle, and Ingres, and PostgreSQL is much better. Just from a programming point of view, the syntax is, in my mind, simpler yet more powerful --- more ANSI-SQL-compliant, too, I've heard.

    Anyway, the point is, I've never used anything I like more. I adore PostgreSQL. It's so powerful. So many useful datatypes, functions, syntax. Not to mention it's ACIDity.

    To your question, though --- are there any good books to help a MySQLite move to PostgreSQL? Not that I've come across. But then again, I haven't found any good PostgreSQL books --- or even, for that matter, very well-written SQL books, period. They all are stupefyingly boring --- but I got what I could out of them.

    Actually, PostgreSQL's documentation is not that bad. In particular, try sections I, II, V, VI, and III, in that order. Skip anything that bores you at first. You can always come back. Honestly, there can't be that much of a learning curve for you, coming from MySQL.

  • by rycamor ( 194164 ) on Tuesday September 11, 2012 @01:04AM (#41296541)

    Generally there is very little in the sense of logical data manipulation capabilities in which Oracle exceeds PostgreSQL (usually the opposite, actually). The main advantage Oracle has is in the extreme high end of scalability and replication, and that benefit is offset by massive complexity in setup and configuration. Even there, PostgreSQL is closing fast these days, with built-in streaming replication, table partitioning, and all sorts of high-end goodies.

    I do all sorts of PostgreSQL consulting, and you would be surprised at the number of large companies and government organizations considering migration from Oracle to PostgreSQL.

    And if you *really* need PostgreSQL to go into high gear, just pay for the commercial Postgres Plus Advanced Server from EnterpriseDB and you will get a few heavy-duty add-ons, including an Oracle compatiblity layer.

    Also, IMHO one of the really cool things about PostgreSQL is the number of very geeky tools it puts at your disposal, such as a rich library of datatypes and additional features, along with the ability to create your own user-defined datatypes.

  • Re:Postgres-Curious (Score:5, Informative)

    by rycamor ( 194164 ) on Tuesday September 11, 2012 @01:17AM (#41296595)

    Unfortunately, I haven't found a really good guide of the type you are looking for. I can give you my experiences, going from MySQL to PostgreSQL, back to MySQL to support it at a large company, and then back to PostgreSQL. Generally, these days there is really *nothing* that I can find about MySQL that can't be done better in PostgreSQL. I mean it. At least for awhile MySQL could boast of native replication, but Postgres biw has that and it is arguably much more robust than MySQL's solution (had the misfortune to support MySQL replication for 2 years). Ditto with full-text indexing, and just about any other MySQL feature.

    Main differences:

    1. PostgreSQL is much more "correct" in how it handles data and has very little (essentially no) unpredictable or showstoppingly odd behavior of the sort you find in MySQL all the time. Your main problem in migrating an app to PostgreSQL will be all those corner cases that MySQL just "accepts" when it really shouldn't, such as entering '0000-00-00' into a date field, or allowing every month to have days 0-31. In other words, PostgreSQL forces you to be a lot more careful with your data. Annoying, perhaps, if you are developing a non-mission-critical system like a web CMS or some such, but absolutely a lifesaver if you deal with data where large numbers of dollars and cents (or lives) depend on correct handling.

    MySQL has provided for a fair amount of cleanup for those who enable ANSI standard behavior, but it is still nowhere close to PostgreSQL's level of data integrity enforcement.

    2. MySQL has different table types, each of which support different features. For example, you cannot have full-text indexing in InnoDB (transactional) tables. PostgreSQL has complete internal consistency in this regard.

    3. MySQL has an almost entirely useless error log. PostgreSQL's can be ratcheted up to an excruciating level of detail, depending on what you want to troubleshoot. Ditto with error messages themselves.

    4. MANY MANY more choices in datatypes and functions to manipulate them. Definitely a higher learning curve, but worth it for expressive capability.

    5. Don't get me started on performance. Yes, if you have a few flat tables, MySQL will be faster. Once you start doing anything complicated, you are in for a world of pain. Did you know that MySQL re-compiles every stored procedure in a database on every new connection? PHP websites with per-page-load connections can really suffer.

    6. Don't get the idea that PostgreSQL is more complex to work with. If you want simple, you can stick with the simple parts, but if you want to delve into complex database designs and methodologies, PostgreSQL pretty much opens up the world to you.

    - Glad to be back in the PostgreSQL world...

  • Re:Range data types (Score:4, Informative)

    by Kergan ( 780543 ) on Tuesday September 11, 2012 @04:19AM (#41297227)

    Oh, it's simple enough to do with two separate fields and a check constraint. That's how you'd do it i other DB engines, in fact.

    Ensuring there are no overlaps is an entirely different story, however: queries against those two fields cannot make any reasonable use of an index. The ranged type, by contrast, allows you to query the data using a nearest neighbour search and a GiST index.

    Think of a GiST index as indexing the smallest boxes that enclose your shapes of interest. When queried, the DB scans for boxes that overlap your box of interest, and discards rows that don't match the data's actual shape.

  • by gazbo ( 517111 ) on Tuesday September 11, 2012 @05:44AM (#41297497)
    Each client connected to the DB has its own child process - the shared memory is a buffer that is shared across postgresql child PIDs with the same parent. That's why the proposed patch would work using an anonymous shared memory segment - because the memory is only passed to children of the same process.
  • by serviscope_minor ( 664417 ) on Tuesday September 11, 2012 @05:55AM (#41297531) Journal

    and you would be surprised at the number of large companies and government organizations considering migration from Oracle to PostgreSQL.

    Not really.

    I've had no experience with the database end of things, but I've been on the receiving end of some other Oracle "products" at two places I've been. Once you've been Oracled, there is a strong incentive never to go anywhere near them again, no matter how they look on paper.

    When it comes for utter distain and hatred for their customers, Oracle make Sony look like rank ametures.

    As far as Oracle are concerned, the customer is a fool whose sole purpose is to be screwed over for as much cash as possible.

  • Re:Postgres-Curious (Score:3, Informative)

    by fuzzytv ( 2108482 ) on Tuesday September 11, 2012 @06:41AM (#41297673)

    Well, recommending a PL/SQL book as a source for learning SQL is a bit silly IMHO. Moreover, I find the books from Oracle rather bad - there are better sources to learn PL/SQL (e.g. the one from Feuerstein is a much better book).

    And in fact there's a great book about administering PostgreSQL from Hannu Krosing - it's called "PostgreSQL 9 Admin Cookbook" [http://www.packtpub.com/postgresql-9-admin-cookbook/book]. It's a great set of recipes for admins for common tasks, not an exhaustive documentation (that's what http://www.postgresql.org/docs/9.1/interactive/index.html [postgresql.org] is for), but if you want to learn how real pros admin the database, this is the right choice. And yes, I'd recommend it to newbies coming from MySQL.

    It might seem that the PostgreSQL community considered MySQL to be a toy database in the past, but it definitely was not a generally shared view. And this definitely changed recently - there's no reason not to join the community mailing lists / IRC channel and start a post with "I'm using a MySQL right now and I don't understand why PostgreSQL does SOMETHING."

  • by gazbo ( 517111 ) on Tuesday September 11, 2012 @08:52AM (#41298353)
    Well...arguably. This is the exact same argument as Apache vs Nginx, where Apache spawns a child process per client, whereas Nginx has a limited number of worker processes that handle a queue of requests as they become free. Nginx definitely has an advantage in terms of RAM when servicing thousands of (truly) simultaneous requests.

    While Postgresql does use the Apache model, there is middleware available (google 'pgpool' for an example) that amongst other things will queue requests so they can be serviced by a limited number of children. Of course this only matters if there are an awful lot of simultaneous queries (without the corresponding amount of server RAM).

    However; your claim about threads per CPU is oversimplified, and especially wrong with a DB server where processes will most likely be IO bound. With 1 core, for example, there is nothing wrong with having 5 processes parsing and planning a query for a few microseconds, while the 6th is monopolising IO actually retrieving query results. Or the reverse - having 1 CPU-bound process occasionally being interrupted to service 5 IO bound processes, which would negligibly impact the CPU-bound query, while hugely improving latency on the IO bound queries.

  • by m.dillon ( 147925 ) on Tuesday September 11, 2012 @04:43PM (#41305173) Homepage

    I don't think this is true any more. Threads are light weight... that's the whole point. They all share the same pmap (same hardware page table). Switching overhead is very low compared to switching between processes.

    The primary benefit of the thread is to allow synchronous operations to be synchronous and not force the programmer to use async operations. Secondarily, people often don't realize that async operations can actually be MORE COSTLY, because it generally means that some other thread, typically a kernel thread, is involved. Async operations do not reduce thread switches, they actually can increase thread switches, particularly when the data in question is already present in system caches and wouldn't block the I/O operation anyway.

    There is no real need to match the number of threads to the number of cpus when the threads are used to support a synchronous programming abstraction. There's no benefit from doing so. For scalability purposes you don't want to create millions of threads (of course), but several hundred or even a thousand just isn't that big a deal.

    In DragonFly (and in most modern unix's) the overhead of a thread is sizeof(struct lwp) = 576 bytes of kernel space, +16K kernel stack, +16K user stack. Everything else is shared. So a thousand threads has maybe ~40MB or so of overhead on a machine that is likely to have 16GB of ram or more. There is absolutely no reason to try to reduce the thread count to the number of cpu cores.


    There are two reasons for using lock memory for a database cache. The biggest and most important is that the database will be accessing the memory while holding locks and the last thing you want to have happen is for a thread to stall on a VM fault paging something in from swap. This is also why a database wants to manage its own cache and NOT mmap() files shared... because it is difficult, even with mincore(), to work out whether the memory accesses will stall or not. You just don't want to be holding locks during these sorts of stalls, it messes up performance across the board on a SMP system.

    Anonymous memory mmap()'s can be mlock()'d, but as I already said, on BSD systems you have the pv_entry overhead which matters a hell of a lot when 60+ forked database server processes are all trying to map a huge amount of shared memory.

    Having a huge cache IS important. It's the primary mechanism by which a database, including postgres, is able to perform well. Not just to fit the hot dataset but also to manage what might stall and what might not stall.

    In terms of being I/O bound, which was another comment someone made here... that is only true in some cases. You will not necessarily be I/O bound even if your hot data exceeds available main memory if you happen to have a SSD (or several) between memory and the hard drive array. Command overhead to a SSD clocks in at around 18uS (verses 4-8mS for a random disk access). SSD caching layers change the equation completely. So now instead of being I/O bound at your ram limit, you have to go all the way past your SSD storage limit before you truly become I/O bound. A small server example of this would be a machine w/16G of ram and a 256G SSD. Whereas without the SSD you can become I/O bound once your hot set exceeds 16G, with the SSD you have to exceed 256G before you truly become I/O bound. SSDs can essentially be thought of as another layer of cache.


With all the fancy scientists in the world, why can't they just once build a nuclear balm?