Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Databases Open Source

PostgreSQL 9.1 Released 148

With his first posted submission, sega_sai writes "The new version of open source database PostgreSQL was released. This version provides many important and interesting features such as synchronous replication, serializable snapshot isolation, support of per-column collations, K-nearest neighbor indexing, foreign data wrappers, support of SELinux permission controls and many others. The complete list of changes is available here"
This discussion has been archived. No new comments can be posted.

PostgreSQL 9.1 Released

Comments Filter:
  • vs Oracle? (Score:5, Interesting)

    by pak9rabid ( 1011935 ) on Monday September 12, 2011 @02:19PM (#37379684)
    So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)? Anybody here tried making the switch?
    • by iamvego ( 785090 )

      Well it's always been able to do basic things like LIMIT, multi-row insert statements and correctly treats empty strings as not being null... which Oracle can't do for some dumb reason.

      • That dumb reason is backwards compatibility for their customers due to Oracle being older than the SQL standard.

        • by h4rr4r ( 612664 )

          So why not make the older behavior optional, rather than continuing this broken functionality?

          • Under the SQL standard a NULL string is different from a zero length string (""). It's subtle but it's a difference.

            Oracle does not differentiate between the two. A zero-length string behaves identically to a NULL. If a customer wrote an application where ("") equated to a NULL then fixing Oracle to differentiate between the two would break those applications.

            If they make it an optional feature Oracle needs to make it an opt-out feature in order to maintain backward compatibility while still allowing their

            • Re:vs Oracle? (Score:5, Insightful)

              by h4rr4r ( 612664 ) on Monday September 12, 2011 @03:01PM (#37380178)

              It is easier, it is still wrong. It should be opt-in to prevent people from using it for new installations.

              It is not a subtle fucking difference. It is a huge big honking difference. Either you don't know the information which is null or you know it to be "".

              • It's subtle in that people using Oracle coming from a standards compliant database usually have this one bite them in the ass big time without any idea that it was coming. I hate this behavior, and the us of NVs all over the damn place. Outside of cost, it is one of my biggest beefs moving to Oracle. It significantly increased the complexity of moving a large application from MySQL to Oracle. Every single query had to be re-written looking for places where this would get you. It is just plain dumb beha
                • It's also part of the Oracle lock-in. Try to move off Oracle, discover queries that vary depending on null behavior (such as OUTER JOIN) don't do quite the same thing anymore. Auditing for queries that misbehave due to Oracle's sloppiness here is a giant headache when moving to PostgreSQL from it.

              • by vux984 ( 928602 )

                It is not a subtle fucking difference. It is a huge big honking difference. Either you don't know the information which is null or you know it to be "".

                I'm curious, while in theory a good database can differentiate... how do you realistically capture the information in a user interface?

                You've got some form: Please enter your name:
                firstname :
                middlename :
                last-name :

                If someone leaves middlename blank and submits the form, do you think it is known to be "", or is it NULL? If you assume NULL and someone doesn't

                • by hplus ( 1310833 )
                  Imagine if your form did not always include a blank for middle name, though one was added later. People who filled out the form prior to the addition would have a NULL middle name, while those who filled it out later would have it set, if only to "" in some cases.
                  • by vux984 ( 928602 )

                    You didn't really answer my question, but rather side stepped it and gave an example where a column could have both.

                    I've certainly been there, but then why not simply:

                    UPDATE oldTable SET newField= ''

                    Rather than have the old records with NULLs?

                    If its a non-required field, that is blank most of the time anyway... what value is there is letting it stay for old records and then dealing with the possibility it might be null all over the place in one's code?

                • What is the actual right / normal course of action?

                  Allowing the user to specify "empty" vs. "nonexistent" is almost always the wrong answer. In general, data for human contact is either provided or nonexistent, not empty. This can be name components, address components, phone numbers, etc.

                  It's hard to set a generalized rule for handling of other inputs, but typically the data model will dictate whether a given "string" datum is optional (NULLable), or whether it has content requirements (empty allowed

            • by Bengie ( 1121981 )

              It's about as different as a null pointer vs a pointer pointing to an empty string object.

          • Because then it would be a bit easier to port the applications to other databases?

            Plus this particular "feature" is used on so many places of the current code base (and that's a huge amount of PL/SQL code) that it's almost impossible to fix. Plus it's actually a bit funnier, because the exact behaviour depends whether you use CHAR or VARCHAR2 and if you're in SQL or PL/SQL. And it's not the only funny feature in Oracle.

            Sometimes I have nightmares about the reasons that led the developers to implement it thi

        • by amorsen ( 7485 )

          Backwards compatibility does not prevent Oracle from supporting FETCH FIRST. See Limiting result sets [arvin.dk]. Row value constructors wouldn't be a problem to handle either, especially since Oracle already has the functionality just with a silly syntax.

        • by Nutria ( 679911 )

          And it doesn't have scalar data types. What fscked up RDBMS doesn't have *integers*?

      • With regards to LIMIT, Oracle does have a couple of equivalents:
        1. WHERE rownum < end

        2. select * from (select S1.*, ROWNUM rn FROM ( real query ) S1 WHERE ROWNUM <= end) WHERE rn >= start

        Obviously, the second syntax is a bit painful, but it works, and it has the important behavior that it doesn't calculate any of the rows that aren't returned (as Postgres says it does for LIMIT...OFFSET in 8.1, see the docs [postgresql.org]). This is important when you're selecting items 4990-5000 of a 10,000 row result set.

        I've us

      • by Bengie ( 1121981 )

        You complain about minor things. I have to import data from Oracle servers and for some reason, Oracle loves to dump out rows with different column counts.

        I have to deal with a lot of customers with a lot of different versions of Oracle, and they all seems to have had CSV files that I could not important because the header says 14 columns with their names, but some rows have 5 columns and some rows have 20 columns. Pulled those numbers out of my ass, but that's what happens.. A LOT.. F'n annoying as hell.

        Ev

        • I doubt that's an Oracle issue, my guess is they're using a custom-developed tool to export the data and it's buggy. I'm dealing with a lots of data exported from Oracle (CSV, columnar, ...) and I've never had this problem. External tables actually made exporting even easier.

          So while I'm a PostgreSQL fan, let's not blame Oracle for the mistakes of others.

          • by Bengie ( 1121981 )

            "my guess is they're using a custom-developed tool to export the data and it's buggy"

            Good point. We'd have to see what's common among the customers, but they always eventually fix it so we never dug further than sending back examples problem and telling them to fix it.

        • let me get this straight you're saying that:

          select col1,col2,col3
          from table;

          won't always return 3 columns? What are you smoking?

        • I've done plenty of exports of data from oracle databases and have never seen this. Are you sure that you're using CSV correctly?
          • by Bengie ( 1121981 )

            We don't do it ourselves, we tell our customers what we need and this is what happens.. Often. We have about 30 different customers who use Oracle, and nearly all of them have had this issues more than once at some point. A select few have never had this issue.

            • The only way I can see this happening is if the CSV doesn't have text fields enclosed with quotes where the text can include a comma or alternatively if there is a newline/carriage return in the middle of test fields.

              I just can't see how you could write a sql query that would return different numbers of columns per row unless you tried really hard. I bet you wouldn't be able to find anyone else having the same problem with oracle who isn't a customer of yours.
    • I expect that most wouldn't because they have already spend all the money on an Oracle License.
      However my experience is most shops (Including Oracle Shops) Don't use nearly close to all those cool features they provide.

      They just want a SELECT * FROM TABLE WHERE VALUE='TEXT'

    • Re:vs Oracle? (Score:5, Informative)

      by discord5 ( 798235 ) on Monday September 12, 2011 @02:50PM (#37380046)

      So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)?

      I can't speak for Oracle, but if you're interested in spatial stuff you should have a look at PostGIS [refractions.net]. We've recently been using it to store tons (magnitude of several million) of points and polygons, and we're very happy with it. We've got about hundred simultaneous users connecting to the WFS in peak hours, and it bears the load pretty well if you properly index your tables. I can't speak much for updates, since our database updates in bursts (we import new data every X weeks). I can't go too much into detail about the type of data other than that it's polygons, points, and mostly distance calculations and intersections.

      We briefly looked at Oracle Spatial for a while, looked at the pricetag and the project budget and made the decision to try the PostgreSQL+PostGIS combination and see how far it'd get us. We were pleasantly surprised. I had some experience with PostgreSQL before in the 7.X releases in a previous lifetime but in the end wasn't all that pleased with it, especially on busy servers. Nowadays, I'm running 9.0 and I'm pretty much content about it. Replication wise we've got a PITR setup up and running which is more than enough for our purposes. It's pretty well documented, but be sure to test everything, etc etc etc... It doesn't quite hold your hand when you're setting it up, so double check everything.

      I'm sure that there will be people on here that have more extensive experience with PostgreSQL (and Oracle) to fill you in on the juicy details, but in general I'm pretty pleased with it so far. It scratches my particular itch, and does so without all too much headaches.

      • Thank you! Your response was very helpful. I've been eye-balling PostGIS and was wondering: 1.) is it something somebody would bring up if it wasn't specifically mentioned in a question regarding spatial, and b.) how it performs. Based on your response, it sounds like it's pretty usable.
        • by Anonymous Coward

          We've been using it at my org for a lot of spatial data as well, though we're still stuck on MS SQL for production.

          One of the big advantages it appears to have over other spatial databases is that there are a lot of spatial apps out there that will nicely integrate to PGSQL, like QGIS, Grass, MapServer. There's plenty of tool suites as well, like FWTools (for windows) or GDAL/OGR/GEOS for *nix.

          In contrast, though MS has had spatial support for a couple of years now it's treated like third rate citizen - no

        • I don't think it would be mentioned - actually it's a separate package built on top of PostgreSQL (thanks to the ability to write custom data types etc).

      • ++ on this comment. My experience is similar. Postgres and PostGIS are very reliable, very fast and scale well.. If you set them up right. There isn't as extensive of a commercial support network for them as Oracle (duh) but there are commercial options and the online free communities are amazingly open, supportive and helpful. The GiST indexes which enable (I believe) a lot of spatial operations to occur in a timely manner are really impressive and learning how they work itself is a nice little CS continui

    • That depends on how many millions you have to spare on extensions.

    • Re:vs Oracle? (Score:5, Interesting)

      by greg1104 ( 461138 ) <gsmith@gregsmith.com> on Monday September 12, 2011 @03:28PM (#37380480) Homepage

      Lots of companies are converting from Oracle Spatial to PostgreSQL plus PostGIS because it's faster and has better compliance to GIS standards. The text of the talk isn't available, but the FAA Airports GIS and PostgreSQL [postgresqlconference.org] presentation was a typical report I was in the audience for. The FAA's first conversion happened very quickly: just export their data in a standard format, import into PostgreSQL, and tweak some queries. The result worked so much better that they've standardized on PostgreSQL for spatial applications at the FAA now. Internal projects needing a spatial database have to justify why they want the budget for Oracle Spatial, and it's default deny unless you have a really good reason.

      The addition of synchronous replication to 9.1 has made it a pretty even match for Oracle's Data Guard now. The main bonus is that you can control the integrity level you want at the transaction level. So you can have a database with a mix of important data (only considered safe when on two nodes) and fast, best attempt eventual consistency data, all in one place. Nothing else can replace Oracle at the top end while still having a lean enough mode to be competitive with NoSQL database [pgcon.org] when integrity isn't the top priority.

      We convert Oracle installs to PostgreSQL all the time at my day job [2ndquadrant.com]. The main obstacles I keep seeing that don't have simple solutions are 1) using a lot of PL/SQL, 2) differences in query handling, such as OUTER JOIN behavior or reliance on optimizer hints, and 3) can't limit the resources used by individual users easily in PostgreSQL yet. I actually have a design outline for how to solve (3)...would only cost a fraction of a typical Oracle license to sponsor that feature. EnterpriseDB's version of Oracle comes with PL/SQL compatibility, but only in a commercial product that lags behind the open-source releases--and buying from them just switches which vendor you're locked into.

      • Good post. Errata -- I think you meant to write "EnterpriseDB's version of [[Postgres]] comes with PL/SQL compatibility, but only in a commercial product that lags behind the open-source releases--and buying from them just switches which vendor you're locked into."

        • That's a lot like the PL/SQL-compatible variant of Firebird, called Fyracle [janus-software.com], which is still based on Firebird 1.5, not the current 2.1 or 2.5 series.

      • The release notes describe some OUTER JOIN changes that may help your (2).

        E.1.3.1.1. Performance

        Allow FULL OUTER JOIN to be implemented as a hash join, and allow either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed (Tom Lane)

        Previously FULL OUTER JOIN could only be implemented as a merge join, and LEFT OUTER JOIN and RIGHT OUTER JOIN could hash only the nullable side of the join. These changes provide additional query optimization possibilities.

    • by shish ( 588640 )
      As a demonstration of postgres' spatial abilities, check out OpenStreetMap [openstreetmap.org] and the various sub-projects
    • by 0xA ( 71424 )
      Using PostGIS and replication here, works very nicely. Out biggest problem was that be have billions of rows that can't be locked for a huge high traffic mobile app. Using a round robin load balanced pool of postgres servers in hot spare mode as read only DBs solved a lot of issues.
    • The streaming replication is generally equal to Oracle DataGuard (physical standby). The hot_standby actually gives you about the same as Active DataGuard, i.e. the ability to run read-only queries on the standby for free (you have to pay for that with Oracle). With Oracle you'll get a management console to handle all this, with PostgreSQL you have to set it up manually (5-minute task), but there are several tools that may help (e.g. repmgr).

      Spatial ... although it's not a built-in feature, there's a PostGI

  • by Anonymous Coward

    If you wish to use their replication implementation to increase performance, you will probably have to look elsewhere, I'm afraid. In the event that your primary server fails, you are required to promote one of the existing slaves to be the new primary server and all other slaves will require a fresh data dump from the new master. Maybe in another year (when 9.2 gets released) it will be ready for the masses.

    • It's not trivial to figure out, but we've been deploying PostgreSQL 9.0 without the problem you describe (must do a fresh dump from the master) for a while now. The repmgr [2ndquadrant.com] software we've released takes care of all the promotion trivia. Worst-case, unusual situations can require you use a tool like rsync to make an out of date standby node into a copy of the new master. That's not the expected case though.

    • by Lennie ( 16154 )

      A possible alternative is to use pgpool II, from the webpage:

      - Automated failover. If one of the two PostgreSQL goes down, pgpool-II will automatically let remaining node take over and keep on providing database service to applications.
      - Query dispatching and load balancing. In streaming replication mode, applications need to carefully chose queries to be sent to standby node. Pgpool-II checks the query and automatically chose primary or standby node in sending queries. So applications need not to worry abo

    • That is not true. If you promote the slave that's ahead of all the other slaves, then the other slaves can just reconnect to the new master. Tools like repmgr can handle this for you.

      And no one actually says you have to do a completely fresh base backup. Ever heard about rsync?

  • not excited (Score:4, Interesting)

    by roman_mir ( 125474 ) on Monday September 12, 2011 @02:47PM (#37380004) Homepage Journal

    I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.

    But there are real things that are missing. Most obvious is distributing of one SQL request into parallel processes or threads to speed up query execution on multi-core systems (which are all multi-core today). The other is the entire issue of attempting to calculate execution time and failing in various cases in the planner, like the really sad cases of completely mis-handling of the mergejoin estimates, which then forces people to set enable_mergejoin to false unfortunately, it's a sledgehammer approach, but otherwise things that can execute in a few milliseconds can take tens of seconds and even minutes instead.

    There are so many ways to improve performance and really kick it up, and instead there are more features added. I think database performance is now more important for PostgreSQL than features (unless this means introducing parallelization of single SQL requests.)

    Otherwise it's a good database, it already provides tons of features. The one weird thing that I find though, is that for replication or hot stand by or just for creating a dynamic backup, the segments that are written to the disk are always of fixed size.

    You can modify the size, which is 16MB by default, but you can only modify the size when you configure the source code before compiling it: configure --with-wal-segsize=1 - this configures the segments to 1MB, which allows the second drive to last that much longer if all you are doing is using a second drive to keep dynamic backup (and that asynchronous backup method, by the way, the problem that they are solving with "synchronous replication", it's that you either have these segments fill up, and then the segment is written to disk, or you wait until time expires for segment to be written to disk if you set checkpoint_timeout). I imagine treating fixed sized segments is easier than generating segments that are of exact size equal to amount of data that was produced in a time period, but it's a waste of disk though.

    The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).

    • The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).

      The big reason you don't find that and it would be a tremendous breakthrough, is that it is currently believed to be actually impossible to get that. Have a look at the CAP Theorem. http://en.wikipedia.org/wiki/CAP_theorem [wikipedia.org]

      • by vlm ( 69642 )

        The big reason you don't find that and it would be a tremendous breakthrough, is that it is currently believed to be actually impossible to get that. Have a look at the CAP Theorem. http://en.wikipedia.org/wiki/CAP_theorem [wikipedia.org]

        Most CAP arguments seem to rely on some combination of not understanding the concepts behind token ring networks, not understanding distributed hash tables, not tolerating latency, and/or trying to scale to a very small number (like 2) instead of a large prime number of majority voting servers. Or it assumes ENIAC level MTBF rates of individual voting devices and vote counters, etc.

        Don't get me wrong, CAP is a good theoretical argument, and educational to think about, it just doesn't apply to many real wor

        • Re:not excited (Score:5, Interesting)

          by fuzzytv ( 2108482 ) on Monday September 12, 2011 @05:24PM (#37381544)

          The reliability probably improved since ENIAC, but the the question still is "when it is going to fail" and not if it is going to fail. Because it is going to fail - it may be a drive, CPU, PSU, a network switch, an AC unit, the whole AWS data center ... something is going to fail.

          The beauty of CAP theorem as I see it that it says "You can't get all three at the same time, face it." If you don't need the strong consistency (and with most apps you don't), then ditch it and it'll be much easier and cheaper to built and scale the system. I'd say once you realize this inner beauty, it clears your mind - something like a Zen of distributed computing.

      • The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disk

        This exists for Postgres in the form of Yale's HadoopDB project: http://db.cs.yale.edu/hadoopdb/hadoopdb.html [yale.edu] http://radar.oreilly.com/2009/07/hadoopdb-an-open-source-parallel-database.html [oreilly.com]

        HadoopDB is comprised of Postgres on each node (database layer), Hadoop/MapReduce as a communication layer that coordinates the multiple nodes each running Postgres, and Hive as the translation layer. The result is a shared-nothing parallel database, that business analysts can interact with using a SQL-like language.

    • Re: (Score:3, Informative)

      by Anonymous Coward

      I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.

      Synchronous replication for many is a must have. In many cases this single feature was preventing adoption of PostgreSQL as many applications require synchronous replication support.

      Per column collation is another feature which is commonly required in complex applications; especially if they are multilingual capable.

      Extensions support is very important and will benefit anyone who uses any contrib or third party extension for PostgreSQL. Unless your database work starts and stop at the most basic of features

      • people love to lie and troll about.

        - more people saying that I am trolling, yet again. Yet again, incorrectly, I don't troll ever. [postgresql.org]

    • by vlm ( 69642 )

      The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).

      You need a middle-ware machine that understands enough SQL to send the correct request to the most optimized box, and a fallback slushware box to handle anything you didn't figure out how to manually optimize on the speedy indexed boxes.

      I realize this is a postgresql article but at one time I had a herd of little mysql boxes (data was replicated outside mysql) and each had different custom indexes set up that matched certain queries. So depending on which set of disk drive filling indexes fit the best, my

      • I address all of these shortcomings that I am writing about here within the application.

        Of-course it's much simpler for me to do from application perspective, because I know what the business logic is, so I break SQLs into pieces that can run in parallel, then I execute them in multiple connections against the database (thread or process per connection), and then I merge data as it comes back. This speeds up execution dramatically, not even close to what a single serial SQL can do.

        As to adding more machine

    • PostgreSQL changes are driven by what people want badly enough that they are willing to invest resources--development manpower, time, etc.--into that change. So your suggestion that performance is more important than these features can't be true. If performance were really the driver for most people, there would be more performance changes instead of all these features contributed. Extending PostgreSQL and getting the code committed is a lot of work to do right. There's very few people developing featur

      • Hey, I am just writing on my experiences, obviously people don't add stuff without reason.

        As to WAL segments - I am not using this feature for streaming replication, just to have an immediate backup to a separate disk, no hot stand by, nothing like that. Imagine a bunch of stores, each one has a small server for management. Data gets transfered to the central servers but also gets backed up to a separate drive that's only used for backup, nothing else. No hot stand by processes waiting for the main process

    • by Lennie ( 16154 )

      I'm kind of surprised that you mention parallel processing. In most tests I've seen PostgreSQL does add working with more cores than MySQL does.

      • As I said [slashdot.org] - a single SQL is not broken into smaller independent pieces to be executed by separate processors/threads. Separate connections/sql requests run in separate processes, but one sql request is not turned into many parallel executions.

        • by Lennie ( 16154 )

          ahh, ok.

          And sorry for the terrible english above. I most have been really tired. :-(

    • Not excited either. There is no uniqueness handling other than fail, as far as I can tell. Almost any other database has an "on conflict", "on duplicate key", or implements the MERGE command of the SQL standard (a totally illegible dragon of a statement, but still). A simple command like an "insert or ignore" or an "insert or replace" should not require server-side programming. How can agile developers work with such a database?
      • While Firebird supports update or insert [firebirdsql.org], who else does? Yes, the merge syntax is common, but that's a lot more annoying to use for day-to-day single-row updates in an OLTP environment. Also, what does "agile" have to do with it? Or did you not mean it in the project-management sense?

  • But I run Debian Stable on all my servers.

    Insensitive clod!

    • by vlm ( 69642 )

      But I run Debian Stable on all my servers.

      Insensitive clod!

      Keep an eye on backports.debian.org

    • That will be ready any day now [debian.org]. Packages for Ubuntu are already available [launchpad.net]. Debian packages for pre-release versions of PostgreSQL 9.1 have been around for some time already.

    • so do I, and for production server I only run packages, but we love Debian and have on our workstations/laptops too, so let's boogie!!

      lsb_release -a
      LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch
      Distributor ID: Debian
      Description: Debian GNU/Linux 6.0.2 (squeeze)
      Release: 6.0.2
      Codename: squeeze

      sudo apt-get install libreadline6-dev zlib1g-dev python-dev libperl-dev
      tar xvfz postgresql-9.1.0.tar.gz
      cd

Little known fact about Middle Earth: The Hobbits had a very sophisticated computer network! It was a Tolkien Ring...

Working...