Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Programming IT Technology

PostgreSQL v7.2 Final Release 259

vvizard writes "After almost a full year of development since PostgreSQL v7.1 was released, the PostgreSQL Global Development Group is proud to announce the availability of their latest development milestone ... PostgreSQL v7.2, another step forward for the project."
This discussion has been archived. No new comments can be posted.

PostgreSQL v7.2 Final Release

Comments Filter:
  • by thing12 ( 45050 ) on Thursday February 07, 2002 @12:17AM (#2965581) Homepage
    This is a huge step forward in making PostgreSQL ready for deployment in the enterprise. Eliminating the locking vacuum in favor of a separate statistics gathering process is clearly the best part of this release.

    The only major hurdle left is replication built into the server.

    • by thing12 ( 45050 ) on Thursday February 07, 2002 @12:22AM (#2965610) Homepage
      Eliminating the locking vacuum...

      I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk.... the bonus is that before you might run vacuum once a week or so because of the impact it has to a production system by doing a full lock on each table it vacuumed, now you can run it much more frequently as all it consumes is cpu time. Shrinking size on disk is nice, but it's the statistics that help the query planner turn SQL into faster queries.

      • by GooberToo ( 74388 ) on Thursday February 07, 2002 @12:42AM (#2965673)
        Always remember that table statistics are used for approximate best guesses for inputs to the query optimizer. It is not uncommon or unheard of to actually see somes types of queries run slower after table statistics have been updated. I've seen this on Oracle, Sybase and SQL Server. I doubt that this is an issue unique to those RDBMS since the conceptual implementations and basis for algorithms tend to all be more or less the same.
        • That is definately a risk. It is often possible to harm overall system performance by upgrading an RDBMS that includes optimizer improvements. Any changes to an optimizer will change execution plans. Hopefully most of them get better, but a few get worse, often dramatically worse. Finding the ones that get worse and tuning them is an important activity. Bad SQL plans are often the biggest impact tuning activity, so it is very important to understand what will happen to your specific application before you make changes that affect how your SQL-statements are implemented.

          This is one area in which Oracle shows its power over the open source databases. (It's also a big oportunity because Oracle can be improved on). Oracle can actually tap into continuous statistics gathering on a per SQL level by using oracle's v$sqlarea dictionary view. If you need high-powered scrutiny on a particular activity, you can trace the session to logs and see the row statistics at every step of the exectution plan. Oracle has more optimizer hints, and has a facility to "pin" an execution plan, so that it won't be reevaluated if optimizer behavior changes. Oracle is working toward server-side SQL tuning, where you can ID bad SQL's and "intercept" them at runtime by adding hints on the server side. That will be an absolutely huge feature, since often SQL hits your system that you can't directly control but can predict.
          • First, I'd like to say thanks for the information on Oracle. That's a very interesting route that I'd not considered before. Very interesting indeed.

            As for the optimizer issues associated with updated statistics, I guess I would like to expand on this slightly. Most people don't realize that many optimizers have hard and arbitrary thresholds for determining the best possible algorithmic data path. This is why it is very important to understand not only your data distribution but the nature of the delta of your data distribution. Also key is is the understanding of how multiple indexes will effect the optimizer's (especially if multiple index types are supported by the RDBMS in question) output. Common causes for negative query performance is slight distribution statistical changes which hit an arbitrary threshold causing another data path to be recommended by the optimizer. Thusly the significance of Oracle being able to pin query plans can be profound.

            Because of these issues, my self and several other DBA's do not advocate blindly updating statistics simply for the sake of doing so. First of all, if your data distribution tends not to vary, this can result in nothing more than a waste of time. Furthermore, if your data distribution tends to vary only slightly, you'd better understand these trends as it's very possible timely and updated statistics may work against you depending on where you were within your trend's cycle.

            While I knew about the pinned query plans on Oracle, I didn't know about the planned server tuning hints. Abilities (assuming I understood you correctly) like this can go a long way toward addressing the issues I've raised about query optimizers.

            In short, know your database AND your data!
      • by nconway ( 86640 ) on Thursday February 07, 2002 @12:55AM (#2965704)
        I should probably clarify that - the full locking vacuum was separated out into two parts one which analyzes statistics and doesn't lock the tables and another which does what the old vacuum did by reordering data blocks to shrink the size on disk....


        This is incorrect (I believe it describes the situation after 7.1: the VACUUM ANALYZE command only needed to lock the table exclusively when VACUUMing, only a read lock was needed for ANALYZE).



        In 7.2, the ANALYZE command can now be used separately, as you say. However, there are other (more important) improvements: ANALYZE only takes a look at a statistical sampling of the rows in the table. This means that collecting statistics on even enormous tables is very fast. Furthermore, VACUUM has been made "lazy" by default: this means that it doesn't attempt to reclaim space as aggressively as before, but it no longer requires an exclusive lock on the database (instead, it cooperates with other DB clients). The old behavior is available as "VACUUM FULL", and it is suggested whenever you need to reclaim a lot of diskspace (e.g. you delete hundreds of thousands of rows of data and need the space).

        you might run vacuum once a week or so


        It was (and is) suggested that you run VACUUM once per day.

        it's the statistics that help the query planner turn SQL into faster queries.


        As far as I know, you only really need to update your planner stats when you change the statistical distribution of your data. Of course, running ANALYZE's reasonably often won't do any harm, and is a relatively cheap operation (performance-wise).
        • by thing12 ( 45050 ) on Thursday February 07, 2002 @01:17AM (#2965779) Homepage
          I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.

          The 7.1 vacuum analyze required table locks. Doesn't matter which phase of it required locks - it required exclusive locks because it vacuumed. By breaking that into a separate commands the need for downtime is reduced drastically (down to the example which you point out - deleting thousands of rows at a time).

          I know that you're recommended to run vacuum once per day, but I found that on a large database running on a fast server a daily vacuum took nearly 30 minutes to complete... that's 30 minutes of sequentially locked tables. Can't afford to do that every day - moving it to once a week may have degraded performance but it reduced the downtime window from 30 minutes per day to 1 hour per week.

          I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.

          • by nconway ( 86640 ) on Thursday February 07, 2002 @01:30AM (#2965808)
            I'm not disagreeing with anything you said, in fact you all but reiterated everything I said.


            No I didn't, read my post again.

            The 7.1 vacuum analyze required table locks.


            PostgreSQL has lots of different types of locks of varying granularities. Saying "table locks" doesn't mean a whole lot.

            Doesn't matter which phase of it required locks


            It does though -- in 7.1, splitting vacuum and analyze internally reduced the time that an exclusive lock needs to be held.

            By breaking that into a separate commands the need for downtime is reduced drastically


            This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.

            I'm just happy that I don't have to bring a production server to its knees once a week (or for that matter once a day) just to do some table maintenance.


            On that, we agree ;-)
            • No I didn't, read my post again.

              I'll say that again, you reiterated everything I was thinking that I implied in my post... but true not everything I said.

              This is where you're wrong. The reduction in downtime has nothing to do with allowing ANALYZE to be executed separately. It is entirely the result of the new vacuum code (which is "lazy", unlike a VACUUM FULL -- which does a 7.1-style VACUUM). In 7.2, running VACUUM (with or without ANALYZE) is fast, and doesn't require an exclusive lock -- so your database can continue serving clients while a VACUUM is executing. Whether you choose to run ANALYZE at the same time or separately is really irrelevant.

              Yes, I misspoke there and left out lots of detail - when I run a vacuum I always run it with anaylyze since there's no way to analyze in 7.1 without vacuuming (at least that I'm aware of). Why would I want to run vacuum followed by vacuum analyze only to have the tables exclusively locked twice? I want to do both, clean up unused rows, and update stats.. so I always run vacuum analyze. What's worse is that the 7.1 vacuum can deadlock with other processes because it uses exclusive locks.

              And yes, I know that in 7.1.x they changed the locking behavior so the vacuum analyze so it did not hold an exclusive lock during the portion where it was anlyzing the table. But that doesn't change the fact that it did hold an exclusive lock while vacuuming - which it had to do because the commands were joined together.

              Effectively there were TWO good things that happened. Analyze was broken into a separate command - that alone would have been an improvement since it would have allowed for vacuums to be done distinctly from analyzes. Vacuum becoming lazy is a second good thing - and yes a very good thing it is. The short of all this is that now you can run vacuums and analyzes separately all day long if you want.

              I'm not trying to get tha last word ;-) really! I think we both can agree that 7.2 is a big leap forward.

  • It's nice to see that they took a year, actually did a great deal of work making it better, and it's gone from 7.1 to 7.2. I always thought it was kinda lame when someone changes 3 characters in output and jumps from 1.3 to 3.0. Damn I need sleep.
  • These guys are here to kick ass and chew bubble gum -- but they're all out of gum.

    On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.

    Next step, master-master replication and clustering. Only wish I knew enough about either topic to help out. I'm doing lots of research on it though, maybe someday.
    • These guys are here to kick ass and chew bubble gum -- but they're all out of gum.

      "They Live". What a film.

      "We've got one who can see!"

      (Kind of on topic actually, people are starting to get through the hype associated with some things and adopt stuff that works, like PostgreSQL).

    • Re: plication (Score:3, Informative)

      by Smoking ( 24594 )
      I've recently set up a master-master replication environnement on Oracle 9i and I did some research to check if it was possible with postgres.

      In fact there are many solutions available (check techdocs.postgresql.org [postgresql.org] for a list...)

      The most advanced guys on the subject seem to be the swiss engineering school in Zürich. Here [inf.ethz.ch] is a list of their publications.
      They seem to have developped a replication scheme (Postgres-R) where they have better than linear performance improvement when they add new masters...Quite impressive

      Quentin
    • On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.

      Jeez. I had no idea. If our MySQL server had this limit:

      Uptime: 10277498 Threads: 7 Questions: 3678405287 Slow queries: 1584 Opens: 2299222 Flush tables: 1 Open tables: 256 Queries per second avg: 357.909

      I'd be receiving a phone call in about a month from my client wondering why the site stopped working. ;)

      And just to make this look less like flamebait: I use and appreciate both MySQL and PostgreSQL. Do get a life, kids.

      • On a serious note, I greatly welcome the 4 billion transaction limit being raised. Sure, it's 126 per second over the period of a year -- but it's one step closer to being the database for a major system.
        Jeez. I had no idea. If our MySQL server had this limit:
        Uptime: 10277498 Threads: 7 Questions: 3678405287 Slow queries: 1584 Opens: 2299222 Flush tables: 1 Open tables: 256 Queries per second avg: 357.909
        I'd be receiving a phone call in about a month from my client wondering why the site stopped working. ;)

        As others have pointed out, there's a difference between "Queries" and "Transactions". Now, people have often facetiously accused MySQL people from not knowing what a Transaction is, but here we have some solid proof. :-)

        Seriously, the previous PostgreSQL limitation probably did have some serious ramifications for bigger installations, so it's a good thing it's history now. When I bring PostgreSQL up in polite public these days, Oracle jockeys now feel the need to critique it more seriously than the "don't make me laugh" attitude of even a couple of years ago. Not because PostgreSQL is that likely to sink a real high-end RDBMS, but it will be an increasingly plausible contender for users who don't really need or use all of the cool (but expensive) features of Oracle or DB2.

  • highlights... (Score:5, Informative)

    by bob@dB.org ( 89920 ) <bob@db.org> on Thursday February 07, 2002 @12:24AM (#2965618) Homepage
    from http://www.us.postgresql.org/news.html [postgresql.org]

    Highlights of this release are as follows:

    • VACUUM: Vacuuming no longer locks tables, thus allowing normal user access during the vacuum. A new "VACUUM FULL" command does old-style vacuum by locking the table and shrinking the on-disk copy of the table.
    • Transactions: There is no longer a problem with installations that exceed four billion transactions.
    • OID's: OID's are now optional. Users can now create tables without OID's for cases where OID usage is excessive.
    • Optimizer: The system now computes histogram column statistics during "ANALYZE", allowing much better optimizer choices.
    • Security: A new MD5 encryption option allows more secure storage and transfer of passwords. A new Unix-domain socket authentication option is available on Linux and BSD systems.
    • Statistics: Administrators can use the new table access statistics module to get fine-grained information about table and index usage.
    • Internationalization: Program and library messages can now be displayed in several languages.

    .. with many many more bug fixes, enhancements and performance related changes ...

    • Re:highlights... (Score:5, Informative)

      by Zeut ( 24694 ) on Thursday February 07, 2002 @12:43AM (#2965675) Homepage
      One issue that is not mentioned in the release highlights is the marked improvement that is now available for SMP boxes. In some cases throughput has been increased by more than a factor of 2.
      • Someone please MOD this up!

        Thanks for the info. I didn't know that.

        What a karma whore... :P~
      • We run PostgreSQL on a dual-processor Linux box to feed our OpenBSD web servers. We got a HUGE speed gain from the OpenBSD -> Linux change (even when we ran it on a slower machine while testing it), and any SMP gains will be helpful.

        When we did OpenBSD we had to be VERY careful not to do more queries than necessary (including some complicated joins and then having PHP parse the results). With Linux as the database server, I feel that I can throw hardware at it (including moving to Solaris if need be) and optimize the queries a bit less to abstract the programming.

        SMP improvement is important, as the next step up for us is a Quad-Xeon processor, then Sun Hardware. (PostgreSQL seemed to run best on Linux and Solaris from the old website)...

        It's such a shame that they never figured out the PostgreSQL support model. I would have happily paid for some support, but it always seemed easier to get the OpenBSD port or the Redhat RPM than pay for their CDs. They never included much beyond installation support. I knew how to install it, having some support (not the mailling list) for some of my optimization questions would have saved days and been worth a support contract.

        Alex
    • There is also a new version of PostgreSQL AutoDoc v0.31 [www.zort.ca] available that works with the new PostgreSQL.
  • by Trepalium ( 109107 ) on Thursday February 07, 2002 @12:28AM (#2965633)
    I really wish there was an embedable version of PostgreSQL... It's a very good database, but it's sometimes a real pain to write a program that ties together a SQL database with anything else, unless it's a local-use only program. I know MySQL added this feature in 4.x (but their transaction support is too new, IMO).
    • What is it you're having problems with?

      I've used ecpg (ESQL/C, similar to Oracle's Pro*C) since I started using postgresql years ago, and have had no problems at all accessing the data from C or C++ code.

      If you're using java then this is even more of a no-brainer - just used the JDBC binding.
    • I only have one real gripe about PostgreSQL- I hate the upgrade path.

      Having to dump the database to disk and re-import is a bad thing IMO. Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)

      A separate program to preserve LOBs I can rationalize (it's a lot of generally unneeded overhead since few people use LOBs).

      It would really be nice if someone would write some wrapper programs to check for foreign keys and LOBs, then wrap the pg_dumpall & pg_dump commands with the appropriate options into one set of programs.
      • I'm a postgres developer and I really have no idea what you mean here!

        Postgres always keeps its integrity constraints, including when you dump and restore. It's done this, as far as I am aware, since at least 7.0.

        LOBs are no longer a problem, since 7.1 supported unlimited row length with binary or ascii data - just use 'bytea' or 'text' fields...

        Chris
      • . Having to add a switch to keep integrety constraints is a very odd thing for a database (shouldn't the default be to *keep* integrety constraints?)

        If you are certain that there are no constraint violations, because you've just exported from one database, then disabling them would make importing into a new database much faster. It's common practice, you just re-enable them before you make the new database available for transactions.
      • It sure would be nice to be able to do major hackery with the data formats, and have an easy in-place upgrade; unfortunately, this comes with two big costs:
        • It takes a lot of work to do this, which will seem pointless to many when "dump/re-import" is already there and works;
        • More importantly, debugging this, across umpteen platforms, possible variations on how folks compiled it, and moves between minor version numbers that may differ more or less than you'd expect is really a lot of work.

        The unavailability of a "Ext2-to-ReiserFS" translator, or Ext2-to-JFS or Ext2-to-XFS occurs for much the same reason, albeit with the further challenge that the "upgrade" would somewhat more resemble a "MySQL-format-to-PostgreSQL-format" conversion :-).

    • My only feature request would be replication facilities.
  • Guys, lets keep the signal/noise ratio high for once?
  • Will the human genome project be upgrading to this?

    I like it when people use linux on non-x86 architectures.
  • I *just* finished my FreeBSD 4.5 upgrade on my database server last weekend. It was a big effort, as I've got multiple databases with tables ranging from a few to 7 million rows with referential integrity and average at least 1 insert every ten seconds every day.

    Now I've gotta do it again!
  • by Sivar ( 316343 ) <charlesnburns[ AT ]gmail DOT com> on Thursday February 07, 2002 @01:01AM (#2965722)
    PostgreSQL is an ACID compliant database. MySQL is not (unless that has changed recently--if so please let me know).
    ACID (an acronymn for Atomicity Consistency Isolation Durability) is a 'keyword' that business professionals generally look for when evaluating databases. Frankly, non-ACID databases aren't taken very seriously, even if they are used by the likes of Yahoo and Slashdot (like MySQL is).
    Here is a quick description of what it means to be ACID [webtechniques.com] compliant:
    1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.

    2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.

    3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.

    4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.


    It is difficult to trust mission critical data to a database that does not guarantee that it will complete not screw up (short of a bug, of course), this such compliance--even when it is more political than technical--is very important.
    • by Anonymous Coward
      All true. In fact, MySQL is neither relational nor a DBMS by any stretch of the imagination, if you really understand what these terms mean.

      People who try to argue otherwise (including the developers of MySQL), need to go back to school to relearn their math and CS fundamentals. There's a bit of discussion of this at http://searchdatabase.techtarget.com/tip/1,289483, sid13_gci788645,00.html?FromTaxonomy=%2Fpr%2F28487 2
    • Here's an example of why an ACID database is useful that hits close to all of our hearts - Slashdot moderation:

      You may have noticed that if several people try to whack a troll at the same time, they all expend one moderator point, even if only a fraction of those points were required to push that troll down into the dreaded depths of -1.

      If an ACID complient database were used, and the two steps of whacking the troll, and deducting the moderator points were placed in the same transaction (with a check constraint on the score of the posts to prevent them from dropping below -1) then the later moderators who tried to whack the troll would not have their points deducted, as the transaction would rollback when the constraint on the score of the post was exceeded.

      Alas, mysql is not ACID complient, and so this sensless waste of moderator points continues to this day...
      • Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)

        You can use transactions or locking with MySQL to keep consistancy on this particular issue as well; however, under either database, the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point -- which is why slash doesn't implement it.

        For clarification, you can't use transactions on MyISAM tables [yet] but it's not like transactions in MySQL don't exist at all. Just use BDB, InnoDB, or Gemini table types and you get transaction support.

        Anyway, I don't care which DB you prefer or use or promote or whatever. I have run into programming problems with both Postgres and MySQL alike.

        So, even though you are quite incorrect (which your probably already knew), at least you got me to respond to your troll. :)

        Let the whacking commence!

        ~GoRK
        • by Pathwalker ( 103 ) <hotgrits@yourpants.net> on Thursday February 07, 2002 @03:11AM (#2965995) Homepage Journal
          You Said:
          Actually, this is not the fault of the database. If you mod a post that is already +5 up or mod a -1 post down you will lose a mod point and the score will go unchanged (the moderation total values will increase though)

          I Reply:
          I have a hard time believing that this behavior started out as a feature. I find it much more likely that it was initially a bug. This bug, being found useful was then elevated to the status of a "feature".
          You are correct that it is not the fault of the database, but transaction and constraint support at the database level would have made it easy to prevent this problem from ever cropping up in the first place.

          You Said:
          the use of a transaction to record a moderation is fairly frivilous and probably more of a waste of CPU time than moderation point

          I Reply:
          For a system which recieves as much activity as Slashdot, and with a constant stream of friendly trolls looking for any crack in the system that they can use to share the sight of their favorite gaping asshole with the unwilling members of the rest of the population, if I were coding it, I would insist on inserting checks of basic constraints at different levels of the system. The database layer is your last line of defense against abuse of the system.

          Secondly, these double checks are useful for finding errors in other levels of the system. Remember the problems that used to crop up from time to time with comments being moderated to -2 or to 6? If the value of the moderations was constrained in the database, not only would users not see this problem, but an error log generated (for the admins only) when a transaction is rolled back in a situaition where it is not expected would have helped isolate the fault very quickly.

          The database level checks would also help against rogue activity of people in positions of (limited) trust. Worried about an editor editing one of their accounts to give themselves a huge number of modpoints? Cap the level in the database at 5; it would make it impossible for this nefarious subterfuge to take place.

          As for the speed issue; if you are willing to sacrefice verification of correct operation for a small increase in speed, you have severly underspecified your hardware requirements.


          Finally, I would like to include a small SQL fragment, showing some of the checks that I would feel are absolutely necessary for a web based discussion system that people are trying to subvert:

          --First we create a table for a couple of users
          CREATE TABLE "users" (
          "uid" integer serial,
          "mod_points" integer default 0,
          "name" text not null,
          CONSTRAINT "user_mod_const" CHECK (((mod_points > -1) AND (mod_points < 6)))
          );

          --now a table for some posts
          CREATE TABLE "posts" (
          "date" timestamp with time zone DEFAULT 'now()',
          "pid" serial,
          "parent" int4
          "uid" int4,
          "mod" integer DEFAULT 1,
          "body" text not null,
          "section" integer,
          CONSTRAINT "mod_const" CHECK (((mod > -2) AND (mod < 6))),
          CONSTRAINT "user_key" FOREIGN KEY (uid) REFERENCES users(uid)
          on delete cascade
          on update cascade
          );
          -- the constraint to ensure parent is equal to zero, or another pid in the posts table is left to the reader.

          --And now for a function to access them. (Remember - direct SQL is icky; run things through functions to ensure a consistant interface)
          CREATE FUNCTION "mod_down" (integer,integer) RETURNS integer AS '
          begin; update users set mod_points=mod_points-1 where uid=($1);
          update posts set mod=mod-1 where pid=($2);
          commit;
          select mod from posts where pid=($2);
          ' LANGUAGE 'sql';


          As you can see, this nicely serves as a check to ensure the restrictions I mentioned above. With it being so trivial to add the checks, I can't see any reason to not take this extra step to eliminate nasty surprises.
      • MySQL doesn't have begin/commit, but I believe individual updates are atomic. You could do an update like this:

        update posts set mod=mod-1 where pid=($2) and mod >= -1;

        Then look at the row count and only deduct a moderator point if there's at least one row updated.

        I agree that begin/commit is important; it's just that you can do without it in this instance so long as an individual update is atomic.

    • I find it a bit odd to claim non-ACID aren't taken seriously while in the same sentence mentioning that Yahoo uses a non-ACID database system. Obviously somebody there took it seriously enough to roll out...
      • Re:while that's true (Score:2, Interesting)

        by Sivar ( 316343 )
        Fair enough.

        However, Yahoo is not a typical business. As the Yahoo article [mysql.com] says, "Finance managed its database needs through homegrown flat files and Berkeley DB databases. But those solutions proved to be inflexible and not scalable enough for their needs..."
        Homebrew flat files will be wooped by any decent DB server, especially a blazing fast one like MySQL, any day of the week. Yahoo also tends to embrace open source and new technologies more readily. Most of their servers, for example, run FreeBSD and have since the beginning.
        This is a slightly bad example considering that FreeBSD is unquestionably a product that you can trust your mission critical data to, but it illustrates to a degree the type of company that Yahoo is.

        I, myself, would trust MySQL with anything that didn't matter to someone else. In the end--it's politics, and "nobody ever got fired for using an ACID compliant database." ;-)
    • MySQL is ACID compliant if you use the InnoDB table type. InnoDB comes with the most recent version of MySQL. It supports transactions and rollback, row level locking, and referential integrity. Slashdot uses InnoDB.
    • MySQL is most definitely capable of supporting ACID functionality.

      One of the nice features of MySQL is the capability of having pluggable persistence managers. An example of that is the default, MyISAM, which you are correct in saying does not support ACID. But with the release of MySQL-Max [mysql.com], which happened awhileago(tm), and MySQL v4 out of the box, support for 3 additional backends was added, BerkelyDB [sleepycat.com], Gemini, and InnoDB [innodb.com], all of which have complete ACID support. InnoDB also supports row level locking and even an initial implementation of foreign keys.

      InnoDb is is in use here at Slashdot as well as a good deal of other sites [innodb.com] demanding high-transaction throughput with full ACID support.

      With the addition of foreign keys and stored procedures functionality, all of which are on the slate for the 4.x series, the reasons not to use MySQL are lessening every day.

      Side note: Yeah, I know Gemini is the red-haired stepchild of the MySQL world. It's still a decent table manager.

      • Yes, you can move to InnoDB tables, but your performance goes South quickly, thus taking away one of the few reasons to use MySQL.

        One of my clients had a Linux Apache/PHP/MySQL box that was being heavily pounded by 250+ web-based instant messaging users. (I know... using a database to handle instant messages :-( ). Anyway, I, in my simple logic, thought

        1. The message tables are constantly subject to INSERT, SELECT, and DELETE queries.

        2. MyISAM tables require a full lock for every INSERT or DELETE

        3. InnoDB tables have row-level locking

        4. Thus, I will install MySQL-Max, and use InnoDB tables for the messaging data, thus relieving PHP of the continuous connect...wait...reconnect scenario

        Well, I read every note about configuring MySQL for performance, and I made the change to InnoDB, and promptly every single one of the 30 or 40 MySQL processes began taking up at least 60 MB RAM! The system slowed to a standstill, and I had to roll back to regular MyISAM tables 'real quick'. (the users on the system take about 10 seconds to start pestering the admin with trouble reports)

        I know the real answer is to use shared memory or some such, but anyway, the performance benchmarks always show PostgreSQL ruling when you have mixed INSERTS and DELETES.
        • One of my clients had a Linux Apache/PHP/MySQL box that was being heavily pounded by 250+ web-based instant messaging users. (I know... using a database to handle instant messages :-( ).

          What's the real problem using a DB for instant messaging? What do you propose as an alternative?
        • Sounds like your tuning was out of whack. You're not very specific about what you tuned (heck, if everyone was specific what fun would Slashdot be?) but I'd bet you had 2 wires crossed somewhere. We're running an system with, what would appear to be, a larger number of inserts and retrievals, for stock quotes and have had completely different results from your experience. We're very happy with the performance and find in a mixed insert/select situation InnoDB is quite a bit faster than MyISAM.

          After your bad experience did you send an email to Mr. InnoDB, Heikki Tuuri [innodb.com]? He's quite attentive and I'm sure would like to hear what happened with your application.

          Not to mention when you're playing with databases or other vaguely unknown programmatic quantities, best not to test them in production. Perhaps if you had run the system in a test environment the lessened pressure would've enabled you to be better equipped to resolve whatever situation was occurring.
    • It occurs to me that an anti-MySQL rant is not terribly ontopic. I won't say "Karma Whoring", but that might be the POV of the moderators that gave you three Overrateds and a Troll. Or maybe the Slashdot editors are justed peeved at your implied criticism of their design choices...

      Anyway, you do have a good point and you've neatly summarized one requirement of a "real" database. I would just caution not to base all comparisons between two DBMSs on ACID support -- especially if one of them is MySQL. After all, there are a lot of specific things that distinguish a serious DBMS from a toy.

      Not that MySQL has a lot of them. The only one that comes to mind is the SQL interpreter. A lot more is missing, and a lot of it is stuff most programmers will miss a lot more than transactional integrity.

      The biggest omission, to my mind, is support for relational queries.

      "BZZZT!" I hear you saying. "What do you mean MySQL doesn't support relational queries? You can do a join with select/where! And..."

      Well, you can specify a join in MySQL. (Just as you can if your "database" is a simple collection of text files [eu.org].) But joins aren't very useful if you can't do nontrivial things with them. And you can't, because MySQL takes a terrible performance hit if your query involves more than one index. That's why Slashdot now numbers posts sequentially [slashdot.org]. Discouraging the first-posters was merely a side effect.

  • by murphj ( 321112 ) on Thursday February 07, 2002 @01:01AM (#2965723) Homepage
    I didn't notice anything about online backups, point-in-time recovery, or standby databases. Is any of this possible on PostgreSQL yet? How about clustering/parallel server. Seems like these are important features to become an Oracle/SQL Server replacement.
    • by pthisis ( 27352 ) on Thursday February 07, 2002 @02:18AM (#2965903) Homepage Journal
      I didn't notice anything about online backups, point-in-time recovery, or standby database

      Online backups: yes, for quite some time

      point-in-time recovery: postgres uses WAL undo/redo logging, but I'm not sure what the state of rollback tools is at the moment.

      Standby database: Assuming you mean Master/Slave replication, this is one of the major features planned for 7.3; 7.x has added a lot of the infrastructure needed for replication, and by 7.4 they hope to have multimaster replication (ie a fully distributed database).

      SONY. Because caucasians are just too damn tall.

      Crazy People. Hysterical movie.

      Sumner
      • Point in time recovery is not yet available AFAIK. This is one critical feature that distinguishes commercial databases from Postgresql.


        My company is using Postgresql in a traditional OLTP context, and we would very much like to have the point in time recovery. At some point it could make the difference between us continuing in the open source mode or switching to (ugh) Oracle!


        I do understand that PIT recovery is being worked on. I wish I had time to help!

    • MySQL has master-slave replication.
      In fact, you can make a trees of replication
      (good for high transaction volumes with massive
      redundancy) or even daisy-chain replication into
      a ring (giving you master-....-master).

      It's too bad that it takes about 2 years to
      update the public perception when a product
      transitions from toy to tool.
      .
  • by jadavis ( 473492 ) on Thursday February 07, 2002 @01:11AM (#2965763)
    I have been a great fan of PostgreSQL since the 6.5.3 days. It has really come a long way, and it is very nice to see recognition on slashdot. I didn't notice any of the previous releases on slashdot, and I wasn't aware the editors paid it tooo much attention.

    I think, above everything else, I appreciate the development process of PostgreSQL. I have read the mailing lists for a long time now, and I have learned a lot from the comments of the developers. The "hackers" list sometimes gets well above my head, but most of the issues are very intresting to read, and are applicable in many areas of logic. I commend all the developers for their hard work as well as their great talent.

    Jeff Davis
  • I've been using mysql for close to three years now, maybe around 6 months back I tried out postgresql, I switched a site or two over to using it. I did some lame inaccurtate benchmarking of a "typic page", pgsql was twice as slow at the time using v7.1, untweaked, but mysql was also untweaked.

    Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc. Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible? (yes I know you have to give it the -i option for network access)

    I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip. You have to add all these weird rules to a config file to enable these mysterious, "accounts with passwords". Documentation inside the actual config files makes me dead btw.

    I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide. I know, rtfm, and figure it out, trust me I have, there's a boat load postgresql docs all split over 10 different manuals it seems.

    Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from /etc/rc2.d/ etc. pg_ctl doesn't exactly cut it. I wrote one myself that's basically a wrapper for pg_ctl, but it's a major hack, I can clean it up and post it if anyone's intersted.

    Overall a good db, it definitely takes way longer than mysql to set up and understand, but the added features make it worthwhile. Even if you don't use the triggers, out the box transactions, and sub query support right away, you'll be glad when do finally want to use them to help you out with data integrity. Sorry, I'm rambling, I have no real point, this is more of a "this is my experience with this thing post". Maybe it will be of some use to someone, and hopefully I'll get an intelligent post or two setting me straight. :)
    • by Moosbert ( 33122 ) on Thursday February 07, 2002 @01:37AM (#2965823)
      Anyway's let me tell you, pgsql's user permissions still make my head swim, it's a nightmare. I mean, ok there's like how many different ways to authenticate a user, plain text password, crypted password, now md5, ident, local ident, kerberos, etc etc.


      Options are somtimes considered to be a good thing.


      Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?


      Add something like this to your pg_hba.conf:


      local sameuser trust

      host sameuser 127.0.0.1 255.255.255.255 trust


      I guess another kind of oddity about the pgsql is that out of the box, it only does ident type local socket authetication, no tcp/ip.


      We like the default setup to be reasonably secure.


      I've looked forever, but I've yet to find a "mysql to postgresql" quick start guide.


      try here [postgresql.org]


      Also, would it be darn nice to include a start/stop script that reads only config files and can be linked from /etc/rc2.d/ etc.


      It's in contrib/start-scripts. Or you might as well download the RPMs.

      • Seriously, what's the "preferred" way to add a normal, non super user, only has select, insert, update, and delete access to a given database that can connect from the local machine, and remotely. Is this even possible?

        Add something like this to your pg_hba.conf:

        local sameuser trust
        host sameuser 127.0.0.1 255.255.255.255 trust

        That's not authentication! "trust" just allows logins, period. Try "psql -U postgres" as anyone on that machine. You'll instantly be logged in as the superuser.

        Something like this works fairly well on Postgresql 7.1:

        host all 127.0.0.1 255.255.255.255 ident sameuser
        host all 0.0.0.0 0.0.0.0 password

        Then enable TCP/IP connections ("tcpip_socket = true" in postgresql.conf)

        Very important: make sure your ident server is trustworthy. Many ident servers have an option to allow a user to fake identification. Turn it off.

        Also, the config I posted there will let any user connect to any database. That's the simplest, but not the most secure. The "sameuser" in the database field won't be enough to let the superuser connect to databases. You might add a seperate line for that with an ident map containing only postgres (the file would have only the words "postgres postgres" in it, on one line). And then "all" in the database field with that map. I.e., "host all 127.0.0.1 255.255.255.255 ident postgres"

        For remote connections, just make sure they have a password in the database:

        create user slamb with password '12345';
        alter user bob with password 'newpassword';

        There's no authentication method here specified for UNIX domain sockets, so they just don't work. You'll need to set the PGHOSTNAME="localhost" environmental variable for stuff to authenticate correctly. I did this because pgsql 7.1 did not support ident on UNIX domain sockets. pgsql 7.2 now does, on certain platforms. (Just replace "host <db> <ip> <netmask> ident <map>" with "local <db> ident <map>")

        pgsql 7.2 adds pam support. If your UNIX and PostgreSQL usernames correspond, it should work.

        pgsql 7.2 also adds support of encrypted passwords. There's an option for storing password encrypted in the database and an option for challenge-based encryption. I think these methods are incompatible - good challenge-based encryption requires the password be stored in plaintext on the server.

        There has been Kerberos auth for some time. I'm trying to switch over to this now, as I'm setting up Kerberos on my network. It's a more complicated system to set up correctly, though. Get something else working first.

        Official docs are here [postgresql.org]

  • by flacco ( 324089 ) on Thursday February 07, 2002 @01:59AM (#2965862)
    I'm sure I'm not the only slash-dotter who was on the verge of exceeding the 4 billion transaction limit on their pgsql-based Anime fan fiction submission website.
  • Congratulations to the pgsql developers on the new release. It looks like the only showstopper now for heavy duty use is lack of replication. I know that there are developers working on this. Does anyone know what sort of progress is being made?
  • PostgreSQL Books (Score:3, Informative)

    by LarryRiedel ( 141315 ) on Thursday February 07, 2002 @03:35AM (#2966005)

    There are a few decent books about PostgreSQL out there now. It is so much nicer than a few years ago.

    Practical PostgreSQL. [amazon.com] I think this one just came out as a bound book. I just got it a couple days ago and it is pretty good. It is also online. [commandprompt.com]

    Postgresql : Developer's Handbook. [amazon.com] I (as a developer) like this one best of all that are out now.

    PostgreSQL Essential Reference. [amazon.com] This one is pretty good, but I would not say it is essential. :-)

    Beginning Databases with PostgreSQL. [amazon.com] This is one of those Wrox books which is about 10000 pages, including 80% of what I want to know and 2000% of what I don't.

    There other others, but I think they are weaker. I was disappointed with the one just called PostgreSQL. [amazon.com]

    • PostgreSQL also comes with some great documentation (probably the best I've yet seen with any software I've used (the only contender I can think of offhand might be the Devpac assembler on the AtariST many years ago))
      (This is also of course online, e.g. the 7.1 version is up at http://www.ca.postgresql.org/users-lounge/docs/7.1 /postgres/ , or interactivly the interactive version http://www.postgresql.org/idocs/)

      So how do we encourage other projects to develop such documentation, which I assume is very time consuming and difficult, as well as being for very little reward (How often do we see posts appreciating new software features compared to those acknowleding the associated documentation?)
  • We have a live system running on postgres 7.1 Now what about upgrading?

    When should we do it? Should we wait a couple of weeks for any possible patch/point release, or is this one granite-rock-solid?

    Why should we do it? The current DB is working just fine. Are there any compelling new features?

    How do we do it? Are the database file binary compatible? Do we need to export and reimport data? Are the RPMs of postgres 7.2 out?
    • There are some potential issues - the formatting and behaviour of some outputs and functions have been changed slightly.

      Your applications might break and need modification.

      So test it out first - non live.

      Even if there are no known issues you should test it out first, you might be unluckily the first one to know you know :).
    • I've been using a 7.2 beta on a production website for a while now. I know it's not generally considered kosher to run a beta of a new release on a production website, but it isn't exactly a high-traffic site, and it's been running without a hitch since I started it up a month ago. I'd estimate it does about a few thousand transactions a day or so. (Not true begin-commit-rollback-transactions, just updates/inserts and selects.)

      A few good features...

      - vacuum doesn't need to lock things up all the time, which is definitely nice.

      - the speed increase is noticible, especially on a dual P3 SMP.

      - there have been a few changes to a number of the datatypes, like the increased resolution on timestamps (now recorded with milliseconds), changes to char and varchar types (they now reject strings longer than their limits -- i.e. a varchar(30) won't accept 31 characters by cutting off the excess).

      - you can supply passwords via md5 hashes rather than plaintext or using crypt, which is definitely good, especially for connections made over the wire. You can also supply the passwords via UNIX domain sockets now, which for some reason you couldn't before. (You had to start postmaster with -i and connect via an INET port for that sort of authenication. Ew.)

      The databases are not binary compatible, so upgrading from 7.1 to 7.2 will require a pg_dump-from-7.1-initdb-and-dump-into-7.2. Life is rough.

      I've been working with it quite a bit and I haven't had any real problems throughout beta testing and the RCs. The "production" site will probably be upgraded later today, or tomorrow.

      I have no idea about RPMs, since I always compile from source, but check out freshrpms or rpmfind.net or something.

      J
  • by pointwood ( 14018 ) <jramskov@ g m a i l . com> on Thursday February 07, 2002 @05:13AM (#2966250) Homepage

    Yes, I know it should be avoided, but I'll ask anyway since sometimes you have no other choice :(

    I'm currently using MySQL on a Win2k server and it actually runs pretty okay and is very easy to install. What about PostgreSQL? Last I looked at it - it was a lot more difficult to get running on Windows. Has that changed or are PostgreSQL still more or less *nix only?

    • if what your using works, why would you want to change it? And yes, pgsql works under windows using Cygwin.
      • by sc00ch ( 254070 )
        Maybe he wants to learn about pgsql and alternatives to mysql? That would be healthy!

        I myself with no experience of cygwin found it a pain in the ass to get postgres working on windows. There was no quick and easy howto on the postgres site at the time, im still unsure if a quick step by step guide exists today.

        I'm sure this puts a lot of people off from initially trying out postgres. Mysql makes it a lot easier to check out on windows, simply run the installation program and its up and running.
    • I haven't tested it under load, but I've installed and successfully tested PostgreSQL 7.1 under Cygwin [redhat.com] on Windows 2000.
      <bart
  • One feature I find really useful (I do lots of web programming) in MySQL that I couldn't find an equivalent of in Postgres is FULLTEXT indexes... The only thing I could find was a hack in the contrib repository...

    Is there any plan to add this or some equivalent? Is it already there and I just missed it? (it's one of the reason I haven't done versions of my open source releases for Postgres)

  • Wish List (Score:3, Interesting)

    by bwt ( 68845 ) on Thursday February 07, 2002 @01:10PM (#2968293)
    Here's my Postgres wish list:

    1. Point in time recovery
    2. Reconstruct SQL from write ahead log
    3. Function based indexes with SQL rewrite
    4. Materialized views with SQL rewrite
    5. Analogue to Oracle's v$sqlarea
    6. Wait statistics
    7. Tablespaces
    8. Inline views (from clause subselects)
    9. Parallel query capability
    10. Partioned tables
    11. Bitmap indexes
    12. IO monitoring (read/write per object)
    13. Dynamic sort and hash area allocation
    14. Detailed SQL tracing (rows per plan step)
    15. Multiplexed WAL writes
    16. SQL optimizer hints
  • Things on my wishlist/irklist are:

    Alter table improvements. Changing schemas now with triggers etc is a nightmare. How are others doing this?

    I can't figure out how to avoid "" around "every" "single" "identifier" which is annoying.

    Continued increased speed and performance. We can do database backed sessions in a mysql damn fast, and it has been tricky to get postgresql to perform as well.

    Reclaim space without forcing table locks. The fact is I'm always amazed folks can even lock their tables for an hour. How do you run 24x7 systems with that. The new VAC is a great step.

    Improve the website. It is ugly and hard to navigate. Just copy php.net/mysql.com or some other reasonable devloper portal.

No man is an island if he's on at least one mailing list.

Working...