Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Databases Software Upgrades

PostgreSQL 9.5 Does UPSERT Right (thenewstack.io) 105

joabj writes: For years, PostgreSQL users would ask when their favorite open source database system would get the UPSERT operator, which can either insert an entry or update it if a previous version already existed. Other RDMS have long offered this feature. Bruce Momjian, one of the chief contributors to PostgreSQL, admits to being embarrassed that it wasn't supported. Well, PostgreSQL 9.5, now generally available, finally offers a version of UPSERT and users may be glad the dev team took their time with it. Implementations of UPSERT on other database systems were "handled very badly," sometimes leading to unexpected error messages Momjian said. Turns out it is very difficult to implement on multi-user systems. "What is nice about our implementation is that it never generates an unexpected error. You can have multiple people doing this, and there is very little performance impact," Momjian said. Because it can work on multiple tables at once, it can even be used to merge one table into another.
This discussion has been archived. No new comments can be posted.

PostgreSQL 9.5 Does UPSERT Right

Comments Filter:
  • by Anonymous Coward

    Finally. That's all.

    • by Anonymous Coward on Monday January 11, 2016 @01:53AM (#51276309)

      PostgeSQL is one of those rare projects that delays a bit but makes sure that they plan their codebase ahead, and Get It Right The First Time.

      Nobody should ever whine about delays in projects that value correctness over first to market, especially in an open source project.

      • Re: Fucking finally. (Score:5, Informative)

        by Rei ( 128717 ) on Monday January 11, 2016 @04:40AM (#51276591) Homepage

        Postgres is leading edge in some things, like (ridiculously useful) table inheritance. Their implementation could have been slightly better (the most common complaint being that indices don't inherit, you need to re-add them on all descendants), but I'm very glad that they were early adopters on this one. They even pulled off multiple inheritance well.

    • Are you upsert that they took so long?

  • upserted, this morning.

    • by haruchai ( 17472 )

      Love the .sig, Oliver Reed's last role.
      R.I.P you crazy drunken Brit tough guy.

    • by KGIII ( 973947 )

      Heh, you're a good one to ask. I am not, and should not be confused for, a DB admin - in fact, I hate it. Oh, I've had to fight with them before and I suck at it. I've gone on about the "wizard" who did the job for us. I am forever grateful for his skills, to the point where is peculiarities did not bother.

      At any rate... Am I reading this summary properly? Is this summary saying that, prior to now, you could not update data that was in the database?

      For years, PostgreSQL users would ask when their favorite open source database system would get the UPSERT operator, which can either insert an entry or update it if a previous version already existed. Other RDMS have long offered this feature. Bruce Momjian, one of the chief contributors to PostgreSQL, admits to being embarrassed that it wasn't supported.

      Err... You couldn't update an entry if a previous version a

      • Re:I'm somewhat (Score:5, Informative)

        by Lord Crc ( 151920 ) on Monday January 11, 2016 @05:34AM (#51276703)

        Is this summary saying that, prior to now, you could not update data that was in the database?

        No. In some cases you want to use a table like a key-value storage. If the key does not exist you need to insert a row with that key. Otherwise you'll want to keep only one row with that key, and just update the value.

        One can always do a "if key exists then update else insert", but the problem is that this is not atomic, because the "key exists" check is a separate statement from the insert or update statement. This can lead to issues if you have multiple connections accessing the same keys at the same time.

        The UPSERT allows you to do this as one atomic operation.

      • Re:I'm somewhat (Score:4, Informative)

        by vikingpower ( 768921 ) on Monday January 11, 2016 @05:41AM (#51276711) Homepage Journal

        Lord Crc answered correctly. UPSERT = (INSERT iff not exists... else UPDATE ). The article is correct in the sense that other databases, especially open source one, do not always handle this one correctly. It was not part of "traditional" SQL and is rather new. Object-oriented and graph databases do not have any problems with such operators, as they're explicitly written to deal with this use case. For relational databases like PostgreSQL this is a harder one to get right. Whether PostgreSQL now really got it right, can only be proven by protracted use "out in the wild".

      • The difference was that you had to know in advance whether it existed and use a different command in each case.

        I have to say that I've rarely found that to be a problem, but it's always nice to have options.

        • Agree. It's something seen in the "natural" evolution of (computer) languages all the time. When the language ages gracefully, such "finer" options become part of it.

        • by KGIII ( 973947 )

          Ah ha! Thanks. I get it now. Well, I think... Basically, this allows more an if/or? If it is X (and should be Y) then change it to Y and if it is Y already then leave it as Y. But in one more command without actually having to use a longer statement to get the same results?

          That makes sense. I can join, add, merge, and stuff like that. I can even (sort of) do it in C, PHP, and probably bang it out in Perl. However, I hate it. I know it may sound odd but, for whatever reason - and I hold a PhD in Applied Math

          • It's more like "If it's X, change it to Y. If it doesn't exist at all, create it with a Y."

            Without this, trying to modify a record that doesn't already exist will cause an error. //to do: dig at MySQL goes here.

            • by KGIII ( 973947 )

              Cool. Thanks again. I should find some sort of database system to play with and see how it goes. Maybe redo an SMF install onto PostgreSQL and then see what I can break/tweak/learn. 'Snot like I'll be breaking the whole internet, just a small piece and it'll be wiped clean in a day or two, after I'm done playing. I'd not want to leave my mess open for others to exploit and then abuse. I guess I could do it locally. All of my hardware down here, at this place, was out of date - so I ordered a few new boxes,

  • by hrumph ( 4411339 ) on Monday January 11, 2016 @03:04AM (#51276409)
    This is great. I've been using PosgreSQL for a while now. It's one of those pieces of software that just does what it's told and doesn't let you down. While I'm saying this there are credible rumours to the effect that the Oracle merge operation is broken. Read the comments to the most upvoted answer at this stack exchange question [stackoverflow.com]. The final comment is:
    Not reliably. I ended up with retry loops in the client code. :( – Randy Magruder Aug 27 '15 at 16:05
    This makes me think that Bruce Momijan may have been thinking about Oracle's implementation of merge when he said that other implementations were handled very badly.
    • by Anonymous Coward

      PostgreSQL is maintained by full-time employees. None of the "devs" are doing favors for anyone, they're doing their job, while the owners try to keep the project relevant.

    • Merge is not really the same as upsert. Merge (using the dual psuedo table) is a potential solution to the problem as are a number of options using pl/sql. I think the real truth here is that oracle haven't seen the need to implement an upsert command. Its not difficult to code and would be easy to build a solution without the need of a special command.
      • by Anonymous Coward on Monday January 11, 2016 @06:57AM (#51276847)

        It is not that trivial, if you have to take into account race conditions and roll-back without using a global lock which would kill the performance. There are actually quite a few research papers just about this problem.

  • What would be the equivalent long-hand in "traditional" SQL?

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Something like this :

      IF EXISTS (SELECT id FROM [table] WHERE id = :id)
      THEN
            UPDATE TABLE [table] .......... WHERE id = :id
      ELSE
            INSERT INTO [table] (.......) VALUES (...........)
      END IF;

      • Oops forgot to log in ^^
      • In postgres, this can be wrapped in a RULE so it's transparent when inserting.

      • by hrumph ( 4411339 )
        Your answer is naive and wrong. See this Stack Overflow page [stackoverflow.com]. The solutions on this page work but none of them have the elegance of the newly available INSERT ... ON DUPLICATE UPDATE method.
        • by halivar ( 535827 )

          The GGP asked for "traditional" SQL, for which the GP offered the correct answer. You offered another vendor-specific solution, not standard ANSI. The GP was in no way wrong. If he was doing retry loops, sure; that would be wrong.

          • In terms of "traditional" SQL, the GGP's answer is wrong because it fails to be atomic. The whole thing needs to be wrapped in a transaction.

            • by halivar ( 535827 )

              If someone has to be told to put a mult-statement SQL write operation in a transaction...

              I kind of think that transactions go without saying, even for UPSERT (according to the Wiki page [postgresql.org], UPSERT will "guarantee insert-or-update 'atomicity' for the simple cases", but leaves me questioning what a "simple case" is).

              • You still aren't getting it. Transactions like you describe are not equivalent and have Race Condition issues that need to be handled by the client code (and that usually isn't handled cleanly) whereas UPSERT makes the transaction truly atomic, not just on a per transaction basis, but even when multiple threads attempt to handle the same kind of transaction simultaneously.
          • Bullshit. This [slashdot.org] is the correct answer. Until you understand that there is no equivalent to UPSERT, you don't understand UPSERT.
        • by halivar ( 535827 )

          For that matter, I give him a gold star with bonus internets because he tested EXISTS instead of COUNT > 0.

        • by mark-t ( 151149 )
          INSERT ... ON DUPLICATE UPDATE will only trigger the update on what is otherwise an attempt to insert a record that will violate unique index or primary key constraints, while the above poster's solution will update any and all records that match the query, which is what UPSERT does when things match the query.
          • by hrumph ( 4411339 )
            A transaction commit happens after the logic of the transaction is processed, and two or more parallell threads can start off with the same DB snapshot when processing their transactions. Supposing that one thread commits first, then the logic that the other one used in its processing will be invalid upon its turn to commit. I don't see how an UPSERT implementation would fix this. The solution arrived at by PostgreSQL is basically perfect.
            • by mark-t ( 151149 )
              It will know that the processing was invalid because the second one tried to modify records would discover that another process had locked them. The update would fail, and it would be up to the user to either reissue or modify the query.
              • by hrumph ( 4411339 )
                This doesn't sound any better than the already available retry loop.....
                • by mark-t ( 151149 )
                  Any automated loop would be a Bad Thing(tm). The correct thing to on any failure is to report the failure, not to simply try again... because there is no way for the computer to know that trying again is even what the person would want to do in the event of such a condition.
              • by Tablizer ( 95088 )

                So far it looks like there is no full equivalent. By not being based on sequential steps, the "UPSERT" has advantages. But the jury is still out...

    • by Anonymous Coward

      The MERGE command. Although PostgreSQL claimed SQL compliance, this command was not understood. Mind you, the MERGE command in standard SQL is so overly complicated that all legibility is lost. This is probably why a lot of other databases have chosen other syntaxes that work better.

    • by Anonymous Coward

      "Traditional" ANSI SQL:2003 has had MERGE for more than a decade, which is a more verbose but significantly more powerful version of UPSERT since you can describe multiple potential actions based on whether the target record is found or based on comparisons between the source and target data. MERGE can combine multiple INSERT, UPDATE and DELETE operations into a single atomic operation.

      It's sad that not only is PostgreSQL over a decade behind but they implemented the sloppy non-ANSI standard version.

      • Re:I'm curious (Score:4, Informative)

        by halivar ( 535827 ) <bfelger@gmai l . com> on Monday January 11, 2016 @11:51AM (#51278455)

        MERGE can combine multiple INSERT, UPDATE and DELETE operations into a single atomic operation.

        In PostgresSQL, Oracle, and SQL Server, MERGE is not atomic. Also, it is not UPSERT. You can use MERGE to accomplish the same end goal, but they are not synonymous and they do not work the same. Also, the syntax is idiosyncratic, and most extant implementations are problematic enough that MERGE is best avoided.

    • "What would be the equivalent long-hand in "traditional" SQL?"

      It doesn't exist. That's the point. This isn't implemented to save typing. It is true atomic insert or update.

    • by Ed Avis ( 5917 )
      There is no exact equivalent in traditional SQL. As others have pointed out, you can check exists and then insert, but that introduces a race condition; wrapping it in an explicit transaction might help depending on the locking model, but might still introduce failures that have to be introduced by client code.

      That said, if you can make some assumptions about what else is writing to the table then you can get fairly close. One technique I often use is like this:

      -- Insert the row if none with the same P

  • Oracle has included the 'merge' command for several iterations now, which does the same thing. It is a rich command - those interested can check the documentation at https://docs.oracle.com/database/121/SQLRF/statements_9016.htm#SQLRF01606

news: gotcha

Working...