PostgreSQL 9.5 Does UPSERT Right (thenewstack.io) 105

Posted by timothy on Monday January 11, 2016 @02:21AM from the oh-that's-upsert dept.

joabj writes: For years, PostgreSQL users would ask when their favorite open source database system would get the UPSERT operator, which can either insert an entry or update it if a previous version already existed. Other RDMS have long offered this feature. Bruce Momjian, one of the chief contributors to PostgreSQL, admits to being embarrassed that it wasn't supported. Well, PostgreSQL 9.5, now generally available, finally offers a version of UPSERT and users may be glad the dev team took their time with it. Implementations of UPSERT on other database systems were "handled very badly," sometimes leading to unexpected error messages Momjian said. Turns out it is very difficult to implement on multi-user systems. "What is nice about our implementation is that it never generates an unexpected error. You can have multiple people doing this, and there is very little performance impact," Momjian said. Because it can work on multiple tables at once, it can even be used to merge one table into another.

PostgreSQL 9.5 Does UPSERT Right

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 105 Comments Log In/Create an Account

Comments Filter:

- Re: Too late (Score:5, Funny)
  
  by Anonymous Coward writes: on Monday January 11, 2016 @02:51AM (#51276301)
  
  Oh. How disappointing. I'll let the team know to remove the code that does this then seeing as you already have a solution.
  
- Re: (Score:3)
  
  by TheRaven64 ( 641858 ) writes:
  
  I hope that you're doing the check then insert inside a transaction.
  - - Re: (Score:2)
      
      by Behrooz Amoozad ( 2831361 ) writes:
      
      this.
      try to solve a complex one in code without user interaction and having a deadlock is only the good part.
    - Re:Too late (Score:4, Informative)
      
      by MagicMerlin ( 576324 ) writes: on Monday January 11, 2016 @10:40AM (#51277433)
      
      That's how to do it. It has to be managed in a loop transaction side or in the application. This loop is remarkably difficult to get right by most programmers. For example, if a trigger is added to the inserted table after the fact causing an error in some other table, most loop based upsert code I've seen will fail into an infinite loop.
      
Fucking finally. (Score:1)

by Anonymous Coward writes:

Finally. That's all.
- Re: Fucking finally. (Score:5, Insightful)
  
  by Anonymous Coward writes: on Monday January 11, 2016 @02:53AM (#51276309)
  
  PostgeSQL is one of those rare projects that delays a bit but makes sure that they plan their codebase ahead, and Get It Right The First Time.
  Nobody should ever whine about delays in projects that value correctness over first to market, especially in an open source project.
  
  - Re: Fucking finally. (Score:5, Informative)
    
    by Rei ( 128717 ) writes: on Monday January 11, 2016 @05:40AM (#51276591) Homepage
    
    Postgres is leading edge in some things, like (ridiculously useful) table inheritance. Their implementation could have been slightly better (the most common complaint being that indices don't inherit, you need to re-add them on all descendants), but I'm very glad that they were early adopters on this one. They even pulled off multiple inheritance well.
    
    - - Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        I'm not sure what you mean. Could you clarify with an example of how it is vs. what you want?
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        He's probably referring to schemaless data records.
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        But python has them. Three different types (xml, hstore, and json). And I don't see how that would be something special to inheritance.
        
        Re: (Score:2)
        
        by Rei ( 128717 ) writes:
        
        úff... that should read "postgres has them" :P Been using python too much lately...
        
        Re: (Score:2)
        
        by Qzukk ( 229616 ) writes:
        
        He's talking about table inheritance: CREATE TABLE foo (...) INHERITS (parenttable);.
        If parenttable has "id integer, name varchar" then foo will automatically have "id integer, name varchar" just like inheritance in programming languages. Unlike most programming languages, you cannot redefine these fields at all (for instance, change id to a UUID datatype or remove it). Doing that would break postgresql's version of polymorphism, where SELECT * FROM parenttable; will also return the id and name columns (o
        
        Re: Fucking finally. (Score:2)
        
        by Rei ( 128717 ) writes:
        
        If parenttable has "id integer, name varchar" then foo will automatically have "id integer, name varchar"
        
        ... which is the whole point if inheritance. Am I to understand that he wants inheritance, except without any inheritance?
        just like inheritance in programming languages. Unlike most programming languages, you cannot redefine these fields at all (for instance, change id to a UUID datatype or remove it).
        In what programming languages can you do that? That doesn't even make sense, you couldn't convert a
- Re: (Score:2)
  
  by U2xhc2hkb3QgU3Vja3M ( 4212163 ) writes:
  
  Are you upsert that they took so long?
- Re: (Score:2)
  
  by 93 Escort Wagon ( 326346 ) writes:
  
  Well, it's postgres - do you expect them to just say "sorry we were behind on implementing this feature"?
  - Re: (Score:3)
    
    by phantomfive ( 622387 ) writes:
    
    but that doesn't mean I want his bundt cake recipe
    Does he have a good one?
  - Re: (Score:2)
    
    by Zero__Kelvin ( 151819 ) writes:
    
    "Well, it's postgres - do you expect them to just say "sorry we were behind on implementing this feature"?"
    Of course not. That would be a ludicrous thing for anyone to say. You don't seem to understand FOSS. Unless they announced that said feature would be implemented by a certain date and missed their dealine, then they aren't "behind", they are on schedule.
I'm somewhat (Score:2)

by vikingpower ( 768921 ) writes:

upserted, this morning.
- Re: (Score:2)
  
  by haruchai ( 17472 ) writes:
  
  Love the .sig, Oliver Reed's last role.
  R.I.P you crazy drunken Brit tough guy.
- Re: (Score:1)
  
  by KGIII ( 973947 ) writes:
  
  Heh, you're a good one to ask. I am not, and should not be confused for, a DB admin - in fact, I hate it. Oh, I've had to fight with them before and I suck at it. I've gone on about the "wizard" who did the job for us. I am forever grateful for his skills, to the point where is peculiarities did not bother.
  At any rate... Am I reading this summary properly? Is this summary saying that, prior to now, you could not update data that was in the database?
  For years, PostgreSQL users would ask when their favorite open source database system would get the UPSERT operator, which can either insert an entry or update it if a previous version already existed. Other RDMS have long offered this feature. Bruce Momjian, one of the chief contributors to PostgreSQL, admits to being embarrassed that it wasn't supported.
  Err... You couldn't update an entry if a previous version a
  - Re:I'm somewhat (Score:5, Informative)
    
    by Lord Crc ( 151920 ) writes: on Monday January 11, 2016 @06:34AM (#51276703)
    
    Is this summary saying that, prior to now, you could not update data that was in the database?
    No. In some cases you want to use a table like a key-value storage. If the key does not exist you need to insert a row with that key. Otherwise you'll want to keep only one row with that key, and just update the value.
    One can always do a "if key exists then update else insert", but the problem is that this is not atomic, because the "key exists" check is a separate statement from the insert or update statement. This can lead to issues if you have multiple connections accessing the same keys at the same time.
    The UPSERT allows you to do this as one atomic operation.
    
    - - Re: (Score:2)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        I sincerely want to congratulate you for learning about a subject rather than speaking from a position of false authority. It seems like just the other day you were rambling on passing yourself on as an expert on a subject you didn't understand. This is a new KGill I'm seeing! Kudos to you!
    - - Re:I'm somewhat (Score:5, Informative)
        
        by Lord Crc ( 151920 ) writes: on Monday January 11, 2016 @10:32AM (#51277371)
        
        Most databases allow you to do transactions (BEGIN TRAN/COMMIT TRAN) that force it to be atomic.
        Sure, but in that case you might get errors when you try to commit if another connection has changed the value in the meantime, and you'll have to retry the whole thing. That's what UPSERT avoids.
        
        
        Re: (Score:2)
        
        by ADRA ( 37398 ) writes:
        
        Not to forget the abhorrent performance loss of adding at least a round trip per row. If you're running a typical poorly performing CRUD app, that gets multiplied for every item in a batch of insert/update's that you'd like to process. Assuming the data was guaranteed to be identical, it was faster to delete all / insert all vs. the alternative which would be manually verifying each row's existence sequentially. This certainly speeds up a lot of natural key table interactions.
        
        Re: (Score:1)
        
        by Ed Avis ( 5917 ) writes:
        
        Many RDBMSes allow more than one statement per batch - so you can execute two or more statements in a single round trip. For example you could do 'if exists (blah) ...', assuming your dialect of SQL supports it, in just one round trip, or even separate 'insert where not exists... update...' or whatever technique you want to use. (I am not saying that these techniques are a foolproof alternative to merge or upsert, they are not, but that is for another discussion.) If you have multiple rows, you can still
    - - Re: (Score:2)
        
        by Lord Crc ( 151920 ) writes:
        
        Ever heard of SELECT .. FOR UPDATE?
        How do you prevent two connections from inserting the same key at the same time using that?
  - Re:I'm somewhat (Score:4, Informative)
    
    by vikingpower ( 768921 ) writes: on Monday January 11, 2016 @06:41AM (#51276711) Homepage Journal
    
    Lord Crc answered correctly. UPSERT = (INSERT iff not exists... else UPDATE ). The article is correct in the sense that other databases, especially open source one, do not always handle this one correctly. It was not part of "traditional" SQL and is rather new. Object-oriented and graph databases do not have any problems with such operators, as they're explicitly written to deal with this use case. For relational databases like PostgreSQL this is a harder one to get right. Whether PostgreSQL now really got it right, can only be proven by protracted use "out in the wild".
    
  - Re: (Score:2)
    
    by Hognoxious ( 631665 ) writes:
    
    The difference was that you had to know in advance whether it existed and use a different command in each case.
    I have to say that I've rarely found that to be a problem, but it's always nice to have options.
    - Re: (Score:2)
      
      by vikingpower ( 768921 ) writes:
      
      Agree. It's something seen in the "natural" evolution of (computer) languages all the time. When the language ages gracefully, such "finer" options become part of it.
    - Re: (Score:1)
      
      by KGIII ( 973947 ) writes:
      
      Ah ha! Thanks. I get it now. Well, I think... Basically, this allows more an if/or? If it is X (and should be Y) then change it to Y and if it is Y already then leave it as Y. But in one more command without actually having to use a longer statement to get the same results?
      That makes sense. I can join, add, merge, and stuff like that. I can even (sort of) do it in C, PHP, and probably bang it out in Perl. However, I hate it. I know it may sound odd but, for whatever reason - and I hold a PhD in Applied Math
      - Re: (Score:2)
        
        by Hognoxious ( 631665 ) writes:
        
        It's more like "If it's X, change it to Y. If it doesn't exist at all, create it with a Y."
        Without this, trying to modify a record that doesn't already exist will cause an error. //to do: dig at MySQL goes here.
        
        Re: (Score:1)
        
        by KGIII ( 973947 ) writes:
        
        Cool. Thanks again. I should find some sort of database system to play with and see how it goes. Maybe redo an SMF install onto PostgreSQL and then see what I can break/tweak/learn. 'Snot like I'll be breaking the whole internet, just a small piece and it'll be wiped clean in a day or two, after I'm done playing. I'd not want to leave my mess open for others to exploit and then abuse. I guess I could do it locally. All of my hardware down here, at this place, was out of date - so I ordered a few new boxes,
- Re: (Score:2)
  
  by NoNonAlphaCharsHere ( 2201864 ) writes:
  
  Nah. If you want to get upskirted by a database, you're gonna have to go with Squeel Server.
Thanks a lot PostgreSQL devels (Score:5, Interesting)

by hrumph ( 4411339 ) writes: on Monday January 11, 2016 @04:04AM (#51276409)

This is great. I've been using PosgreSQL for a while now. It's one of those pieces of software that just does what it's told and doesn't let you down. While I'm saying this there are credible rumours to the effect that the Oracle merge operation is broken. Read the comments to the most upvoted answer at this stack exchange question [stackoverflow.com]. The final comment is:
Not reliably. I ended up with retry loops in the client code. :( – Randy Magruder Aug 27 '15 at 16:05
This makes me think that Bruce Momijan may have been thinking about Oracle's implementation of merge when he said that other implementations were handled very badly.

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  PostgreSQL is maintained by full-time employees. None of the "devs" are doing favors for anyone, they're doing their job, while the owners try to keep the project relevant.
- Re: Thanks a lot PostgreSQL devels (Score:2)
  
  by BigZee ( 769371 ) writes:
  
  Merge is not really the same as upsert. Merge (using the dual psuedo table) is a potential solution to the problem as are a number of options using pl/sql. I think the real truth here is that oracle haven't seen the need to implement an upsert command. Its not difficult to code and would be easy to build a solution without the need of a special command.
  - Re: Thanks a lot PostgreSQL devels (Score:4, Informative)
    
    by Anonymous Coward writes: on Monday January 11, 2016 @07:57AM (#51276847)
    
    It is not that trivial, if you have to take into account race conditions and roll-back without using a global lock which would kill the performance. There are actually quite a few research papers just about this problem.
    
I'm curious (Score:1)

by Tablizer ( 95088 ) writes:

What would be the equivalent long-hand in "traditional" SQL?
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Something like this :
  IF EXISTS (SELECT id FROM [table] WHERE id = :id)
  THEN
  UPDATE TABLE [table] .......... WHERE id = :id
  ELSE
  INSERT INTO [table] (.......) VALUES (...........)
  END IF;
  - Re: (Score:1)
    
    by tijgertje ( 4289605 ) writes:
    
    Oops forgot to log in ^^
  - Re: (Score:2)
    
    by KiloByte ( 825081 ) writes:
    
    In postgres, this can be wrapped in a RULE so it's transparent when inserting.
  - Re: (Score:2)
    
    by hrumph ( 4411339 ) writes:
    
    Your answer is naive and wrong. See this Stack Overflow page [stackoverflow.com]. The solutions on this page work but none of them have the elegance of the newly available INSERT ... ON DUPLICATE UPDATE method.
    - Re: (Score:3)
      
      by halivar ( 535827 ) writes:
      
      The GGP asked for "traditional" SQL, for which the GP offered the correct answer. You offered another vendor-specific solution, not standard ANSI. The GP was in no way wrong. If he was doing retry loops, sure; that would be wrong.
      - Re: (Score:3)
        
        by mrchaotica ( 681592 ) * writes:
        
        In terms of "traditional" SQL, the GGP's answer is wrong because it fails to be atomic. The whole thing needs to be wrapped in a transaction.
        
        Re: (Score:2)
        
        by halivar ( 535827 ) writes:
        
        If someone has to be told to put a mult-statement SQL write operation in a transaction...
        I kind of think that transactions go without saying, even for UPSERT (according to the Wiki page [postgresql.org], UPSERT will "guarantee insert-or-update 'atomicity' for the simple cases", but leaves me questioning what a "simple case" is).
        
        Re: (Score:2)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        You still aren't getting it. Transactions like you describe are not equivalent and have Race Condition issues that need to be handled by the client code (and that usually isn't handled cleanly) whereas UPSERT makes the transaction truly atomic, not just on a per transaction basis, but even when multiple threads attempt to handle the same kind of transaction simultaneously.
      - Re: (Score:2)
        
        by Zero__Kelvin ( 151819 ) writes:
        
        Bullshit. This [slashdot.org] is the correct answer. Until you understand that there is no equivalent to UPSERT, you don't understand UPSERT.
    - Re: (Score:2)
      
      by halivar ( 535827 ) writes:
      
      For that matter, I give him a gold star with bonus internets because he tested EXISTS instead of COUNT > 0.
      - Re: (Score:1)
        
        by tijgertje ( 4289605 ) writes:
        
        Thanks, although it is she, not he :)
    - Re: (Score:2)
      
      by mark-t ( 151149 ) writes:
      
      INSERT ... ON DUPLICATE UPDATE will only trigger the update on what is otherwise an attempt to insert a record that will violate unique index or primary key constraints, while the above poster's solution will update any and all records that match the query, which is what UPSERT does when things match the query.
      - Re: (Score:1)
        
        by hrumph ( 4411339 ) writes:
        
        A transaction commit happens after the logic of the transaction is processed, and two or more parallell threads can start off with the same DB snapshot when processing their transactions. Supposing that one thread commits first, then the logic that the other one used in its processing will be invalid upon its turn to commit. I don't see how an UPSERT implementation would fix this. The solution arrived at by PostgreSQL is basically perfect.
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        It will know that the processing was invalid because the second one tried to modify records would discover that another process had locked them. The update would fail, and it would be up to the user to either reissue or modify the query.
        
        Re: (Score:1)
        
        by hrumph ( 4411339 ) writes:
        
        This doesn't sound any better than the already available retry loop.....
        
        Re: (Score:2)
        
        by mark-t ( 151149 ) writes:
        
        Any automated loop would be a Bad Thing(tm). The correct thing to on any failure is to report the failure, not to simply try again... because there is no way for the computer to know that trying again is even what the person would want to do in the event of such a condition.
        
        Re: (Score:1)
        
        by Tablizer ( 95088 ) writes:
        
        So far it looks like there is no full equivalent. By not being based on sequential steps, the "UPSERT" has advantages. But the jury is still out...
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  The MERGE command. Although PostgreSQL claimed SQL compliance, this command was not understood. Mind you, the MERGE command in standard SQL is so overly complicated that all legibility is lost. This is probably why a lot of other databases have chosen other syntaxes that work better.
- - Re: (Score:2)
    
    by mark-t ( 151149 ) writes:
    
    The biggest problem that I could foresee with the approach that you've suggested, particularly for very large tables where updates are being done very frequently, is that you could rapidly run out of storage.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  "Traditional" ANSI SQL:2003 has had MERGE for more than a decade, which is a more verbose but significantly more powerful version of UPSERT since you can describe multiple potential actions based on whether the target record is found or based on comparisons between the source and target data. MERGE can combine multiple INSERT, UPDATE and DELETE operations into a single atomic operation.
  It's sad that not only is PostgreSQL over a decade behind but they implemented the sloppy non-ANSI standard version.
  - Re:I'm curious (Score:4, Informative)
    
    by halivar ( 535827 ) writes: <bfelger@@@gmail...com> on Monday January 11, 2016 @12:51PM (#51278455)
    
    MERGE can combine multiple INSERT, UPDATE and DELETE operations into a single atomic operation.
    In PostgresSQL, Oracle, and SQL Server, MERGE is not atomic. Also, it is not UPSERT. You can use MERGE to accomplish the same end goal, but they are not synonymous and they do not work the same. Also, the syntax is idiosyncratic, and most extant implementations are problematic enough that MERGE is best avoided.
    
    - - Re: (Score:2)
        
        by halivar ( 535827 ) writes:
        
        MERGE is absolutely atomic. It either entirely fails or entirely succeeds.
        That's not what atomicity means.
        
        Re: (Score:1)
        
        by Dunavant ( 1078123 ) writes:
        
        MERGE is absolutely atomic. It either entirely fails or entirely succeeds.
        That's not what atomicity means.
        Actually it's exactly what atomicity means in this context of a database transaction. In an atomic transaction, a series of database operations either all occur, or nothing occurs. You may be confusing it with an 'atomic operation' in programming. https://en.wikipedia.org/wiki/... [wikipedia.org]
- Re: (Score:3)
  
  by Zero__Kelvin ( 151819 ) writes:
  
  "What would be the equivalent long-hand in "traditional" SQL?"
  It doesn't exist. That's the point. This isn't implemented to save typing. It is true atomic insert or update.
- Re: (Score:1)
  
  by Ed Avis ( 5917 ) writes:
  
  There is no exact equivalent in traditional SQL. As others have pointed out, you can check exists and then insert, but that introduces a race condition; wrapping it in an explicit transaction might help depending on the locking model, but might still introduce failures that have to be introduced by client code.
  That said, if you can make some assumptions about what else is writing to the table then you can get fairly close. One technique I often use is like this:
  -- Insert the row if none with the same P
Oracle was there first (Score:2)

by Ora*DBA ( 101576 ) writes:

Oracle has included the 'merge' command for several iterations now, which does the same thing. It is a rich command - those interested can check the documentation at https://docs.oracle.com/database/121/SQLRF/statements_9016.htm#SQLRF01606
- Re: (Score:1)
  
  by davester666 ( 731373 ) writes:
  
  Dup
- Re: (Score:2)
  
  by Zero__Kelvin ( 151819 ) writes:
  
  Actually David Bowie was known for leveraging technology in his musical career. Here is the first article that pops up on Google [theguardian.com].
- Re: (Score:3)
  
  by Qzukk ( 229616 ) writes:
  
  What's broken with the transaction isolation model? The only thing I see [postgresql.org] is that they don't do the non-transaction "read uncommitted" transaction that lets you see records that other transactions have not committed.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: Too late (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re:Too late (Score:4, Informative)

Fucking finally. (Score:1)

Re: Fucking finally. (Score:5, Insightful)

Re: Fucking finally. (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Fucking finally. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

I'm somewhat (Score:2)

Re: (Score:2)

Re: (Score:1)

Re:I'm somewhat (Score:5, Informative)

Re: (Score:2)

Re:I'm somewhat (Score:5, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re:I'm somewhat (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Thanks a lot PostgreSQL devels (Score:5, Interesting)

Re: (Score:1)

Re: Thanks a lot PostgreSQL devels (Score:2)

Re: Thanks a lot PostgreSQL devels (Score:4, Informative)

I'm curious (Score:1)

Re: (Score:2, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re:I'm curious (Score:4, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: (Score:1)

Oracle was there first (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Related Links Top of the: day, week, month.

Slashdot Top Deals