PostgreSQL 9.5 Does UPSERT Right (thenewstack.io) 105
joabj writes: For years, PostgreSQL users would ask when their favorite open source database system would get the UPSERT operator, which can either insert an entry or update it if a previous version already existed. Other RDMS have long offered this feature. Bruce Momjian, one of the chief contributors to PostgreSQL, admits to being embarrassed that it wasn't supported. Well, PostgreSQL 9.5, now generally available, finally offers a version of UPSERT and users may be glad the dev team took their time with it. Implementations of UPSERT on other database systems were "handled very badly," sometimes leading to unexpected error messages Momjian said. Turns out it is very difficult to implement on multi-user systems. "What is nice about our implementation is that it never generates an unexpected error. You can have multiple people doing this, and there is very little performance impact," Momjian said. Because it can work on multiple tables at once, it can even be used to merge one table into another.
Re: Too late (Score:5, Funny)
Oh. How disappointing. I'll let the team know to remove the code that does this then seeing as you already have a solution.
Re: (Score:3)
Re: (Score:2)
try to solve a complex one in code without user interaction and having a deadlock is only the good part.
Re:Too late (Score:4, Informative)
Fucking finally. (Score:1)
Finally. That's all.
Re: Fucking finally. (Score:5, Insightful)
PostgeSQL is one of those rare projects that delays a bit but makes sure that they plan their codebase ahead, and Get It Right The First Time.
Nobody should ever whine about delays in projects that value correctness over first to market, especially in an open source project.
Re: Fucking finally. (Score:5, Informative)
Postgres is leading edge in some things, like (ridiculously useful) table inheritance. Their implementation could have been slightly better (the most common complaint being that indices don't inherit, you need to re-add them on all descendants), but I'm very glad that they were early adopters on this one. They even pulled off multiple inheritance well.
Re: (Score:2)
I'm not sure what you mean. Could you clarify with an example of how it is vs. what you want?
Re: (Score:2)
Re: (Score:2)
But python has them. Three different types (xml, hstore, and json). And I don't see how that would be something special to inheritance.
Re: (Score:2)
úff... that should read "postgres has them" :P Been using python too much lately...
Re: (Score:2)
He's talking about table inheritance: CREATE TABLE foo (...) INHERITS (parenttable);.
If parenttable has "id integer, name varchar" then foo will automatically have "id integer, name varchar" just like inheritance in programming languages. Unlike most programming languages, you cannot redefine these fields at all (for instance, change id to a UUID datatype or remove it). Doing that would break postgresql's version of polymorphism, where SELECT * FROM parenttable; will also return the id and name columns (o
Re: Fucking finally. (Score:2)
In what programming languages can you do that? That doesn't even make sense, you couldn't convert a
Re: (Score:2)
Are you upsert that they took so long?
Re: (Score:2)
Well, it's postgres - do you expect them to just say "sorry we were behind on implementing this feature"?
Re: (Score:3)
but that doesn't mean I want his bundt cake recipe
Does he have a good one?
Re: (Score:2)
Of course not. That would be a ludicrous thing for anyone to say. You don't seem to understand FOSS. Unless they announced that said feature would be implemented by a certain date and missed their dealine, then they aren't "behind", they are on schedule.
I'm somewhat (Score:2)
upserted, this morning.
Re: (Score:2)
Love the .sig, Oliver Reed's last role.
R.I.P you crazy drunken Brit tough guy.
Re: (Score:1)
Heh, you're a good one to ask. I am not, and should not be confused for, a DB admin - in fact, I hate it. Oh, I've had to fight with them before and I suck at it. I've gone on about the "wizard" who did the job for us. I am forever grateful for his skills, to the point where is peculiarities did not bother.
At any rate... Am I reading this summary properly? Is this summary saying that, prior to now, you could not update data that was in the database?
For years, PostgreSQL users would ask when their favorite open source database system would get the UPSERT operator, which can either insert an entry or update it if a previous version already existed. Other RDMS have long offered this feature. Bruce Momjian, one of the chief contributors to PostgreSQL, admits to being embarrassed that it wasn't supported.
Err... You couldn't update an entry if a previous version a
Re:I'm somewhat (Score:5, Informative)
Is this summary saying that, prior to now, you could not update data that was in the database?
No. In some cases you want to use a table like a key-value storage. If the key does not exist you need to insert a row with that key. Otherwise you'll want to keep only one row with that key, and just update the value.
One can always do a "if key exists then update else insert", but the problem is that this is not atomic, because the "key exists" check is a separate statement from the insert or update statement. This can lead to issues if you have multiple connections accessing the same keys at the same time.
The UPSERT allows you to do this as one atomic operation.
Re: (Score:2)
Re:I'm somewhat (Score:5, Informative)
Most databases allow you to do transactions (BEGIN TRAN/COMMIT TRAN) that force it to be atomic.
Sure, but in that case you might get errors when you try to commit if another connection has changed the value in the meantime, and you'll have to retry the whole thing. That's what UPSERT avoids.
Re: (Score:2)
Not to forget the abhorrent performance loss of adding at least a round trip per row. If you're running a typical poorly performing CRUD app, that gets multiplied for every item in a batch of insert/update's that you'd like to process. Assuming the data was guaranteed to be identical, it was faster to delete all / insert all vs. the alternative which would be manually verifying each row's existence sequentially. This certainly speeds up a lot of natural key table interactions.
Re: (Score:1)
Re: (Score:2)
Ever heard of SELECT .. FOR UPDATE?
How do you prevent two connections from inserting the same key at the same time using that?
Re:I'm somewhat (Score:4, Informative)
Lord Crc answered correctly. UPSERT = (INSERT iff not exists... else UPDATE ). The article is correct in the sense that other databases, especially open source one, do not always handle this one correctly. It was not part of "traditional" SQL and is rather new. Object-oriented and graph databases do not have any problems with such operators, as they're explicitly written to deal with this use case. For relational databases like PostgreSQL this is a harder one to get right. Whether PostgreSQL now really got it right, can only be proven by protracted use "out in the wild".
Re: (Score:2)
The difference was that you had to know in advance whether it existed and use a different command in each case.
I have to say that I've rarely found that to be a problem, but it's always nice to have options.
Re: (Score:2)
Agree. It's something seen in the "natural" evolution of (computer) languages all the time. When the language ages gracefully, such "finer" options become part of it.
Re: (Score:1)
Ah ha! Thanks. I get it now. Well, I think... Basically, this allows more an if/or? If it is X (and should be Y) then change it to Y and if it is Y already then leave it as Y. But in one more command without actually having to use a longer statement to get the same results?
That makes sense. I can join, add, merge, and stuff like that. I can even (sort of) do it in C, PHP, and probably bang it out in Perl. However, I hate it. I know it may sound odd but, for whatever reason - and I hold a PhD in Applied Math
Re: (Score:2)
It's more like "If it's X, change it to Y. If it doesn't exist at all, create it with a Y."
Without this, trying to modify a record that doesn't already exist will cause an error. //to do: dig at MySQL goes here.
Re: (Score:1)
Cool. Thanks again. I should find some sort of database system to play with and see how it goes. Maybe redo an SMF install onto PostgreSQL and then see what I can break/tweak/learn. 'Snot like I'll be breaking the whole internet, just a small piece and it'll be wiped clean in a day or two, after I'm done playing. I'd not want to leave my mess open for others to exploit and then abuse. I guess I could do it locally. All of my hardware down here, at this place, was out of date - so I ordered a few new boxes,
Re: (Score:2)
Thanks a lot PostgreSQL devels (Score:5, Interesting)
Not reliably. I ended up with retry loops in the client code.
This makes me think that Bruce Momijan may have been thinking about Oracle's implementation of merge when he said that other implementations were handled very badly.
Re: (Score:1)
PostgreSQL is maintained by full-time employees. None of the "devs" are doing favors for anyone, they're doing their job, while the owners try to keep the project relevant.
Re: Thanks a lot PostgreSQL devels (Score:2)
Re: Thanks a lot PostgreSQL devels (Score:4, Informative)
It is not that trivial, if you have to take into account race conditions and roll-back without using a global lock which would kill the performance. There are actually quite a few research papers just about this problem.
I'm curious (Score:1)
What would be the equivalent long-hand in "traditional" SQL?
Re: (Score:2, Informative)
Something like this :
IF EXISTS (SELECT id FROM [table] WHERE id = :id) .......... WHERE id = :id
THEN
UPDATE TABLE [table]
ELSE
INSERT INTO [table] (.......) VALUES (...........)
END IF;
Re: (Score:1)
Re: (Score:2)
In postgres, this can be wrapped in a RULE so it's transparent when inserting.
Re: (Score:2)
Re: (Score:3)
The GGP asked for "traditional" SQL, for which the GP offered the correct answer. You offered another vendor-specific solution, not standard ANSI. The GP was in no way wrong. If he was doing retry loops, sure; that would be wrong.
Re: (Score:3)
In terms of "traditional" SQL, the GGP's answer is wrong because it fails to be atomic. The whole thing needs to be wrapped in a transaction.
Re: (Score:2)
If someone has to be told to put a mult-statement SQL write operation in a transaction...
I kind of think that transactions go without saying, even for UPSERT (according to the Wiki page [postgresql.org], UPSERT will "guarantee insert-or-update 'atomicity' for the simple cases", but leaves me questioning what a "simple case" is).
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
For that matter, I give him a gold star with bonus internets because he tested EXISTS instead of COUNT > 0.
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
So far it looks like there is no full equivalent. By not being based on sequential steps, the "UPSERT" has advantages. But the jury is still out...
Re: (Score:1)
The MERGE command. Although PostgreSQL claimed SQL compliance, this command was not understood. Mind you, the MERGE command in standard SQL is so overly complicated that all legibility is lost. This is probably why a lot of other databases have chosen other syntaxes that work better.
Re: (Score:2)
Re: (Score:1)
"Traditional" ANSI SQL:2003 has had MERGE for more than a decade, which is a more verbose but significantly more powerful version of UPSERT since you can describe multiple potential actions based on whether the target record is found or based on comparisons between the source and target data. MERGE can combine multiple INSERT, UPDATE and DELETE operations into a single atomic operation.
It's sad that not only is PostgreSQL over a decade behind but they implemented the sloppy non-ANSI standard version.
Re:I'm curious (Score:4, Informative)
MERGE can combine multiple INSERT, UPDATE and DELETE operations into a single atomic operation.
In PostgresSQL, Oracle, and SQL Server, MERGE is not atomic. Also, it is not UPSERT. You can use MERGE to accomplish the same end goal, but they are not synonymous and they do not work the same. Also, the syntax is idiosyncratic, and most extant implementations are problematic enough that MERGE is best avoided.
Re: (Score:2)
MERGE is absolutely atomic. It either entirely fails or entirely succeeds.
That's not what atomicity means.
Re: (Score:1)
MERGE is absolutely atomic. It either entirely fails or entirely succeeds.
That's not what atomicity means.
Actually it's exactly what atomicity means in this context of a database transaction. In an atomic transaction, a series of database operations either all occur, or nothing occurs. You may be confusing it with an 'atomic operation' in programming. https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:3)
It doesn't exist. That's the point. This isn't implemented to save typing. It is true atomic insert or update.
Re: (Score:1)
That said, if you can make some assumptions about what else is writing to the table then you can get fairly close. One technique I often use is like this:
Oracle was there first (Score:2)
Oracle has included the 'merge' command for several iterations now, which does the same thing. It is a rich command - those interested can check the documentation at https://docs.oracle.com/database/121/SQLRF/statements_9016.htm#SQLRF01606
Re: (Score:1)
Dup
Re: (Score:2)
Re: (Score:3)
What's broken with the transaction isolation model? The only thing I see [postgresql.org] is that they don't do the non-transaction "read uncommitted" transaction that lets you see records that other transactions have not committed.