Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used postgres for the first time last week, and was wondering, "WHERE IS UPSERT?" Ended up going with a "UPDATE table SET field='C', field2='Z' WHERE id=3; INSERT INTO table (id, field, field2) SELECT 3, 'C', 'Z' WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);" (as seen on http://stackoverflow.com/questions/1109061/insert-on-duplica... - although, not the chosen answer). Anyone come up with better or their own solutions?


I built the pgscript version of the some upsert example and use it as a stored function. I'm no pgsql pro, so sorry if this is total shit. Works for what I needed. This is controlling a list of access rules to different services for doling out access on the controller level globally, or down to an individual action inside that controller. If I rewrote this on newer PG, I'd store the permissions text as a JSON datatype.

  CREATE OR REPLACE FUNCTION merge_privileges(key integer, data_controller text, data_permissions text)
  RETURNS void AS
  $BODY$
  BEGIN
      LOOP
          -- first try to update the key
          UPDATE privileges SET controller = data_controller, permissions = data_permissions WHERE user_id = key   AND controller = data_controller;
          IF found THEN
              RETURN;
          END IF;
          -- not there, so try to insert the key
          -- if someone else inserts the same key concurrently,
          -- we could get a unique-key failure
          BEGIN
              INSERT INTO privileges(user_id, controller, permissions) VALUES (key, data_controller,     data_permissions);
              RETURN;
          EXCEPTION WHEN unique_violation THEN
              -- do nothing, and loop to try the UPDATE again
          END;
      END LOOP;
  END;
  $BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;


I prefer to use CTE in postgres to do upserts, this is a non complete example, but you get the general idea:

  WITH upsert AS (
    UPDATE some_table 
    SET attrib = 'foo'
    WHERE id = 123
    RETURNING *
  ) INSERT INTO some_table ( id , attrib )
    SELECT 123 as id, 'foo' as attrib
    WHERE 123 NOT IN (SELECT id FROM upsert );


This does not handle concurrent inserts though unless you rerun the code on unique violation.


Correct, but if I write the upsert in that way - I actually understand what I'm doing... :)

If this is standard upsert query in your app, you should turn it into a procedure and parameterize, catch exceptions anyway.

Or you could mess around with transaction isolation.


Nope, that's exactly what I use when I really need it, generalized to simple table/PK schemes.[1]

At first I wanted to try and do it within PL/pgSQL (e.g., from your SO link), but I couldn't figure out how to generalize it like I could with my the UPDATE ... INSERT WHERE. (I haven't really studied PL/pgSQL too hard yet.)

[1] - https://github.com/BurntSushi/nfldb/blob/master/nfldb/db.py#...


Do you mind if I ask what your use case is that demands an upsert?


I wanted to update a record if it existed, otherwise use insert. So if a name is in the database, update the times I've seen it. If the name does not exist, insert. I was keeping track of the number of times a name is seen.


Without knowing much about your actual application (i.e. treat the following with utmost skepticism!), my instinct would be to separate the fact of observing a name from the fact of the observation count. I would keep a log table and a summary table like this:

  CREATE TABLE namelog (
    name TEXT
  );

  CREATE TABLE namesummary (
    name TEXT,
    seen int
  );
Then each time you see a name, log it:

  INSERT INTO namelog VALUES (%s)
Then on a regular basis, run a housekeeping/summary process like this:

    CREATE TEMPORARY TABLE prevsummary AS
        SELECT * FROM namesummary;

    TRUNCATE namesummary;

    INSERT INTO namesummary
        SELECT coalesce(a.name, b.name) as name,
               coalesce(a.seen,0) + coalesce(b.seen, 0) as seen
        FROM 
            prevsummary a 
            FULL OUTER JOIN
            (SELECT name, count(*) as seen
            FROM namelog
            GROUP BY name) b
            ON a.name=b.name
    ;

    DELETE FROM namelog;
This of course is grossly space-inefficient, but making the logging process INSERT-only should enable a great degree of concurrency - and you can tune the summary process to run as often as necessary to keep memory/disk usage within your available resources.

If concurrency isn't an issue, then your initial solution is probably fine.


Very nice approach, I didn't think of trying to log names with only INSERT. I see how it handles concurrency. My solution is likely to fail if my database gets bombarded...My app was supposed to be a simple one I made for a few friends and also for me to learn a few things in node / postgres.

For more detail, here is my initial table: CREATE TABLE names(name_id serial, name text, count integer)

I was planning to access the name count a good numbers of times, so I went with a simple SELECT from names. I don't necessarily need "on demand" counts so having the summary process run 5 or 10 mins would be a better!

Thanks for your insight!


I quite like the idea of treating the database as append-only immutable data and aggregate views as optimisations.


Please note that warning in the manual that TRUNCATE is not MVCC safe. You might want to use a normal "DELETE FROM namesummary" instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: