Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wanted to update a record if it existed, otherwise use insert. So if a name is in the database, update the times I've seen it. If the name does not exist, insert. I was keeping track of the number of times a name is seen.


Without knowing much about your actual application (i.e. treat the following with utmost skepticism!), my instinct would be to separate the fact of observing a name from the fact of the observation count. I would keep a log table and a summary table like this:

  CREATE TABLE namelog (
    name TEXT
  );

  CREATE TABLE namesummary (
    name TEXT,
    seen int
  );
Then each time you see a name, log it:

  INSERT INTO namelog VALUES (%s)
Then on a regular basis, run a housekeeping/summary process like this:

    CREATE TEMPORARY TABLE prevsummary AS
        SELECT * FROM namesummary;

    TRUNCATE namesummary;

    INSERT INTO namesummary
        SELECT coalesce(a.name, b.name) as name,
               coalesce(a.seen,0) + coalesce(b.seen, 0) as seen
        FROM 
            prevsummary a 
            FULL OUTER JOIN
            (SELECT name, count(*) as seen
            FROM namelog
            GROUP BY name) b
            ON a.name=b.name
    ;

    DELETE FROM namelog;
This of course is grossly space-inefficient, but making the logging process INSERT-only should enable a great degree of concurrency - and you can tune the summary process to run as often as necessary to keep memory/disk usage within your available resources.

If concurrency isn't an issue, then your initial solution is probably fine.


Very nice approach, I didn't think of trying to log names with only INSERT. I see how it handles concurrency. My solution is likely to fail if my database gets bombarded...My app was supposed to be a simple one I made for a few friends and also for me to learn a few things in node / postgres.

For more detail, here is my initial table: CREATE TABLE names(name_id serial, name text, count integer)

I was planning to access the name count a good numbers of times, so I went with a simple SELECT from names. I don't necessarily need "on demand" counts so having the summary process run 5 or 10 mins would be a better!

Thanks for your insight!


I quite like the idea of treating the database as append-only immutable data and aggregate views as optimisations.


Please note that warning in the manual that TRUNCATE is not MVCC safe. You might want to use a normal "DELETE FROM namesummary" instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: