Long as hell post with a clickbait title documenting how the author found out (1) one can insert multiple rows with a single INSERT statement; (2) the RETURNING clause. Spend your time elsewhere if you’re already aware; maybe pick up a SQL textbook if you aren’t...
This is not an uncharitable read btw. Actual quotes:
> The first thing I learned was how to insert multiple rows with a single INSERT statement: ...
> Scouring the docs I discovered the RETURNING clause of an INSERT statement.
I enjoyed trying to guess at what the "one word" was he was going to discover. I was mostly right!
That said, I was unaware of the big discovery in the post: Doing INSERT ... RETURNING ... ON CONFLICT DO NOTHING, and passing an array of rows to insert, the returning clause will return only the values from the rows that were actually inserted. IMO this is the "obvious" behavior, but it is still worthwhile to know it works!
> IMO this is the "obvious" behavior, but it is still worthwhile to know it works!
I think this is why people love Postgres so much. 99 times out of 100, the "obvious" behaviour turns out to be the behaviour that Postgres actually implements.
Well it's not just the obvious behaviour, it seems like the only useful behaviour, just like the default is returning the number of affected rows (which for insert, would be the number of inserts). You know ahead of time how many items you're trying to insert, it would be completely pointless to return that.
> You know ahead of time how many items you're trying to insert, it would be completely pointless to return that.
In simple CRUD cases yes. In many other scenarios, you might not. If your insert is a select statement, with joins and clauses, you can easily insert too much data, or no data at all.
The RETURNING semantics produce content of the resulting row after insertion. For the ones that have been skipped, this could be ill-defined. The content is not just the content of the row data that was supplied, but potentially server-generated content from sequences, row-level triggers, etc.
> You know ahead of time how many items you're trying to insert, it would be completely pointless to return that.
I often use "insert into mytable(...) select ... from ... where ...", and I know I only want one row inserted.
In such cases checking that exactly one row was inserted is a very nice safeguard against buggy code/query, or in the freak case someone messed with the DB.
I've been looking for way to bulk insert for postgres, but with the ability to ignore any error for any single row (including FK/unique constraint/check constraint/post insert trigger/whatever) and after it finishes give me the list of the rows that failed to insert. I haven't found any way to do this yet. Anyone has any idea?
I didn't know about RETURNING and enjoyed the post. It's a much nicer way to learn than via a textbook. A blog post doesn't have to be novel to have value.
That said, I'm with you on the clickbait title. That really grinds my gears.
If you never had a formal introduction to SQL, you can probably get a lot of value from a run-of-the-mill SQL textbook by scanning the table of contents and skimming the chapters/sections on topics new to you. (For the record, I learned SQL myself through fumbling. My first schema designs didn’t even have relations.)
The problem with this blog post isn’t lack of novelty; rather, with the clickbait title, it piqued my interest but I ended up learning nothing, other than being reminded of my own terrible designs and queries back when I was a noob.
All that is well and good, but RETURNING is a Postgres feature, not part of the SQL standard. I don't think a standard SQL textbook (especially one targeting something like MySQL/Sqlite) would have taught the author about it.
I was quite surprised when I found out that RETURNING is actually a Postgres feature, and not standard. It just felt like such an obvious and extremely useful thing to have.
Agreed. I would expect learning about more edge cases here on HN. These are basic PG features you can see when looking at the INSERT definition in the docs.
I second what you said about reading a text book. I read through the entire PG documentation and a PG book my second year as an engineer and 7 years later its paid off big time. It’s one of those things where if you do it early, you get compounding benefits over your career.
Once you have a basic understanding of SQL, I found it really informative to go closer to the source, e.g. C. J. Date and Hugh Darwen's "A Guide to the SQL Standard", or perhaps other books by them.
There's a lot of incomplete and misunderstood information floating around in blog tutorials or potboiler textbooks.
I've personally experienced cases where making a slight change to a query in a non-obvious way resulted in a 10x or more speedup.
My personal favorite is I was generating a query that pulled out a dozen different fields from a large JSONb column. Naturally, you would think Postgres would read the JSONb field once, then pull out the individual fields from it. Instead, Postgres was reading the JSONb column once per each field. I figured out this was the case because the number of blocks read from EXPLAIN (ANALYZE, BUFFERS) went up proportionally to the number of fields I extracted from the JSONb column. Reading the JSONb column was especially expensive because Postgres needed to deTOAST[0] the JSONb column.
The obvious fix is to write a subquery to read the entire JSONb column and then have the outer query extract the individual fields, but that doesn't work! Postgres will inline the subquery, basically undoing your attempt to prevent the unnecessary accesses. In the end, the solution wound up being to add OFFSET 0 to the end of the subquery. That doesn't change the semantics of the query, but it does prevent Postgres from inlining the JSONb column access.
Something you'll discover when you do enough query optimization is that postgres' query planner isn't clause-order invariant. i.e. a AND b won't necessarily give you the same query plan as b AND a.
My first reaction was that this was awful, but the more I thought about it, the more I was grateful that postgres (accidentally) gave me this knob to play with. Optimization of complex queries is a very tricky thing, and most engines won't do a good job of it 100% of the time. What I realized would have been a worse situation is if postgres sometimes picked a bad plan and there was little I could do to avoid it without reforming the query (a pain when you're generating the query with a query compiler already). Reordering the clauses effectively allowed me to ask the planner to roll the dice again.
Following this, I may or may not have gone on to build a cache for the system that timed queries and kept notes of "good" clause orders for common queries, resorting to random ones otherwise....
A variation of this is that the query planner may produce better plan if you add additional redundant constraints to the query that are logically implied by the constraints that are already there.
Personally, it feels like a bit of a fragile mess trying to trick a sometimes-clever-sometimes-dumb optimiser into doing what you want by subtle indirect hacks - because the interface doesn't give you a way to directly override bad automated optimiser decisions.
Seriously I actually considered the fragility of it to be a positive, precisely because it wasn't a hard "optimizer decision" that I was forcing. If you force something like that, you've got to take full responsibility for it - the planner will lose all intelligence over a certain decision and no longer do the clever thing query planners do and take the current distribution of the database's data into account to allow it to make better decisions. When you upgrade the database and the query planner gets smarter (or dumber), you've got to re-evaluate whether that's still the right decision. I could imagine an app with a bunch of out of date hard-forced decisions to be crippling performance wise.
A look-aside of automatically deduced hints with a validity of less than a week seemed like a much lighter touch.
Totally get what you're saying. Ideally there's a way for user to supply additional information or constraints to the optimisation process which are used to influence the results without turning the optimisation process off completely. Although any scheme of doing that could, as you say, produce poor results if the distribution of data changes over time or the db planner code is changed.
I used to work on non-database decision support tool that incorporated a custom optimiser which was used to spit out crude engineering designs for a particular kind of construction problem. The optimisation problem was difficult & the implementation to solve it was not state of the art: there were a few preprocessing stages that were used to lock in some early decisions using heuristics -- which helped massively reduce the search space, then a global optimisation approach was run on the remaining sub problem. The result was that the overall algorithm would locally optimise after perhaps locking in a bad early decision. Once the software was delivered to the client and in use by a small team of users I later discovered that the users had figured out that by running the software repeatedly with very small adjustments to input parameters (adjustments that should not obviously matter) they could bump the optimiser into outputting wildly different designs. It was a little bit like repeatedly pulling the handle on a one armed bandit until it eventually gave you a decent output.
It was clever of our users to figure this out but the overall UI/UX was appalling, they had to click and re run a somewhat slow batch process until it produced a reasonable result. It would have been much better to give the users a user interface where they could directly override or constrain parts of of the engineering design problem in an ergonomic way, and then let the optimisation algorithm loose to make the remainder of the decisions.
Not to hand, but it's not amazingly hard to reproduce. You just have to build quite a complex query (many joins in our case) and play around with the clauses, watching the EXPLAIN output.
A CTE probably would have fixed it, but we weren't able to use them. We were using Citus which at the time didn't support CTEs.
Using a CTE to do this would also be suboptimial since it materializes the entire result of the subquery in memory. This is as opposed to the OFFSET 0 which would only materialize one row at a time.
Here's some one's you can do in SQL Server and probably easily attain that in your garbage code:
1. Remove table variables, use temp tables.
2. Remove user defined functions from your predicate if its job is to return the same fixed subset, insert that into a temp table.
The first one can be done by effectively changing an @ to a # and mildly changing the create statements (from declare to create) - doing just this change I took a a 22+ hour long query(didnt want to wait any longer) to a <1 minute query.
This sounds like a bug in postgres (or a low-hanging-fruit optimization); adding the hacky offset 0 to bypass the inlining may help work around the bug now but may hinder future optimizations. My experience teaches me to avoid tricking an optimizer down a path. Write the query in the most easy-to-understand way to avoid massive future pain.
I would also avoid json/xml 'support', it's a recipe for unintended consequences.
I ran into that this week too! Actually, it was a filter that contained a Json path select and it appears that it’s evaluating the path for every row check (maybe because the path lookups can be more dynamic?).
To the author, if timestamp is only a timestamp, that will not guarantee uniqueness. It may be sufficient for your use case, but unless there are other constraints on your system design you may loose data (since they will be treated as duplicates and dropped).
If timestamps are just wall clock timestamps. You should be looking at the SQL type Timestamp (without timezones) as putting a primary key around text fields is not optimal. Not 100% on PostgreSQL, but I would look at moving timestamp to Timestamp and group to varchar.
The author has a conference talk talking about CRDT and the timestamp technique he uses. Really useful if anyone else wonders what's the problem with wall clock timestamps.
EDIT: btw, out of all the comments on the timestamp here, seems like only yours is not complaining about the format, and even points out that the article did not go in depth yet to specify the actual timestamp that he uses.
"Inserting in a loop is a bad idea" would have been a more real title.
The moment your insertions can be done in a loop, _batch them_. Batch them, batch them, batch them. I can't stress this enough. Batch them.
Do keep in mind that your DBMS of choice can have a maximum amount of bound variables (SQLite has 999 maximum by default if I remember correctly. It can quickly become an issue.) If you fear that this might happen, split into multiple batches, batch those batches.
> timestamp TEXT
> PRIMARY KEY(timestamp, ...)
Dear god no. Make it an actual `timestamp` type, or an int if you can't.
I'm 100% with you on everything you said. It's too bad I've seen many JavaScript devs do queries in loops. Queries that can easily be batched. It's interesting for me what's the cause for this low level of relational database understanding.
I regularly encounter code running off of performance cliffs like this in rails application i'm working on -- and it's incredibly difficult to climb the application out of these cliffs ...
Yes, it's a problem in rails too. My current project is a rewrite (in Node) of big rails monolith which is full of synchronous http requests and SQL queries in loops. The sad thing is senior Node developers just port the code, seeing no problems with the aforementioned be.
This (or some of it) seems like “shot in the dark” engineering where the author finds an error condition and instead of doing the work of finding root causes simply twiddles some values until things work.
For example, a 21MB upload with a 72MB INSERT statement (100k messages inserted) fails but a 5MB upload with 30MB INSERT statement (40k messages inserted) works so author just limits to the smaller values and calls it a day. But clearly the new limits are still near the edge of the performance envelope and without knowledge of root cause of the error how do they know the error won’t resurface under different conditions (more load on server, less ram allocated to db, congested network, fuller disk)? How do they know a config change from an upgrade or from a fix to another problem won’t lead to another failure?
It saddens me to know in my gut that a lot of what passes for engineering happens this way. This is not proper debugging.
Training and culture are probably the biggest failures here, I think. Most devs are only equipped to add many print statements to things in terms of debugging, and most companies/dev leads/managers are very comfortable allowing arbitrary half-checked solutions to bugs or performance issues instead of actually understanding the problem.
> and most companies/dev leads/managers are very comfortable allowing arbitrary half-checked solutions
Not really true - they just don't have _any_ grasp on the problem aside from "Yeah I thought you were working on that last week, are you stuck on it? Why is it taking so long? - Oh so you do have a fix that works, cool so how soon can that be merged?"
Almost all of these types of situations lead back to miscommunication or someone in the discussion just not caring.
What is the purpose of storing the messages in the Merkle trie that is stored in the database along with each individual message. It seems to me that the trie here is the bottleneck, not Postgres. And am I reading it correctly that the entire trie is stored as a single blob and has to be modified and written back in its entirety? The whole architecture seems suspect and like its performance is going to be terrible.
And yes multiple insert statements are better than individual ones and yes getting the IDs back is good. Different RDBMSs will handle this differently but batching means higher throughput at the cost of some latency which is often a worthwhile tradeoff.
you might unlock another 9x and fix that crash by using copy to load your messages instead generating arbitrarily large insert statements, and maybe a little more still by using appropriate data types instead of text for identifiers and dates.
It is much faster than INSERT indeed and I’m using it in places where I need to insert a lot of data into Postgres. Can you get RETURNING from COPY, though?
You can't RETURNING from a copy, but you can `CREATE TEMP TABLE foo ON COMMIT DROP`, `COPY` into foo, then `INSERT INTO bar RETURNING` from the temp table.
I don't believe you can...however one of the pieces of code I'm quite proud of was using a combination of c# reflection and postgres built in types to convert and arbitray list of objects into a postgres binary representation and then insert that in bulk. In our situation we are able to "reserve" primary keys by incrementing the sequencer directly by the number of objects we are inserting (separate statement beforehand) and assign the ID before we insert it...so whilst you can't get the IDe from using "returning" you can work around it. Hope that helps someone :-)
The usage of a timestamp (stored as a TEXT column!) as a primary key sets off every single alarm bell I have in my head. This is a shockingly poor decision, and a terrible way to enforce semantic uniqueness.
There are both kinds. A trie is a prefix tree, which has benefits for space efficiency when you want to store values with a lot of common prefixes. Ethereum uses a mix of a Patricia Trie with Merkle Tree to get a combination of both: https://medium.com/codechain/modified-merkle-patricia-trie-h...
OTOH, the wikipedia article they link to seems to imply a regular Merkle tree. I guess it's not possible to really tell without knowing which Merkle tree/trie library they use.
The correct way to do this, I believe, is to `CREATE TEMP TABLE to_insert ON COMMIT DROP`, then `COPY INTO to_insert` and `INSERT INTO target SELECT * FROM to_insert RETURNING`. This gives you all the perf benefit of `COPY` while still giving you the ability to use `RETURNING`.
Copy is great! I do a few hundred million row DB reload in a script. No real time pressure, but I did wonder, couldn't this be faster? COPY is really much nicer, I'm pretty sure I'm down to essentially disk read / network / disk write limits vs processing limits.
COPY api in Postgres avoids a lot of the query parsing and other stuff that takes up cpu time. I have used it in production and have seen great results.
There are cases where the COPY api is not sufficient. For instance it cannot do the ON CONFLICT IGNORE thingy and instead fails the call. In that case we fallback to INSERT ON CONFLICT IGNORE - since in our application we do not expect too many duplicates this works well for us.
There are nicer ways to do batch insert in postgres.
With 'jsob_to_recordset' there is only one parameter which is an array and it gets queried as a table. That has the advantage of avoiding different queries for different inputs (it can be prepared only once, it's harder to commit mistakes...). Also it doesn't have the overhead of creating a huge string.
Another option is copy to. Using pg copy node https://github.com/brianc/node-pg-copy-streams you can insert/query as much as you want from/to node streams without memory footprint. I've used it to pipe millions of records straight to an http response (to download a csv)
For the jsonb
`INSERT INTO some_table (a, b) SELECT x.a, x.b FROM json_to_recordset('[{"a":1,"b":"foo"},{"a":"2","b":"bar"}]') AS x(a int, b text);`
That one is super flexible and it can handle thousands of records. Where you have the array it would be $1 and pass it as a parameter.
With streams it's trickier, probably you need to create a temporary table to pipe everything and then another query to insert where you want. Also you need to deal with async errors which is a pain. But you have 0 memory load. I've used it to insert/select millions of records
```
pool.connect(function(err, client, done) {
var stream = client.query(copyFrom('COPY my_table FROM STDIN'));
var fileStream = fs.createReadStream('some_file.tsv')
fileStream.on('error', done);
stream.on('error', done);
stream.on('end', done);
fileStream.pipe(stream);
});
```
The number of times people I work with would greatly benefit from reading the table of contents of our DBMS manual (...me included) is actually staggering.
Read the fine manual of your database. Then read it again. Chances are you’ll be very surprised.
Full disclaimer - Timescale person here - but your use case seems like an example of what a time-series database is for, and given that TimescaleDB[0] is implemented as an extension to Postgres, it's a real obvious fit.
It's designed around the idea of time-series data. It should give you better insert and query performance since it's geared towards that workload. Aside from that, you could even reap the storage benefits using native compression that it supports.
The data is apparently for bookkeeping of some kind. Not really a time series in the usual sense. In fact relying on timestamps for uniqueness in an OLTP situation is optimistic bordering on problematic.
It's astonishing how poorly the author understands his tools, both relationals databases in general and Postgres specifically. There are some "discoveries" here (both ones that he made and others that he needs to be told about, like how bad of an idea it is to use timestamp as a primary key) that would be in the first few chapters of a Postgres book.
> A much bigger problem is #2. We need the result of the insert to know if a row was inserted or not.
You can write a trigger on insert and make second insert there, or enclose first insert into CTE and then make second insert selecting from the first one
What do you mean by “it’s implementation dependent”? RETURNING is a Postgres-specific clause, so of course its order is implementation dependent. There is only one Postgres, as far as I know.
They do wrap it in a transaction, they just omit the code for it:
> This is mostly the real code, the only difference is we also rollback the transaction on failure. It's extremely important that this happens in a transaction and both the messages and merkle trie are updated atomically.
I know Ethereum uses Merkel Trees to store state but this app already has Postgres and I imagine that it would be better architecturally to not write a big blob data structure back and forth. WITH RECURSIVE queries and the right table layout could make this entire thing way simpler to operate.
This is not an uncharitable read btw. Actual quotes:
> The first thing I learned was how to insert multiple rows with a single INSERT statement: ...
> Scouring the docs I discovered the RETURNING clause of an INSERT statement.