One word in PostgreSQL unlocked a 9x performance improvement

oefrha · on June 14, 2020

Long as hell post with a clickbait title documenting how the author found out (1) one can insert multiple rows with a single INSERT statement; (2) the RETURNING clause. Spend your time elsewhere if you’re already aware; maybe pick up a SQL textbook if you aren’t...

This is not an uncharitable read btw. Actual quotes:

> The first thing I learned was how to insert multiple rows with a single INSERT statement: ...

> Scouring the docs I discovered the RETURNING clause of an INSERT statement.

quasarj · on June 14, 2020

I enjoyed trying to guess at what the "one word" was he was going to discover. I was mostly right!

That said, I was unaware of the big discovery in the post: Doing INSERT ... RETURNING ... ON CONFLICT DO NOTHING, and passing an array of rows to insert, the returning clause will return only the values from the rows that were actually inserted. IMO this is the "obvious" behavior, but it is still worthwhile to know it works!

nicoburns · on June 14, 2020

> IMO this is the "obvious" behavior, but it is still worthwhile to know it works!

I think this is why people love Postgres so much. 99 times out of 100, the "obvious" behaviour turns out to be the behaviour that Postgres actually implements.

egeozcan · on June 14, 2020

But your battle-trained mined, injured from weird RDBMSs, is always expecting the worst, and you still open the docs, searching for the gotcha.

teraku · on June 14, 2020

Or like my colleagues: find behaviour that 99 out of 100 db-people find intuitive weird because it's not like WhAt ThEy ArE uSeD tO

dkersten · on June 14, 2020

What are they used to?

fphhotchips · on June 14, 2020

Depends on the industry segment, but I find that MySQL/MariaDB users tend to be the worst for this.

All the RDBMS systems are guilty of it to some degree though.

teraku · on June 15, 2020

Oracle

_t0du · on June 14, 2020

The actual Python "pit of success" is postgres.

Kwantuum · on June 14, 2020

Well it's not just the obvious behaviour, it seems like the only useful behaviour, just like the default is returning the number of affected rows (which for insert, would be the number of inserts). You know ahead of time how many items you're trying to insert, it would be completely pointless to return that.

xeonoex · on June 14, 2020

> You know ahead of time how many items you're trying to insert, it would be completely pointless to return that.

In simple CRUD cases yes. In many other scenarios, you might not. If your insert is a select statement, with joins and clauses, you can easily insert too much data, or no data at all.

OJFord · on June 14, 2020

Is it? Returning the conflicts that weren't inserted seems more obviously useful to me.

saltcured · on June 14, 2020

The RETURNING semantics produce content of the resulting row after insertion. For the ones that have been skipped, this could be ill-defined. The content is not just the content of the row data that was supplied, but potentially server-generated content from sequences, row-level triggers, etc.

ivoras · on June 14, 2020

There's one HUGE upside for doing it the current way: autogenerated columns.

E.g.

    CREATE TABLE t (id SERIAL PRIMARY KEY, x INTEGER UNIQUE);

    INSERT INTO t (x) VALUES (42), (43) ON CONFLICT DO NOTHING RETURNING *;

This does EXACTLY the most useful thing if one of '42' or '43' conflicts.

magicalhippo · on June 14, 2020

> You know ahead of time how many items you're trying to insert, it would be completely pointless to return that.

I often use "insert into mytable(...) select ... from ... where ...", and I know I only want one row inserted.

In such cases checking that exactly one row was inserted is a very nice safeguard against buggy code/query, or in the freak case someone messed with the DB.

chinhodado · on June 14, 2020

I've been looking for way to bulk insert for postgres, but with the ability to ignore any error for any single row (including FK/unique constraint/check constraint/post insert trigger/whatever) and after it finishes give me the list of the rows that failed to insert. I haven't found any way to do this yet. Anyone has any idea?

biddlesby · on June 14, 2020

I didn't know about RETURNING and enjoyed the post. It's a much nicer way to learn than via a textbook. A blog post doesn't have to be novel to have value.

That said, I'm with you on the clickbait title. That really grinds my gears.

oefrha · on June 14, 2020

If you never had a formal introduction to SQL, you can probably get a lot of value from a run-of-the-mill SQL textbook by scanning the table of contents and skimming the chapters/sections on topics new to you. (For the record, I learned SQL myself through fumbling. My first schema designs didn’t even have relations.)

The problem with this blog post isn’t lack of novelty; rather, with the clickbait title, it piqued my interest but I ended up learning nothing, other than being reminded of my own terrible designs and queries back when I was a noob.

badrequest · on June 14, 2020

All that is well and good, but RETURNING is a Postgres feature, not part of the SQL standard. I don't think a standard SQL textbook (especially one targeting something like MySQL/Sqlite) would have taught the author about it.

fabian2k · on June 14, 2020

I was quite surprised when I found out that RETURNING is actually a Postgres feature, and not standard. It just felt like such an obvious and extremely useful thing to have.

AdrianB1 · on June 14, 2020

RETURNING exists also in Oracle PL/SQL and it is called OUTPUT in T-SQL. It may exist in other SQL variants.

branko_d · on June 15, 2020

> My first schema designs didn’t even have relations.

I'm pretty sure your first schema designs had relations (tables). I can't speak for relationSHIPS (foreign keys) though...

;)

Swizec · on June 14, 2020

But would any of us have seen this blogpost without the clickbait title? Probably not

devmunchies · on June 14, 2020

Agreed. I would expect learning about more edge cases here on HN. These are basic PG features you can see when looking at the INSERT definition in the docs.

I second what you said about reading a text book. I read through the entire PG documentation and a PG book my second year as an engineer and 7 years later its paid off big time. It’s one of those things where if you do it early, you get compounding benefits over your career.

revskill · on June 14, 2020

The title is fine. The problem is structure of the post. If author puts actual that word with a story to follow then all is good.

aaomidi · on June 14, 2020

What SQL Textbook should I read?

sgk284 · on June 14, 2020

The Postgres docs are some of the best reading you can do. Super digestible, great examples, and all around excellent for learning.

ant6n · on June 14, 2020

"Become a Select St*r!" (https://wizardzines.com/zines/sql/)

wiml · on June 14, 2020

Once you have a basic understanding of SQL, I found it really informative to go closer to the source, e.g. C. J. Date and Hugh Darwen's "A Guide to the SQL Standard", or perhaps other books by them.

There's a lot of incomplete and misunderstood information floating around in blog tutorials or potboiler textbooks.

aaomidi · on June 15, 2020

This sounds great!

3pt14159 · on June 14, 2020

I found reading the postgres documentation and playing around with each of the examples useful.

jmalicki · on June 15, 2020

I highly recommend SQL for Smarties.

https://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/d...

xiphias2 · on June 14, 2020

Thanks, flagged it

malisper · on June 14, 2020

I've personally experienced cases where making a slight change to a query in a non-obvious way resulted in a 10x or more speedup.

My personal favorite is I was generating a query that pulled out a dozen different fields from a large JSONb column. Naturally, you would think Postgres would read the JSONb field once, then pull out the individual fields from it. Instead, Postgres was reading the JSONb column once per each field. I figured out this was the case because the number of blocks read from EXPLAIN (ANALYZE, BUFFERS) went up proportionally to the number of fields I extracted from the JSONb column. Reading the JSONb column was especially expensive because Postgres needed to deTOAST[0] the JSONb column.

The obvious fix is to write a subquery to read the entire JSONb column and then have the outer query extract the individual fields, but that doesn't work! Postgres will inline the subquery, basically undoing your attempt to prevent the unnecessary accesses. In the end, the solution wound up being to add OFFSET 0 to the end of the subquery. That doesn't change the semantics of the query, but it does prevent Postgres from inlining the JSONb column access.

[0] https://malisper.me/postgres-toast/

ris · on June 14, 2020

Something you'll discover when you do enough query optimization is that postgres' query planner isn't clause-order invariant. i.e. a AND b won't necessarily give you the same query plan as b AND a.

My first reaction was that this was awful, but the more I thought about it, the more I was grateful that postgres (accidentally) gave me this knob to play with. Optimization of complex queries is a very tricky thing, and most engines won't do a good job of it 100% of the time. What I realized would have been a worse situation is if postgres sometimes picked a bad plan and there was little I could do to avoid it without reforming the query (a pain when you're generating the query with a query compiler already). Reordering the clauses effectively allowed me to ask the planner to roll the dice again.

Following this, I may or may not have gone on to build a cache for the system that timed queries and kept notes of "good" clause orders for common queries, resorting to random ones otherwise....

shoo · on June 14, 2020

A variation of this is that the query planner may produce better plan if you add additional redundant constraints to the query that are logically implied by the constraints that are already there.

Personally, it feels like a bit of a fragile mess trying to trick a sometimes-clever-sometimes-dumb optimiser into doing what you want by subtle indirect hacks - because the interface doesn't give you a way to directly override bad automated optimiser decisions.

ris · on June 14, 2020

> a fragile mess

Oh, I'm not saying it wasn't a fragile mess...

Seriously I actually considered the fragility of it to be a positive, precisely because it wasn't a hard "optimizer decision" that I was forcing. If you force something like that, you've got to take full responsibility for it - the planner will lose all intelligence over a certain decision and no longer do the clever thing query planners do and take the current distribution of the database's data into account to allow it to make better decisions. When you upgrade the database and the query planner gets smarter (or dumber), you've got to re-evaluate whether that's still the right decision. I could imagine an app with a bunch of out of date hard-forced decisions to be crippling performance wise.

A look-aside of automatically deduced hints with a validity of less than a week seemed like a much lighter touch.

shoo · on June 14, 2020

Totally get what you're saying. Ideally there's a way for user to supply additional information or constraints to the optimisation process which are used to influence the results without turning the optimisation process off completely. Although any scheme of doing that could, as you say, produce poor results if the distribution of data changes over time or the db planner code is changed.

I used to work on non-database decision support tool that incorporated a custom optimiser which was used to spit out crude engineering designs for a particular kind of construction problem. The optimisation problem was difficult & the implementation to solve it was not state of the art: there were a few preprocessing stages that were used to lock in some early decisions using heuristics -- which helped massively reduce the search space, then a global optimisation approach was run on the remaining sub problem. The result was that the overall algorithm would locally optimise after perhaps locking in a bad early decision. Once the software was delivered to the client and in use by a small team of users I later discovered that the users had figured out that by running the software repeatedly with very small adjustments to input parameters (adjustments that should not obviously matter) they could bump the optimiser into outputting wildly different designs. It was a little bit like repeatedly pulling the handle on a one armed bandit until it eventually gave you a decent output.

It was clever of our users to figure this out but the overall UI/UX was appalling, they had to click and re run a somewhat slow batch process until it produced a reasonable result. It would have been much better to give the users a user interface where they could directly override or constrain parts of of the engineering design problem in an ergonomic way, and then let the optimisation algorithm loose to make the remainder of the decisions.

tirumaraiselvan · on June 15, 2020

> postgres' query planner isn't clause-order invariant. i.e. a AND b won't necessarily give you the same query plan as b AND a.

Do you have any references which shows this behaviour? Or a way to reproduce it?

ris · on June 15, 2020

Not to hand, but it's not amazingly hard to reproduce. You just have to build quite a complex query (many joins in our case) and play around with the clauses, watching the EXPLAIN output.

andreareina · on June 14, 2020

Did you happen to try it with a CTE/WITH expression? Prior versions are not supposed to inline those, and the current version allows you to explicitly {pre,pro}scribe inlining: https://paquier.xyz/postgresql-2/postgres-12-with-materializ...

malisper · on June 14, 2020

A CTE probably would have fixed it, but we weren't able to use them. We were using Citus which at the time didn't support CTEs.

Using a CTE to do this would also be suboptimial since it materializes the entire result of the subquery in memory. This is as opposed to the OFFSET 0 which would only materialize one row at a time.

hobs · on June 14, 2020

Here's some one's you can do in SQL Server and probably easily attain that in your garbage code: 1. Remove table variables, use temp tables. 2. Remove user defined functions from your predicate if its job is to return the same fixed subset, insert that into a temp table.

The first one can be done by effectively changing an @ to a # and mildly changing the create statements (from declare to create) - doing just this change I took a a 22+ hour long query(didnt want to wait any longer) to a <1 minute query.

cosmotic · on June 14, 2020

This sounds like a bug in postgres (or a low-hanging-fruit optimization); adding the hacky offset 0 to bypass the inlining may help work around the bug now but may hinder future optimizations. My experience teaches me to avoid tricking an optimizer down a path. Write the query in the most easy-to-understand way to avoid massive future pain.

I would also avoid json/xml 'support', it's a recipe for unintended consequences.

aidos · on June 14, 2020

I ran into that this week too! Actually, it was a filter that contained a Json path select and it appears that it’s evaluating the path for every row check (maybe because the path lookups can be more dynamic?).

murkt · on June 14, 2020

Oh wow, good to know, I’ll check if I have such things in my codebase. Which version of the Postgres are you using?

mey · on June 14, 2020

To the author, if timestamp is only a timestamp, that will not guarantee uniqueness. It may be sufficient for your use case, but unless there are other constraints on your system design you may loose data (since they will be treated as duplicates and dropped).

If timestamps are just wall clock timestamps. You should be looking at the SQL type Timestamp (without timezones) as putting a primary key around text fields is not optimal. Not 100% on PostgreSQL, but I would look at moving timestamp to Timestamp and group to varchar.

raihansaputra · on June 14, 2020

The author has a conference talk talking about CRDT and the timestamp technique he uses. Really useful if anyone else wonders what's the problem with wall clock timestamps.

dotJS 2019 - James Long - CRDTs for Mortals (20:44): https://www.youtube.com/watch?v=DEcwa68f-jY

EDIT: btw, out of all the comments on the timestamp here, seems like only yours is not complaining about the format, and even points out that the article did not go in depth yet to specify the actual timestamp that he uses.

ohgodplsno · on June 14, 2020

"Inserting in a loop is a bad idea" would have been a more real title.

The moment your insertions can be done in a loop, _batch them_. Batch them, batch them, batch them. I can't stress this enough. Batch them.

Do keep in mind that your DBMS of choice can have a maximum amount of bound variables (SQLite has 999 maximum by default if I remember correctly. It can quickly become an issue.) If you fear that this might happen, split into multiple batches, batch those batches.

> timestamp TEXT

> PRIMARY KEY(timestamp, ...)

Dear god no. Make it an actual `timestamp` type, or an int if you can't.

nesarkvechnep · on June 14, 2020

I'm 100% with you on everything you said. It's too bad I've seen many JavaScript devs do queries in loops. Queries that can easily be batched. It's interesting for me what's the cause for this low level of relational database understanding.

breatheoften · on June 14, 2020

This is not a javascript only phenomenon...

I regularly encounter code running off of performance cliffs like this in rails application i'm working on -- and it's incredibly difficult to climb the application out of these cliffs ...

nesarkvechnep · on June 15, 2020

Yes, it's a problem in rails too. My current project is a rewrite (in Node) of big rails monolith which is full of synchronous http requests and SQL queries in loops. The sad thing is senior Node developers just port the code, seeing no problems with the aforementioned be.

jatone · on June 14, 2020

i see it a bunch with activerecord like ORMS. the devs are just too lazy to care and it drags entire systems down.

mapgrep · on June 14, 2020

This (or some of it) seems like “shot in the dark” engineering where the author finds an error condition and instead of doing the work of finding root causes simply twiddles some values until things work.

For example, a 21MB upload with a 72MB INSERT statement (100k messages inserted) fails but a 5MB upload with 30MB INSERT statement (40k messages inserted) works so author just limits to the smaller values and calls it a day. But clearly the new limits are still near the edge of the performance envelope and without knowledge of root cause of the error how do they know the error won’t resurface under different conditions (more load on server, less ram allocated to db, congested network, fuller disk)? How do they know a config change from an upgrade or from a fix to another problem won’t lead to another failure?

It saddens me to know in my gut that a lot of what passes for engineering happens this way. This is not proper debugging.

_t0du · on June 14, 2020

Training and culture are probably the biggest failures here, I think. Most devs are only equipped to add many print statements to things in terms of debugging, and most companies/dev leads/managers are very comfortable allowing arbitrary half-checked solutions to bugs or performance issues instead of actually understanding the problem.

emidoots · on June 14, 2020

> and most companies/dev leads/managers are very comfortable allowing arbitrary half-checked solutions

Not really true - they just don't have _any_ grasp on the problem aside from "Yeah I thought you were working on that last week, are you stuck on it? Why is it taking so long? - Oh so you do have a fix that works, cool so how soon can that be merged?"

Almost all of these types of situations lead back to miscommunication or someone in the discussion just not caring.

_t0du · on June 15, 2020

Yeah I disagree with your statement a lot, but YMMV.

IgorPartola · on June 14, 2020

What is the purpose of storing the messages in the Merkle trie that is stored in the database along with each individual message. It seems to me that the trie here is the bottleneck, not Postgres. And am I reading it correctly that the entire trie is stored as a single blob and has to be modified and written back in its entirety? The whole architecture seems suspect and like its performance is going to be terrible.

And yes multiple insert statements are better than individual ones and yes getting the IDs back is good. Different RDBMSs will handle this differently but batching means higher throughput at the cost of some latency which is often a worthwhile tradeoff.

gfody · on June 14, 2020

you might unlock another 9x and fix that crash by using copy to load your messages instead generating arbitrarily large insert statements, and maybe a little more still by using appropriate data types instead of text for identifiers and dates.

murkt · on June 14, 2020

It is much faster than INSERT indeed and I’m using it in places where I need to insert a lot of data into Postgres. Can you get RETURNING from COPY, though?

Smaug123 · on June 14, 2020

You can't RETURNING from a copy, but you can `CREATE TEMP TABLE foo ON COMMIT DROP`, `COPY` into foo, then `INSERT INTO bar RETURNING` from the temp table.

morrbo · on June 14, 2020

I don't believe you can...however one of the pieces of code I'm quite proud of was using a combination of c# reflection and postgres built in types to convert and arbitray list of objects into a postgres binary representation and then insert that in bulk. In our situation we are able to "reserve" primary keys by incrementing the sequencer directly by the number of objects we are inserting (separate statement beforehand) and assign the ID before we insert it...so whilst you can't get the IDe from using "returning" you can work around it. Hope that helps someone :-)

dpratt · on June 14, 2020

The usage of a timestamp (stored as a TEXT column!) as a primary key sets off every single alarm bell I have in my head. This is a shockingly poor decision, and a terrible way to enforce semantic uniqueness.

jacques_chester · on June 14, 2020

You needn't worry, friend, you can just ON CONFLICT DO NOTHING. That's how financial records are meant to be kept, I believe.

tobyhinloopen · on June 14, 2020

Wait until he learns about using joins for 30x better performance compared to n+1 queries

granitepail · on June 14, 2020

I wonder how the naive timestamp approach works when multiple users are syncing at the same time?

trashburger · on June 14, 2020

Is "merkle trie" the correct spelling? I always thought "Merkle tree" was how it was spelled.

WJW · on June 14, 2020

There are both kinds. A trie is a prefix tree, which has benefits for space efficiency when you want to store values with a lot of common prefixes. Ethereum uses a mix of a Patricia Trie with Merkle Tree to get a combination of both: https://medium.com/codechain/modified-merkle-patricia-trie-h...

OTOH, the wikipedia article they link to seems to imply a regular Merkle tree. I guess it's not possible to really tell without knowing which Merkle tree/trie library they use.

Smaug123 · on June 14, 2020

The correct way to do this, I believe, is to `CREATE TEMP TABLE to_insert ON COMMIT DROP`, then `COPY INTO to_insert` and `INSERT INTO target SELECT * FROM to_insert RETURNING`. This gives you all the perf benefit of `COPY` while still giving you the ability to use `RETURNING`.

thoraway1010 · on June 14, 2020

Copy is great! I do a few hundred million row DB reload in a script. No real time pressure, but I did wonder, couldn't this be faster? COPY is really much nicer, I'm pretty sure I'm down to essentially disk read / network / disk write limits vs processing limits.

chinhodado · on June 14, 2020

Can you explain more about COPY? I never used it before and would like to know what its benefits are.

swaranga · on June 14, 2020

COPY api in Postgres avoids a lot of the query parsing and other stuff that takes up cpu time. I have used it in production and have seen great results.

There are cases where the COPY api is not sufficient. For instance it cannot do the ON CONFLICT IGNORE thingy and instead fails the call. In that case we fallback to INSERT ON CONFLICT IGNORE - since in our application we do not expect too many duplicates this works well for us.

furstenheim · on June 14, 2020

There are nicer ways to do batch insert in postgres.

With 'jsob_to_recordset' there is only one parameter which is an array and it gets queried as a table. That has the advantage of avoiding different queries for different inputs (it can be prepared only once, it's harder to commit mistakes...). Also it doesn't have the overhead of creating a huge string.

Another option is copy to. Using pg copy node https://github.com/brianc/node-pg-copy-streams you can insert/query as much as you want from/to node streams without memory footprint. I've used it to pipe millions of records straight to an http response (to download a csv)

chinhodado · on June 14, 2020

Can you elaborate please? I would like to learn more about this.

furstenheim · on June 20, 2020

Hey, I just saw your comment.

For the jsonb `INSERT INTO some_table (a, b) SELECT x.a, x.b FROM json_to_recordset('[{"a":1,"b":"foo"},{"a":"2","b":"bar"}]') AS x(a int, b text);` That one is super flexible and it can handle thousands of records. Where you have the array it would be $1 and pass it as a parameter.

With streams it's trickier, probably you need to create a temporary table to pipe everything and then another query to insert where you want. Also you need to deal with async errors which is a pain. But you have 0 memory load. I've used it to insert/select millions of records ``` pool.connect(function(err, client, done) { var stream = client.query(copyFrom('COPY my_table FROM STDIN')); var fileStream = fs.createReadStream('some_file.tsv') fileStream.on('error', done); stream.on('error', done); stream.on('end', done); fileStream.pipe(stream); }); ```

Southland · on June 14, 2020

Why is the timestamp field text...

knorker · on June 14, 2020

Why is group_id?

baq · on June 14, 2020

The number of times people I work with would greatly benefit from reading the table of contents of our DBMS manual (...me included) is actually staggering.

Read the fine manual of your database. Then read it again. Chances are you’ll be very surprised.

akresic · on June 14, 2020

Full disclaimer - Timescale person here - but your use case seems like an example of what a time-series database is for, and given that TimescaleDB[0] is implemented as an extension to Postgres, it's a real obvious fit.

It's designed around the idea of time-series data. It should give you better insert and query performance since it's geared towards that workload. Aside from that, you could even reap the storage benefits using native compression that it supports.

[0] https://www.timescale.com/

jacques_chester · on June 14, 2020

The data is apparently for bookkeeping of some kind. Not really a time series in the usual sense. In fact relying on timestamps for uniqueness in an OLTP situation is optimistic bordering on problematic.

smt88 · on June 14, 2020

It's astonishing how poorly the author understands his tools, both relationals databases in general and Postgres specifically. There are some "discoveries" here (both ones that he made and others that he needs to be told about, like how bad of an idea it is to use timestamp as a primary key) that would be in the first few chapters of a Postgres book.

ptrwis · on June 14, 2020

> A much bigger problem is #2. We need the result of the insert to know if a row was inserted or not.

You can write a trigger on insert and make second insert there, or enclose first insert into CTE and then make second insert selecting from the first one

orangepanda · on June 14, 2020

What is the order of rows returned by insert ... returning? Going over docs it doesnt seem to be mentioned.

This would be nice to retrieve auto generated primary keys when doing bulk inserts. Can this be done deterministically?

murkt · on June 15, 2020

I would be very surprised to learn that RETURNING order is different from the order of rows in INSERT statement.

Some ORMs (at least SQLAlchemy) do that already, they can fill primary keys in your objects during single inserts, and in dicts during bulk insert.

grzm · on June 15, 2020

Don’t rely on the order: its implementation dependent. Results are sets of rows; unordered.

murkt · on June 15, 2020

What do you mean by “it’s implementation dependent”? RETURNING is a Postgres-specific clause, so of course its order is implementation dependent. There is only one Postgres, as far as I know.

tpxl · on June 15, 2020

There are many versions of one Postgres :) If it's not defined in the docs/spec, it may change in the future (without notice).

jlgaddis · on June 14, 2020

The blog post doesn't give me much confidence in Actual (whatever that may be).

lmilcin · on June 14, 2020

Think of it the other way: one small mistake caused 10x performance degradation.

How is that sounding now?

ccleve · on June 14, 2020

Yes, inserting multiple values into Postgres really does work, but beware of inserting too many at once.

I started generating insert statements of about 200k in length and it caused Amazon Aurora to start dropping connections.

If you insert 10 or 100 values, you'll get a 10x or 100x speedup, which is usually good enough.

meigwilym · on June 14, 2020

But do DBAs hate him?

rory_isAdonk · on June 14, 2020

Dear Lord why do we upvote this thinly veiled corporate sales pitch.

the-dude · on June 14, 2020

I wonder if wrapping it in a transaction would have resulted in the same improvement.

malisper · on June 14, 2020

They do wrap it in a transaction, they just omit the code for it:

> This is mostly the real code, the only difference is we also rollback the transaction on failure. It's extremely important that this happens in a transaction and both the messages and merkle trie are updated atomically.

icheishvili · on June 14, 2020

I know Ethereum uses Merkel Trees to store state but this app already has Postgres and I imagine that it would be better architecturally to not write a big blob data structure back and forth. WITH RECURSIVE queries and the right table layout could make this entire thing way simpler to operate.