Postgres should be good enough for 300GB, no?

wiredfool · 2024-10-22T21:19:07 1729631947

I had a postgres database where the main index (160gb) was larger than the entire equivalent clickhouse database (60gb). And between the partitioning and the natural keys, the primary key index in clickhouse was about 20k per partition * ~ 1k partitions.

Now, it wasn't a good schema to start with, and there was about a factor of 3 or 4 size that could be pulled out, but clickhouse was a factor of 20 better for on disk size for what we were doing.

marginalia_nu · 2024-10-22T20:36:42 1729629402

At least in my experience, that's about when regular DBMS:es kinda start to suck for ad-hoc queries. You can push them a bit farther for non-analytical usecases if you're really careful and have prepared indexes that assist every query you make, but that's rarely a luxury you have in OLAP-land.

tempest_ · 2024-10-22T19:35:51 1729625751

It depends, if you want to do any kind of aggregation, counts, or count distinct pg falls over pretty quickly.

notamy · 2024-10-22T18:58:21 1729623501

Probably, but Clickhouse has been zero-maintenance for me + my dataset is growing at 100~200GB/month. Having the Clickhouse automatic compression makes me worry a lot less about disk space.

whalesalad · 2024-10-22T20:20:38 1729628438

For write heavy workloads I find psql to be a dog tbh. I use it everywhere but am anxious to try new tools.

For truly big data (terabytes per month) we rely on BigQuery. For smaller data that is more OLTP write heavy we are using psql… but I think there is room in the middle.

jacobsenscott · 2024-10-23T17:11:54 1729703514

Yes, but you're starting to get to the size where you need some real PG expertise to keep the wheels on. If your data is growing CH will just work out of box for a lot longer.