PostgreSQL and ZFS is a marriage made in heaven. Block aligns to disk behaviour,...

smilliken · on April 3, 2020

FWIW, I've often called out my decision to use PostgreSQL and ZFS (on Linux) as my worst.

Memory usage was extremely inefficient. ARC had to be set to half of what it should've been because it varied so dramatically, so half the system memory was wasted. ZFS would occasionally exhaust system memory causing a block on allocations for 10+ minutes almost daily, no ssh or postgresql connections could be opened in the meantime. Sometimes a zfs kernel process would get stuck in a bad state and require restarting the server. Many days were wasted testing different raid schemes to work around dramatic space inefficiencies (2-8x IIRC) with the wrong combinations of using disk block sizes, zfs record sizes, postgresql block sizes, and zfs record compression. Because zfs records are atomic and larger than disk pages, writes have to be prefaced with reads for the other disk pages, adding lots of latency and random IOPS load. Bunch of other issues, I could go on.

I've since switched back to ext4 and hardware RAID. Median and average latency dropped an order of magnitude. 99th percentile latency dropped 2 orders of magnitude.

These databases are at high load. If they had low load, and I wasn't expecting to grow into high load, I'd consider ZFS since it does have a bunch of nice features.

ggm · on April 3, 2020

I'm on a Read-mostly DB. Also, I have SSD backed ARC and its recent ZFS with a lot of memory (512+GB IIRC)

smilliken · on April 5, 2020

Same here, latest ZFS (on Linux), 512GB RAM, SSD backed, but write heavy.

yjftsjthsd-h · on April 3, 2020

And compression! I've gotten compression ratios above 3x on real production databases, which is the difference between "we need more disks by next month" and "this hardware will last the foreseeable future". Not to mention that it improves performance when lz4 can decompress faster than the disks can read.

od0 · on April 4, 2020

I'm getting almost 4x on a 10TB production database! A little surprising the first time you see all the z_wr_iss processes flying :)

pritambarhate · on April 3, 2020

Can you please give us more info about your workload? Is it very high throughput DB? Especially for Inserts and Updates? I searched on ZFS and PostgreSQL performance a couple of years ago on the Internet and popular opinion seems to be databases with Copy on Write type of file systems don't give the best performance.

riku_iki · on April 3, 2020

I am not deep expert, but my impression is that Postgres MVCC is kinda Copy on Write approach already (no in place updates, but you write new page for each update), so it should be perfect marriage..

ggm · on April 3, 2020

I'm on a read-mostly DB. few clients, but constant loads. More inserts than updates.