I *was* comparing it with Parquet, which is much more complex, but has features ...

cm2187 · 2024-04-14T01:12:40 1713057160

you mentioned NLJSON and CSV, which would require to read all columns from the disk.

orthoxerox · 2024-04-14T07:44:35 1713080675

Yes, but you would usually have to read at least two columns anyway. What are the datasets that are too large to be ingested completely, but too small for a proper columnar format?

If ZSV is meant to occupy the gap between CSV/NLJSON (smaller datasets) and Parquet/DuckDB (larger datasets), this niche is actually really small, if not nonexistent.

cm2187 · 2024-04-14T08:13:48 1713082428

yes it's unclear to me what is the advantage over parquet with compression. And there are enough file formats flying around already.