Hacker News new | past | comments | ask | show | jobs | submit login

I wonder who will win in the long term between Brotli and zstd. http://facebook.github.io/zstd/



There are some design decisions in Brotli I just don't quite understand [1][2][3], like what's going on with its dictionary [2]. One of the Brotli authors is active in this thread, so perhaps they can talk about this.

Zstandard is pretty solid, but lacks deployment on general-purpose web browsers. Firefox and Edge have followed Google's lead and added or about to add support for Brotli. Both Brotli and Zstandard see usage in behind-the-scenes situations, on-the-wire in custom protocols, and the like.

As for widespread use on files-sitting-on-disk, on perhaps average people's computers, I think we're quite a few years and quite some time away from replacing containers and compressors that have been around for a long time, and are still being used because of compatibility and lack of pressure to switch to a non-backwards-compatible alternative [4].

[1] https://news.ycombinator.com/item?id=12010313 [2] https://news.ycombinator.com/item?id=12003131 [3] https://news.ycombinator.com/item?id=12400379 [4] https://news.ycombinator.com/item?id=13171374


> https://news.ycombinator.com/item?id=12003131

This is some sort of misunderstanding. If one replaces the static dictionary with zeros, one can easily benchmark brotli without the static dictionary. If one actually benchmarks it, one can learn the two things:

1) With the short (~50 kB) documents there is about an 7 % saving because of the static dictionary. There is still a 14 % win over gzip.

2) There is no compression density advantage for long documents (1+ MB).

Brotli's savings come to a large degree from algorithmic improvements, not from the static dictionary.

> https://news.ycombinator.com/item?id=12010313

The transformations make the dictionary a small bit more efficient without increasing the size of the dictionary. Think that out of the 7 % savings that the dictionary brings, about 1.5 % units (~20 %) are because of the transformations. However, the dictionary is 120 kB and the transformations less than 1 kB. So, transformations are more cost efficient than basic form of the dictionary.

> https://news.ycombinator.com/item?id=12400379

Brotli's dictionary was generated with a process that leads to the largest gain in entropy, i.e., every term and their ordering was chosen for the smallest size -- considering how many bits it would have costs to express those terms using other features of brotli. Even if results looks disgusting or difficult to understand, the process to generate it was quite delicate.

The same for transforms, but there it was mostly the ordering that we iterated with and generated candidate transforms using a large variety of tools.


ZSTD.

It is superior to Brotli in most categories (decompression, compression ratios, and compression speeds). The real issue with Brotli is the second order context modeling (compression level >8). Causes you to lose ~50% compression speed for less then a ~1% gain in ratios [1].

I've spoken to the author about this on twitter. They're planning on expanding Brotli dictionary features and context modeling in future versions.

Overall it isn't a bad algorithm. Brotli and ZSTD are head and shoulders above LZMA/LZMA2/XZ. Pulling off comparable compression ratios in half to a quarter of the time [1]. They make GZip and Bzip2 look outdated (which frankly its about time).

ZSTD really just needs a way to package dictionaries WITH archives.

[1] These are just based on personal benchmarks while building a tar clone that supports zstd/brotli files https://github.com/valarauca/car


What use case do you have in mind for packaging dictionaries with archives? There is an ongoing discussion about a jump table format that could contain dictionary locations [1].

[1] https://github.com/facebook/zstd/issues/395


For large files >1GiB a library + archive is often smaller then the archive on its own.


How are you compressing the data?

I would expect a dictionary to be useful if the data is broken into chunks, and each chunk is compressed individually.

If the data is compressed as one frame, I would be very interested in an example where the dictionary helps.


In my benchmarks brotli compresses more densely, compresses typically faster to a given density, but decompresses slower.

I benchmark with internet-like loads, not with 50-1000 MB compression research corpora.


When i last ran the numbers a few months ago[1], for the same time spent in the compressor, zstd almost always produced a smaller output than brotli.

1. https://code.ivysaur.me/compression-performance-test/


For now at least, Brotli is the winner. It's already in the browsers.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: