If this story is true the paper is not solid. Claims in the abstract and claim 3... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

light_hue_1 on July 17, 2023 | parent | context | favorite | on: Bad numbers in the “gzip beats BERT” paper?

If this story is true the paper is not solid.

Claims in the abstract and claim 3 in the paper, as well as much of the publicity around the paper is just wrong.

It takes gzip from being great out of domain to being middling at best. It goes from something really interesting to a "meh" model. The main part that was intellectually interesting is how robust gzip is out of domain, if that's gone, there isn't much here.

If I was the reviewer for this paper, this would take the paper from an accept to a "submit to a workshop".

Also, kNN methods are slow O(n^2).

ivirshup on July 17, 2023 [–]

kNN methods are broadly not O(n^2)[1], especially in practice where approximate methods are used.

[1]: https://en.wikipedia.org/wiki/Nearest_neighbor_search

huac on July 17, 2023 | [–]

how would you build an index over the gzip encoded data? seems quite different from building indices over vector embeddings.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact