Hacker News new | past | comments | ask | show | jobs | submit login

I wonder what the probability that a files is not compressible is.

Even more, I wonder if the likelihood that a file is compressible increases, decreases, or is constant with relation to file size.

My assumption would be that 50% of files are not compressible at a given size, and that compressibility doesn't vary with file size.




> My assumption would be that 50% of files are not compressible at a given size, and that compressibility doesn't vary with file size.

This would mean that say, for a 2 MB random file there's 75% chance that at least one of its halves is compressible but only 50% chance that the file as a whole is compressible. Not impossible but suspicious if you ask me.


Ah, but let me go through the possibilities.

Neither side compressed: okay the file is the original length, let's move on

Both sides compressed: okay you split it in half and decompress each side, you found the lucky one in four two-bit savings

One side compressed: uh oh, you don't know which half compressed. if you want to decompress you're going to need extra info to know where the first half ends.

You could add an extra bit to store which side compressed but then you lose the compression. You could decide to only accept situations where side A compressed, but then you're back to only half the files compressing. etc. There is no way to take advantage of splitting random data into variable-sized chunks because storing the boundaries needs just as many bits.


No, it would mean for a 2mb random file, there's a 50% chance that at least one of its halves is compressible.


Do you also state that there's only 50% chance of having at least one tails when throwing two (edit:unbiased) coins?


The bigger the file size, the larger the chance that it has been generated in a pseudorandom way, meaning it is in theory compressible. On the flipside, in a larger file finding patterns from that pseudorandomity could take longer.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: