"The beautiful part of using PNG file to store parameters is that all optimization techniques available to PNG is now available to us. The generated PNG file can be further reduced with pngcrush -brute. In fact, 10% data file size reduced with pngcrush."
I thought this was a cool hack for reducing the memory size of the model. Is this an established technique for storing parameters? Are there known better ways to compress the model?
Compression works by making assumptions about how the data should be. You can't compress random data for example. In the case of images, it usually takes advantage of the fact that pixels nearby each other are very similar. Obviously that isn't true for NN parameters.
Yes, it is pretty noisy. My assumption is that PNG is pretty good at encoding 2D patterns too, and there are some patterns: https://github.com/liuliu/klaus/blob/master/ccv/ccvResources... This is a big hack though. I am pretty sure there are better dictionary-based compression method at the last step, but pngcrush gives me some nice small gain already :)
It's literally indistinguishable from random static: http://silverspaceship.com/static/shot_2.png My guess is it may be making some small gains by learning "pixels are 20% more likely to be black than white" or something like that.
Dictionary wouldn't work either, unless you think the NN is more likely than random to produce sequences of bits that are exactly the same.
There may be regularities in deep NNs. For example, many learned features are basically exactly the same but rotated or translated in a different way. This is because NN's aren't inherently invariant to rotations or translations, so they have to learn multiple features for every possible variation. Another possible regularity is that the weights tend to cluster on specific sections of the input image and ignore the rest.
But this entirely depends how the weights are projected into an image. The image provided looks like random static. The standard method is to just create a separate image for every feature, and for every pixel becomes the weights of the NN from that input pixel. Like this: http://fastml.com/images/deep-learning-made-easy/layer2visua...
Standard image compression software should be able to work with that. But this won't work in higher layers, because there are only 3 to 4 color channels in an image.
In fact, since the first layer in that example is just gabor filters, you could manually code a program to create gabor filter units for that layer, and get massive compression. Don't know if that's true for all NN's though.
CNN layer's parameters are not interesting for compression, these are very small comparing to the massive full connect layers.
In retrospective, the full connect layer's regularity probably just reflects that some neurons are more dead than others.
You always probably want to have a dictionary-based compression method (LZ-like) for the last step to see how much more you can squeeze. Before that, quantization, residual quantization, some clever transformations probably can carry much longer way in terms of compressing the full connect layer parameters. I haven't explored literally any of these fancy methods yet.
Just finished running xzip / gzip on best setting on my quantized full connect layer output to see if pngcrush's gain is my illusion. For the last two full connect layers, both pngcrush and xzip and gzip produce the about same size files, which suggests very limited compression ratio. On the first full connect layer, pngcrush (8.2M) is marginally better than xzip (8.6M). It is a reasonable choice as last step if you don't want to have library reference to xzip ;)
Also you can still compare neurons to see if any of them are very similar. Another thing that might work would be to prune away many of the connections. Set all of the connections that are very small to zero, or encourage it to set weights to zero with something like weight decay. As a bonus this might even improve generalization.
Yeah that makes sense. He does say that he achieved 10% compression efficiency on the models.
"Obviously that isn't true for NN parameters." I have little experience with NN's but I wonder if that's true for deep NN's? Is there any gradual, image-like change in parameters in deep learning models?
I said above that my guess is it may be making some small gains by learning "pixels are 20% more likely to be black than white" or something like that. Which fits with the image he provided.
There may be regularities in deep NNs. For example, many learned features are basically exactly the same but rotated or translated in a different way. This is because NN's aren't inherently invariant to rotations or translations, so they have to learn multiple features for every possible variation. Another possible regularity is that the weights tend to cluster on specific sections of the input image and ignore the rest.
But this entirely depends how the weights are projected into an image. The image provided looks like random static. The standard method is to just create a separate image for every feature, and for every pixel becomes the weights of the NN from that input pixel. Like this: http://fastml.com/images/deep-learning-made-easy/layer2visua...
Standard image compression software should be able to work with that. But this won't work in higher layers, because there are only 3 to 4 color channels in an image.
When I was playing with convolutional neural networks a while ago, I found the following tutorial to be a good introduction to the topic:
http://deeplearning.net/tutorial/lenet.html
If I don't feel like uploading my data to someone else's server? If I don't want to pay for a cloud service when the phone I already bought can do the job?
I thought this was a cool hack for reducing the memory size of the model. Is this an established technique for storing parameters? Are there known better ways to compress the model?