Creating a modifiable gzipped disk image

jwilk · on Dec 4, 2022

Assuming your decompressor is as capable as gzip(1) is, you don't need to write your own compressor or rely on block alignment, because:

• pigz (https://github.com/madler/pigz) has -0 for no-compression compression;

• you can concat multiple gzip files together.

  $ echo foo | gzip > foobar.gz
  $ echo bar | pigz -0 >> foobar.gz
  $ echo baz | gzip >> foobar.gz
  $ hd foobar.gz  | grep -C1 -w bar
  00000010  a8 65 32 7e 04 00 00 00  1f 8b 08 00 70 bf 8c 63  |.e2~........p..c|
  00000020  00 03 01 04 00 fb ff 62  61 72 0a e9 b3 a2 04 04  |.......bar......|
  00000030  00 00 00 1f 8b 08 00 00  00 00 00 00 03 4b 4a ac  |.............KJ.|
  $ gzip -d < foobar.gz 
  foo
  bar
  baz

This also solves the CRC problem, because pigz computes the checksum for you.

rwmj · on Dec 4, 2022

Really interesting - I didn't know you could concatenate gzip files, and indeed this would have worked better.

Another thing I found out while researching this earlier is the pigz -i / --independent option. It lets you create a seekable gzip file, albeit one which cannot easily be indexed except by brute force searching over the whole file for the block boundaries (so not as useful as seekable xz like we use here: https://libguestfs.org/nbdkit-xz-filter.1.html)

mungoman2 · on Dec 4, 2022

Updating the checksum is possible by calculating the CRC of the modified bytes at the proper byte offset before and after modification. Then applying:

checksum ^= crc_of_removed_bytes

checksum ^= crc_of_added_bytes

rwmj · on Dec 4, 2022

Thanks! I thought there would be a way something like this.

To be honest the tool as posted is just a hack to demonstrate the point that it is possible. I have an idea for a somewhat more refined tool which would let you mark small stretches of data (not whole partitions) for modification, and provide additional tooling to do the mods and update the checksums.

As well as implementing the whole thing again for xz & zstd, which is another large chunk of work.

csdvrx · on Dec 4, 2022

This could be extremely helpful for initrd, say for live distributions: you could then add content to say /lib/modules without having to play the unmkinitramfs and multiple cpio dance like https://askubuntu.com/questions/1435396/load-drivers-to-init...

dezgeg · on Dec 4, 2022

Initramfs archives have a little-known secret: you can simply concatenate them even when they are compressed and the kernel will parse them all. So appending stuff doesn't need unmkinitramfs (which is indeed painful due to needing to keep permissions correct etc.)

rwmj · on Dec 4, 2022

Dusty has something similar to this in mind. Specifically he would like to support multiple clouds (hence a variety of drivers and guest agents) using a single main image.

stefanha · on Dec 4, 2022

It might be easier to do this with the qcow2 disk image format. Compression is enabled individually for each cluster (see bit 62 of the L2 table entry). That way an image can have mixed uncompressed and compressed clusters without requiring a CRC update after modification.

The data layout within the file can be queried with "qemu-img map file.qcow2" so you know the offsets where uncompressed data can be overwritten (if you want to patch it from the host instead of from inside a virtual machine).

rwmj · on Dec 4, 2022

Prefixed by saying this is indeed a good approach, I did just want to point out some potential issues:

- Clouds don't universally support qcow2 (or even worse, use their own implementation which isn't feature-complete)

- Probably a per-cluster compressed qcow2 isn't ever going to be as small as a qcow2.xz file.

[Edit: deleted a bogus third point here about making these files, no special tools are needed to create or modify compressed qcow2 files]

float4 · on Dec 4, 2022

There are file systems, like zfs, that support transparent block level compression. This means that you can just copy a normal disk image file to the zfs partition and zfs will handle everything for you. The image can be mounted in r/w mode the way you normally would, because as far as you're concerned the image file is a normal (uncompressed) image file. Why not use that? Sounds way easier and safer.

But oh well, the author works at Red Hat so I'm sure they have their reasons :p

rwmj · on Dec 4, 2022

zfs isn't really an option, sadly, because of its license. Even if it was we have to deal with what existing cloud providers support for disk image formats.

jjluoma · on Dec 4, 2022

I would try to use zstd for compression before exploring other options. It is much faster than gzip.

rwmj · on Dec 4, 2022

zstd still doesn't support seekable compressed disk images, the main advantage that xz has: https://github.com/facebook/zstd/issues/395#issuecomment-535...