Really interesting - I didn't know you could concatenate gzip files, and indeed this would have worked better.
Another thing I found out while researching this earlier is the pigz -i / --independent option. It lets you create a seekable gzip file, albeit one which cannot easily be indexed except by brute force searching over the whole file for the block boundaries (so not as useful as seekable xz like we use here: https://libguestfs.org/nbdkit-xz-filter.1.html)
Thanks! I thought there would be a way something like this.
To be honest the tool as posted is just a hack to demonstrate the point that it is possible. I have an idea for a somewhat more refined tool which would let you mark small stretches of data (not whole partitions) for modification, and provide additional tooling to do the mods and update the checksums.
As well as implementing the whole thing again for xz & zstd, which is another large chunk of work.
This could be extremely helpful for initrd, say for live distributions: you could then add content to say /lib/modules without having to play the unmkinitramfs and multiple cpio dance like https://askubuntu.com/questions/1435396/load-drivers-to-init...
Initramfs archives have a little-known secret: you can simply concatenate them even when they are compressed and the kernel will parse them all. So appending stuff doesn't need unmkinitramfs (which is indeed painful due to needing to keep permissions correct etc.)
Dusty has something similar to this in mind. Specifically he would like to support multiple clouds (hence a variety of drivers and guest agents) using a single main image.
It might be easier to do this with the qcow2 disk image format. Compression is enabled individually for each cluster (see bit 62 of the L2 table entry). That way an image can have mixed uncompressed and compressed clusters without requiring a CRC update after modification.
The data layout within the file can be queried with "qemu-img map file.qcow2" so you know the offsets where uncompressed data can be overwritten (if you want to patch it from the host instead of from inside a virtual machine).
There are file systems, like zfs, that support transparent block level compression. This means that you can just copy a normal disk image file to the zfs partition and zfs will handle everything for you. The image can be mounted in r/w mode the way you normally would, because as far as you're concerned the image file is a normal (uncompressed) image file. Why not use that? Sounds way easier and safer.
But oh well, the author works at Red Hat so I'm sure they have their reasons :p
zfs isn't really an option, sadly, because of its license. Even if it was we have to deal with what existing cloud providers support for disk image formats.
• pigz (https://github.com/madler/pigz) has -0 for no-compression compression;
• you can concat multiple gzip files together.
This also solves the CRC problem, because pigz computes the checksum for you.