PCM-DVD has a specific way in how the channels are interleaved. For bit depths higher than 16, each sample is split so that the lowest byte is stored following the high bytes of all channels. There's also a 3-byte metadata which has to be prefixed to each packet. All of these steps could have been done via a bitstream filter but since the codec_id is different, this has to be done at the encoding stage.