I had no idea AV1 or any other codec provided for this. Pretty interesting!
What would undermine it is imperfect removal of real noise or grain from the source material: Then you'd have some remnants of the original grain, plus the synthesized grain... an incorrect result. It seems that this scheme requires perfectly clean input images.
I hope this idea is taken further, and it becomes the norm among codecs or wrappers to provide room for an author-defined post-processing shader to be applied after decompression.
It really doesn't. The whole reason this technique works is that our brain generally processes effects like noise and grain in terms of macro properties, and even if none of the pixels are anywhere like the right value, it will still look "right" as long as the noise distribution is closish.
It doesn't require perfect removal of the original film grain. A mishmash composed of remnants of the original grain combined with new synthetic grain can still look correct to our brains.
According to this paper, they're going a lot of effort to reproduce the original grain very accurately. If they combine that with some half-assed residual grain from the original, why go to all the effort to faithfully reproduce it? Based on your assertion, it wasn't really necessary to begin with.
Various audio codecs have the same thing. They will synthesize background noise. The amount of background noise and the rough frequency spectrum are encoded in the stream.
If you removed the noise, it would sound a bit weird. For telephone calls in particular, the background noise lets you know that the phone call is connected. If you filter it out, people think that the call was dropped.
I can imagine getting a large amount of flaming for this idea, but wouldn't the logical extreme just be to basically define a container for a JS canvas with embeddable resources? In a more restricted form of course, to be sure that it will always render the same, and all compliant computers will have enough power to render it.
You could do things like 480p training videos with ultra-crisp text, and snow/fire/etc effects.
You'd probably have to prevent it from having network access or else you'd just be asking for the videos to rot over time, but you could give them an embedding API or even access to a few small bits of data, so you could have a YouTube video that changes with seasons and location.
If you were strict about the no network access thing you could give access to all sensors, since the data would never leave the device.
YouTube's already got some limited interactivity features, DVDs had them, why not all videos? It would mostly be a novelty, but you could also do really cool stuff like use Bluetooth beacons to skip around to certain parts for an interactive tour, although you might have to enforce downloading the whole thing before playing if you wanted 100% no privacy leaks.
That's an interesting idea. I reckon the complexity added with rendering vector graphics (text in particular, font rendering is notoriously difficult) outweighs the bandwidth savings, but still it seems like an area ripe for exploration. Canvas-like APIs might be too complicated to encode/decode efficiently, yet I suspect something closer to a Web/OpenGL fragment shader would be much more manageable (yet likely much worse for interactivity unfortunately). While that wouldn't quite mesh with the idea of vector-like text you proposed (without a heavy library or two thrown in), I suspect the engineering put into the existing graphics pipeline would make it a more feasible approach to augmented video. Looking at the stuff on Shadertoy.com and Inigo Quilez's work shows the capabilities of fragment shader based graphics, yet I suspect all of the magic would be in the details of the file format and encoding strategy. If anyone pokes around or explores a video/shader hybrid format let me know and post it on HN, I bet a bunch of people would be interested.
> ... but wouldn't the logical extreme just be to basically define a container for a JS canvas with embeddable resources?
Here's my attempt at creating a grainy video effect in the browser using a media stream (from the device's camera - the page should ask for permission first). Very much an MVP; I'm already working on making the dithering algorithm more efficient. https://codepen.io/kaliedarik/pen/OJOaOZz
> YouTube's already got some limited interactivity features, DVDs had them, why not all videos?
I don't know about videos, but again this is doable in the browser using the canvas. For instance, this example: https://scrawl-v8.rikweb.org.uk/demo/canvas-027.html - the swans, and both geese, are clickable (to their respective Wikipedia pages). The navigation around each goose is a clickable box, while the navigation on the swans uses a chroma-key effect to limit the clickable area.
The biggest problem with that idea is that it wouldn’t be possible to have hw decoders as efficient as the current ones.
But there is definitely a trend towards more customizable decoders, with a larger set of primitive operations (DCT, motion estimation/compensation, convolution filters) and parameters. The codec space is slowly moving towards a model where the decoders are VM highly optimized for specific tasks.
> It seems that this scheme requires perfectly clean input images.
It doesn't, the base encode comes out superclean/stable (from my limited tests). The problem is glacial encoding speed which makes this thing not so interesting for home use. Anyway the final result with synthetic grain is wonderful for the bitrate used (I was testing 2 - 4 MBps for HD).
the synthesized noise only covers what is removed by the denoiser, so it all adds up to what the input is (assuming the left-in noise makes it through the encoder)
From the flow graph, the film grain estimation is done on the difference between the de-noised video, and the original. So it only picks up the difference that the denoiser creates.
> and estimating the film grain parameters from flat regions of the difference between the noisy and de-noised versions of the video sequence.
I'm not sure I like the idea of a film's photography potentially looking different based on the decoder used. I would much rather dedicate additional bits to preserving the noise that was present in the original film stock or added during post-production.
That goal might already have been somewhat lost years ago. The video almost everyone after the editors watch has been re-encoded at least once (more likely twice) with one or more different codecs that definitely change the character of the video from that of the source. Any same source content will already look noticeably different depending one which service or provider you’re using, since they each have their own opinionated video content encoding and distribution pipelines. If you’re talking about archival storage in the sense of preserving masters, then I definitely agree and the film grain removal should be disabled in the encoder, so the decoder-side synthesis won’t happen. Thankfully, that’s very easy to do (for example, libaom-av1 encoder in ffmpeg supports denoise-noise-level parameter set to zero to disable those scary parts.)
No one is archiving with av1 and no one ever will. It's all film out, DPX sequences, and prores 4444 for the vast majority of things I've encountered as a post professional.
Funnily, I also don't like these techniques but that's because I'd rather eliminate the noise myself and have a noiseless image when possible. Grain, to me, is simply annoying. Adding it and preserving it has little value.
In subjective tests, normal people often report grain (real or reconstructed) as a defect so you're probably in the majority of viewers. Creators and codec nerds value them more than the audience.
So these grain removal techniques are in many ways a political compromise between people who just want the smaller video, and the people who want the grain.
I'd be fascinated to see how codecs with and without this feature diverge in terms of encoding changes. Presumably there were some big gains just sitting there but the minority that preferred grain stopped then being exploited.
Much of the drama around encoder quality has came down to this dichotomy. Choosing to lose the grain/detail was attacked as plastic or fake by the people who care about it, while people who didn't could point to better test scores. Which then caused more arguments about the validity of the measures.
I am a film person (director, camera guy, work in post production). Grain is an aesthetic choice and has an effect on the perception of an image.
Sure, for the typical Hollywood blockbuster it has not much value and can break the immersion, but for everything else grain can make you look on a picture, rather than into it. This could be the difference between looking through a window and looking at a painting for example. Additionally grain can emphasize the passing of time. A shot of a still life without grain is something different from the very same shot with grain.
Today it is just another tool in the aesthetical toolbox. And someone somewhere decides when to have it and when not to have it.
Now-a-days, I agree, grain is more of an aesthetic choice and for modern films. And, in fact, I generally don't enable de-noise when it's fairly clear that's what's happening.
However, for a lot of film shot pre-2010, grain is an artifact that's not there by choice or artistically. Other than perhaps the film 300.
It just so happens that most of my personal media fits in that box of being pre-2010 which is why I generally denoise.
Something like 30 rock, or The Office, for example, don't have film grain because of some artistic choice.
Interestingly, though, "That 70s' show" does in a few cases even though it's somewhat noisy be default. That's a more tough call.
Even before 2010, directors and their cinematographers put a lot of thought into what film stock to use in order to build the aesthetic they want for their film. Grain characteristics are one of the most prominent distinctions among different stocks.
Again, depends on what's being shot. Like I said, 300 is a good pre-2010 example of a film stock/grain specifically chosen for it's artistic value.
However, for a bunch of film and especially tv shows, the choice had FAR less to do with aesthetics and was more related to cost. Big budget films certainly could pick any sort of equipment/stock they wanted. Lower budget productions didn't necessarily have that luxury.
I'm not saying it didn't happen, rather that the choice in stock was more often than not "cheap while being as clear as we can afford".
Again, you see this with modern film where grain is almost never added (except for specific scenes trying to give more of a dated effect). That, to me, says cinematographers aren't generally trying to pick their stock to add grain. Some do, but that's the exception and not the rule.
Exactly this, there have long been plugins for edit suites that let you specify the grain and color process of specific film stocks to get the look you want.
People today miss the point that things like grain can be an intentional directorial choice and not an artifact to be removed.
in that case, this is still probably a good feature for you since you can make an encoder that just doesn't re-synthesize noise and you'll get a de-noised picture.
Not really, because these encoders are (generally speaking) making pretty large compromises with their noise filters in order to be timely. I want to spend the extra time doing motion compensated denoising.
These noise filter will sometimes be temporal, rarely will be motion compensating (because that's computationally expensive) and as a result can't get as good a result as I can.
Storage is, but additional bandwidth is seldom spent on quality. It's more often spent on cramming in more bullshit, AKA another home-shopping channel.
What would undermine it is imperfect removal of real noise or grain from the source material: Then you'd have some remnants of the original grain, plus the synthesized grain... an incorrect result. It seems that this scheme requires perfectly clean input images.
I hope this idea is taken further, and it becomes the norm among codecs or wrappers to provide room for an author-defined post-processing shader to be applied after decompression.