DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression

juancampa · on Nov 25, 2019

Here's an anecdote. My friend's apartment got burglarized many years ago and when we looked at the security footage we clearly saw the thieves taking everything, their faces were impossible to recognize though, due to low resolution. Ever since that happened I kept thinking of a video codec that would store the whole video in low-res but recognize faces and encode those parts in ultra high-res. I hope research like this can lead to better security.

OTOH, government surveillance...

daenz · on Nov 25, 2019

OpenCV has a module called "super resolution"[0] whose goal is to reconstruct high-quality images from multiple low quality images.

0. https://www.youtube.com/watch?v=E6mePD21sIU

marsokod · on Nov 25, 2019

Super resolution can only do so much. Basically, a x2 or x4 improvement at best, especially on features more complex than text. Also, as soon as you compress your video stream you lose a lot of information that would actually be used by these algorithms.

If a high res stream is available, it is much better to use it. A basic face detection algorithm and snapshots of them saved regularly would go a long way and are really simple to implement.

taneq · on Nov 25, 2019

Don't they do something like that to boost the resolution of space telescopes? Also I seem to remember reading something about processing a stream of images from an earthbound telescope to cancel out atmospheric distortion.

AlanYx · on Nov 25, 2019

Space telescopes don't do super-resolution of the type used for video, but they do something a little similar. They use a technique called aperture synthesis which combines signals from a collection of instruments so that they have the same angular resolution as a much larger virtual instrument.

CamperBob2 · on Nov 25, 2019

That's terrible. I want my 1:35 back

jonplackett · on Nov 25, 2019

Try this instead. Best example of enhancing on the internet. https://www.dailymotion.com/video/x2qlmuy

nrp · on Nov 25, 2019

Many video conferencing/calling applications already do this, allocating more bits to regions recognized as faces when encoding. You can test it out in FaceTime for example by showing your face vs obscuring enough of it that face detection fails.

hwbehrens · on Nov 25, 2019

The dynamic gaze example really convinced me that eye tracking will be necessary for immersive VR. If you can achieve a 1+ order of magnitude improvement in rendering performance with no noticeable loss in quality... it would be very difficult to leave that on the table.

pygy_ · on Nov 25, 2019

You can also use a lower frame rate for the foveal input.

What peripheral vision losses in spatial resolution, it wins back in time.

_carl_jung · on Nov 25, 2019

Not necessarily. The "lower framerate" in our fovea is not represented as a stuttering sequence of frames, but a blurry and smooth flow. Simply using a lower framerate would still be noticeable.

Unless you could engineer a display technology that could do this.

pornel · on Nov 25, 2019

Dropping of certain pixels is a very peculiar way of reducing input quality. Why was that method chosen?

For 3D rendering I guess that's a kind of DLSS, but the paper focuses on video compression.

For video streams that doesn't seem to make sense. Video codecs are not pixel-based, but block/frequency based, so you can't save any bandwidth by dropping pixels. Raw pixels don't compress well, especially less correlated samples like that, so I wouldn't be surprised if sending just the reduced input for this algorithm was more costly than sending a full video stream. And existing video codecs can already very effectively vary quality within the frame by varying block sizes and quantization.

Lorkki · on Nov 25, 2019

"Compression" is probably just poorly chosen wording. This has more to do with reducing the number of required samples in applications like eye-tracking VR, where you can choose to render a dense image for the part that the user is looking at, while reducing detail in peripheral vision. Current implementations use some fractional resolution(s) for the periphery and blend pixels using more traditional methods, which results in blurryness and/or aliasing artifacts.

hemogloben · on Nov 25, 2019

Those are some pretty incredible results. For any single frame I found it hard to find a significant quality loss between the DeepFovea frame and the reference (obviously while looking at the Foveal target and trying to compare peripheral quality), but in motion there was a lot of interframe noise / aliasing / jitter.

While I'm sure they'll improve on those issues I'm currently wondering what kind of visual peripheral trade offs I'd make; if I had a demo in front of me I'd bet that I'd prefer running at higher foveal settings / fidelity with peripheral artifacts to running at lower overall settings / fidelity to avoid them.

s_gourichon · on Nov 25, 2019

Had this idea at least 10 years ago. Have many ideas, that said...

Fovea-oriented compression can be useful for optimized bandwidth usage in video conferencing, too.

One could even implement auto-reframe of video feed when several participants are in the same room without need for a mechanical camera moving. Or something like liquid rescale to still get a glimpse of the rest of the full frame.

Perhaps those ideas were since patented and even developed?

hoseja · on Nov 25, 2019

Can the eye tracking see saccades? Can it keep up if you rapidly refocus?

czr · on Nov 25, 2019

haven't read paper yet, but this is a silly demo. "turn 10% of pixels black" is not a good baseline, should use nearest-neighbor interp (or something) to fill holes in the "sparse" video for fair comparison. also, you can clearly see in hd video that it's temporally unstable ("shimmering"), which is the same problem nvidia has with dlss since forever; need to build temporal smoothing in or users will hate it.

bitL · on Nov 25, 2019

Can this be used to replace Photoshop's Content-aware fill as well? Or does it require some sparse sampling of the whole area that needs to be reconstructed?

naveen99 · on Nov 25, 2019

i was thinking of doing something similar for image segmentation.