Note that the encoders are slower largely because it's not hardware-accelerated yet.
We can reasonably assume that will change, as basically every major player in the CPU/GPU spaces[1] has signed on to back this. Even Apple is a member.
[1] the exceptions are mostly downstream ARM vendors - notably, Qualcomm
You've linked competing hardware encoding APIs which are not x264/x265.
x264/x265 are software encoding libraries for H264/H.265 video formats and don't really use hardware acceleration beyond SIMD instructions on CPUs. x264 is known for its great output quality while still being fast on x86 CPUs.
x264 and x265 refer to the specific set of software programs of the same name, right? Most chips nowadays can hardware accel H.264 and H.265, but they don't use x264/x265 (or do they? Not sure...). I think the confusion is that the terms x264 and H.264 are not interchangeable
Thanks, I removed the reference to WEBM in that case. MP4 and MKV are very common container formats, so being supported in WEBM, MP4, MKV doesn't seem like a differentiator.
>WebM only recently added AV1 support to its container format
I was unable to find any evidence of this anywhere I looked. For instance this webpage: https://www.webmproject.org/docs/container/ shows only VP8 and VP9 as acceptable video codecs.
Indeed you are right, my mistake! While ffmpeg allows writing AV1 streams to WebM containers, it's not officially supported by the spec. I'd be interesting to know whether Firefox and Chrome support playback of these WebM AV1 streams though.
>Approximately 50% higher compression than x264 [1] and 30% higher than H.265 (HEVC)
What is the relative improvement between codec generations (e.g. H.264 vs. H.265) compared to implementations of the same codec standard (e.g. early reference implementation of H.264 vs. current x264 with well-chosen parameters)?
I recently did some encoding, but settled on VP9 over AV1 because it has wider support and ffmpeg considers its encoder "experimental." VP9 is supported out-of-the-box on practically everything modern that isn't Apple. AV1 is on-track to replace it, but depending on the support you're looking for, that might be five years out.
I downloaded Blade Runner 2047 in AV1 (search for `av1` in qbittorent), tiny size and it looked fantastic! I can't wait until more releases are made with this codec. Youtube will save a ton of money and people on networks like Ting will be able to stream content and not pay ridiculous amounts of money.
The assembly is not safe but it is used with safe wrappers. Likewise, there are some custom data structures (specifically for tiling) that are internally implemented with unsafe code (just like libstd's data structures). Fortunately the assembly bits have well defined inputs and outputs so they are testable on their own. I would love for comparably fast SIMD to be written entirely in safe Rust, but that is not possible yet.
I don't know if its the case with rav1e but in many video codecs most of the assembly is just a flat unrolled circuit with zero or minimal control flow, minimal address calculation-- essentially just a static circuit that you might otherwise implement as a fixed block of logic in an asic. As a result they aren't too much of an unsafety vector.
This is also often true for cryptographic software, particularly fixed-parameter optimized code.
The performance gain usually comes from some mixture of good scheduling or register allocation that the compiler fails at and the use of instructions that the compiler won't (reliably) emit.
For these kinds of straight-line codes, I haven't found mechanically validating them to be significantly more complicated than verifying C code... and as C code these functions tend to be the lowest risk type.
So, without any direct experience, I'd expect there to be more risk in rav1e from the unsafe-rust than the asm.
That said, already the 'attack surface' of an encoder is pretty small. I'd personally expect rav1e's gain from rust's safety would be less security and more in avoiding wasting time on blind alleys caused by memory corruption in the codebase. I've seen more than one poor design decision made in a multimedia codec which was ultimately due to a bug that a better language might have prevented.
I've not checked the actual numbers but I think in terms of line count, that assembly may be the same function done multiple times for different (sub-)architectures for different generations of Arm and Intel chips that have different vector instructions.
Yes, there are currently 4 copies of the assembly - for AVX2, SSSE3, 64-bit NEON, and 32-bit NEON. There will likely be more as future architectures get added.
It depends on the amount of third-party crates, unsafe{} and assembly being used to confirm this claim. It's possible that out of the other alternative AV1 encoders (not in Rust), this claim may or may not be true, but can be more true if they said "The fastest and safest AV1 encoder in Rust".
Most rust projects have these sort of safety claims to attract companies, devs to promote it in meetups or tech conferences. Maybe they should audit this encoder to see if it lives up to these claims.
Rust can compile to many different targets. The assembly is for x86 platforms, but this project also supports arm64 and wasm targets. If security is of utmost concern, I would actually suggest running the WASM build in a sandboxed JIT like lucet.
What blows my mind is that there's a Rust encoder, but no Rust decoder. There are many more orders of magnitude of people who will use a decoder than an encoder. One would think that the safe decoder would have come first.
Decoders are notoriously harder to support than encoders due to the various internal formats and encoding/decoding options. Until it supports all the different formats (a lot of effort), it will only be able to play back a few videos.
By focusing on an encoder (at least initially) you can just support a subset of features that work well for you. Then focus on making a decoder that can at least play back videos from your encoder.
More people will find it useful to have an encoder that works all the time, rather than a decoder that only works on a subset of videos.
Despite the tagline, I don't think its ever actually been the fastest AV1 encoder, it's quite possible its always the safest, even with the asm in place and even more so when using only rust.
The logical structure of your statement is: Are five apples and two oranges really more food than two oranges? You have apples, but you still only have two oranges.
I just hope very much that Apple does not take too much time with this. After all, they joined the Alliance for Open Media in early 2018.
It would be nice to finally have a codec again that works with every (modern) platform after such a long time. The split of support for HEVC and VP9 always doubles the effort to distribute content effectively.
GP is comparing it to IE6 in terms of being behind other browsers in implementing features, not in terms of publishers only checking that their pages render correctly in that browser.
Chromium basically ships with a usable, “de-Googled” browser in tree doesn’t it? WebKit has what you might call a demo. Look up MiniBrowser to see what I mean. It’s a web view and three buttons, basically.
We'll find out when the next Apple TV is announced shortly. If AV1 isn't there people should worry. Apple was using HEVC in the iPhone 6, so they aren't scared to be early on this stuff.
Whether or not Apple TV has AV1 out of the box won't matter (although it would be nice).
If Netflix or YouTube want AV1 on Apple TV then they'll just implement it in their applications. Netflix has already implemented AV1 as an option in their Android application:
We really need focus on embedded GPU av1 as well. It's amazing how fast pure GPU transcoding is. H265 is supported well in modern GPU's and can transcode at dozens of times the speed. But av1 for some reason isn't getting the same treatment is seems.
GPU:s are generally terrible for video encoding. I think you're confusing the GPU with a dedicated "hardware encoder" ASIC (which may be located on the same die as other silicon components such as a CPU or a GPU).
An off-the-shelf GPU can encode / decode thousands of channels of video and audio in real-time.
There are some cloud providers that offer this as a service, allowing you to have the end-points of, e.g., video calls to use different audio and video codecs.
This is quite useful, e.g., when some people join the call only using audio via a cell phone in a different country using a different audio standard (or a land line, etc.). Or when somebody joins the video call from laptop tethering from a phone on a train. Or for switching video codecs depending on whether somebody is sharing their desktop or using a webcam to record their face.
The client can picks the codecs that are the best fit for the current situation (content, bandwidth, latency, etc.) and a could server transcodes the video from everyone else in the meeting to their clients format.
And that point is wrong, since there are some cloud providers using normal GPU cores for exactly this.
The codecs that NVEnc supports are just a few bunch, there is no real-time audio, no nothing.
All of this is implemented in CUDA, using normal CUDA implementations of audio and video codecs, and running on normal GPUs using normal GPU cores. In real time. Supporting thousands of audio and video channels concurrently.
Also, even for NVEnc and NVDec themselves, in some of the GPUs they do not use any specialized hardware and use normal GPU cores instead (e.g. see the older GM20x GPUs).
None of the ones I know are. They are all proprietary.
I'm not sure they are for sale either (probably for the right price), since they sell these "as a service", which pays better. I also don't think these are on sale for "small" customers.
“Compression is understanding”. As our understanding of the problem domain increases, so too will our ability to compress it. Video has three key things: the still images that approximate the input, the relationship of one image to the prior and next and a human bean who is sensitive to some kinds of distortion and not others, depending on context :) To have effective and good compression, you have to understand all three.
Note: while I work at Netflix, I do not work on anything related to video encoding, these are just my semi-informed understandings :)
I would also add computational complexity. We're increasingly willing to throw more transistors at video decoding and encoding due to their reduced cost and increased performance, so the algorithms can become more complicated, have larger perform larger searches, etc.
If you think about it from a power perspective, the power cost of a bit of data transmitted to a mobile device is pretty large. I would expect to see video compression formats increase in complexity until the power cost of compressing away one more bit approaches the cost of transmitting one more bit over the network.
Well with the compression vs transmission/decompression ratio of video for stuff in the Netflix catalog, I think the inflection point is different than you suggest (not sure about YouTube.)
"Understanding" is an interesting way of putting it. Traditionally "compression" is defined in terms of entropy and for lossy compression (eg. audio/video) entropy is defined in terms of human perception. I would generalize your statement to "relationship of the past and future of the video stream"
The main enabler of more advanced compression strategies is the higher performance available to the encoder. Decoders are essentially deterministic state machines. Encoders however have to search a large space of transformations to find ones that captures the entropy for a given situation.
In the development of AV1 they called these transformation "tools" and the research encoder experimented with lots of them and only the most profitable ones made it into the standard.
The compression can thus still evolve even with the AV1 standard frozen, just like how x264 and LAME have gotten so much better over their lifetime.
What do you see when the current generation codec is heavily bitrate constrained? First thing you'll notice are the ugly blocks with non-matching colors on the edges. Such primitive expression of lack of bitrate is showing quite clearly these codecs have very little understanding.
With a codec of the (probably far) future where both encoder and decoder have good "understanding", you wouldn't see any blocks, lower color depth, smudged details or so. If some detail is missing because of constrained bitrate, decoder will just fill it in intelligently.
It can go crazy far - when constrained with bit rate, the encoder can just send sort of a movie script - "Man stands on the beach and looks into the distance" and the decoder will "understand" it and render it in awesome detail including adding non-mentioned splashing waves, man's thinking expression and perhaps sunset (decoder effectively becomes little automated film director who takes missing information in movie script as artistic license). It will most probably look very different from the original movie, but it will be believable - if you didn't see the original then you won't really have a reason to suspect it's fake.
With more bitrate available you can add more detail to the movie script - the man is black and old, he wears this and that which the decoder would take into account when reconstructing the video. At some point you're always constrained on the bitrate, but "understanding decoder" can always go deeper and fill in more realistic details when needed.
Exactly. What is the information that you need to preserve. When you think of this, the notion of lossy compression becomes much more understandable from a human+computer perspective. Another way to view the problem of compression is a quest to lower to kolmogrov complexity of the program that needs to run to generate the output that “represents” the original work “faithfully”.
It is true that for lossless compression there are constraints on the size of the input and the size of the compressed artifact+decoder, but I believe for lossy compression, the fourth element is the perceptive capabilities (or preferences) of human minds.
The amount of unused exploitable redundancy in typical video content is still massive, we just need better ways of of finding it.
A lot of the constant improvements is just that the amount of compute that is reasonable to use keeps growing, and so new codecs that are balanced around ever slower algorithms keep getting developed. I'm not saying that there is no new inventions -- plenty of useful tricks keep being found, but most of the difference of AV1 and what came before is just that the developers thought it was okay to spend more on search.
Some algorithms required too much compute power relative to what was available in mass market devices. As more transistors are available, more exhaustive searches in the content data for redundancy can be done.
With AV1 encoders I have to ask myself every time whether I should read the 1 as 'i' or as 'l'. Dav1d was still easy to guess, but here I really have a hard time.
Apart from that, I am really glad to see how Rust is slowly making an appearance in such fundamental multimedia libraries.
Great link (and I love Vimeo). The blog goes hyperbolic though: "In other words, if you happen to be in a part of the world where you might not have the best internet connection, AV1 will still provide you with an impeccable viewing experience."
No, AV1 is evolutionary. You'll "just" better quality for the same amount of bits (just like going from MPEG2 -> H.264 -> H.265 etc). The license of AV1 however is revolutionary.
I would perhaps wait until rav1e 0.4, because there is a bug in the FFmpeg wrapper (of which I am the author) right now that breaks timestamps. If you wish to build with rav1e master, you can apply this to fix it: https://github.com/dwbuiten/FFmpeg/commit/ac4cf898c1960d72bc... (removing the configure check)
It hasn't made it upstream yet since I can't bump the minimum required version in FFmpeg git until rav1e tags 0.4.
https://www.researchgate.net/publication/340351958_MSU_Video... according to that (5. ENCODING SPEED section) SVT-AV1 is a bit faster, but since they are both VERY slow it doesn't really matter. That was released in March so recent updates might have changed things.
Just because the source has unsafe blocks doesn't mean it is unsafe either. At least the amount of work to check these blocks for memory issues is a lot less then checking any other implementation.
SVT-AV1 is written in C (and lots of assembly). libaom-av1 is written in C (and lots of assembly). libgav1 (which is decoder) is written in C++ (and lots of assembly). So yes, lots of C/C++ in open source AV1 encoders/decoders.
* Open & Royalty Free (no licensing required to use)
* Backed by Mozilla, Google, Microsoft, Cisco, etc. [1]
* Approximately 50% higher compression than x264 [1] and 30% higher than H.265 (HEVC) [2]
* Supported in current versions of Chrome, Firefox, Opera, and Edge [3]
* Slower encoders than HEVC, so not typically used for live streaming [2]
[1]: https://en.wikipedia.org/wiki/AV1
[2]: https://www.theoplayer.com/blog/av1-hevc-comparative-look-vi...
[3]: https://caniuse.com/#feat=av1