Hacker News new | past | comments | ask | show | jobs | submit login
Rav1e, an AV1 encoder written in Rust and assembly (github.com/xiph)
203 points by mindfreeze on June 13, 2020 | hide | past | favorite | 123 comments



For others like me who are curious and haven't heard of AV1, some main features:

* Open & Royalty Free (no licensing required to use)

* Backed by Mozilla, Google, Microsoft, Cisco, etc. [1]

* Approximately 50% higher compression than x264 [1] and 30% higher than H.265 (HEVC) [2]

* Supported in current versions of Chrome, Firefox, Opera, and Edge [3]

* Slower encoders than HEVC, so not typically used for live streaming [2]

[1]: https://en.wikipedia.org/wiki/AV1

[2]: https://www.theoplayer.com/blog/av1-hevc-comparative-look-vi...

[3]: https://caniuse.com/#feat=av1


Note that the encoders are slower largely because it's not hardware-accelerated yet.

We can reasonably assume that will change, as basically every major player in the CPU/GPU spaces[1] has signed on to back this. Even Apple is a member.

[1] the exceptions are mostly downstream ARM vendors - notably, Qualcomm


> Note that the encoders are slower largely because it's not hardware-accelerated yet.

This is a very misleading statement.

AV1 encoders are slow because they're slow, not because they're not hardware accelerated.

x264 and x265 are not hardware accelerated, either.


> x264 and x265 are not hardware accelerated, either.

This is incredibly false.

https://en.wikipedia.org/wiki/Intel_Quick_Sync_Video

https://en.wikipedia.org/wiki/Nvidia_NVENC

https://en.wikipedia.org/wiki/Video_Coding_Engine (this has been widely available for 8 years)


You've linked competing hardware encoding APIs which are not x264/x265.

x264/x265 are software encoding libraries for H264/H.265 video formats and don't really use hardware acceleration beyond SIMD instructions on CPUs. x264 is known for its great output quality while still being fast on x86 CPUs.


I think kasabali was talking about x264 (https://www.videolan.org/developers/x264.html) and x265 (http://x265.org/) specifically rather than the H.264 and H.265 video formats in general.


x264 and x265 refer to the specific set of software programs of the same name, right? Most chips nowadays can hardware accel H.264 and H.265, but they don't use x264/x265 (or do they? Not sure...). I think the confusion is that the terms x264 and H.264 are not interchangeable


> x264 and x265 are not hardware accelerated, either.

Huh, TIL.

Though I would also sort of want to know where x264 and x265 were a year in, tbf.


Still no support in any mobile browser. 35% isn't a lot of coverage. And of course Safari is a holdout -- they haven't even adopted VP9 yet.


> And of course Safari is a holdout -- they haven't even adopted VP9 yet.

Given Apple is an AOM governing member it's pretty likely Apple will eventually support AV1 (and AVIF).

VP9 is simply not on their roadmap at all, it's not a "yet" thing.


and they never will - apple is not interested in vpX codecs. AV1 is different story.


As far as I know, WebM only recently added AV1 support to its container format. Before that, it was already supported in MP4 and Matroska.


Thanks, I removed the reference to WEBM in that case. MP4 and MKV are very common container formats, so being supported in WEBM, MP4, MKV doesn't seem like a differentiator.


With how versatile the MKV container is, it seems odd that it isn't the de facto container out there.


Not to mention WebM is (was?) literally a subset of Matroska.


Afaik WebM is just a royalty-free subset of MKV, maybe with some exotic features omitted.


>WebM only recently added AV1 support to its container format

I was unable to find any evidence of this anywhere I looked. For instance this webpage: https://www.webmproject.org/docs/container/ shows only VP8 and VP9 as acceptable video codecs.


Indeed you are right, my mistake! While ffmpeg allows writing AV1 streams to WebM containers, it's not officially supported by the spec. I'd be interesting to know whether Firefox and Chrome support playback of these WebM AV1 streams though.


>ffmpeg allows writing AV1 streams to WebM containers

Any idea why they flipped the switch on that? Seems irresponsible when .mkv will do in all circumstances.


AFAIK even HEVC isn't typically used for live streaming, at least not on Web (YouTube, Twitch, etc.).

Is there any examples of major services currently using HEVC?


Apple Facetime. Netflix 4K on select TVs.


Most commercial 4K content, e.g. pretty much all 4K television broadcasts. YouTube/Twitch don't like it due to the royalties.


>Approximately 50% higher compression than x264 [1] and 30% higher than H.265 (HEVC)

What is the relative improvement between codec generations (e.g. H.264 vs. H.265) compared to implementations of the same codec standard (e.g. early reference implementation of H.264 vs. current x264 with well-chosen parameters)?


I recently did some encoding, but settled on VP9 over AV1 because it has wider support and ffmpeg considers its encoder "experimental." VP9 is supported out-of-the-box on practically everything modern that isn't Apple. AV1 is on-track to replace it, but depending on the support you're looking for, that might be five years out.


I downloaded Blade Runner 2047 in AV1 (search for `av1` in qbittorent), tiny size and it looked fantastic! I can't wait until more releases are made with this codec. Youtube will save a ton of money and people on networks like Ting will be able to stream content and not pay ridiculous amounts of money.


You obviously meant 2049 (unless there's a much more compressible pre-montage version of the movie 2 commits before the final version, that is :) )


"The fastest and safest AV1 encoder" but it's also 62.1% assembly? Is the assembly as safe as rust? I'm confused


The assembly is not safe but it is used with safe wrappers. Likewise, there are some custom data structures (specifically for tiling) that are internally implemented with unsafe code (just like libstd's data structures). Fortunately the assembly bits have well defined inputs and outputs so they are testable on their own. I would love for comparably fast SIMD to be written entirely in safe Rust, but that is not possible yet.


I don't know if its the case with rav1e but in many video codecs most of the assembly is just a flat unrolled circuit with zero or minimal control flow, minimal address calculation-- essentially just a static circuit that you might otherwise implement as a fixed block of logic in an asic. As a result they aren't too much of an unsafety vector.

This is also often true for cryptographic software, particularly fixed-parameter optimized code.

The performance gain usually comes from some mixture of good scheduling or register allocation that the compiler fails at and the use of instructions that the compiler won't (reliably) emit.

For these kinds of straight-line codes, I haven't found mechanically validating them to be significantly more complicated than verifying C code... and as C code these functions tend to be the lowest risk type.

So, without any direct experience, I'd expect there to be more risk in rav1e from the unsafe-rust than the asm.

That said, already the 'attack surface' of an encoder is pretty small. I'd personally expect rav1e's gain from rust's safety would be less security and more in avoiding wasting time on blind alleys caused by memory corruption in the codebase. I've seen more than one poor design decision made in a multimedia codec which was ultimately due to a bug that a better language might have prevented.


I've not checked the actual numbers but I think in terms of line count, that assembly may be the same function done multiple times for different (sub-)architectures for different generations of Arm and Intel chips that have different vector instructions.


Yes, there are currently 4 copies of the assembly - for AVX2, SSSE3, 64-bit NEON, and 32-bit NEON. There will likely be more as future architectures get added.


It depends on the amount of third-party crates, unsafe{} and assembly being used to confirm this claim. It's possible that out of the other alternative AV1 encoders (not in Rust), this claim may or may not be true, but can be more true if they said "The fastest and safest AV1 encoder in Rust".

Most rust projects have these sort of safety claims to attract companies, devs to promote it in meetups or tech conferences. Maybe they should audit this encoder to see if it lives up to these claims.


Rust can compile to many different targets. The assembly is for x86 platforms, but this project also supports arm64 and wasm targets. If security is of utmost concern, I would actually suggest running the WASM build in a sandboxed JIT like lucet.


What blows my mind is that there's a Rust encoder, but no Rust decoder. There are many more orders of magnitude of people who will use a decoder than an encoder. One would think that the safe decoder would have come first.


Decoders are notoriously harder to support than encoders due to the various internal formats and encoding/decoding options. Until it supports all the different formats (a lot of effort), it will only be able to play back a few videos.

By focusing on an encoder (at least initially) you can just support a subset of features that work well for you. Then focus on making a decoder that can at least play back videos from your encoder.

More people will find it useful to have an encoder that works all the time, rather than a decoder that only works on a subset of videos.


Most people will use the decoder in their browser, while the encoding environment will be much more diverse.


I've added assembly to the title for now. If someone can suggest a better (more accurate and neutral) title, we can change it again.


The assembly appears to be optional.


Is it still the safest when it's the fastest, or do you have to pick?


Despite the tagline, I don't think its ever actually been the fastest AV1 encoder, it's quite possible its always the safest, even with the asm in place and even more so when using only rust.


Is Rust really safer? It's memory safe, but you can still have all the other fuck ups and vulnerabilities that other apps have.


How many times are we going to have this discussion?

> ~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues

https://msrc-blog.microsoft.com/2019/07/16/a-proactive-appro...

There are similar reports from other organizations.


The logical structure of your statement is: Are five apples and two oranges really more food than two oranges? You have apples, but you still only have two oranges.


> safer

For any reasonable definition of "safer", yes. 100% safe? no.



I see that Safari does not support this (yet).

Might it be possible to compile this to wasm and decode it that way on the client?


Yes, and in fact you can see a demo of dav1d compiled to wasm here:

http://jabberfr.org/tmp/ogv.js/

Note this works fine in desktop and mobile Safari.


I just hope very much that Apple does not take too much time with this. After all, they joined the Alliance for Open Media in early 2018.

It would be nice to finally have a codec again that works with every (modern) platform after such a long time. The split of support for HEVC and VP9 always doubles the effort to distribute content effectively.


Is Safari becoming the new Internet Explorer?



GP is comparing it to IE6 in terms of being behind other browsers in implementing features, not in terms of publishers only checking that their pages render correctly in that browser.


IE was both the browser that were first with features and had lots of features the others didn't have, and the browser that lagged with new features.


It has been for a long time... It's opensource even, but they seem to ignore patches


(To be pedantic: Safari is closed source, the WebKit that ships with Safari is mostly open source.)


How different is this than Google Chrome being closed vs Chromium being open? Does Safari have more substantial closed-source components?


Chromium basically ships with a usable, “de-Googled” browser in tree doesn’t it? WebKit has what you might call a demo. Look up MiniBrowser to see what I mean. It’s a web view and three buttons, basically.


What about the Webkit Nightly builds though? Those basically seem to be Safari.


...oh, huh, did something change in the past few years? I just checked, and the nightly builds do not download a full, usable app.

I could have sworn they used to though. You’d get an app called “Webkit”, but it functioned just like Safari for all intents and purposes.


Yes, Safari has more substantial closed source components. WebKit is comparable to Blink, not Chromium.


Primarily because of mobile. You can download a nightly build of Safari but you can’t install one on your phone.


Apple is a member of the development group, but they won't adopt it unless there's satisfactory hardware support.


That’s.. kinda on them isn’t it?


We'll find out when the next Apple TV is announced shortly. If AV1 isn't there people should worry. Apple was using HEVC in the iPhone 6, so they aren't scared to be early on this stuff.


Whether or not Apple TV has AV1 out of the box won't matter (although it would be nice).

If Netflix or YouTube want AV1 on Apple TV then they'll just implement it in their applications. Netflix has already implemented AV1 as an option in their Android application:

https://netflixtechblog.com/netflix-now-streaming-av1-on-and...


As far as I've seen, Mediatek has the only shipping smartphone SOC that supports AV1 hardware decoding.

It's still a bit early to get upset about other vendors, when Google's own Pixel line doesn't have hardware decode yet.


> I see that Safari does not support this (yet).

It also doesn't support VP9, so don't hold your breath.

> Might it be possible to compile this to wasm and decode it that way on the client?

Maybe, but I doubt the performance would be tolerable.


We really need focus on embedded GPU av1 as well. It's amazing how fast pure GPU transcoding is. H265 is supported well in modern GPU's and can transcode at dozens of times the speed. But av1 for some reason isn't getting the same treatment is seems.


GPU:s are generally terrible for video encoding. I think you're confusing the GPU with a dedicated "hardware encoder" ASIC (which may be located on the same die as other silicon components such as a CPU or a GPU).


An off-the-shelf GPU can encode / decode thousands of channels of video and audio in real-time.

There are some cloud providers that offer this as a service, allowing you to have the end-points of, e.g., video calls to use different audio and video codecs.

This is quite useful, e.g., when some people join the call only using audio via a cell phone in a different country using a different audio standard (or a land line, etc.). Or when somebody joins the video call from laptop tethering from a phone on a train. Or for switching video codecs depending on whether somebody is sharing their desktop or using a webcam to record their face.

The client can picks the codecs that are the best fit for the current situation (content, bandwidth, latency, etc.) and a could server transcodes the video from everyone else in the meeting to their clients format.


The point the parent poster was trying to make was that the GPU cores themselves are not a great match for video codec work.

However most consumer/server GPUs include hardware IP blocks specifically for doing codec work.


And that point is wrong, since there are some cloud providers using normal GPU cores for exactly this.

The codecs that NVEnc supports are just a few bunch, there is no real-time audio, no nothing.

All of this is implemented in CUDA, using normal CUDA implementations of audio and video codecs, and running on normal GPUs using normal GPU cores. In real time. Supporting thousands of audio and video channels concurrently.

Also, even for NVEnc and NVDec themselves, in some of the GPUs they do not use any specialized hardware and use normal GPU cores instead (e.g. see the older GM20x GPUs).


Are any of these transcoding libraries open source?


None of the ones I know are. They are all proprietary.

I'm not sure they are for sale either (probably for the right price), since they sell these "as a service", which pays better. I also don't think these are on sale for "small" customers.


Can you refer any encoder that only uses CUDA?


From these products, most of them, since NVEnc doesn't really support many formats (AV1, VP9, Speex, ...).



How is there constantly so much compression ratio improvement in video codecs? New algorithms imrpove so much upon the old.


“Compression is understanding”. As our understanding of the problem domain increases, so too will our ability to compress it. Video has three key things: the still images that approximate the input, the relationship of one image to the prior and next and a human bean who is sensitive to some kinds of distortion and not others, depending on context :) To have effective and good compression, you have to understand all three.

Note: while I work at Netflix, I do not work on anything related to video encoding, these are just my semi-informed understandings :)


I would also add computational complexity. We're increasingly willing to throw more transistors at video decoding and encoding due to their reduced cost and increased performance, so the algorithms can become more complicated, have larger perform larger searches, etc. If you think about it from a power perspective, the power cost of a bit of data transmitted to a mobile device is pretty large. I would expect to see video compression formats increase in complexity until the power cost of compressing away one more bit approaches the cost of transmitting one more bit over the network.


Well with the compression vs transmission/decompression ratio of video for stuff in the Netflix catalog, I think the inflection point is different than you suggest (not sure about YouTube.)


You're right. I was thinking about the decoding side (where total power for displaying the video should be minimized) but was unclear above.


"Understanding" is an interesting way of putting it. Traditionally "compression" is defined in terms of entropy and for lossy compression (eg. audio/video) entropy is defined in terms of human perception. I would generalize your statement to "relationship of the past and future of the video stream"

The main enabler of more advanced compression strategies is the higher performance available to the encoder. Decoders are essentially deterministic state machines. Encoders however have to search a large space of transformations to find ones that captures the entropy for a given situation.

In the development of AV1 they called these transformation "tools" and the research encoder experimented with lots of them and only the most profitable ones made it into the standard.

The compression can thus still evolve even with the AV1 standard frozen, just like how x264 and LAME have gotten so much better over their lifetime.


> Compression is understanding

That's quite thought provoking.

What do you see when the current generation codec is heavily bitrate constrained? First thing you'll notice are the ugly blocks with non-matching colors on the edges. Such primitive expression of lack of bitrate is showing quite clearly these codecs have very little understanding.

With a codec of the (probably far) future where both encoder and decoder have good "understanding", you wouldn't see any blocks, lower color depth, smudged details or so. If some detail is missing because of constrained bitrate, decoder will just fill it in intelligently.

It can go crazy far - when constrained with bit rate, the encoder can just send sort of a movie script - "Man stands on the beach and looks into the distance" and the decoder will "understand" it and render it in awesome detail including adding non-mentioned splashing waves, man's thinking expression and perhaps sunset (decoder effectively becomes little automated film director who takes missing information in movie script as artistic license). It will most probably look very different from the original movie, but it will be believable - if you didn't see the original then you won't really have a reason to suspect it's fake.

With more bitrate available you can add more detail to the movie script - the man is black and old, he wears this and that which the decoder would take into account when reconstructing the video. At some point you're always constrained on the bitrate, but "understanding decoder" can always go deeper and fill in more realistic details when needed.


Exactly. What is the information that you need to preserve. When you think of this, the notion of lossy compression becomes much more understandable from a human+computer perspective. Another way to view the problem of compression is a quest to lower to kolmogrov complexity of the program that needs to run to generate the output that “represents” the original work “faithfully”.

It is true that for lossless compression there are constraints on the size of the input and the size of the compressed artifact+decoder, but I believe for lossy compression, the fourth element is the perceptive capabilities (or preferences) of human minds.


The amount of unused exploitable redundancy in typical video content is still massive, we just need better ways of of finding it.

A lot of the constant improvements is just that the amount of compute that is reasonable to use keeps growing, and so new codecs that are balanced around ever slower algorithms keep getting developed. I'm not saying that there is no new inventions -- plenty of useful tricks keep being found, but most of the difference of AV1 and what came before is just that the developers thought it was okay to spend more on search.


Some algorithms required too much compute power relative to what was available in mass market devices. As more transistors are available, more exhaustive searches in the content data for redundancy can be done.


With AV1 encoders I have to ask myself every time whether I should read the 1 as 'i' or as 'l'. Dav1d was still easy to guess, but here I really have a hard time.

Apart from that, I am really glad to see how Rust is slowly making an appearance in such fundamental multimedia libraries.


A V ONE

AVI is already a container so it would just be confusing.


I think he means "ravie" or "ravle".



Any major projects using this yet?


Vimeo: Vimeo's staff pick is encoded using rav1e, FFmpeg is having it



Great link (and I love Vimeo). The blog goes hyperbolic though: "In other words, if you happen to be in a part of the world where you might not have the best internet connection, AV1 will still provide you with an impeccable viewing experience."

No, AV1 is evolutionary. You'll "just" better quality for the same amount of bits (just like going from MPEG2 -> H.264 -> H.265 etc). The license of AV1 however is revolutionary.


Interesting that they do it only on the staff pick videos. I presume this is a way of incrementally rolling out support for the codec.


Yes.


Is there a way to use this in ffmpeg yet or is it just stand alone application?


FFmpeg master is having for a while now, if you could grab latest nightly, you can use it. https://ffmpeg.org/ffmpeg-codecs.html#librav1e


I would perhaps wait until rav1e 0.4, because there is a bug in the FFmpeg wrapper (of which I am the author) right now that breaks timestamps. If you wish to build with rav1e master, you can apply this to fix it: https://github.com/dwbuiten/FFmpeg/commit/ac4cf898c1960d72bc... (removing the configure check)

It hasn't made it upstream yet since I can't bump the minimum required version in FFmpeg git until rav1e tags 0.4.



The developers say it's the fastest, but does anybody have a link to benchmarks?


Is it worth using as a default encoder for home movie libraries? Or would it soebd the next decade encoding everything?


Anyone know how the "fastest" claim stacks up against SVT-AV1?


https://www.researchgate.net/publication/340351958_MSU_Video... according to that (5. ENCODING SPEED section) SVT-AV1 is a bit faster, but since they are both VERY slow it doesn't really matter. That was released in March so recent updates might have changed things.


Thanks.


Is it multithreaded???


Yes, by default it is, You can use `--threads` to hack around,

We are using Rayon for multi threading, and also tiling to boost the output


Wonderful. Thanks!


You can't make a competitive video encoder that isn't.


I see you haven't tried encoding AV1 with FFMPEG lately.


> The fastest and safest AV1 encoder.

Is it faster than dav1d?


dav1d is a decoder while rav1e is an encoder

rav1e = rav1e is an AV1 Encoder

dav1d = dav1d is an AV1 Decoder


Coders can't come up with a good name to save their lives.


Eav1e was right there as well :(


What's your proposal?


Since I am a fellow coder I cannot in good conscience suggest another terrible name.


How about “encodeav1” for the cli tool, “libav1encoder-rs” for the library crate.


"The fastest and safest AV1 encoder." quick search on unsafe: https://github.com/xiph/rav1e/search?q=unsafe&unscoped_q=uns...

https://github.com/xiph/rav1e/issues/2378

https://github.com/xiph/rav1e/issues/2310

Does not seems safe at all, definitly not the "safest"encoder out there.


The rust standard library is full of uses of unsafe. Does that mean it is not a safe library?

No one uses “unsafe” in C (because it doesn’t exist). Does that mean C libraries are safe?

Is there some safer encoder you have in mind?


Just because the source has unsafe blocks doesn't mean it is unsafe either. At least the amount of work to check these blocks for memory issues is a lot less then checking any other implementation.


Aren't the only other encoders written in C?


SVT-AV1 is written in C (and lots of assembly). libaom-av1 is written in C (and lots of assembly). libgav1 (which is decoder) is written in C++ (and lots of assembly). So yes, lots of C/C++ in open source AV1 encoders/decoders.


So it sounds to me like it might be the safest encoder... except for the fact that it is 60% assembly. Maybe they can improve that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: