Nom, a byte oriented, streaming, zero copy, parser combinators library in Rust [pdf]

geal · on May 26, 2015

Hi all, Nom's author here.

I wrote this paper in January 2015, so some things changed in the meantime:

- the benchmarks have been optimized a bit, and someone contributed a cereal parser that beats nom ( https://github.com/Geal/nom_benchmarks ). It's alright, I just wanted to estimate where it stands

- there is better error management: https://github.com/Geal/nom/wiki/Error-management

- the error management types give powerful debugging features: http://dev.unhandledexpression.com/slides/langsec-2015/img/c... (test code: https://github.com/Geal/nom_colored_hexdump )

- It is now easy to embed in C ( https://github.com/Geal/nom_in_c ). I plan to make API compatible versions of some C libs

- there are more example parsers now: https://github.com/Geal/nom/issues/14 feel free to pick one up!

Here are the slides for the conference at the IEEE Langsec workshop: http://dev.unhandledexpression.com/slides/langsec-2015/

Langsec is one of the most interesting approaches to follow in security, and the ideas presented this year were amazing (I'm especially fond of the heap exploitation algebra). I highly recommend watching the videos once they will be available.

Writing parsers with nom is easy and fun, please give it a try ;) https://github.com/Geal/nom

dansimon · on May 26, 2015

Great work! I've been working on a Rust parser combinator library myself ( https://github.com/DanSimon/peruse ), though I've been focusing more on parsing syntax trees than binary formats, and it's really cool to see the different approaches you've taken. I had been planning on working on stream parsers next, but I may just defer to nom since it basically looks like what I had wanted to do.

geal · on May 26, 2015

I really like your approach with trait objects. This is something I would have liked in nom when I started it, but macros proved more useful for my experimentations (less types to rewrite when I change something).

For the streaming part, it is still a work in progress. I need a way to better represent input enabled state machines, otherwise we'll end up with streaming switch based state machines, and they are a security nigtmare.

pjmlp · on May 26, 2015

> Langsec is one of the most interesting approaches to follow in security, and the ideas presented this year were amazing (I'm especially fond of the heap exploitation algebra). I highly recommend watching the videos once they will be available.

Many thanks for the heads up!

kmike84 · on May 26, 2015

Nice!

Any plans for fancier parsing algorithms (GLL? see e.g. http://www.cs.uwm.edu/~dspiewak/papers/generalized-parser-co...)

geal · on May 26, 2015

yes, that's the plan! I had to take care of the basic tools and the plumbing first, but I'll add better algorithms.

nickpsecurity · on May 26, 2015

Getting parsing done correctly and efficiently is a requirement that pops up over and over. More work in the field is always good. Interesting paper, although I haven't heard of the parsers. I usually hear about yacc, GOLD parsing system, and so on. Like to see comparisons with most common ones for probable use cases in terms of productivity, safety, and performance.

Far as next project, Leroy's people at INRIA did excellent work in verified, LR parsers [1]. I'm not sure that anyone is building on it at the moment. Implementing that in or integrating with Rust might make for one heck of a parsing system. So long as correspondence was proven, it would be the safest one in a systems language.

[1] http://gallium.inria.fr/~fpottier/publis/jourdan-leroy-potti...

geal · on May 26, 2015

The parser combinators are a common approach in functional languages. Basically, instead of generating the whole parser from a grammar, you assemble a lot of small functions, in other functions. The resulting code often ressembles the grammar very closely, and all of the intermediate parsers are very easy to test.

Proving with Coq the soundness of a parser compared to its grammar is a cool approach! The biggest problem one has when writing parsers in "safe" systems like parser generators or parser combinators, is the gaps between the input language intended by the designer, the input language described by the grammar and the input language described by the code. Anything that can reduce those gaps is welcome.

I shoul try at some point to use Coq or some SMT based system to hunt ambiguity in formats. That would make an interesting research ;)

nickpsecurity · on May 26, 2015

That makes sense. The combinator approach might be verified using a form of design by contract or VCG's if it's simple functions. I'll look into them when I take the deep dive into functional programming.

alfiedotwtf · on May 25, 2015

As OP says "benchmarks, they have to be considered with skepticism", but with Nom implementing the "most naive recursive parsing", even the UN-optimised benchmarks look impressive

flippant · on May 25, 2015

https://github.com/Geal/nom

https://github.com/Geal/nom_benchmarks

eli_gottlieb · on May 26, 2015

Ok, if Rust adds higher-kinded types, I'm giving it a try. This is very interesting, being able to write almost Haskell-style code without laziness or garbage-collection.

steveklabnik · on May 26, 2015

We designed the standard library to accommodate them, but working on an actual proposal was put off till post 1.0. It's a very highly desired feature.

stefanix · on May 26, 2015

Nom is such good name for a parser. Will ver 2.0 be called NomNom?

geal · on May 26, 2015

That's a fun numbering scheme, but it might be cumbersome in the long run, right?

It's named that way because a nom parser takes a byte of data ;)

DanWaterworth · on May 26, 2015

    Nom
    NomOm
    NomNom
    NomOmOm
    NomOmNom
    ...

buster · on May 26, 2015

I am wondering how it compares to rust-peg[1]. I am currently using it and the syntax is very nice. At first look it seems to be more verbose to use Nom.. Is it much faster then rust-peg, though?

[1] https://github.com/kevinmehall/rust-peg

geal · on May 26, 2015

I do not know if it is faster, I should add it to the benchmarks :)

The biggest difference here is the use of a syntax extension to parse the grammar. This is something I would like to do at some point, because it can make the code nicer. Right now, macros are good, because you can use nom directly without Rust 1.0, no need for feature gates.

buster · on May 26, 2015

That'd be nice.. i am currently contemplating to move or not. Nom does run on Rust stable, right? Although i find rust-peg super easy to use, i don't like that it's only working on nightly. A difference in performance would be another reason to switch over.. or not.. :)

Anyway, thanks for this great project!

geal · on May 26, 2015

Nom works on stable, but there are nice things you can do if you use the nightly, like matching on error codes with slice or box patterns ( https://github.com/Geal/nom/wiki/Error-management#with-slice... ) or use no_std to embed the parser as a C library ( https://github.com/Geal/nom_in_c ).

steveklabnik · on May 26, 2015

You shouldn't need no_std specifically to embed in C anymore.

geal · on May 26, 2015

That's nice to know, thanks!

steveklabnik · on May 26, 2015

Any time. Basically, one the runtime was removed[1], this became true.

1: Of course, all programming languages have runtimes, but once we got down to C or C++ levels of runtime, that is.

amelius · on May 26, 2015

What class of grammars does this support? How are ambiguities handled and reported?

geal · on May 26, 2015

Parser combinators are recursive descent parsers, so they do not manage ambiguities at compile time. At runtime, the first parsing path that works will win.

Nom can handle easily regular, context-free and context sensitive grammars (a lot of binary formats are a bit context sensitive, so that was a requirement). I suspect it could do recursively enumerable too.

Animats · on May 26, 2015

How hard is it to translate something written for "pyparsing" to Nom? The approach seems reasonably similar, but are there limitations in Nom that aren't in "pyparsing"?

geal · on May 26, 2015

I have never used pyparsing, but the approach seems similar. Right now, the only real limitation is that I still have a few combinators to implement. Otherwise, anything complex you might want can be done by writing a simple function.

Apanatshka · on May 26, 2015

Hmm. Someone beat me to it. Interesting though, very cool.