Show HN: a lexer, parser, interpreter and web runtime (np, a 5-weekend project)

Udo · on March 17, 2013

I made this primarily because I wanted to learn more about writing an interpreter and I also wanted to experiment a little with the decisions that go into language design - so it turned out not like a "normal" C-like, but hopefully still recognizable. To my surprise, making decisions about how scope should work was actually the hardest part.

I can only recommend that every hacker does this at least once in their lifetime, it's been a real eye-opener for me in respect to problems I don't usually think too much about.

That said, feedback or questions are very welcome :)

gngeal · on March 17, 2013

"To my surprise, making decisions about how scope should work was actually the hardest part."

Perhaps a thorough reading of SICP would enlighten you as to how scope should be handled if you don't want to run into problems with it.

Udo · on March 17, 2013

I know SICP, it's great. However, arguments from authority only get one so far. I think it's definitely worth experimenting with different paradigms. And I'm not done yet. However, there is something to be said for not taking "the standard way of doing things" at face value all the time, even if that means arriving at the exact same solution in the end. To my defense though, different languages behave in different ways in this respect, so at least I'm not violating the entire collective wisdom of computing (just a part of it).

fabriceleal · on March 18, 2013

> I can only recommend that every hacker does this at least once in their lifetime

I've been doing this for a while now and I love interpreter implementation.

The way I got started with this was through http://nathansuniversity.com/, which teaches you how to implement interpreted languages using Javascript (it uses http://pegjs.majda.cz/ for generating the parser). From that I already played in C (lex/yacc for the lexer/parser), C# (irony - http://irony.codeplex.com) and currently I'm enjoying F# (flex/fyacc).

I already made a simple programming language from scratch (https://github.com/fabriceleal/Multitasks.git) and I'm currently implementing a subset of scheme (https://github.com/fabriceleal/Pang.git), which I have great proud to announce that has a working call-cc.

I'm only hoping to have some time to polish them before "officially" showing them in HN or reddit.

peterkelly · on March 18, 2013

"I wanted to explore what it feels like to write a parser, a lexer, and an interpreter without using a framework or some other ready-made solution"

This is an excellent exercise for any programmer who wants to really understand what's going on behind the scenes. I've always believed it's useful to understand at least one layer below what you normally work with.

Some people will criticise you for "why create another programming language when we already have X?" but ignore them. As long as you explain that it's an experiment (as you have) you should be fine. And it's possible that by starting something completely from the ground up you'll discover a different take on things that might be better than what exists already.

Just be prepared to do a complete rewrite or three once you've got your initial prototype working ;)

"and I didn't really consult any literature either"

I really recommend that you do.

The single best resource on this topic is the "Dragon Book" (officially "Compilers: Principles, Techniques, and Tools" by Aho, Lam, Sethi, and Ullman) - the latest edition is from 2006, though the older one should be fine for what you want too. I wrote my first compiler before learning the "proper" way of doing things and it was a complete mess. It was only after studying compiler construction at university that I learnt about things like formal grammars, syntax trees, etc. The dragon book will take you all the way through to expert level, and is written in such a way that you can implement different parts of your compiler as you go through the book.

One caveat though is it's huge (close to 1,000 pages). I spent a couple of months full time working through it (coding as I went along) during a long break from work a couple of years ago, and only got about half-way through in that time. But you only really need the first few chapters to get the basics; the later parts go on to a lot of advanced code optimisation techniques that you only really need if you're trying to write a high-performance production compiler or get into compiler research.

SICP, as someone else mentioned, is also very good, though it's really more about the overarching theoretical concepts and doesn't go into the same amount of depth as the dragon book. I think reading both is a good idea.

Just be warned, this stuff is highly addictive, and you can spend years studying it and still end up with lots of gaps in your knowledge ;)

Udo · on March 18, 2013

Thanks for the encouragement! Doing this was pretty exciting.

> and I didn't really consult any literature either

In hindsight I shouldn't have written that, because it practically forces people to ignore the project itself and begin suggesting emergency reading material to the illiterate goof who wrote this thing (a.k.a. me). I want to clarify: it's not that I have no clue how anything works, I just didn't study up especially for this project.

There is a continuum ranging from absolutely no prior knowledge, to some intellectual contamination, up to just aping established methods because you couldn't imagine doing things any other way. I was probably somewhere around the middle when I started this and it's not a bad place to be. In fact, I'd go so far as to recommend this as a good perspective for anyone who wants to try something similar. Learning "how it's done" beforehand not only defeats the point of the exercise, it also results in a lot of projects that are strikingly monocultural.

Now that I have a better understanding of the problem space, I imagine I will read more on these topics. I'm excited to find out how others have approached some of the problems I struggled with. So thanks for suggesting the Dragon Book, which I haven't read yet (as opposed to SICP on which I already commented)!

stiff · on March 18, 2013

The single best resource on this topic is the "Dragon Book"

Your advice is great in general, but I wish people would back up statements like this with some context, for example, what specifically have you compared it with? Inspired by Steve Yegge blogposts I once decided to write a toy compiler for educational purposes and I started with reading the Dragon Book, in fact I eventually read it almost cover to cover and implemented many algorithms from it, it is definitely full of valuable information. But it also is a very roundabout way of learning how to write an actual compiler, it is more a theoretical reference work than anything else. There are several hundreds of pages devoted to parsing, but some of the more modern techniques are not covered, runtime is treated much more briefly and many practical issues are not discussed at all. There are some nice modern textbooks that are more to the point:

http://www.amazon.com/Modern-Compiler-Design-D-Grune/dp/0471... http://www.amazon.com/Engineering-Compiler-Second-Keith-Coop...

ANTLR is a great tool and many practical issues I solved with the help of this book about it:

http://www.amazon.com/gp/product/1934356999/

Finally, source code of original AWK is a great example codebase to learn about real-world parsing, building and traversing parse trees etc.:

https://github.com/danfuzz/one-true-awk

It is an interpreter and not a compiler but once you know how to do the things just mentioned converting it to a naive compiler isn't that hard if you are not interested in fancy optimization like the OP said.

peterkelly · on March 18, 2013

s/The single best resource/Out of the resources I've read, the one I've personally found most useful/

;)

I should also probably add that I only ever read the dragon book years after I first started learning about compilers through more gentle means such as the 3rd-year compiler construction course I studied, and a bunch of other reading and experimentation. It's probably better to start off with something simpler. I haven't read either of the two books you referenced; they may indeed be more appropriate for a beginner.

While I haven't used ANTLR, I've used bison & flex, as well as Stratego (which operates on a much higher level and is actually very nice).

I guess it depends on what aspect of compilers you're most interested in and how much depth you want to go into. The dragon book will teach you all about how to write your own lexer + parser generator (which I found quite fascinating), but you don't need to know about NFAs/DFA construction etc. if you just want to create your own language implementation, given the existence of many good lexer/parser generators to which you can just pass in a formal grammar.

Another book I also found very useful was "The Implementation of Functional Programming Languages" by Simon Peyton Jones (http://research.microsoft.com/en-us/um/people/simonpj/papers...), though it's mostly of use only if you're specifically interested in functional programming (which is what I've mainly focused on my research; this book is much more specialised than some of the others).

stiff · on March 18, 2013

The dragon book will teach you all about how to write your own lexer + parser generator (which I found quite fascinating), but you don't need to know about NFAs/DFA construction etc.

One of the most fun projects I made inspired by the Dragon Book was a small grep implementation that was building up an NFA from a regular expression parsed by recursive descent and than simulating the NFA using an algorithm with two stacks described in the book :)

It's just that it takes a lot of time to get something practical running starting from just the Dragon Book, and it can be discouraging given how much effort it takes to read it. Overall I can second everything you say, though.

rohitarondekar · on March 18, 2013

Great advice. I'd also recommend the Compilers course on Coursera. Although it is already underway, you can take the self paced version. https://www.coursera.org/course/compilers

mkdir · on March 17, 2013

This is an inspiring project. It would have taken me far more than five weekends to create this, so you can feel smug knowing you're about ten times as efficient as a random HN member!

I have a couple of criticisms. First of all, strange things happen when a semicolon is omitted from the end of a function definition. In the following example, there is no output. What's going on?

    square = { n |
      n * n
    }
    println(square(5));

Second, according to the rules for omitting parentheses, it seems like this should output "25". Instead, it outputs "Function5":

    square = { n |
      n * n
    };
    println square(5);

It seems like it's passing "square" and 5 as separate parameters to println, which is unlikely to be the intended behavior in most cases.

Lastly, is recursion possible?

    factorial = { n |
      if (n < 2) { 1 }
      { n * factorial(n - 1) }
    };
    println(factorial(5));

That code generates the following error:

    Error: function identifier expected ('Exp() :3:5' found) at line 3 char 5

Anyway, I don't mean to nitpick! Do you plan on developing this further?

Udo · on March 17, 2013

First: thanks for even trying it out!

Yeah, if you don't leave off the final semicolon, it doesn't return the last value. In that case you would have to use a normal "return" statement. It's definitely a bug, but I left it in there for some time because I thought it was weirdly interesting. I'm going to get rid of that bug though in the next version.

> Second, according to the rules for omitting parentheses

Ah, I'm doing a bad job with the tutorial then. Parentheses don't work like they do in C-like languages. It's more like Lisp in that regard. Your last statement prints "square"="Function" because you're not invoking the function but getting the function pointer. The correct way would be (again, somewhat Lisp-like):

    square = { n |
      n * n
    };
    println (square 5);

Again, this would also be the solution to your last example.

I should probably make a general syntax paragraph as part of the tutorial, especially for people coming from C-likes where invokation goes function(a) instead of (function a).

Edit: there's a section on the site now to explain this, I hope this will make it easier.

mkdir · on March 17, 2013

Actually—I think you did explain properly. I must confess that I jumped into the sandbox before reading the documentation, and then I only checked the documentation when ran into behavior I didn't expect.

I do think the semicolon behavior is a bit confusing. The documentation explains that expressions aren't auto-returned if they're followed by a semicolon, but I don't think that quite covers the behavior I pointed out. I might be missing something else from the documentation, though...

Also, I found one more major issue. Your documentation uses the numeric literal "3.1415" in the explanation for named parameters. That really needs to be "3.1416". :P

Udo · on March 17, 2013

> I do think the semicolon behavior is a bit confusing.

I agree.

Regarding the named params, the example is:

  f 
    3.1415 
    #name:"I'm using a named param!";

where 3.1415 is an unnamed parameter, and the 3rd line contains the named part. I'm probably showing too many things at once here.

Also, there are definitely a few bugs still to iron out!

mkdir · on March 17, 2013

Regarding "3.1415", I was simply passive-aggressively pointing out that if you're referencing pi, you should probably round up to "3.1416". The example itself made sense!

Udo · on March 17, 2013

Ah, I'm so dense! Sure, there is also "math.pi" as a built-in for that. (And I fixed this grave rounding error)

I also discovered a scope problem with the recursion example you gave. Turns out, there is (of course) a major bug in there, thanks for discovering that. It's about the visibility of the "factorial" symbol itself, so the workaround is for now:

  factorial = { n |
    factorial = outer.factorial;
    if(n == 0) {
      1
    } {
      n * (factorial (n - 1))
    } 
  };
  println (factorial 5);

At least, until I restart the service (which I don't want to do while everyone is potentially using the tutorial).

S4M · on March 17, 2013

I also am super impressed that it took the OP only 5 week ends to do this. For me it would probably take 1 year if I don't give up on the way...

Udo · on March 17, 2013

To be fair, I'm showing you something extremely half-baked here and I thought about the project a lot before starting to code anything.

endlessvoid94 · on March 17, 2013

Keep up the good work!

I recently started working on a bytecoded javascript interpreter for the same reason - fun. I've found the biggest criticism people have for me is "why are you doing this?" You can definitely draw a line between the typical responses: either "oh, that's cool!" and "that's stupid". It's been interesting.

mgallivan · on March 17, 2013

I've seen this response to others working on new languages. How do people expect languages to improve without hobbyists honing their skills? A lot of _good_ languages start out in one person's head so please keep at it!

Udo · on March 17, 2013

Thanks!

Looking forward to your Show HN, then :)

niggler · on March 17, 2013

Failing in chrome:

    java.lang.ArrayIndexOutOfBoundsException: 1
	at np.LibRequest.parseCookies(LibRequest.java:37)
	at np.LibRequest.<init>(LibRequest.java:26)
	at np.Interpreter.initRootContext(Interpreter.java:94)
	at np.Interpreter.run(Interpreter.java:115)
	at np.Interpreter.load(Interpreter.java:142)
	at npfcgi.handleRequest(npfcgi.java:60)
	at npfcgi.main(npfcgi.java:107)

Udo · on March 17, 2013

Of course I managed to cause a fatal error just after posting the link here... ;)

Anyway, the tutorial demo server should work now.

iskander · on March 17, 2013

Looks cool but the sandboxed interpreter fails to run in my instance of Chrome (v25 on Mac OS X):

>Error: java.lang.NullPointerException np.LibRuntime.b_assign(LibRuntime.java:214) sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:616) np.LibRuntime.execute(LibRuntime.java:104) np.CoreBuiltin.execute(CoreBuiltin.java:85) np.ClastExp.invoke(ClastExp.java:32) np.ClastExp.run(ClastExp.java:45) np.ClastExp.invoke(ClastExp.java:21) np.ClastExp.run(ClastExp.java:45) np.ClastModule.run(ClastModule.java:29) np.Interpreter.run(Interpreter.java:120)

etc...

Udo · on March 17, 2013

The tutorial examples seem to work. Could you post the code you're executing?

Thanks for letting me know!

iskander · on March 18, 2013

I don't remember exactly what I typed in, something along the lines of:

    x = 10;
    y = 20;
    println (x ^ (y ^ y));

Udo · on March 18, 2013

Huh, the ^ operator isn't defined yet, but that shouldn't have caused it. Damn, I would have liked to find this out, because the end user should never see exceptions of course. Anyway, thanks for telling me, I hope it'll crop up again in the future!

huhtenberg · on March 18, 2013

  oki-doki = 42;

will cause an exception too.

akkartik · on March 17, 2013

I'm curious: why did you choose the lisp-like (f x) syntax for function calls?

Udo · on March 17, 2013

The short answer is: I wanted to experiment and do something different. I thought it was worth looking into the fact that we've become so accustomed to the f(x) form, when more logically the function should form a more obvious unit with its parameters.

However, I have to admit that it still takes some getting-used-to even for me. Maybe there is something about f(x) that is inherently more readable then (f x), but I'll reserve my judgement for a week weeks.

goldfeld · on March 18, 2013

I have been coding CoffeeScript for a while and slowly got rid of it's parentheses down to doing `do someFunction`. When I learned about Clojure and got excited to learn it (but haven't yet), I also gradually shifted my aesthetics to using (someFunction param1, param2) in CoffeScript every time that was needed for making it unambiguous (e.g. chained function calls), and now I consider someFunction(param) a most hideous invocation. But (fn x y) is even nicer. I have noticed you allow for optional commas when invoking multiple parameters.

Overall I love your syntax decisions, maybe I'm partial for liking lispy-ness and lambda syntax, but still it's refreshing to see someone experimenting beyond all the C-ness that pervades us. I feel like there's a dash of Rust and a sprinkle of Haskell too. Nice work! And if you keep development I might actually consider using this for a project or two in the future.

Udo · on March 18, 2013

> I have noticed you allow for optional commas when invoking multiple parameters.

Yes, but they're thrown away by the lexer. I'm actually still on the fence about some of these details.

> And if you keep development I might actually consider using this for a project or two in the future.

That's great to hear, thanks! There is still a TON of bugfixing and fleshing-out of the runtime library to do, but when it's ready I'll be doing a project with it as well. Gotta see how it all holds up under pressure!

stormbrew · on March 17, 2013

I don't think I've seen it argued before that (f x) is a more obvious form than f(x). Usually the argument for (f x) is premised on a lisp-based idea of homoiconicity (code/data parity).

The reason f(x) seems more obvious in meaning, to me at least, is that f is a thing and (x) is a message or application I'm giving it.

pash · on March 18, 2013

The traditional syntax really falls down when your functions are curried and you want to do partial application. In that context, as in Haskell, the f x_1 ... x_n syntax is as natural as it gets.

akkartik · on March 17, 2013

I'll be curious to hear more over the coming weeks. Email in profile.

(I have a dog in this fight, but I'll avoid saying more right now and wait for you to settle on a position :)

andrewflnr · on March 18, 2013

I've only skimmed it so far, but the docs seem excellent for a mere 5-weekend project. That is perhaps the thing that impresses me most.

Udo · on March 18, 2013

I basically always had the doc/html editor open alongside the IDE. It helped to document things as I went along. I'm not 100% sure I completely succeeded in keeping the two in sync, however. Since I always planned to Show HN, that motivated me to make it as nice as possible. There is also a programming buddy who had a semi-watchful eye on it.

mgallivan · on March 17, 2013

Could you post any helpful resources you used? Also, what was your experience with language design before going into this project?

Udo · on March 17, 2013

> Could you post any helpful resources you used?

Not a lot initially. I did have a look at the LLVM tutorial, which is absolutely great, but I eventually closed that tab because I found it more exciting to think about these problems by myself (instead of devoting resources to always figuring out "why did they do it this way?"). It probably took a lot longer, but it was also a better learning experience. There is nothing fundamentally magic about this, one simply has to go ahead an do it.

It's a mixed bag, because I can feel there are a lot of inefficiencies in my code that exist either because I personally find it more readable or simply because I couldn't quite grasp the alternative at the time.

> Also, what was your experience with language design before going into this project?

None, that's a big reason why I wanted to do this. I did make a tool for pen&paper roleplaying before that could execute standard dice codes (http://rolz.org/) but that's nothing like this project. Over the years I made several domain-specific "configuration" languages for some projects, but again, this was different.

It did help a lot to break down the process into distinct steps that have almost no overlap. A lexer to convert a string of tokens, a parser to build a tree, an interpreter to execute that tree in place (not the most efficient thing to do but it was a lot of fun), and a minimal runtime library on top of it. For the runtime functions I did cheat a little, however, because there are a lot of Apache Commons function wrappers in there.

There were some decisions of probably questionable value. For example, when I decided that the language would have no commas and would use whitespace as a generic token separator. It's been surprising sometimes to deal with the consequences of those early decisions.

One of the things that surprised me was that despite the total absence of optimization, np doesn't execute abysmally slow. When I was banging out the code I was convinced it would degrade to C64-like performance levels ;)

I'm actually planning to add more functionality, as I've hinted at in the tutorial, up to the point where someone could conceivably do a web project with it. That someone will probably be me ;)

kunil · on March 18, 2013

I am also working a programming language. I want programmers to be able to dynamically edit program code in run time. For example replacing operators in equations, conditions or injecting code into labels etc. Currently I have a working parser and semi working interpreter. But I can't find enough time to continue

kamaal · on March 18, 2013

If you did this in your spare time or even other wise. 5 weeks is amazingly short time for a project of this size.

Good work done!

Udo · on March 18, 2013

Thanks kamaal! It probably looks larger than it is, there is still a huge amount of stuff missing that people would expect from a real programming language (and also, a lot of bugs still need to be fixed).

LolWolf · on March 17, 2013

Really digging the 5-week project; great job, mate, looking forward to seeing more!

Udo · on March 17, 2013

Thanks, it's great to finally show something, even if it's not really of any practical value!