I made this primarily because I wanted to learn more about writing an interpreter and I also wanted to experiment a little with the decisions that go into language design - so it turned out not like a "normal" C-like, but hopefully still recognizable. To my surprise, making decisions about how scope should work was actually the hardest part.
I can only recommend that every hacker does this at least once in their lifetime, it's been a real eye-opener for me in respect to problems I don't usually think too much about.
That said, feedback or questions are very welcome :)
I know SICP, it's great. However, arguments from authority only get one so far. I think it's definitely worth experimenting with different paradigms. And I'm not done yet. However, there is something to be said for not taking "the standard way of doing things" at face value all the time, even if that means arriving at the exact same solution in the end. To my defense though, different languages behave in different ways in this respect, so at least I'm not violating the entire collective wisdom of computing (just a part of it).
> I can only recommend that every hacker does this at least once in their lifetime
I've been doing this for a while now and I love interpreter implementation.
The way I got started with this was through http://nathansuniversity.com/, which teaches you how to implement interpreted languages using Javascript (it uses http://pegjs.majda.cz/ for generating the parser). From that I already played in C (lex/yacc for the lexer/parser), C# (irony - http://irony.codeplex.com) and currently I'm enjoying F# (flex/fyacc).
"I wanted to explore what it feels like to write a parser, a lexer, and an interpreter without using a framework or some other ready-made solution"
This is an excellent exercise for any programmer who wants to really understand what's going on behind the scenes. I've always believed it's useful to understand at least one layer below what you normally work with.
Some people will criticise you for "why create another programming language when we already have X?" but ignore them. As long as you explain that it's an experiment (as you have) you should be fine. And it's possible that by starting something completely from the ground up you'll discover a different take on things that might be better than what exists already.
Just be prepared to do a complete rewrite or three once you've got your initial prototype working ;)
"and I didn't really consult any literature either"
I really recommend that you do.
The single best resource on this topic is the "Dragon Book" (officially "Compilers: Principles, Techniques, and Tools" by Aho, Lam, Sethi, and Ullman) - the latest edition is from 2006, though the older one should be fine for what you want too. I wrote my first compiler before learning the "proper" way of doing things and it was a complete mess. It was only after studying compiler construction at university that I learnt about things like formal grammars, syntax trees, etc. The dragon book will take you all the way through to expert level, and is written in such a way that you can implement different parts of your compiler as you go through the book.
One caveat though is it's huge (close to 1,000 pages). I spent a couple of months full time working through it (coding as I went along) during a long break from work a couple of years ago, and only got about half-way through in that time. But you only really need the first few chapters to get the basics; the later parts go on to a lot of advanced code optimisation techniques that you only really need if you're trying to write a high-performance production compiler or get into compiler research.
SICP, as someone else mentioned, is also very good, though it's really more about the overarching theoretical concepts and doesn't go into the same amount of depth as the dragon book. I think reading both is a good idea.
Just be warned, this stuff is highly addictive, and you can spend years studying it and still end up with lots of gaps in your knowledge ;)
Thanks for the encouragement! Doing this was pretty exciting.
> and I didn't really consult any literature either
In hindsight I shouldn't have written that, because it practically forces people to ignore the project itself and begin suggesting emergency reading material to the illiterate goof who wrote this thing (a.k.a. me). I want to clarify: it's not that I have no clue how anything works, I just didn't study up especially for this project.
There is a continuum ranging from absolutely no prior knowledge, to some intellectual contamination, up to just aping established methods because you couldn't imagine doing things any other way. I was probably somewhere around the middle when I started this and it's not a bad place to be. In fact, I'd go so far as to recommend this as a good perspective for anyone who wants to try something similar. Learning "how it's done" beforehand not only defeats the point of the exercise, it also results in a lot of projects that are strikingly monocultural.
Now that I have a better understanding of the problem space, I imagine I will read more on these topics. I'm excited to find out how others have approached some of the problems I struggled with. So thanks for suggesting the Dragon Book, which I haven't read yet (as opposed to SICP on which I already commented)!
The single best resource on this topic is the "Dragon Book"
Your advice is great in general, but I wish people would back up statements like this with some context, for example, what specifically have you compared it with? Inspired by Steve Yegge blogposts I once decided to write a toy compiler for educational purposes and I started with reading the Dragon Book, in fact I eventually read it almost cover to cover and implemented many algorithms from it, it is definitely full of valuable information. But it also is a very roundabout way of learning how to write an actual compiler, it is more a theoretical reference work than anything else. There are several hundreds of pages devoted to parsing, but some of the more modern techniques are not covered, runtime is treated much more briefly and many practical issues are not discussed at all. There are some nice modern textbooks that are more to the point:
It is an interpreter and not a compiler but once you know how to do the things just mentioned converting it to a naive compiler isn't that hard if you are not interested in fancy optimization like the OP said.
s/The single best resource/Out of the resources I've read, the one I've personally found most useful/
;)
I should also probably add that I only ever read the dragon book years after I first started learning about compilers through more gentle means such as the 3rd-year compiler construction course I studied, and a bunch of other reading and experimentation. It's probably better to start off with something simpler. I haven't read either of the two books you referenced; they may indeed be more appropriate for a beginner.
While I haven't used ANTLR, I've used bison & flex, as well as Stratego (which operates on a much higher level and is actually very nice).
I guess it depends on what aspect of compilers you're most interested in and how much depth you want to go into. The dragon book will teach you all about how to write your own lexer + parser generator (which I found quite fascinating), but you don't need to know about NFAs/DFA construction etc. if you just want to create your own language implementation, given the existence of many good lexer/parser generators to which you can just pass in a formal grammar.
Another book I also found very useful was "The Implementation of Functional Programming Languages" by Simon Peyton Jones (http://research.microsoft.com/en-us/um/people/simonpj/papers...), though it's mostly of use only if you're specifically interested in functional programming (which is what I've mainly focused on my research; this book is much more specialised than some of the others).
The dragon book will teach you all about how to write your own lexer + parser generator (which I found quite fascinating), but you don't need to know about NFAs/DFA construction etc.
One of the most fun projects I made inspired by the Dragon Book was a small grep implementation that was building up an NFA from a regular expression parsed by recursive descent and than simulating the NFA using an algorithm with two stacks described in the book :)
It's just that it takes a lot of time to get something practical running starting from just the Dragon Book, and it can be discouraging given how much effort it takes to read it. Overall I can second everything you say, though.
Great advice. I'd also recommend the Compilers course on Coursera. Although it is already underway, you can take the self paced version. https://www.coursera.org/course/compilers
This is an inspiring project. It would have taken me far more than five weekends to create this, so you can feel smug knowing you're about ten times as efficient as a random HN member!
I have a couple of criticisms. First of all, strange things happen when a semicolon is omitted from the end of a function definition. In the following example, there is no output. What's going on?
square = { n |
n * n
}
println(square(5));
Second, according to the rules for omitting parentheses, it seems like this should output "25". Instead, it outputs "Function5":
square = { n |
n * n
};
println square(5);
It seems like it's passing "square" and 5 as separate parameters to println, which is unlikely to be the intended behavior in most cases.
Lastly, is recursion possible?
factorial = { n |
if (n < 2) { 1 }
{ n * factorial(n - 1) }
};
println(factorial(5));
That code generates the following error:
Error: function identifier expected ('Exp() :3:5' found) at line 3 char 5
Anyway, I don't mean to nitpick! Do you plan on developing this further?
Yeah, if you don't leave off the final semicolon, it doesn't return the last value. In that case you would have to use a normal "return" statement. It's definitely a bug, but I left it in there for some time because I thought it was weirdly interesting. I'm going to get rid of that bug though in the next version.
> Second, according to the rules for omitting parentheses
Ah, I'm doing a bad job with the tutorial then. Parentheses don't work like they do in C-like languages. It's more like Lisp in that regard. Your last statement prints "square"="Function" because you're not invoking the function but getting the function pointer. The correct way would be (again, somewhat Lisp-like):
square = { n |
n * n
};
println (square 5);
Again, this would also be the solution to your last example.
I should probably make a general syntax paragraph as part of the tutorial, especially for people coming from C-likes where invokation goes function(a) instead of (function a).
Edit: there's a section on the site now to explain this, I hope this will make it easier.
Actually—I think you did explain properly. I must confess that I jumped into the sandbox before reading the documentation, and then I only checked the documentation when ran into behavior I didn't expect.
I do think the semicolon behavior is a bit confusing. The documentation explains that expressions aren't auto-returned if they're followed by a semicolon, but I don't think that quite covers the behavior I pointed out. I might be missing something else from the documentation, though...
Also, I found one more major issue. Your documentation uses the numeric literal "3.1415" in the explanation for named parameters. That really needs to be "3.1416". :P
Regarding "3.1415", I was simply passive-aggressively pointing out that if you're referencing pi, you should probably round up to "3.1416". The example itself made sense!
Ah, I'm so dense! Sure, there is also "math.pi" as a built-in for that. (And I fixed this grave rounding error)
I also discovered a scope problem with the recursion example you gave. Turns out, there is (of course) a major bug in there, thanks for discovering that. It's about the visibility of the "factorial" symbol itself, so the workaround is for now:
I recently started working on a bytecoded javascript interpreter for the same reason - fun. I've found the biggest criticism people have for me is "why are you doing this?" You can definitely draw a line between the typical responses: either "oh, that's cool!" and "that's stupid". It's been interesting.
I've seen this response to others working on new languages. How do people expect languages to improve without hobbyists honing their skills? A lot of _good_ languages start out in one person's head so please keep at it!
java.lang.ArrayIndexOutOfBoundsException: 1
at np.LibRequest.parseCookies(LibRequest.java:37)
at np.LibRequest.<init>(LibRequest.java:26)
at np.Interpreter.initRootContext(Interpreter.java:94)
at np.Interpreter.run(Interpreter.java:115)
at np.Interpreter.load(Interpreter.java:142)
at npfcgi.handleRequest(npfcgi.java:60)
at npfcgi.main(npfcgi.java:107)
Huh, the ^ operator isn't defined yet, but that shouldn't have caused it. Damn, I would have liked to find this out, because the end user should never see exceptions of course. Anyway, thanks for telling me, I hope it'll crop up again in the future!
The short answer is: I wanted to experiment and do something different. I thought it was worth looking into the fact that we've become so accustomed to the f(x) form, when more logically the function should form a more obvious unit with its parameters.
However, I have to admit that it still takes some getting-used-to even for me. Maybe there is something about f(x) that is inherently more readable then (f x), but I'll reserve my judgement for a week weeks.
I have been coding CoffeeScript for a while and slowly got rid of it's parentheses down to doing `do someFunction`. When I learned about Clojure and got excited to learn it (but haven't yet), I also gradually shifted my aesthetics to using (someFunction param1, param2) in CoffeScript every time that was needed for making it unambiguous (e.g. chained function calls), and now I consider someFunction(param) a most hideous invocation. But (fn x y) is even nicer. I have noticed you allow for optional commas when invoking multiple parameters.
Overall I love your syntax decisions, maybe I'm partial for liking lispy-ness and lambda syntax, but still it's refreshing to see someone experimenting beyond all the C-ness that pervades us. I feel like there's a dash of Rust and a sprinkle of Haskell too. Nice work! And if you keep development I might actually consider using this for a project or two in the future.
> I have noticed you allow for optional commas when invoking multiple parameters.
Yes, but they're thrown away by the lexer. I'm actually still on the fence about some of these details.
> And if you keep development I might actually consider using this for a project or two in the future.
That's great to hear, thanks! There is still a TON of bugfixing and fleshing-out of the runtime library to do, but when it's ready I'll be doing a project with it as well. Gotta see how it all holds up under pressure!
I don't think I've seen it argued before that (f x) is a more obvious form than f(x). Usually the argument for (f x) is premised on a lisp-based idea of homoiconicity (code/data parity).
The reason f(x) seems more obvious in meaning, to me at least, is that f is a thing and (x) is a message or application I'm giving it.
The traditional syntax really falls down when your functions are curried and you want to do partial application. In that context, as in Haskell, the f x_1 ... x_n syntax is as natural as it gets.
I basically always had the doc/html editor open alongside the IDE. It helped to document things as I went along. I'm not 100% sure I completely succeeded in keeping the two in sync, however. Since I always planned to Show HN, that motivated me to make it as nice as possible. There is also a programming buddy who had a semi-watchful eye on it.
Not a lot initially. I did have a look at the LLVM tutorial, which is absolutely great, but I eventually closed that tab because I found it more exciting to think about these problems by myself (instead of devoting resources to always figuring out "why did they do it this way?"). It probably took a lot longer, but it was also a better learning experience. There is nothing fundamentally magic about this, one simply has to go ahead an do it.
It's a mixed bag, because I can feel there are a lot of inefficiencies in my code that exist either because I personally find it more readable or simply because I couldn't quite grasp the alternative at the time.
> Also, what was your experience with language design before going into this project?
None, that's a big reason why I wanted to do this. I did make a tool for pen&paper roleplaying before that could execute standard dice codes (http://rolz.org/) but that's nothing like this project. Over the years I made several domain-specific "configuration" languages for some projects, but again, this was different.
It did help a lot to break down the process into distinct steps that have almost no overlap. A lexer to convert a string of tokens, a parser to build a tree, an interpreter to execute that tree in place (not the most efficient thing to do but it was a lot of fun), and a minimal runtime library on top of it. For the runtime functions I did cheat a little, however, because there are a lot of Apache Commons function wrappers in there.
There were some decisions of probably questionable value. For example, when I decided that the language would have no commas and would use whitespace as a generic token separator. It's been surprising sometimes to deal with the consequences of those early decisions.
One of the things that surprised me was that despite the total absence of optimization, np doesn't execute abysmally slow. When I was banging out the code I was convinced it would degrade to C64-like performance levels ;)
I'm actually planning to add more functionality, as I've hinted at in the tutorial, up to the point where someone could conceivably do a web project with it. That someone will probably be me ;)
I am also working a programming language. I want programmers to be able to dynamically edit program code in run time. For example replacing operators in equations, conditions or injecting code into labels etc. Currently I have a working parser and semi working interpreter. But I can't find enough time to continue
Thanks kamaal! It probably looks larger than it is, there is still a huge amount of stuff missing that people would expect from a real programming language (and also, a lot of bugs still need to be fixed).
I can only recommend that every hacker does this at least once in their lifetime, it's been a real eye-opener for me in respect to problems I don't usually think too much about.
That said, feedback or questions are very welcome :)