I don't know why it uses a garbage collector when simple reference counting would suffice. Also I hate libraries with global states/variables like this one ...
I am not sure I grok the code, but I think something along the following lines would work:
- a 'recovery' list of (object pointer, old refcount) pairs
- whenever you allocate something, add (object pointer, 0) to the list
- when you update a reference count, check whether the object is in the list.
If it is not, add it to the list, thus remembering the original refcount
- whenever the code decreases reference count to zero, add the object to a
'to be deleted' list (this could possibly be a linked list chained through
the 'refcount' fields; the old refcount will be in the recovery list)
- when the parser reaches toplevel without longjmp, clear the first list,
and delete any objects in the 'to be deleted' list.
- when a longjmp occurs, walk the list and reset reference counts. For
zero 'old refcounts', delete the object
Elegant? Not really, but with proper macros/functions, it should not be much less elegant than reference counting on its own.
That's pretty close to Deutsch-Bobrow deferred referenced counting. If you decide to implement it, you may want to read their paper about the technique.
I don't like C libraries that use longjmp for error handling. It's like killing a fly with a bazooka. There are some cases when longjmp is useful of course, but If you want proper exceptions, use a language that supports them.
I appreciate the OP's efforts, certainly sharing code is a good thing. And one of the benefits of the sharing is to get a lot of feedback. Criticizing implementation details is important to spot, discuss and hopefully fix potential problems.
But the only actionable advice you can take away from the criticism is "rewrite it in C++". I don't think that's in any way useful and I don't think it should be made.
Right - "Don't use longjmp, do X in C instead" would have been a useful comment (particularly with example code). "Don't use longjmp, use C++ instead" is much less useful.
I agree that my comment was a bit harsh, but that's not what I said. I said that if he wanted exceptions he shouldn't have used C.
As for the alternatives, well return values never killed anyone, did they? You can use gotos to simplify the control flow within the functions. Other than that, there really isn't much to explain.
I don't like longjmp because I never expect them in C. I never think "hey, the control flow might jump to some point 20 stack frames above at any moment when I enter this library". And then I use mutexes. Cue the drama. You can't even protect yourself from the stack unwinding with handler-case or try ... catch/except.
This. The headline was a parser combinator in C. The author is working with what he's got. Can't argue with the hangups that come with it, unless he makes some claim that his language choice is somehow superior to another, which I don't see happening here.
If I release code, having it criticized is a benefit. I want to learn from others, hear what they think of the design decisions, what they would have done differently, and what they think I did stupidly.
If you criticize my work, you're doing me a favor.
Basically, the general idea of parser combinators is that you want to be able to express your parsing grammar in the sort of high-level description that people actually talk in, not in the low-level description that you get when you are constantly looking at the element string[i] in some switch statement in some function. Here is a simple example:
parseAddress = let
hexStr2Int = Prelude.read . ("0x" ++)
in do
start <- many1 hexDigit
char '-'
end <- many1 hexDigit
return $ Address (hexStr2Int start) (hexStr2Int end)
In actual words, "here is the function for converting a hex string to an integer, now define the variable `start` as the parse of many (or one) hex digits, then there is a hyphen, then the variable `end` is another parse of many (or one) hex digits. Convert `start` and `end` into numbers and put them into an Address object."
What's probably the most interesting thing to speak of here is simply the fact that it's written in C, rather than in a "normal" functional language. You'll see in the latter presentation for example that there is a sort of "circuit wiring" approach which you can do with Lisp functions, essentially using function calls a(x) in order to describe a directed graph vertex from x to a.
I knew that you could pass pointers to functions in C, but I was not aware that it was sophisticated enough to build a parser combinator library, so I might brush up on my rusty C skills to see if I can understand the details of the implementation here.
parseAddress =
let hexStr2Int = Prelude.read . ("0x" ++)
in do start <- many1 hexDigit
char '-'
end <- many1 hexDigit
return $ Address (hexStr2Int start) (hexStr2Int end)
Or in applicative style (using attoparsec), which I like a lot more than the monadic style,
because you can read it quite literally from the left to the right:
It's not bad, but the "0x" is part of the conversion into an integer, not part of the parsed address. That is, the more-complicated-looking combinator converts the string "03-c0" to `Address 3 192`, while it looks like yours converts the string "0x03-0xc0" to to `Address "0x03" "0xc0"`. (I'm not totally sure about that; I'm still very new to Haskell. Thanks for the heads-up on attoparsec though.)
The (* >) operator [1] (the space in the middle is just to circumvent HN's rather dumb comment formatting) sequences its two arguments, ignoring the one on the left. So
string "0x" *> takeWhile1 hexadecimal
parses a literal string "0x", then one or more hexadecimal digits, and the result is just the hexadecimal digits.
The major parser libraries that I could find don't allow adding syntax at runtime or to add operators to the expression hierarchy at runtime. Many (though not all) of them also make it difficult to do custom error reporting. These were my major motivations.
Ah right, interesting use-case. I usually think of combinator parsing's advantages versus table-driven parsing (e.g. LALR) being primarily its expressiveness (and amenability to a functional-programming style), but it has a nice runtime-modifiable aspect as well.
I wonder how hard it'd be to build a runtime-modifiable table-driven parser, though. Assuming table compilation is fast enough, the most straightforward approach might be to just link in the grammar-compilation code so it's available at runtime, and rebuild the table each time a modification is made. I don't think you'd want to be doing complex incremental surgery on a table-driven parser, but you might not need to.
Not C .. but are you familiar with Ometa (and its variations, Ometa/JS, Ometa#, PyMeta, etc?) http://tinlizzie.org/ometa/
It's a turing-complete parser specification language with an interpreter that is very close to being a packrat parser, which embraces adding and extending syntax (in a more general way than your library does, if I understand correctly).
Implementing the same idea in C should be possible. I have no idea why you want to add syntax at runtime, but the Ometa model might fit your plans better.
nice, but would be better to have something simpler that doesn't rely on various libraries and features that are typically forbidden in high performance environments.
incidentally almost every memory leak I deal with comes from not being allowed to allocate my own memory and having to fiddle around with gc and refcount rubbish.
I have written whole non-trivial games with no detectable leaks.
Could you provide some more information about "features that are typically forbidden in high performance environments".
Probably you want a parser expression grammar with a packrat parser if you really want speed anyway.
At the moment I am not smart enough to see how to implement combinators in C in an elegant way without GC of some form. But I'll think about it. Maybe something will occur to me.