fn is_token(c: u8) -> bool {
// roughly follows the order of ascii chars: "\"(),/:;<=>?@[\\]{} \t"
c < 128 && c > 32 && c != b'\t' && c != b'"' && c != b'(' && c != b')' &&
c != b',' && c != b'/' && !(c > 57 && c < 65) && !(c > 90 && c < 94) &&
c != b'{' && c != b'}'
}
I've not done much with Rust, but why is it stated: “LOOK && AT && ALL && THOSE && BRANCHES!”
I don't see branches, just the evaluation of a complex bool expression.
If there were lots of "ifs" and "elses" I would see branches.
Edit: yes, I know 9 is less than 32 and would've been caught by the condition before it; I was just "mentally compiling" the expression directly without thinking much about optimising it.
A better name for the short-circuiting && operator is "andthen". Since "false && X" is false for all X, && does not evaluate its right operand if the left operand is false; this translates into branches in the assembly code, as if you had written nested "if" statements.
I went down a similar route in a performance-sensitive lexer (started with naive code, spent some time fiddling with different ways to express it) and eventually used a lexer generator, which uses a combination of lookup tables and tests.
Depending on whether the sets line up, this can be done in 2(!) instructions totaling to 4 bytes, and around 25 clock cycles. Nowhere near as fast as the bit-table lookup approach, but possibly the smallest.
This strikes me as the kind of peephole optimization that LLVM would do anyway (similar to people manually converting `x / 2` to `x >> 1`, my nemesis). But maybe it's esoteric enough for them to not bother?
I know the (non-bitwise) boolean operators have shortcutting behavior, but if evaluating the second clause has no side effects, couldn't the compiler just cheat by evaluating both clauses unconditionally and then combining the two boolean results using the equivalent bitwise operation, thus avoiding a branch?