An AST being generated as an intermediate step is mentioned in the article, at least in passing in section 2.4.
The reason bytecode is generally faster (not just for SQL, but in most interpreted languages you may use (Python, etc)) is that walking the AST is relatively expensive and doesn't treat caches nicely. Bytecode is generally located next to each other in memory, and you can make the bytecode fetch/dispatch pretty tight, relative to indirect function calls on an object in an AST.
Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST, providing better instruction pipelining with having all the run code next to each other.
The downside -- especially for dynamic languages like JavaScript -- is that you need to keep all of the type checks and fast-paths in the code, resulting in larger code blocks. With more type analysis you could group fast-path instructions together (e.g. within a while or for loop) but that takes time, which is typically why a JIT engine uses multiple passes -- generate the slower machine code first, then improve the fast-path blocks for code that is long running.
> Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST
Huh? A bytecode interpreter still ends up branching based on the decoded value of the byte codes. The VM bytecodes are not directly executed by the CPU (at least not usually, Jazelle and some older P-code stuff being rare exceptions).
The parent mentioned avoiding the dispatch of a virtual function call used to evaluate the AST/bytecode so I was assuming it was a minimal JIT instead of running the resulting bytecode in an interprative loop.
But yes, if you are interpreting the intermediate VM bytecode you will still have a switch or dispatch branch when evaluating each instruction.
The reason bytecode is generally faster (not just for SQL, but in most interpreted languages you may use (Python, etc)) is that walking the AST is relatively expensive and doesn't treat caches nicely. Bytecode is generally located next to each other in memory, and you can make the bytecode fetch/dispatch pretty tight, relative to indirect function calls on an object in an AST.