I was surprised to receive a reply from you, the author. Thank you :)
Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?
It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.
I am amazed that the author (D Richard Hipp) made an effort to find and respond to a tweet that was (1) not directed/"@tted" at him or (2) written in his native language of English (original tweet is in Japanese[1]).
Simple, I'm on Mastodon, Substack, Threads and Bluesky.
And they are all barely breathing.
The community density is low (because of the split), the discovery sucks (because of decentralization and no algo to suggest content), the culture is homogeneous (I encounter people mostly from one political side) and the publication of limited quality.
You can't start new communities from old users. Communities are built by the young, they are the ones with the creativity, motivation and free time to do it.
That's why twitter, made mostly by young people from 20 years ago, still got the inertia of it, are is more interesting and active.
And that's why tik tok and roblox are full of life. I'm sure the kids are also elsewhere, where we are not looking, creating the communities of tomorrow.
I’d argue with that bit about creativity and youth.
In sports, for instance, if you are talented and 15 there are pipelines into established and pro sports that make sense to follow. If you are 30 and missed your chance for that you can still be a pioneer in an emerging “extreme” sport.
A lot of young people seem to think creativity is 99% asking for permission and 1% “just do it”, it takes some experience before people realize it is really the other way around.
I've never seen more people use it, as in being actually active on it, and it pops up everywhere due to the community note memes. But yeah I really need to get around creating a Mastodon account since some very good posters moved there too.
I'm amazed people like yourself care so much about it to write a post like this. Doesn't your brain have something better to think about beyond artificial outrage?
Like most large social media sites, pre-Musk Twitter had some combination of algorithms and bureaucracy that act as a chaos monkey to shadowban and suspend accounts apparently at random, often with an inability to comprehend satire or under incoherent or undisclosed pretexts. The site's users and staff were predominantly left-leaning, so right-leaning accounts were disproportionately the victims of these false positives because of human bias in which posts are flagged and how they're subsequently moderated.
Musk bought the site under a claim of doing something about the bias after Twitter suspended the account of a right-leaning satire site. Many of the existing users were displeased by this, because they liked having a site whose moderation system was biased in their favor, so some of them left. Meanwhile new right-leaning accounts were created as you might expect. This affected the ideological balance of the site, so even though it's still somewhat left-leaning it's less than before. And now when the drunk chaos monkey suspends an account on the left, people are finding new satisfaction in their ability to blame Musk in particular.
Musk has personally bragged about banning a lot of left-leaning accounts and unbanning right-leaning. What is your evidence that the site still leans left?
Bans in either direction are only a small percentage of accounts. People with large followings have a large disincentive to leave (have to start over), and new users haven't had long to build followings, so most large accounts are the same ones they were before he bought the company.
Musk is not actually a right-wing figure -- he smokes pot, makes electric cars, isn't religious, etc. But he isn't a left-wing figure either, which confuses people who can't contemplate that the same person could simultaneously e.g. support gay marriage and think peculiar pronouns are silly.
The result is that he's more inclined to ban accounts he doesn't like, but what he doesn't like isn't inherently associated with any particular party. And if you look at the "left-leaning" accounts he's suspended, it's the likes of Aaron Rupar and Taylor Lorenz, who... well, here it is:
Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.
So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.
A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.
Part of the benefit of compiling bytecode (or anything) is specializing code to the context (types, values, etc) in which it appears. While I don't doubt your analysis, it could be the case that compiled C code in question is full of branches that can be folded away when specialized to the context of the query, such as the structure of the rows, the type and values of columns, etc.
Basically all of what you are saying about high-level bytecodes applies to dynamic languages, too. But they benefit highly from specializing each bytecode given static and dynamic context, and shortcutting dataflow through local variables.
There's usually a cost to shuttling data between bytecodes too. When two are fused together the second can lift the data from wherever the first wanted to leave it, as opposed to routing through a fixed location. Might be what you mean by shortcutting dataflow?
Also doing control flow in bytecode is usually slower than doing it in the native code.
I wonder if the context in which the instructions occur is sufficiently finite in sqlite for ahead of specialisation of the bytecode to be better. That is, the program you're operating on isn't known until JIT time, but the bytecode implementations are. SQL should correspond to an unusually specific set of operations relative to a general purpose language implementation.
The compiler will notice some values are constant when working with the bytecode. It can know ahead of time which arguments correspond to folding branches within the bytecode instructions and specialise correspondingly. If that works, you've emitted a sequence of calls into opcodes which are less branchy than they would otherwise be, at which point the opcode implementations start to look like basic blocks and a template JIT to machine code beckons.
> Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records
Emphasis added. This is because of SQLite3's varint encoding method for numbers. Performance-wise it was probably a mistake, though it's a form of compression, which might have paid off in terms of space.
(I seem to remember seeing something, possibly by you, about this before.)
I wonder if it would be possible to replace the varint encoding... Yes, it'd be a compatibility break in that older SQLite3s couldn't open newer DBs.
I never used it, but only read big thread about SQL vs RPG on Russian-speaking forum, and there was a ton of examples from person who works with IBM i platform in some big bank. Basic operations look like SQLite bytecodes: open table, move cursor after record with key value "X", get some fields, update some field, plus loops, if's, basic arithmetic, etc.
Well, having programmed RPG professionally, there is no question that a python programmer would clock any RPG programmer. RPG is a terrible way to program.
It is possible to construct a worst case scenario for bytecode execution, for example very complex expressions in the WHERE and/or SELECT clauses that compute values, and a query plan that performs a full table scan over a table with say 100 million rows that is cached in RAM (or maybe use generate_series, whatever works best).
Computing the expressions should dominate execution time, right?
Then, to compare against the best possible case, we can write a custom C program that uses sqlite internals to perform the same task (full scan of the table, extract values from row) and does not use the bytecode VM and computes the complex expressions in regular C code (e.g. a function that accepts floats and returns a float or whatever).
Then comparing the two implementations will tell us how much faster sqlite can be if it had a "perfect" JIT.
> A key point to keep in mind is that SQLite bytecodes tend to be very high-level
That's right. "Bytecode" is a spectrum. SQLite's bytecode is higher-level than e.g. WASM, JVM, Python, etc. (Notably, because the original source code is higher-level.)
A while back was musing if it was possible to come up with something resembling the instruction set for a CPU for an abstract relational-engine. Is this basically what SQLite is doing with bytecodes?
I think yes, essentially. Bytecode running on a software interpreter is the same thing as machine code running on a silicon interpreter. Probably slower, probably a different ISA, but very much the same idea.
The OP suggests the sqlite bytecode also has arithmetic & control flow in it which you might want to exclude in other settings. There's a parallel with cisc vs risc instruction sets here, and to calls into a compiler runtime vs expanding instructions into larger numbers of more primitive ones in place.
@rkrzr there's a circular definition in your "no" - if "CPU" means "an interpreter that executes instructions simpler than SQL" then this is indeed not an instruction set. If "CPU" means "interpreter of a given ISA" then it could be. The virtual machine in the sqlite codebase is the definition of the CPU for this instruction set. Unless you want to define CPU as exclusive of software implementations of interpreters.
> Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements
> compiling all the way down to machine code might provide a performance boost 3% or less
This logic doesn't seem sound? Because the application now spends 3% on bytecode dispatch doesn't tell us anything about how long it would instead spend on e.g. interpreting SQL.
Unfortunately you can’t do performance analysis this way but I think the overall point that it’s a fraction probably still stands as you’d expect the I/O work to dominate.
The reason you can’t do the analysis the way that you explicitly stated (which fairly is what was implied) is that when you lower the code to machine code you typically get rid of branches in the execution path. Since branches slow down execution by a disproportionate amount vs how long they themselves take, it’s easy to get a >3% boost getting rid of code paths that seem like there’s only 3% of room.
What the author also failed to mention is that they’ve gone on many optimization hunts eeking out less than a percent here or there to get a cumulative boost, so 3% isn’t anything to sneeze at.
That being said, the broader point which is valid is that the maintenance isn’t worth the performance profile that SQLite targets not to mention that JITs open up an attack vector for security exploits. So the cost vs benefit is squarely in the interpreted side for now.
Edit: Forgot to mention that performance tuning a JIT to work well is also hard - for small queries you’ll spend more time compiling the byte code than you would just executing. That’s why all the big JS engines do a tiered approach where each tier optimizes a bit more.
4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)
Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.
Isn't what you're describing just an interpreter, bytecode or otherwise? An interpreter walks through a data structure in order to determine which bits of precompiled code to execute in which order and with which arguments. In a bytecode interpreter the data structure is a sequential array, in a plain interpreter it's something more complicated like an AST.
Is this just the same as "interpret the query plan" or am I missing something?
It interprets where it needs to jump and then jumps to the precompiled binary machine code locations that are executed natively on the CPU, no interpretation going on (I guess it's conceptually something like instruction traces inside the CPU trace cache, but done at higher level). These precompiled building blocks are shipped in separate libraries, one library for each CPU architecture on a platform (SSE4.2, AVX, AVX2, AVX-521 on AMD64 and other SIMD architectures on other platforms that Oracle supports). Oracle dynamically loads the correct library matching the CPUs capability during the startup and starts jumping there as needed.
So this is not bytecode interpretation, it's native machine code executed directly on CPU whenever the SQL exec engine decides to jump there. This is different from, say Apache Impala that compiles new LLVM machine code for each query execution for the query's scanning tight loops.
Edit: I guess why I see this as not just not regular bytecode interpretation (I don't actually know that much about it in general), is that these building blocks can include various loops (and can do their thing on entire vectors of data), so looks like they can push quite a bit of complexity into the machine code sections, before returning back to the normal interpreted AST/opcode land.
It's called a template JIT. You get to remove the interpreter control flow overhead but tend to end up with a lot of register shuffling at the boundaries. Simpler than doing things properly, usually faster than a bytecode interpreter.
You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.
This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.
I'm probably wrong, but I always thought for platforms like .Net and JVM, AoT delivers the same performance as JIT in most cases. Since a lot of information is available before runtime, unlike JS where VM always needs to be read, ditch optimized byte code and go back to the whiteboard.
As a rule of thumb: AOT has shorter cold start time, but similar overall performance to JIT.
In fact, JIT can perform slightly better once the program reaches "steady state" because it can target the exact processor running and also perform the dynamic profile-guided optimization, which may or may not yield meaningful improvements in real life.
Nick Chapsas did some benchmarking of .NET AOT vs JIT, if you are interested:
There are a few concerns here that make it confusing for the general public to reason about AOT vs JIT performance.
Different JIT compilers have different levels of sophistication, being able to dynamically profile code at runtime or otherwise, and the range of applied optimizations of this type may also vary significantly.
Then, AOT imposes a different restrictions in the form of what types of modularity the language/runtime wants to support (Swift ABI vs no AOT ABI in .NET NativeAOT or GraalVM Native Image), what kind of operations modifying type system are allowed, if any, and how the binary is produced.
In more trivial implementations, where JIT does not perform any dynamic recompilation and profile based optimizations, the difference might come down to simply pre-JITing the code, with the same quality of codegen. Or the JIT may be sophisticated by AOT mode might still be a plain pre-JIT which makes dynamic profiling optimizations off-limits (which was historically the case for .NET's ReadyToRun and partially NGEN, IIRC JVM situation was similar up until GrallVM Native Image).
In more advanced implementations, there are many concerns which might be at odds with each other: JIT compilers have to maintain high throughput in order to be productive but may be able to instrument intermediate compilations to gather profile data, AOT compilers, in the absence of static profile data (the general case), have to make a lot more assumptions about the code, but can reason about compiled code statically, assuming such compilation comes with "frozen world" type of packaging (rather than delegating dynamic parts to emulation with interpreter).
And then there are smaller details - JIT may not be able to ever emit pure direct calls for user code where a jump is performed to an immediate encoded in the machine code, because JIT may have to retain the ability to patch and backpatch the callsites, should it need to de/reoptimize. Instead, the function addresses may be stored in the memory, and those locations are encoded instead where calls are emitted in the form of dereference of a function pointer from a static address and then a jump to it.
JIT may also be able to embed the values of static readonly fields as JIT constants in codegen, which .NET does aggressively, but is unable to pre-initialize such values by interpreting static initialization at compile time in such an aggressive way that AOT can (constexpr style).
So in general, a lot of it comes down to offering a different performance profile. A lot of the beliefs in AOT performance stem from the fact lower-level languages rely on it, and the compilers offering it are very advanced (GCC and Clang mainly), which can expend very long compilation times on hundreds of optimization passes JIT compilers cannot. But otherwise, JIT compilers can and do compete, just a lot of modern day advanced implementations are held back by the code that they are used for, in particular in Java land where OpenJDK is really, really good, but happens to be hampered by being targeted by Java and JVM bytecode abstraction, which is not as much of a limitation in C# that can trade blows with C++ and Rust quite confidently the moment the abstraction types match (when all use templates/struct generics for example).
More on AOT and JIT optimizations (in the case of .NET):
You can also parse the code into an augmented syntax tree or code DOM and then directly interpret that. This approach eliminates the intermediate code and bytecode machine at the cost of slower interpretation. It's slower due to memory cache and branch prediction issues rather than algorithmic ones.
For simple functions which are not called repeatedly you have to invert this list - interpreted code is fastest and compiling the code is slowest.
The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.
I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?
There are situations where the analyst enters SQL into the computer interactively. But in those cases the overhead of compiling does not really matter since this is infrequently done, and there is only a single user asking for the operation of running the SQL.
> I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?
You would hope, but no that's not reality. In reality most of the people "writing SQL" in a business scenario don't even know SQL, they're using an ORM like Hibernate, SQLAlchemy, or similar which is dynamically generating queries that likely subtly change every time the application on top changes.
I didn't mean to imply that compilation never makes sense, I'm only pointing out that it isn't always the fastest way of executing a query
In situations where you have a limited number of queries and these are used repeatedly then some kind of cache for the complied code will be able to amortise the cost of compilation.
I certainly agree that a situation where an analyst enters SQL manually is a weird niche and not common. However most applications can dynamically construct queries on behalf of users and so getting a good hit rate on your cache of precompiled queries isn't a certainty.
As far as I know, database often does query caching. So it doesn't have to recompile the query all the time.
On the other hand, query plan may depend on the specific parameters - the engine may produce a different plan for a query for users with deleted=true (index scan) and for deleted=false (99% of users are not deleted, not worth it to use index).
APL interpreters are tree-walking, but they're backed by C procedures. Then GCC optimizes quite well and you get excellent performance! With stuff like vector instructions.
Getting on par with GCC/Clang with your own JIT is pretty hefty.
optimizing runtimes (usually bytecode) can beat statically compiled code, because they can profile the code over time, like the JVM.
... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.
I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).
The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it, as far as I know. Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.
I believe Microsoft SQL Server uses an object tree internally, and yet its query plan output is a table:
It can also execute a query incrementally. In fact, certain features rely on this behavior. “Watch live data” (in SSMS) for extended events is actually an infinite table-valued function. It’s not exactly rocket science to do in a database, you just need to model data sources and operators as iterators.
Same with PostgreSQL. You can choose between text output (one table row per line), XML or JSON. But none of them are ordered according to a strict execution order like bytecode would be. To me that makes perfect sense though because if represents what can be parallelized in the plan. The bytecode representation in SQLite seems to have the inherent limitation that it is single threaded.
I love how the example SQL query uses abbreviated columns that sound like runes:
select
-- count(*)
a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
-- *
from BSEG as a
innerjoin ACDOCA as b
on b.rclnt = a.mandt
and b.rbukrs = a.bukrs
and b.gjahr = a.gjahr
and b.belnr = a.belnr
where a.mandt = '715'
and b.rldnr = '0L'
and b.docln = '000001'
and a.buzei = '001'
and a.gjahr = '2018'
--and a.gjahr = '2017'
--and a.gjahr = '2019'
--order by a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
limit 200;
Just realised this is probably from German too, "jahr" being year.
You get XML for estimated execution plans if you use SET SHOWPLAN_XML ON. You get the actual execution plan XML along with your results if you use SET STATISTICS XML. You can see cached execution plans in the sys.dm_exec_query_plan dynamic management view.
The estimated/actual execution plan feature in SQL Server Management Studio was changed to use XML in SQL Server 2005. The SQL Server team are only adding new information to the XML representation, not the text representation.
The name "estimated" execution plan is a bit misleading. If you executed the statement(s) with the current indexes and statistics, that's how it would be executed. I think the "estimated" part of the name is because the number of rows returned from each node, and the execution time of each node, is an estimate.
Assuming that the indexes and statistics don't change, you should expect the "actual" execution plan to have the same shape. It will just be annotated with the number of rows that were actually retrieved as well as the estimated numbers. SQL Server doesn't re-plan if it turns out the number of rows was greater (or fewer) than expected.
As a SQLite and SQL Server user, I find that I would like something more detailed from SQLite than EXPLAIN QUERY PLAN (which just tells you the join order), but less detailed than the bytecode. Particularly I want to know why it chose that join order or to use that index.
I havent used SQL server much since the 2008 days but it used to happen quite often that the estimated and actual execution plan differed a lot. I think the estimated part also includes how many rows get returned from a subquery and that informs which joining algorithm is used.
I second your longing for something more detailed than explain query plan in Sqlite though.
In fact I complained about it here and some sqlite developer pointed me to these resources:
I don’t doubt the author, but what is it that makes rendering a tree of objects to a table a difficult problem? Is that not what browsers do when they render a table element?
"A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it"
To further elaborate on this important point.
There is an 'impedance mismatch' (conceptual difficulty mapping between the two logic models) between the tree abstract data type and the table abstract data type. Specifically, there are four key differences between the simple table data structure and the more complex tree data structure that makes mapping between them a non-trivial operation.
Hierarchical: A table has a flat representation; a tree has a hierarchical
representation.
Order: The order of the rows in a table typically do not matter (they may have a unique rowid). The order of branches (nodes) and leaves in a tree is important, and the ordering itself in an encoding of valuable information.
Semi-structured: A table has a fixed structure (rows multiplied by columns). A
tree has a flexible structure - an arbitrary combination of branches (internal nodes) and leaves (terminal nodes). Semi-structured data has a structure that may not necessarily be known in advance, the tree has irregular and variable formation; the tree may have branches with missing or supplementary nodes.
Meta-data: The information describing the meaning of the data in a table
is typically stored separately from the table - consequently a schema is often mandatory. A schema is optional for a tree abstract data type.
As an aside, I have been visiting hacker news almost daily since 2010. This is my first comment on hacker news. I want to say thank you to the community for the many valuable insights over this years.
What I want to do is represent my tree-structure as a table in a relational database and then be able to efficiently get the tree structure back by transforming the table-representation back into a tree.
Further I would like to do that in plain standard SQL. This must be a common problem, any documented solutions out there?
Great question - you have touched on the key difference between a labeling scheme and an encoding scheme for tree data structures.
As mentioned previously, the tree is an abstract data type, that is to say, a conceptual model that defines the nodes in a tree, and their relationships.
To be able to evaluate a expression that processes a tree, one needs a labeling scheme. The purpose of a labeling scheme is to assign unique labels to each node in the tree and these labels must facilitate node ordering, and often (but not always) a labeling scheme will permit the reconstruction of the tree structure.
However, no labeling scheme captures the node type, names or the content stored at the nodes. For that we need an encoding scheme. An encoding scheme is constructed upon a labeling scheme and augments it with the information necessary to fully represent the tree in a table-like data structure. In answer to your question, it also permits the full transformation from the table representation to the original tree structure.
Thus, it sounds like what you are looking for is an encoding scheme.
There are many different labeling schemes out there for tree structure data, and virtually all of them can be augmented with additional information to construct a complete encoding scheme. Concerning documented solutions - I have not been active in this space for a number of years, so off the bat - I don't have a recommended documented solution to point you too.
But to help, I will put a link to my PhD thesis [1] which gives a more in-depth understanding of labeling schemes and encoding schemes for tree structured data with an example of a simple implementation of an encoding scheme enabling the full transformation from the table representation to the original tree structure (pages 5-9) and a survey of the advantages and disadvantages of existing labeling schemes concerning their usefulness to be part of an encoding scheme you could use in your solution (see chapter 2)
Caveat 1: My thesis was written in the context of updating dynamic (XML) trees but it addresses the transformation between tree and table data structures.
Caveat 2: The thesis was written 11 years ago, but every now and then I have kept in touch with the latest developments in the area, and to my knowledge, there have been no major developments since.
I'm wondering if one could write this bytecode directly (or with a higher level imperative language) instead of SQL. Often, the programmer knows exactly which index lookups need to happen in a loop, while it seems like a burden to express that in SQL. This might also be an opportunity to create a different type safe dsl for database access.
I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion. Despite the fact that most of the tables most people have are never any big (except for a few oltps that you ought to leave to specific sql server wizards anyway) and you usually do have the idea how it should work. SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have. And most of our complex queries are trivially expressible in imperative loops. Index lookup and a complex plan building are key features of an rdbms, everything else it steals from you and forces to use an arcane inconvenient language that in practice isn’t even a standard. Make plan building and indexing core ops and leave everything else to some sort of a general purpose bytecode, everyone will be happier and will create dozens of DSLs for all their needs.
> I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion.
Which is simply not true. Query optimization is done heuristically, for the simple reason that you usually need to run the query to get the information required for "perfect" optimization. If the RDBMS really knew better, it wouldn't offer query hints.
I'm wondering the opposite - could RDBMS know better than programmer how to structure the program? The other day I had this realization, that if I squint, quite a lot of code I see and write could be seen as database tables, prepared statements, and specialized indexes. In particular, every time I use an associative table (or several) to speed up access to data along particular dimension (like X/Y coordinates in the world of a videogame), that's equivalent to making an index in SQL.
>SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have.
My usual problems are:
1) Running queries manually, where SQL is much more friendly than writing bytecode by hand.
2) Running queries using a clueless ORM that has no idea how to optimize them, so leaving this job to the database makes sense.
I believe the actually rare problem is "writing hyper-optimized complex queries tailored to the current state of the database", where bytecode would help. But that is a very unusual usecase.
Maybe there are shops that use databases very differently from me, but it makes sense that SQL is optimized for the common use case.
And since db internals are very different, it's hard to make a single bytecode standard that makes sense to use across different databases. You can probably write something specialized for sqlite or postgres, but since nobody did, probably it's not as useful for other people too.
hyper-optimized complex queries tailored to the current state of the database", where bytecode would help
Optimization isn’t the goal here, this quote misrepresents the idea. In the spirit of dismissal through all these years, btw, some people just don‘t get it and think it’s for better performance. It is, but for better performance of a developer, not that of a database. The goal is to have a set of languages above that bytecode that would help with usual programming bookkeeping and with expressing queries in an algorithmic way because that’s how you designed it. SQL lacks DX ergonomics, it’s just a language from the clueless epoch.
I was wondering the same thing. And in particular if a new query language that avoided many of the pitfalls of SQL could compile down to that bytecode and avoid having to deal with SQL as an intermediate representation. Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
I think we already kind of have that already; one can prepare a query/statement and then use it multiple times. Regex engines also do that, except that in the case of SQlite one can bind different parameter values.
Programmers worried about SQL lexing/parsing times can compile their most used queries once for all at programmer startup. Plus, one can sometimes break a query with annoyingly variable parts into smaller queries glued with SQLite API calls.
> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.
That's exactly how those "prepared statements" work in SQLite - you get a handle to a piece of bytecode, and you can call it with different parameters.
You can do an automatic upgrade thing, where you recognise old formats and upgrade them on the fly. Possibly with a window on it where at sufficient age people need to run their elderly code through multiple upgrade cycles.
Or you can declare it an unstable interface and it's on the user to deal with it changing over time.
Or you can just leave it accessible without excessive work, but not document it, and then it's very obviously on the external tool to deal with the impedance matching.
Something akin to this is available[1] in DuckDB, itself started as a "shell" over SQLite. Using Python, you can compose your SQL plans as you like.
I recall a previous discussion about the SQLite bytecode and potential alternatives, where the main idea was that, if SQLite had gone the other way, you could have a much more general engine, where you could achieve something like DuckDB itself just as a set of extensions.
Reading the draft, it doesn't seem that extensibility of SQLite was a main factor in deciding. Maybe this trade-off also covers extensions somehow, which would be nice to see in the final document.
1) I think a simple new kind of query hint could go a long way: Declare a table as being "infinitely large" from the perspective of the query planner.
So, if a query caused a full table scan or similar against the table, it's an error, regardless of the actual content.
A lot of "typical" backend code would then simply require you to make the right indices you need to support your queries, and the query planner would be forced to use those indices in all situations.
2) You can already sort of do what you say; that "higher level imperative language" is available simply by breaking your query up into smaller chunks.
At least in MS-SQL (which is what I know), simply do
declare @tmp1 as table (...)
insert into @tmp1 ...
declare @tmp2 as table (...)
insert into @tmp2 ... from @tmp1 join ...
and so on; if you make each of those steps small enough you pretty much have the imperative language.
3) That said, I think SQL is a wonderful idea, I am also super-productive in implementing business logic in it; but it is a horrible language; the learning curve is so much higher than it needs to be and so many silly pitfalls. So fully support pushes for better languages.
Definitely yes. Databases and compilers have a lot in common. This bytecode is exactly equivalent to a compiler IR. It's called out as such in the OP where the decoupling between front end and backend is noted. Therefore you can construct the bytecode from outside of the database and feed it to the backend (may require patching sqlite to make the entry points accessible).
By analogy, this is really easy in LLVM. You can write the IR you want in a text file and feed it back into the infrastructure. This is used a lot in testing and debugging LLVM. It's much harder in GCC for reasons that never made much sense to me.
There's probably a parser for the format of the EXPLAIN text dump in sqlite somewhere intended for modifying the debug output then feeding it back into the engine. Maybe not documented, but I expect it works fine in practice.
You should watch this asianometry video on the birth of SQL, very interesting, and in fact a functional approach, based on relational algebra and tuple relational calculus, was originally what the father of the concept intended for interacting with the db. Other IBM engineers formulated the SQL language over that.
The precursors to sql rdbms, and the war over competing concepts, were also quite thought provoking.
I think most people associate bytecode VMs / interpreters with general-purpose programming languages, but it's a surprisingly useful concept in other contexts.
Sometimes bytecode VMs appear in unexpected places! A few that I'm aware of:
- eBPF, an extension mechanism in the Linux kernel
- DWARF expression language, with interpreters in debuggers like GDB and LLDB
- The RAR file format includes a bytecode encoding for custom data transformation
Bytecode is great for tiny domain languages and I use them in many projects.
Ultima IV used one to animate sprites on it's title screen map. For the next version of XU4 I implemented three bytecode interpreters to script the entire title sequence. There's a high level presentation interpreter, a GPU rendering interpreter, and one for the Ultima IV bytecode.
The original TeX Fonts stored their metrics in TFM (short for TeX font metrics) files, which contains a bytecode interpreter for calculating ligatures and kerning between characters. I learned about that when I tried reading the files myself.
From what I can tell, modern fonts using OpenType just have tables to accomplish something similar now, in the form of the GSUB and GPOS tables?
Oh yes, thank you for the link to that! Looks like that is an instruction set for representing the font glyphs themselves? I was talking about the instruction set in TFM which is for representing meta information, like ligatures and kerning between glyphs, and not for the actual glyphs. The glpyhs for the original TeX fonts are described using Metafont which is an interpreted language.
Glyphs are generally defined as pure outlines (the "glyf" table [1]), and the instruction set is an optional system for things like grid fitting. Ligatures, kerning etc. are normal tables.
That’s awesome, I didn’t know SQLite had a bytecode.
Based on my experience writing interpreters, I know that bytecode is faster to interpret.
(Source: I wrote JavaScriptCore’s interpreter and it’s optimizing JITs. I’ve also worked on other language implementations. I’ve seen the AST vs bytecode thing play out more than once.)
Maybe I'm missing something, but the bytecode approach seems really obviously better, just from a memory usage and locality point of view. Scanning an array of bytes or words is obviously going to be faster than traversing a tree of pointers.
So I'd be surprised to find a serious language implementation using a pure AST in its interpreter, without at the very least flattening it to an array!
Edit to add: of course there are some cases where you might actually want the AST approach, as the Sqlite doc is careful to point out. But not very many.
It’s fine to use an AST interpreter if you pair it with really great JITs.
That has its own problems though - using an AST as a shared source of truth for your JIT compilers is cumbersome for implementing stuff like OSR. But from a perf standpoint, it’s probably fine.
Also - say you want to optimize just for startup time, like if the code you’re running has minimal looping or reexecution. Then AST is better because you skip a step before first execution.
SQLite's design docs were the first time I had seen a database use a virtual machine instead of walking a tree. I later noticed VMs in libraries, embedded DSLs, and other applications outside of large general-purpose programming languages. That really drove home for me that VMs could be anywhere and were often a useful step in handling a user's expressions.
Stack-based VMs, like SQLite's (I think), ARE trees. A stack based VM's bytecode (without DUP and POP) is just the post-order depth-first traversal of the corresponding expression tree. With DUP you have a connected acyclic DAG. With POP you have an acyclic DAG with one or more components. With loops you have a full graph.
When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.
Either way, try taking something like the following and printing an expression tree like the one below
LITERAL 2
LITERAL 3
LITERAL 4
MUL
PLUS
now, write something to produce on a terminal, the following:
Plus
Mul
3
4
2
You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.
Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.
This was the question I was going to ask! I've recently been diving into Lua 5.x internals and have been extremely impressed with Lua's register-based bytecode implementation. Lua has a stable byte code interface between patch releases (5.4.0 -> 5.4.1, etc) but not between major/minor revisions (5.3 -> 5.4). SQLite on the other hand does not consider this a public interface at all!
> "Remember: The VDBE opcodes are not part of the interface definition for SQLite. The number of opcodes and their names and meanings change from one release of SQLite to the next."
https://www.sqlite.org/opcode.html#the_opcodes
For anyone interested in understanding how a register-based VM operates I highly recommend:
Not a public one, because they want to have the freedom to change how it works. But if you type in `EXPLAIN CREATE TABLE x(y);` or `EXPLAIN SELECT 3;` or whatever into `sqlite3`, you can see it. And it's reasonably straightforward to dig into the source code (SQLite is very well-written C) to hook that up yourself if you really wanted to.
> Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
WITH RECURSIVE can also generally be implemented tree-style, you just loop a bunch in one of the iterators.
There are some databases, like Umbra, which can have DAG query plans for the benefit of more efficient groupjoin (GROUP BY pushed down into a join). IIRC some of the unnesting patterns can also benefit from non-tree plans.
VMs really can be. They have a long history in code portability. In the old days it really wasn't uncommon to use an approach centered around some interpretable byte code running in a vm, where the reusable vm was all that needed porting to different architectures and operating systems. This all happened well before Java.
It was really big in gaming, Zork, Sierra games, LucasArts games, and even a few more "action" games like Flashback were all designed around VMs to make porting somewhat sane. Running the byte code is usually not the performance bottleneck in these cases but drawing stuff to the screen is, and the VM had to handle that.
It used an almost memory safe systems language, ESPOL, zero Assembly, all CPU capabilities are exposed via intrisics, one of the first recoded uses of unsafe code blocks, there was tagging and capabilities support, the CPUs were microcoded. All of this in 1961, a decade before C came to be.
ESPOL was quickly replaced by NEWP, although there are very little data when it happened, probly a couple of years later.
Nowadays it is still sold by Unisys under the guise of being a mainframe system for those that value security above all, as ClearPath MCP, and you can get NEWP manual.
the b5000 was one of the first non-virtual stack machines, but its instruction set isn't any more of a virtual machine than the 8088's or the pdp-10's. there were a number of interpreted stack languages in the 50s, though not nearly as many as there would be later
When one does digital archeology is it quite common to see Assembly referred to as bytecode, when the CPUs are actually interpreters written in microcode.
Another example, all the Xerox PARC workstations, which loaded the respective interpreter (Smalltalk, Interlisp, Mesa, Mesa/Cedar) into the CPU as first step during the boot process.
According to 5s on Google, the x86 has always been microcoded. I guess the argument is that “machine language” is the public API of the CPU, and “assembly language” is the higher level script used to, mostly, directly create machine language.
Is Intel microcode able to be changed by anyone? I understand that even CPUs get field updates nowadays.
Microcode updates are encrypted, and only Intel has the key. There are exploits to extract the key, but otherwise you're pretty much locked out.
I don't know a whole lot about microarchitecture, but from my understanding, it's not possible to modify microcode's actual uOps in software, it's the translation from assembly to microcode that's being patched in updates.
i have never seen assembly, or the code generated from it, referred to as 'bytecode' unless there was no physical hardware or microcode that could interpret it — except in the case of smalltalk, whose bytecode was, as you say, interpreted by microcode on the alto and the d-machines. but possibly your archeology has delved into cultures mine has not? i'd be very interested in reading more, if you happen to have any links handy
Wozniak found that 16-bit arithmetic was much easier to perform on a 16-bit machine. So he wrote SWEET16, a VM with its own bytecode, to work with 16-bit pointers in Apple's Integer BASIC.
Similarly, the TI-99/4 series of computers had a cumbersome way of accessing memory, because most of the RAM in the machine was exclusively controlled by the video chip and could only be accessed by poking a few registers. So the designers implemented a bytecode VM in ROM called GPL (Graphics Programming Language) that would make accesses to this video RAM seem like straightforward RAM accesses. TI encouraged the use of GPL to develop games and other applications for the system. Unfortunately, the fact that it was interpreted by the CPU, plus the fact that the CPU could only access video-chip memory during the horizontal and vertical retrace intervals, made GPL execution very slow. And since the machine's in-ROM BASIC interpreter was written in GPL, you now know why BASIC programs on the TI-99/4 and /4A crept along at the most leisurely of paces.
So VMs were definitely a thing used to make certain kinds of tradeoffs, even way back in the 8-bit days when systems had mere kilobytes of memory. Who knows how many might be lurking in a modern system!
VMs are a persistently underrated tool by people who think the conversation begins and ends at VirtualBox. They are a phenomenal way to constrain the surface area of what one's own code has to target, and that's always a plus.
Doesn't MS Word internally run a Forth-like VM? I remember reading an article by someone who decompiled an early MS-DOS version of Word only to discover that there was a VM inside.
"Originally code-named “EP” (for “Electronic Paper”), Multiplan was written to use a very clever p-code compiler system created by Richard Brodie. This p-code system, which was also used later by Microsoft Word, allowed Microsoft to target Multiplan to a huge number of different 8-bit and 16-bit computer systems. (Charles Simonyi once estimated that Multiplan was available for 100 platforms.)"
"Microsoft has introduced a code compression technology in its C/C++ Development System for Windows version 7.0 (C/C++ 7.0) called p-code (short for packed code) that provides programmers with a flexible and easy-to-implement solution for minimizing an application's memory requirements. In most cases, p-code can reduce the size of an executable file by about 40 percent. For example, the Windows Project Manager version 1.0 (resource files not included) shrinks from 932K (C/C ++ 7.0 with size optimizations turned on) to 556K when p-code is employed."
"Until now, p-code has been a proprietary technology developed by the applications group at Microsoft and used on a variety of internal projects. The retail releases of Microsoft Excel, Word, PowerPoint�, and other applications employ this technology to provide extensive breadth of functionality without consuming inordinate amounts of memory."
Or indeed any time complexity need to be layered. A VM design doesn't even have to have an explicit bytecode compilation stage -- you can write an interpreter that runs as the instructions are issued.
MonetDB is another database that uses a VM [1], using an instruction set called the MonetDB Assembly Language (MAL). I believe it pioneered the technique [2]. The VM is basically designed to express the query as vectorized functions on columns.
> Tree-Of-Objects → The input SQL is translated in a tree of objects that represent the processing to be done. The SQL is executed by walking this tree. This is the technique used by MySQL and PostgreSQL.
Note that this only holds true for fairly recent MySQL (MySQL 8.x, not including the oldest 8.0.x releases). 5.7 and older, and by extension MariaDB, instead models pretty much everything as a large fixed-function recursive function that calls itself for each new table in the join, plus some function pointers for handling of GROUP BY and such.
TBH, when it comes to query execution (A JOIN B JOIN C GROUP BY …), I think the difference between SQLite's bytecode and a tree of iterators (the classical Volcano executor) is fairly small in practice; they are quite interchangeable in terms of what they can do, and similar when it comes to performance. The difference between bytecode and tree structure is much larger when it comes to evaluation of individual expressions (a + b * cos(c)), especially since that involves much more of a type system; that is more obviously in the “bytecode is the way to go” camp to me.
Yes, postgres for example. It maps pretty close to what you see from `explain` where the ops are pretty high level, reducing interpreter overhead. JIT's big gain is speeding up reading values out of row data
I recently implemented my own expression evaluator in java for in-memory data frames, and once you think about doing that deeply, you very quickly understand the need for bytecode. If you directly evaluate the expression using a tree representation, you basically have to do a whole lot of branching (either via switch statements or polymorphism) for every single line of useful operation. Yes, the branch predictor kicks in and it means that your code wouldn’t be as slow as if it didn’t, but it is still measurably slower than if you converted the expression into bytecode once and just ran that on all rows instead.
Bytecode will only impact packing, so more efficient ram, cache and cpu wise. But I don't understand how it would help with branching? You still have to make the same decisions? As in the bytecode executor still needs to do differnt things based on the op code, its not in hardware.
Consider evaluating an expression like "col1 + col2 == col3"
A tree based evaluator has to do something like -
for (int i = 0; i < df.size(); i++) {
// switch on expression type (column, binary operator, unary operator etc)
// switch on expression operator type (add, subtract, equals etc)
// switch on column type (int, long, float etc)
// actual evaluation
}
whereas a bytecode based evaluator is able to run something equivalent to -
for (int i = 0; i < df.size(); i++) {
result[i] = df.getLong(column1Index, i) + df.getLong(column2Index, i)) == df.getLong(column3Index, i)
}
You still have to branch on opcode selection which you're omitting in your translation. Branching on "expression type" and column type in your first example is also unnecessary. Bytecodes have different opcodes to operate on different types and different arities, so in both cases you have only one unpredictable branch for each operation/opcode.
The main benefit of bytecodes is then cache friendliness, by removing all of those unpredictable loads and stores to obtain the expression details (opcode, arguments, etc.). All of those are inlined into the bytecode array which is already in cache.
It is unnecessary, only branching on expression operator type is strictly necessary. You're trying to abstract too much in your AST which yields an inefficient interpreter. I can invent an inefficient bytecode too, but that wouldn't be a correct evaluation of how fast bytecode interpretation could be when done right.
There is threaded bytecode as well, which uses direct jumping vs a switch for dispatch. This can improve branch prediction, though it is a debated topic and may not offer much improvement for modern processors.
Do you have perhaps some links/references on that?
I have once tried benchmarking it by writing a tiny VM interpreter and a corresponding threaded one with direct jumps in Zig (which can force inline a call, so I could do efficient direct jumps) and I have - to me surprisingly- found that the naive while-switch loop was faster, even though the resulting assembly of the second approach seemed right.
I wasn’t sure if I saw it only due to my tiny language and dumb example program, or if it’s something deeper. E.g. the JVM does use direct threaded code for their interpreter.
The jump target is compiled into the bytecode, so rather than return to the big switch statement, it jumps straight to the next opcode's implementation. The process is called "direct threading". These days a decent switch-based interpreter should fit in cache, so I'm not sure direct threading is much of a win anymore.
Running bytecode is much lower latency than compiling into native code. If you're not bottlenecked by running the bytecode (as opposed to memory or disk speed), you don't really have to JIT it any further into native code.
Which is why JavaScript engines (and JIT compilers for other languages) are typically designed to:
1. start interpreting the code once it has been parsed, and start jitting a function being called;
2. generate naive bytecode for the function that generates native code equivalents of the run actions of the AST (including some fast-path code for simple cases such as adding two 32-bit integers, and falling back to function calls to perform the add on more complex types);
3. generate more optimal code for sequences of instructions in the background, such as entire for/while loops, then patch in calls to those fast-path versions when ready.
That way you can start running the code immediately after parsing it, and can switch to the faster versions when they are ready if the code takes longer to run in the slower version.
Depends how much compilation you want to do. For example converting threaded code into native direct jumps need not be more expensive than plain bytecode. Of course the gains are also limited.
Yeah, but nobody is seriously considering that unless maybe for huge prepared statements. The argument is usually bytecode vs parser and associated data structures.
I don't have personal experience on it, but I've read that in practice it's not worth the effort—at least not yet. Apparently there are some issues with it and it barely speeds up queries (except perhaps certain ones?). I imagine this could be in big part because LLVM is not really a good fit for JIT where you want to spend very little time to do the compilation itself.
Yeah my experience of the pg jit is mostly negative, it’s quite slow and it has a hard time estimating the cost of interpreted v compilation + compiled execution so more often than not it’s actively detrimental. It might fare better if you make heavy use of a limited number of prepared statements.
I was surprised the text didn’t mention one major difference between the byte code approach vs AST: coupling.
When your database engine runs in-process, there is no possibility of the server and the client library having diverging versions. But this is common with traditional databases.
Once you bake in the execution steps („how to execute“) instead of describing the query via AST („what to execute“), an important part of the execution logic now lives in the driver.
I suspect this complicates version upgrades and bugfixes, because the database is less self-contained.
Not an issue for sqlite, potentially disastrous for mysql.
Do clients typically communicate with the server in some AST representation instead of, well, SQL? I'd be surprised if that much parsing/planning happens on the client.
Converting a SELECT to a PRPEPARE does not really require parsing the complete query—or even it it did, some small concessions for this could be implemented in the line protocol to enable the server to do the prepared statement out of client query.
> an important part of the execution logic now lives in the driver
> potentially disastrous for mysql
This is completely incorrect. The MySQL client side isn't responsible for execution steps and this isn't part of its wire protocol at all.
With prepared statements, the server parses the statement and gives the client back a handle, which is basically just a unique identifier that the client can use to invoke the prepared statement.
Perhaps my understanding is off, but I am pretty sure parsing and translating SQL into bytecode still involves an AST.
Just that query processing itself is done from the bytecode (produced presumably from an AST or something similar) rather than directly from the AST itself.
If I'm right I can't really see how this performs better unless you're excluding the parsing step from benchmarks
An AST being generated as an intermediate step is mentioned in the article, at least in passing in section 2.4.
The reason bytecode is generally faster (not just for SQL, but in most interpreted languages you may use (Python, etc)) is that walking the AST is relatively expensive and doesn't treat caches nicely. Bytecode is generally located next to each other in memory, and you can make the bytecode fetch/dispatch pretty tight, relative to indirect function calls on an object in an AST.
Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST, providing better instruction pipelining with having all the run code next to each other.
The downside -- especially for dynamic languages like JavaScript -- is that you need to keep all of the type checks and fast-paths in the code, resulting in larger code blocks. With more type analysis you could group fast-path instructions together (e.g. within a while or for loop) but that takes time, which is typically why a JIT engine uses multiple passes -- generate the slower machine code first, then improve the fast-path blocks for code that is long running.
> Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST
Huh? A bytecode interpreter still ends up branching based on the decoded value of the byte codes. The VM bytecodes are not directly executed by the CPU (at least not usually, Jazelle and some older P-code stuff being rare exceptions).
The parent mentioned avoiding the dispatch of a virtual function call used to evaluate the AST/bytecode so I was assuming it was a minimal JIT instead of running the resulting bytecode in an interprative loop.
But yes, if you are interpreting the intermediate VM bytecode you will still have a switch or dispatch branch when evaluating each instruction.
You don't need one, Lua is another example where no AST is ever generated. In some sense the resulting bytecode closely corresponds to the AST that would have been generated though.
Have you used any parser generator like yacc/bison? They have "actions", which are arbitrary codes that will be executed when some grammar production is detected. For example, `expr ::= expr mulop expr { some code }` will execute `some code` when a multiplicative expression is detected, where intermediate results from two `expr`s and `mulop` are available to that code. This concept of actions generally applies to all sort of parsers, not just generated parsers.
Those actions would typically allocate and build (partial) ASTs, but you can do anything with them. You can for example directly evaluate the subexpression if your grammar is simple enough. Likewise bytecodes can be generated on the fly; the only concern here is a backward reference, which has to be patched after the whole expression block gets generated, but otherwise you don't have to build any tree-like structure here. (Most practical parsers only need a stack to function.)
Consider that if the bytecode is just a flattened representation, then instead of generating and traversing the tree, you can just traverse what would have been turned into the tree in the same traversal order directly.
E.g. imagine you're in a leaf production for some simple grammar for an operator that can only take integers, w/all error handling ignored:
parse_fooexpr() {
left = parse_int
emit_int(left)
expect_token(FOO_OP)
right = parse_int
emit_int(right)
emit_foo_op()
}
Whether emit_int outputs a "push <someint>" VM instruction to a bytecode buffer, or constructs an IntNode and pushes it to an argument stack, and whether "emit_foo_op" emits a "foo_op" bytecode instruction or pops the argument stack and constructs a FooOpNode(...argument stack values) AST node is an implementation detail in terms of the parser.
Wirth's compilers usually didn't use an AST, and his book on compiler construction contains a thorough walkthrough on generating code as you parse, if you want to see it explained in a lot more detail and w/tricks to generate better code than the most naive approaches will.
The difficulty with this approach comes when you want to apply optimisations or where it's inconvenient to generate the code in parse-order because the language tries to smart about what feels better to write, but you can get pretty far with this method.
Another way to think about this is to imagine you are working in a lazy-by-default language like Haskell. The code might seem to build up an AST and then evaluate it, but (disregarding parse errors) the AST is never materialized in memory at once: the tree might only have one level, with its children represented by thunks that continue to parse more of the source code. (If you are unfamiliar with Haskell laziness, imagine that the so-called tree has children that are functions to produce the next level of the tree.) Of course you will need a carefully designed bytecode in order to generate bytecode from AST where the children is not yet known.
The second half of Crafting Interpreters [0] does this—rather than outputting the AST, the bytecode interpreter outputs bytecode directly. Essentially, the process of creating the AST is already very similar to the tree traversal you'll do to unpack it (recursive descent parsing is a traversal of the tree, just in the call stack rather than over a real data structure), so if you don't need the AST you can skip it and just start outputting bytecode as if you were already walking the tree.
Chapter 2 and 3, expression parsing, should give you the idea. Instead of building the AST, you build code. So you're cutting out the middle man (AST).
It's as if they generated an AST and then immediately converted it to bytecode, all at once.
The key restriction is that you must be able to generate the bytecode in a single pass over the source code. For example, the compilation of a function at the top of the file must not depend on declarations further down the file.
Depending on the form the dependencies takes you can handle that too. E.g. if it's "just" about knowing addresses, then just building up a list of relocation records where you need to patch in addresses as they become known works just fine. Of course the more work like that you have to do, the less you save over just building an AST anyway.
Parse Tree and AST are slightly different. A parse tree can be implict. A recursive descent parser can can explore the parse tree without there being an explict parse tree data structure.
How you go from source to target code without an AST is that the syntax-directed translation step in your implementation (that which would build the AST) doesn't bother with that and just builds the output code instead. The extra traversal is skipped; replaced by the original parser's traversal of the raw syntax.
E.g. pseudo-Yacc rules for compiling the while loop in a C-like notation.
while_loop : WHILE '(' expr ')' statement
{
let back = get_label();
let fwd = get_label();
let expr = $3; // code for expr recursively generated
let stmt = $5; // code for statement, recursively generated
$$ = make_code(stmt.reg(),
`$back:\n`
`${expr.code()}\n`
`BF ${expr.reg}, $fwd\n` // branch if false
`${stmt.code()}\n`
`JMP $back\n`
`$fwd:\n`)
}
;
Every code fragment coming out of a rule has .code() and .reg(): the generated code, which is just a string, and the output register where it leaves its value. Such representational details are decided by the compiler writer.
The while statement produces no value, so we just borrow the statement's .reg() as a dummy in the call to make_code; our rule is that every code fragment has an output register, whether it produces a value or not.
When the LALR(1) parser reduces this while loop rule to the while_loop grammar symbol, the expr and statements have already been processed; so the rule action has ready access to the code objects for them. We just synthesize a new code object. We grab a pair of labels that we need for the forward and back jump.
I'm assuming we have a vaguely JS-like programming language being used for the grammar rule actions, in which we have template strings with interpolation, and adjacent strings get merged into one. The bytecode assembly is line oriented, so we have \n breaks.
One possible kind of expression is a simple integer constant, INTEGER:
expr : INTEGER
{
let reg = allocate_reg();
let val = $1
$$ = make_code(reg,
`LOAD $reg, #$val\n`)
}
One possible statement is an empty statement dented by empty curly braces:
So then when we have while (1) { } we might get R1 allocated in the expr rule, like this:
LOAD R1, #1\n ; output register is R1
then in the while loop, things get put together like this:
L0:\n ; back label
LOAD R1, #1\n ; code for expr
BF R1, L1\n ; branch if false to L1
; no code came from empty statement
JMP L0 ; back branch
L1:\n ; fwd label
"The bytecode generated by SQLite is usually smaller than the corresponding AST coming out of the parser. During initial processing of SQL text (during the call to sqlite3_prepare() and similar) both the AST and the bytecode exist in memory at the same time and so more memory is used then. But that is a transient state. The AST is quickly discarded and its memory recycled [...]"
I asked on the mailing list years ago if there was an AST that I could generate from my general purpose language, and the answer was no. It's SQL and then it's bytecode.
> This is more difficult to achieve in a tree-of-objects design. When the prepared statement is a tree-of-objects, execution is normally accomplished by walking the tree. To pause the statement in the middle of a computation means unwinding the stack back up to the caller, all the while saving enough state to resume evaluation where it last left off. This is not impossible to do, but it is sufficiently difficult that I have never seen it actually done.
I don't think it's too difficult. You can avoid stack unwinding by just not using the native stack in the first place and using an explicit stack. This is one step towards a bytecode VM though, as you're virtualizing the stack. Definitely easier in bytecode though as it already makes all of that state explicit.
For the article overall, isn't it a bit trivial to see that a bytecode interpreter must be faster and more compact? Write out the expression tree to an array as an in-order traversal and this is basically an executable bytecode. It will be more compact because you no longer need hidden malloc/object headers and pointers for subexpressions (they're preceding indices!), and it will be faster because it's basically all in cache, so no cache misses from pointer chasing.
I can't imagine a situation in which the tree would ever be faster for execution, in general. At best, the overhead might be just too small to care about. We use trees in compilers because we need to perform incremental rewrites. This is trivial with trees but can get expensive when expanding and contracting arrays. Execution doesn't need to perform rewrites though, it needs linear, cache-friendly access patterns.
However, the article does make a good point that the query plan may be tuned on the fly during execution, eg. the small rewrites I mentioned above occurring while executing. I'm not fully convinced this couldn't be done in bytecode though.
I think the article is just trying to be scrupulously fair by listing all possible disadvantages as well as advantages. But I agree with you, the bytecode approach seems obviously better.
I just wonder why can't hackers write homoiconic (if thats the word), languages that get array tokens and provide results like so. I often wonder what if sqlite was also designed to have its statements like ('select '* 'from 'table), it would also had been better for third part libraries, to implement their orms.
Since all of this (byte coding) (AST) is done by almost many programming languages I ever explored (Python,Ruby, lua, ...). It would have been awesome, if they were easily parse-able I guess.
Most of the ORM implementations are very error prone to those design decisions. (Especially SQL databases).
What I find interesting is that this approach means the query plan is chosen when "compiling" the statement, not when running it. However, there are still some cases where SQLite will recompile the statement when running, such as when the schema changed.
“Cross-platform Code → SQLite runs on any platform with an 8-bit byte, two's complement 32-bit and 64-bit integers, and a C compiler. It is actively tested on all currently popular CPUs and operating systems. The extreme portability of the SQLite code and file format will help it remain viable on future platforms.”
I think this is unlikely for SQLite (other comments cover why it probably wouldn't happen even if it were likely to benefit), but I happened to have the copy and patch paper open in a tab, so I'll take the opportunity to share it here.
https://twitter.com/gorilla0513/status/1784756577465200740
I was surprised to receive a reply from you, the author. Thank you :) Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?
https://twitter.com/DRichardHipp/status/1784783482788413491
It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.