Why SQLite Uses Bytecode

wolf550e · 2024-04-30T05:30:29 1714455029

The page is the result of this exchange on Twitter:

https://twitter.com/gorilla0513/status/1784756577465200740

I was surprised to receive a reply from you, the author. Thank you :) Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?

https://twitter.com/DRichardHipp/status/1784783482788413491

It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.

coolandsmartrr · 2024-04-30T06:15:08 1714457708

I am amazed that the author (D Richard Hipp) made an effort to find and respond to a tweet that was (1) not directed/"@tted" at him or (2) written in his native language of English (original tweet is in Japanese[1]).

[1] https://twitter.com/gorilla0513/status/1784623660193677762

bena · 2024-04-30T12:49:37 1714481377

He might have a google alert for sqlite.org

jonp888 · 2024-04-30T07:46:15 1714463175

Side note, but I'm amazed that anyone that is not a journalist or a politician still actively uses X/twitter. Everyone I used to follow has stopped.

BiteCode_dev · 2024-04-30T10:04:16 1714471456

Simple, I'm on Mastodon, Substack, Threads and Bluesky.

And they are all barely breathing.

The community density is low (because of the split), the discovery sucks (because of decentralization and no algo to suggest content), the culture is homogeneous (I encounter people mostly from one political side) and the publication of limited quality.

You can't start new communities from old users. Communities are built by the young, they are the ones with the creativity, motivation and free time to do it.

That's why twitter, made mostly by young people from 20 years ago, still got the inertia of it, are is more interesting and active.

And that's why tik tok and roblox are full of life. I'm sure the kids are also elsewhere, where we are not looking, creating the communities of tomorrow.

PaulHoule · 2024-04-30T11:46:37 1714477597

I’d argue with that bit about creativity and youth.

In sports, for instance, if you are talented and 15 there are pipelines into established and pro sports that make sense to follow. If you are 30 and missed your chance for that you can still be a pioneer in an emerging “extreme” sport.

A lot of young people seem to think creativity is 99% asking for permission and 1% “just do it”, it takes some experience before people realize it is really the other way around.

mardifoufs · 2024-04-30T08:48:20 1714466900

I've never seen more people use it, as in being actually active on it, and it pops up everywhere due to the community note memes. But yeah I really need to get around creating a Mastodon account since some very good posters moved there too.

p9fus · 2024-04-30T11:17:44 1714475864

I refuse to use it and I can't even view tweet "threads" properly, nor replies or any kind of coherent timeline.... its unusable

infecto · 2024-04-30T11:38:27 1714477107

I'm amazed people like yourself care so much about it to write a post like this. Doesn't your brain have something better to think about beyond artificial outrage?

lupusreal · 2024-04-30T16:12:31 1714493551

"I'm surprised people use $WHATEVER because nobody uses it" usually means "I don't like $WHATEVER and wish you would stop using it."

I see this frequently directed towards reddit and twitter.

zarathustreal · 2024-04-30T10:59:38 1714474778

[flagged]

immibis · 2024-04-30T16:15:25 1714493725

I stopped using Twitter because "Space Man" banned most of one political side. Maybe if you didn't notice that, you're in a bubble?

cryptonector · 2024-04-30T19:14:16 1714504456

I've not been following this stuff. Whom did Musk ban?

AnthonyMouse · 2024-04-30T23:07:08 1714518428

Like most large social media sites, pre-Musk Twitter had some combination of algorithms and bureaucracy that act as a chaos monkey to shadowban and suspend accounts apparently at random, often with an inability to comprehend satire or under incoherent or undisclosed pretexts. The site's users and staff were predominantly left-leaning, so right-leaning accounts were disproportionately the victims of these false positives because of human bias in which posts are flagged and how they're subsequently moderated.

Musk bought the site under a claim of doing something about the bias after Twitter suspended the account of a right-leaning satire site. Many of the existing users were displeased by this, because they liked having a site whose moderation system was biased in their favor, so some of them left. Meanwhile new right-leaning accounts were created as you might expect. This affected the ideological balance of the site, so even though it's still somewhat left-leaning it's less than before. And now when the drunk chaos monkey suspends an account on the left, people are finding new satisfaction in their ability to blame Musk in particular.

immibis · 2024-05-04T18:59:11 1714849151

Musk has personally bragged about banning a lot of left-leaning accounts and unbanning right-leaning. What is your evidence that the site still leans left?

AnthonyMouse · 2024-05-05T08:02:37 1714896157

Bans in either direction are only a small percentage of accounts. People with large followings have a large disincentive to leave (have to start over), and new users haven't had long to build followings, so most large accounts are the same ones they were before he bought the company.

Musk is not actually a right-wing figure -- he smokes pot, makes electric cars, isn't religious, etc. But he isn't a left-wing figure either, which confuses people who can't contemplate that the same person could simultaneously e.g. support gay marriage and think peculiar pronouns are silly.

The result is that he's more inclined to ban accounts he doesn't like, but what he doesn't like isn't inherently associated with any particular party. And if you look at the "left-leaning" accounts he's suspended, it's the likes of Aaron Rupar and Taylor Lorenz, who... well, here it is:

https://www.urbandictionary.com/define.php?term=rupar

They're the sort of accounts that get themselves suspended once there is no longer anything protecting them from getting themselves suspended.

paulddraper · 2024-04-30T05:42:14 1714455734

There are three approaches:

1. interpreted code

2. compiled then interpreted bytecode

3. compiled machine code

The further up, the simpler.

The further down, the faster.

SQLite · 2024-04-30T11:30:32 1714476632

Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.

So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.

A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.

titzer · 2024-04-30T13:57:05 1714485425

Part of the benefit of compiling bytecode (or anything) is specializing code to the context (types, values, etc) in which it appears. While I don't doubt your analysis, it could be the case that compiled C code in question is full of branches that can be folded away when specialized to the context of the query, such as the structure of the rows, the type and values of columns, etc.

Basically all of what you are saying about high-level bytecodes applies to dynamic languages, too. But they benefit highly from specializing each bytecode given static and dynamic context, and shortcutting dataflow through local variables.

JonChesterfield · 2024-05-01T10:20:01 1714558801

There's usually a cost to shuttling data between bytecodes too. When two are fused together the second can lift the data from wherever the first wanted to leave it, as opposed to routing through a fixed location. Might be what you mean by shortcutting dataflow?

Also doing control flow in bytecode is usually slower than doing it in the native code.

I wonder if the context in which the instructions occur is sufficiently finite in sqlite for ahead of specialisation of the bytecode to be better. That is, the program you're operating on isn't known until JIT time, but the bytecode implementations are. SQL should correspond to an unusually specific set of operations relative to a general purpose language implementation.

The compiler will notice some values are constant when working with the bytecode. It can know ahead of time which arguments correspond to folding branches within the bytecode instructions and specialise correspondingly. If that works, you've emitted a sequence of calls into opcodes which are less branchy than they would otherwise be, at which point the opcode implementations start to look like basic blocks and a template JIT to machine code beckons.

cryptonector · 2024-04-30T19:15:32 1714504532

> Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records

Emphasis added. This is because of SQLite3's varint encoding method for numbers. Performance-wise it was probably a mistake, though it's a form of compression, which might have paid off in terms of space.

(I seem to remember seeing something, possibly by you, about this before.)

I wonder if it would be possible to replace the varint encoding... Yes, it'd be a compatibility break in that older SQLite3s couldn't open newer DBs.

blacklion · 2024-04-30T12:23:39 1714479819

Looks like SQLIte bytecode is similar to IBM RPG language from "IBM i" platform, which is used directly, without translation from SQL :)

Edit: PRG->RPG.

wglb · 2024-04-30T13:14:56 1714482896

I'm curious. Would you have a pointer to the documentation of PRG language?

blacklion · 2024-04-30T13:29:29 1714483769

Oooops, it is IBM RPG, not PRG! My bad!

There are some links in Wikipedia.

I never used it, but only read big thread about SQL vs RPG on Russian-speaking forum, and there was a ton of examples from person who works with IBM i platform in some big bank. Basic operations look like SQLite bytecodes: open table, move cursor after record with key value "X", get some fields, update some field, plus loops, if's, basic arithmetic, etc.

kayodelycaon · 2024-04-30T13:48:41 1714484921

For those that aren’t familiar, IBM i is ye olde A/S 400.

To be fair, RPG has been continuously updated and the version I’ve seen has Java integration.

It’s somewhat amusing that a single developer using Python can run circles around entire teams of RPG programmers. Technology marches on. :)

wglb · 2024-05-01T04:33:06 1714537986

Well, having programmed RPG professionally, there is no question that a python programmer would clock any RPG programmer. RPG is a terrible way to program.

lmz · 2024-05-01T12:03:52 1714565032

That kind of manipulation isn't limited to RPG. I believe dBase and descendants also work similarly with manual cursor manipulation.

wglb · 2024-05-01T04:34:25 1714538065

Ah. I did RPG on a system 33, a precursor to the AS 400.

ngcc_hk · 2024-05-01T15:52:18 1714578738

32, 34 or 36? or 38 which is very different and the real R system that leads to as/400 or I … s/33?

wglb · 2024-05-02T23:37:58 1714693078

Sorry, good point. It was a 34.

wolf550e · 2024-04-30T16:33:34 1714494814

It is possible to construct a worst case scenario for bytecode execution, for example very complex expressions in the WHERE and/or SELECT clauses that compute values, and a query plan that performs a full table scan over a table with say 100 million rows that is cached in RAM (or maybe use generate_series, whatever works best).

Computing the expressions should dominate execution time, right?

Then, to compare against the best possible case, we can write a custom C program that uses sqlite internals to perform the same task (full scan of the table, extract values from row) and does not use the bytecode VM and computes the complex expressions in regular C code (e.g. a function that accepts floats and returns a float or whatever).

Then comparing the two implementations will tell us how much faster sqlite can be if it had a "perfect" JIT.

paulddraper · 2024-04-30T14:27:43 1714487263

> A key point to keep in mind is that SQLite bytecodes tend to be very high-level

That's right. "Bytecode" is a spectrum. SQLite's bytecode is higher-level than e.g. WASM, JVM, Python, etc. (Notably, because the original source code is higher-level.)

temporarely · 2024-04-30T13:04:19 1714482259

A while back was musing if it was possible to come up with something resembling the instruction set for a CPU for an abstract relational-engine. Is this basically what SQLite is doing with bytecodes?

JonChesterfield · 2024-05-01T10:07:12 1714558032

I think yes, essentially. Bytecode running on a software interpreter is the same thing as machine code running on a silicon interpreter. Probably slower, probably a different ISA, but very much the same idea.

The OP suggests the sqlite bytecode also has arithmetic & control flow in it which you might want to exclude in other settings. There's a parallel with cisc vs risc instruction sets here, and to calls into a compiler runtime vs expanding instructions into larger numbers of more primitive ones in place.

@rkrzr there's a circular definition in your "no" - if "CPU" means "an interpreter that executes instructions simpler than SQL" then this is indeed not an instruction set. If "CPU" means "interpreter of a given ISA" then it could be. The virtual machine in the sqlite codebase is the definition of the CPU for this instruction set. Unless you want to define CPU as exclusive of software implementations of interpreters.

rkrzr · 2024-04-30T13:32:16 1714483936

No. The SQLite bytecode is much higher-level than the instruction set of a CPU.

beeboobaa3 · 2024-04-30T12:58:47 1714481927

> Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements

> compiling all the way down to machine code might provide a performance boost 3% or less

This logic doesn't seem sound? Because the application now spends 3% on bytecode dispatch doesn't tell us anything about how long it would instead spend on e.g. interpreting SQL.

asimpletune · 2024-04-30T13:32:34 1714483954

He’s saying in the best case scenario that 3% would go to 0. Therefore the reality would probably be even less.

vlovich123 · 2024-04-30T15:15:30 1714490130

Unfortunately you can’t do performance analysis this way but I think the overall point that it’s a fraction probably still stands as you’d expect the I/O work to dominate.

The reason you can’t do the analysis the way that you explicitly stated (which fairly is what was implied) is that when you lower the code to machine code you typically get rid of branches in the execution path. Since branches slow down execution by a disproportionate amount vs how long they themselves take, it’s easy to get a >3% boost getting rid of code paths that seem like there’s only 3% of room.

What the author also failed to mention is that they’ve gone on many optimization hunts eeking out less than a percent here or there to get a cumulative boost, so 3% isn’t anything to sneeze at.

That being said, the broader point which is valid is that the maintenance isn’t worth the performance profile that SQLite targets not to mention that JITs open up an attack vector for security exploits. So the cost vs benefit is squarely in the interpreted side for now.

Edit: Forgot to mention that performance tuning a JIT to work well is also hard - for small queries you’ll spend more time compiling the byte code than you would just executing. That’s why all the big JS engines do a tiered approach where each tier optimizes a bit more.

tanelpoder · 2024-04-30T06:48:54 1714459734

4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)

Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.

lolinder · 2024-04-30T11:52:19 1714477939

Isn't what you're describing just an interpreter, bytecode or otherwise? An interpreter walks through a data structure in order to determine which bits of precompiled code to execute in which order and with which arguments. In a bytecode interpreter the data structure is a sequential array, in a plain interpreter it's something more complicated like an AST.

Is this just the same as "interpret the query plan" or am I missing something?

tanelpoder · 2024-05-01T16:55:54 1714582554

It interprets where it needs to jump and then jumps to the precompiled binary machine code locations that are executed natively on the CPU, no interpretation going on (I guess it's conceptually something like instruction traces inside the CPU trace cache, but done at higher level). These precompiled building blocks are shipped in separate libraries, one library for each CPU architecture on a platform (SSE4.2, AVX, AVX2, AVX-521 on AMD64 and other SIMD architectures on other platforms that Oracle supports). Oracle dynamically loads the correct library matching the CPUs capability during the startup and starts jumping there as needed.

So this is not bytecode interpretation, it's native machine code executed directly on CPU whenever the SQL exec engine decides to jump there. This is different from, say Apache Impala that compiles new LLVM machine code for each query execution for the query's scanning tight loops.

Edit: I guess why I see this as not just not regular bytecode interpretation (I don't actually know that much about it in general), is that these building blocks can include various loops (and can do their thing on entire vectors of data), so looks like they can push quite a bit of complexity into the machine code sections, before returning back to the normal interpreted AST/opcode land.

paulddraper · 2024-04-30T14:29:17 1714487357

> lots of small building blocks of static machine code precompiled/shipped with DB software binary

You might call those interpreter operations.

FridgeSeal · 2024-04-30T10:30:27 1714473027

That’s a really cool idea! Is there any writeups or articles about this in more detail?

JonChesterfield · 2024-05-01T10:23:26 1714559006

It's called a template JIT. You get to remove the interpreter control flow overhead but tend to end up with a lot of register shuffling at the boundaries. Simpler than doing things properly, usually faster than a bytecode interpreter.

UncleEntity · 2024-04-30T19:52:43 1714506763

Sounds like 'copy & patch compilation' is a close cousin.

branko_d · 2024-04-30T06:03:45 1714457025

> 2. compiled then interpreted bytecode

You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.

This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.

gpderetta · 2024-04-30T09:41:56 1714470116

There is also the option of native code generation from bytecode at install time as opposed of runtime.

usrusr · 2024-04-30T11:30:10 1714476610

Aka missing out on all the delicious profiling information approaches with later translation enjoy. There's no simple best answer.

0x457 · 2024-04-30T16:59:46 1714496386

I'm probably wrong, but I always thought for platforms like .Net and JVM, AoT delivers the same performance as JIT in most cases. Since a lot of information is available before runtime, unlike JS where VM always needs to be read, ditch optimized byte code and go back to the whiteboard.

branko_d · 2024-05-02T05:58:39 1714629519

As a rule of thumb: AOT has shorter cold start time, but similar overall performance to JIT.

In fact, JIT can perform slightly better once the program reaches "steady state" because it can target the exact processor running and also perform the dynamic profile-guided optimization, which may or may not yield meaningful improvements in real life.

Nick Chapsas did some benchmarking of .NET AOT vs JIT, if you are interested:

https://youtu.be/gJcPqdbKF90?si=PnSPnFvVhr0pjLL-

neonsunset · 2024-04-30T18:36:53 1714502213

There are a few concerns here that make it confusing for the general public to reason about AOT vs JIT performance.

Different JIT compilers have different levels of sophistication, being able to dynamically profile code at runtime or otherwise, and the range of applied optimizations of this type may also vary significantly.

Then, AOT imposes a different restrictions in the form of what types of modularity the language/runtime wants to support (Swift ABI vs no AOT ABI in .NET NativeAOT or GraalVM Native Image), what kind of operations modifying type system are allowed, if any, and how the binary is produced.

In more trivial implementations, where JIT does not perform any dynamic recompilation and profile based optimizations, the difference might come down to simply pre-JITing the code, with the same quality of codegen. Or the JIT may be sophisticated by AOT mode might still be a plain pre-JIT which makes dynamic profiling optimizations off-limits (which was historically the case for .NET's ReadyToRun and partially NGEN, IIRC JVM situation was similar up until GrallVM Native Image).

In more advanced implementations, there are many concerns which might be at odds with each other: JIT compilers have to maintain high throughput in order to be productive but may be able to instrument intermediate compilations to gather profile data, AOT compilers, in the absence of static profile data (the general case), have to make a lot more assumptions about the code, but can reason about compiled code statically, assuming such compilation comes with "frozen world" type of packaging (rather than delegating dynamic parts to emulation with interpreter).

And then there are smaller details - JIT may not be able to ever emit pure direct calls for user code where a jump is performed to an immediate encoded in the machine code, because JIT may have to retain the ability to patch and backpatch the callsites, should it need to de/reoptimize. Instead, the function addresses may be stored in the memory, and those locations are encoded instead where calls are emitted in the form of dereference of a function pointer from a static address and then a jump to it.

JIT may also be able to embed the values of static readonly fields as JIT constants in codegen, which .NET does aggressively, but is unable to pre-initialize such values by interpreting static initialization at compile time in such an aggressive way that AOT can (constexpr style).

So in general, a lot of it comes down to offering a different performance profile. A lot of the beliefs in AOT performance stem from the fact lower-level languages rely on it, and the compilers offering it are very advanced (GCC and Clang mainly), which can expend very long compilation times on hundreds of optimization passes JIT compilers cannot. But otherwise, JIT compilers can and do compete, just a lot of modern day advanced implementations are held back by the code that they are used for, in particular in Java land where OpenJDK is really, really good, but happens to be hampered by being targeted by Java and JVM bytecode abstraction, which is not as much of a limitation in C# that can trade blows with C++ and Rust quite confidently the moment the abstraction types match (when all use templates/struct generics for example).

More on AOT and JIT optimizations (in the case of .NET):

- https://devblogs.microsoft.com/dotnet/performance-improvemen...

- https://migeel.sk/blog/2023/11/22/top-3-whole-program-optimi...

If someone has similar authoritative content on what GraalVM Native Image does - please post it.

GartzenDeHaes · 2024-04-30T09:08:09 1714468089

You can also parse the code into an augmented syntax tree or code DOM and then directly interpret that. This approach eliminates the intermediate code and bytecode machine at the cost of slower interpretation. It's slower due to memory cache and branch prediction issues rather than algorithmic ones.

fancyfredbot · 2024-04-30T10:05:00 1714471500

For simple functions which are not called repeatedly you have to invert this list - interpreted code is fastest and compiling the code is slowest.

The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.

galaxyLogic · 2024-04-30T16:25:07 1714494307

I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

There are situations where the analyst enters SQL into the computer interactively. But in those cases the overhead of compiling does not really matter since this is infrequently done, and there is only a single user asking for the operation of running the SQL.

tristor · 2024-04-30T23:33:07 1714519987

> I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

You would hope, but no that's not reality. In reality most of the people "writing SQL" in a business scenario don't even know SQL, they're using an ORM like Hibernate, SQLAlchemy, or similar which is dynamically generating queries that likely subtly change every time the application on top changes.

fancyfredbot · 2024-04-30T23:39:53 1714520393

I didn't mean to imply that compilation never makes sense, I'm only pointing out that it isn't always the fastest way of executing a query

In situations where you have a limited number of queries and these are used repeatedly then some kind of cache for the complied code will be able to amortise the cost of compilation.

I certainly agree that a situation where an analyst enters SQL manually is a weird niche and not common. However most applications can dynamically construct queries on behalf of users and so getting a good hit rate on your cache of precompiled queries isn't a certainty.

lambdaxyzw · 2024-05-02T10:52:01 1714647121

As far as I know, database often does query caching. So it doesn't have to recompile the query all the time.

On the other hand, query plan may depend on the specific parameters - the engine may produce a different plan for a query for users with deleted=true (index scan) and for deleted=false (99% of users are not deleted, not worth it to use index).

vmchale · 2024-04-30T14:30:32 1714487432

APL interpreters are tree-walking, but they're backed by C procedures. Then GCC optimizes quite well and you get excellent performance! With stuff like vector instructions.

Getting on par with GCC/Clang with your own JIT is pretty hefty.

DeathArrow · 2024-04-30T06:48:08 1714459688

There is also JIT bytecode.

paulddraper · 2024-04-30T14:32:21 1714487541

Yes, the timing of that compilation of bytecode and machine code can be either AOT or JIT.

For example, Java/JVM compiles bytecode AOT and compiles machine code JIT.

AtlasBarfed · 2024-04-30T13:17:38 1714483058

optimizing runtimes (usually bytecode) can beat statically compiled code, because they can profile the code over time, like the JVM.

... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.

I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).

ehaliewicz2 · 2024-04-30T19:16:52 1714504612

Yeah, for example, movs between registers are generally effectively no-ops and handled by the register renaming hardware.

userbinator · 2024-04-30T04:03:13 1714449793

The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it, as far as I know. Hence, no tree-of-objects database engine provides the level of detail in their "EXPLAIN" output that SQLite provides.

I believe Microsoft SQL Server uses an object tree internally, and yet its query plan output is a table:

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-s...

slaymaker1907 · 2024-04-30T06:04:00 1714457040

It can also execute a query incrementally. In fact, certain features rely on this behavior. “Watch live data” (in SSMS) for extended events is actually an infinite table-valued function. It’s not exactly rocket science to do in a database, you just need to model data sources and operators as iterators.

winternewt · 2024-04-30T11:27:49 1714476469

Same with PostgreSQL. You can choose between text output (one table row per line), XML or JSON. But none of them are ordered according to a strict execution order like bytecode would be. To me that makes perfect sense though because if represents what can be parallelized in the plan. The bytecode representation in SQLite seems to have the inherent limitation that it is single threaded.

zachmu · 2024-04-30T18:19:42 1714501182

Dolt's EXPLAIN output prints the execution tree directly. E.g.:

    explain select * from xy join uv on (x = u and u  > 0) where u < 2;
    Project
     ├─ columns: [xy.x:2!null, xy.y:3, uv.u:0!null, uv.v:1]
     └─ LookupJoin
         ├─ IndexedTableAccess(uv)
         │   ├─ index: [uv.u]
         │   ├─ static: [{(0, 2)}]
         │   ├─ colSet: (3,4)
         │   ├─ tableId: 2
         │   └─ Table
         │       ├─ name: uv
         │       └─ columns: [u v]
         └─ IndexedTableAccess(xy)
             ├─ index: [xy.x]
             ├─ keys: [uv.u:0!null]
             ├─ colSet: (1,2)
             ├─ tableId: 1
             └─ Table
                 ├─ name: xy
                 └─ columns: [x y]

yayr · 2024-04-30T11:47:00 1714477620

Actually, SAP HANA has a nice tool called PlanViz for that in Eclipse. A google image search gives you a good impression how that works. There are also some blogs about it, e.g. https://community.sap.com/t5/enterprise-resource-planning-bl...

Cthulhu_ · 2024-04-30T12:41:27 1714480887

I love how the example SQL query uses abbreviated columns that sound like runes:

    select
    --   count(*)
    a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
    --     *
    from       BSEG    as a
    innerjoin  ACDOCA  as b
    on    b.rclnt  = a.mandt
    and   b.rbukrs = a.bukrs
    and   b.gjahr  = a.gjahr
    and   b.belnr  = a.belnr
    where a.mandt  = '715'
    and   b.rldnr  = '0L'
    and   b.docln  = '000001'
    and   a.buzei  = '001'
    and   a.gjahr  = '2018'
    --and   a.gjahr = '2017'
    --and   a.gjahr = '2019'
    --order by a.mandt, b.rldnr, a.bukrs, a.gjahr, a.belnr, b.docln, a.buzei
    limit 200;

Just realised this is probably from German too, "jahr" being year.

yayr · 2024-04-30T13:03:59 1714482239

yeah, these accounting table field names have not changed since around 30 years, which is not necessarily a bad thing...

for a human readable view of this you can look e.g. here https://docsfortec.com/sap/S4/tables/ACDOCA

deanishe · 2024-04-30T20:04:54 1714507494

They're a bit of a bilingual mess.

Did they start out with German table names and decide to give new tables English names at some point?

mharig · 2024-05-01T22:45:48 1714603548

Out of the guts, the only maybe english abbrev is docln = document line (?). Referring to the example, not the link.

jiggawatts · 2024-04-30T04:20:09 1714450809

Typically you get XML for showplan because it can represent the tree structure better.

kruador · 2024-04-30T11:36:04 1714476964

You get XML for estimated execution plans if you use SET SHOWPLAN_XML ON. You get the actual execution plan XML along with your results if you use SET STATISTICS XML. You can see cached execution plans in the sys.dm_exec_query_plan dynamic management view.

The estimated/actual execution plan feature in SQL Server Management Studio was changed to use XML in SQL Server 2005. The SQL Server team are only adding new information to the XML representation, not the text representation.

The name "estimated" execution plan is a bit misleading. If you executed the statement(s) with the current indexes and statistics, that's how it would be executed. I think the "estimated" part of the name is because the number of rows returned from each node, and the execution time of each node, is an estimate.

Assuming that the indexes and statistics don't change, you should expect the "actual" execution plan to have the same shape. It will just be annotated with the number of rows that were actually retrieved as well as the estimated numbers. SQL Server doesn't re-plan if it turns out the number of rows was greater (or fewer) than expected.

As a SQLite and SQL Server user, I find that I would like something more detailed from SQLite than EXPLAIN QUERY PLAN (which just tells you the join order), but less detailed than the bytecode. Particularly I want to know why it chose that join order or to use that index.

yread · 2024-04-30T20:12:35 1714507955

I havent used SQL server much since the 2008 days but it used to happen quite often that the estimated and actual execution plan differed a lot. I think the estimated part also includes how many rows get returned from a subquery and that informs which joining algorithm is used.

I second your longing for something more detailed than explain query plan in Sqlite though.

In fact I complained about it here and some sqlite developer pointed me to these resources:

https://news.ycombinator.com/item?id=30405275

I've never actually tried yet. Queries are still too fast to do all that:)

bryancoxwell · 2024-04-30T10:03:01 1714471381

I don’t doubt the author, but what is it that makes rendering a tree of objects to a table a difficult problem? Is that not what browsers do when they render a table element?

mfoc · 2024-04-30T15:24:38 1714490678

Quote from the article:

"A tree-of-objects representation is more difficult to publish in a human-readable form. The objects that comprise the tree tend to all be very different, and thus it is tricky to come up with a consistent and simple table representation with which to display the objects. Any any such table representation that you do come up with would almost certainly have more than six columns, probably many more. The problem of rendering a tree-of-objects as a table is sufficiently difficult that nobody does it"

To further elaborate on this important point.

There is an 'impedance mismatch' (conceptual difficulty mapping between the two logic models) between the tree abstract data type and the table abstract data type. Specifically, there are four key differences between the simple table data structure and the more complex tree data structure that makes mapping between them a non-trivial operation.

Hierarchical: A table has a flat representation; a tree has a hierarchical representation.

Order: The order of the rows in a table typically do not matter (they may have a unique rowid). The order of branches (nodes) and leaves in a tree is important, and the ordering itself in an encoding of valuable information.

Semi-structured: A table has a fixed structure (rows multiplied by columns). A tree has a flexible structure - an arbitrary combination of branches (internal nodes) and leaves (terminal nodes). Semi-structured data has a structure that may not necessarily be known in advance, the tree has irregular and variable formation; the tree may have branches with missing or supplementary nodes.

Meta-data: The information describing the meaning of the data in a table is typically stored separately from the table - consequently a schema is often mandatory. A schema is optional for a tree abstract data type.

As an aside, I have been visiting hacker news almost daily since 2010. This is my first comment on hacker news. I want to say thank you to the community for the many valuable insights over this years.

galaxyLogic · 2024-04-30T16:35:37 1714494937

Congratulations for your first comment!

What I want to do is represent my tree-structure as a table in a relational database and then be able to efficiently get the tree structure back by transforming the table-representation back into a tree.

Further I would like to do that in plain standard SQL. This must be a common problem, any documented solutions out there?

mfoc · 2024-04-30T17:17:30 1714497450

Thank you.

Great question - you have touched on the key difference between a labeling scheme and an encoding scheme for tree data structures.

As mentioned previously, the tree is an abstract data type, that is to say, a conceptual model that defines the nodes in a tree, and their relationships.

To be able to evaluate a expression that processes a tree, one needs a labeling scheme. The purpose of a labeling scheme is to assign unique labels to each node in the tree and these labels must facilitate node ordering, and often (but not always) a labeling scheme will permit the reconstruction of the tree structure.

However, no labeling scheme captures the node type, names or the content stored at the nodes. For that we need an encoding scheme. An encoding scheme is constructed upon a labeling scheme and augments it with the information necessary to fully represent the tree in a table-like data structure. In answer to your question, it also permits the full transformation from the table representation to the original tree structure.

Thus, it sounds like what you are looking for is an encoding scheme.

There are many different labeling schemes out there for tree structure data, and virtually all of them can be augmented with additional information to construct a complete encoding scheme. Concerning documented solutions - I have not been active in this space for a number of years, so off the bat - I don't have a recommended documented solution to point you too.

But to help, I will put a link to my PhD thesis [1] which gives a more in-depth understanding of labeling schemes and encoding schemes for tree structured data with an example of a simple implementation of an encoding scheme enabling the full transformation from the table representation to the original tree structure (pages 5-9) and a survey of the advantages and disadvantages of existing labeling schemes concerning their usefulness to be part of an encoding scheme you could use in your solution (see chapter 2)

Caveat 1: My thesis was written in the context of updating dynamic (XML) trees but it addresses the transformation between tree and table data structures.

Caveat 2: The thesis was written 11 years ago, but every now and then I have kept in touch with the latest developments in the area, and to my knowledge, there have been no major developments since.

I hope it helps.

[1]: https://doras.dcu.ie/19316/

galaxyLogic · 2024-05-02T15:54:51 1714665291

Thanks for the link

bryancoxwell · 2024-04-30T20:51:28 1714510288

Hey, this is excellent. Thanks for the thorough response.

IsTom · 2024-04-30T10:08:51 1714471731

At least they're hard to display on a terminal. Reading EXPLAIN ANALYZE diagrams from psql does not spark joy.

tucnak · 2024-05-01T12:37:44 1714567064

It does in something like DataGrip. Honestly, I can't imagine using Postgres without it these days.

manx · 2024-04-30T07:48:09 1714463289

I'm wondering if one could write this bytecode directly (or with a higher level imperative language) instead of SQL. Often, the programmer knows exactly which index lookups need to happen in a loop, while it seems like a burden to express that in SQL. This might also be an opportunity to create a different type safe dsl for database access.

wruza · 2024-04-30T10:09:07 1714471747

I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion. Despite the fact that most of the tables most people have are never any big (except for a few oltps that you ought to leave to specific sql server wizards anyway) and you usually do have the idea how it should work. SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have. And most of our complex queries are trivially expressible in imperative loops. Index lookup and a complex plan building are key features of an rdbms, everything else it steals from you and forces to use an arcane inconvenient language that in practice isn’t even a standard. Make plan building and indexing core ops and leave everything else to some sort of a general purpose bytecode, everyone will be happier and will create dozens of DSLs for all their needs.

lqet · 2024-04-30T12:11:52 1714479112

> I was advocating for it for decades, but everyone dismisses it with “you don’t know better than rdbms”. That’s almost a religion.

Which is simply not true. Query optimization is done heuristically, for the simple reason that you usually need to run the query to get the information required for "perfect" optimization. If the RDBMS really knew better, it wouldn't offer query hints.

kayodelycaon · 2024-04-30T14:50:14 1714488614

Postgres doesn't offer query hints. ;)

masklinn · 2024-05-01T11:14:50 1714562090

And that's regularly a problem, and why e.g. Amazon offers Query Plan Management, and there are extensions around hinting or fixing query plans.

TeMPOraL · 2024-05-01T13:01:59 1714568519

I'm wondering the opposite - could RDBMS know better than programmer how to structure the program? The other day I had this realization, that if I squint, quite a lot of code I see and write could be seen as database tables, prepared statements, and specialized indexes. In particular, every time I use an associative table (or several) to speed up access to data along particular dimension (like X/Y coordinates in the world of a videogame), that's equivalent to making an index in SQL.

HanyouHottie · 2024-05-01T23:33:01 1714606381

You might find SpacetimeDB interesting.

https://spacetimedb.com/

lambdaxyzw · 2024-05-02T11:02:39 1714647759

>SQL is a cool solution for a set of problems that has a very non-zero xor with a set of problems that we actually have.

My usual problems are:

1) Running queries manually, where SQL is much more friendly than writing bytecode by hand. 2) Running queries using a clueless ORM that has no idea how to optimize them, so leaving this job to the database makes sense.

I believe the actually rare problem is "writing hyper-optimized complex queries tailored to the current state of the database", where bytecode would help. But that is a very unusual usecase.

Maybe there are shops that use databases very differently from me, but it makes sense that SQL is optimized for the common use case.

And since db internals are very different, it's hard to make a single bytecode standard that makes sense to use across different databases. You can probably write something specialized for sqlite or postgres, but since nobody did, probably it's not as useful for other people too.

wruza · 2024-05-03T08:37:16 1714725436

hyper-optimized complex queries tailored to the current state of the database", where bytecode would help

Optimization isn’t the goal here, this quote misrepresents the idea. In the spirit of dismissal through all these years, btw, some people just don‘t get it and think it’s for better performance. It is, but for better performance of a developer, not that of a database. The goal is to have a set of languages above that bytecode that would help with usual programming bookkeeping and with expressing queries in an algorithmic way because that’s how you designed it. SQL lacks DX ergonomics, it’s just a language from the clueless epoch.

thayne · 2024-04-30T07:57:14 1714463834

I was wondering the same thing. And in particular if a new query language that avoided many of the pitfalls of SQL could compile down to that bytecode and avoid having to deal with SQL as an intermediate representation. Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.

nraynaud · 2024-04-30T09:29:51 1714469391

I'm puttig my wish list here:

- be able to put the projection in a varable and reuse it (and I think orm people might love it)

- have a quick way to forward the the non-aggregated fields of projection to a group by (maybe with the aforementionned variables)

TachyonicBytes · 2024-04-30T12:48:22 1714481302

The DuckDB API I was talking above seem to already meet your use-cases?

- Does this[1] solve the group by wish?

- Depending on what you mean by projection, maybe this[2] or this[3]?

[1] https://duckdb.org/docs/api/python/relational_api#aggregatee...

[2] https://duckdb.org/docs/api/python/relational_api#projectexp...

[3] https://duckdb.org/docs/api/python/expression#column-express...

nraynaud · 2024-05-01T21:19:24 1714598364

that's really great!

astrobe_ · 2024-04-30T17:11:47 1714497107

> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.

I think we already kind of have that already; one can prepare a query/statement and then use it multiple times. Regex engines also do that, except that in the case of SQlite one can bind different parameter values.

Programmers worried about SQL lexing/parsing times can compile their most used queries once for all at programmer startup. Plus, one can sometimes break a query with annoyingly variable parts into smaller queries glued with SQLite API calls.

forinti · 2024-04-30T14:36:44 1714487804

A client-server database would hash the SQL and cache the output of the parser. The end result is the same.

Maybe SQLite could have a similar mechanism, but the cache stays on disk or an external memory cache.

gwbas1c · 2024-04-30T18:36:03 1714502163

In the C# library for SQLite, a DbCommand with parameterized SQL can be reused, thus reusing the bytecode.

TeMPOraL · 2024-05-01T13:03:48 1714568628

> Also, if you can compile to bytecode ahead of time, then that could save the time needed to parse the text of a sql query to bytecode at runtime.

That's exactly how those "prepared statements" work in SQLite - you get a handle to a piece of bytecode, and you can call it with different parameters.

asvitkine · 2024-04-30T14:10:32 1714486232

The main downside is now you're making the bytecode an API which means all future changes need to be backwards compatible.

JonChesterfield · 2024-05-01T10:44:25 1714560265

You can do an automatic upgrade thing, where you recognise old formats and upgrade them on the fly. Possibly with a window on it where at sufficient age people need to run their elderly code through multiple upgrade cycles.

Or you can declare it an unstable interface and it's on the user to deal with it changing over time.

Or you can just leave it accessible without excessive work, but not document it, and then it's very obviously on the external tool to deal with the impedance matching.

TachyonicBytes · 2024-04-30T12:40:19 1714480819

Something akin to this is available[1] in DuckDB, itself started as a "shell" over SQLite. Using Python, you can compose your SQL plans as you like.

I recall a previous discussion about the SQLite bytecode and potential alternatives, where the main idea was that, if SQLite had gone the other way, you could have a much more general engine, where you could achieve something like DuckDB itself just as a set of extensions.

Reading the draft, it doesn't seem that extensibility of SQLite was a main factor in deciding. Maybe this trade-off also covers extensions somehow, which would be nice to see in the final document.

[1] https://duckdb.org/docs/api/python/expression

dagss · 2024-04-30T23:31:16 1714519876

1) I think a simple new kind of query hint could go a long way: Declare a table as being "infinitely large" from the perspective of the query planner.

So, if a query caused a full table scan or similar against the table, it's an error, regardless of the actual content.

A lot of "typical" backend code would then simply require you to make the right indices you need to support your queries, and the query planner would be forced to use those indices in all situations.

2) You can already sort of do what you say; that "higher level imperative language" is available simply by breaking your query up into smaller chunks.

At least in MS-SQL (which is what I know), simply do

    declare @tmp1 as table (...)
    insert into @tmp1 ...
    declare @tmp2 as table (...)
    insert into @tmp2 ... from @tmp1 join ...

and so on; if you make each of those steps small enough you pretty much have the imperative language.

3) That said, I think SQL is a wonderful idea, I am also super-productive in implementing business logic in it; but it is a horrible language; the learning curve is so much higher than it needs to be and so many silly pitfalls. So fully support pushes for better languages.

JonChesterfield · 2024-05-01T10:41:32 1714560092

Definitely yes. Databases and compilers have a lot in common. This bytecode is exactly equivalent to a compiler IR. It's called out as such in the OP where the decoupling between front end and backend is noted. Therefore you can construct the bytecode from outside of the database and feed it to the backend (may require patching sqlite to make the entry points accessible).

By analogy, this is really easy in LLVM. You can write the IR you want in a text file and feed it back into the infrastructure. This is used a lot in testing and debugging LLVM. It's much harder in GCC for reasons that never made much sense to me.

There's probably a parser for the format of the EXPLAIN text dump in sqlite somewhere intended for modifying the debug output then feeding it back into the engine. Maybe not documented, but I expect it works fine in practice.

nurple · 2024-05-02T04:02:26 1714622546

You should watch this asianometry video on the birth of SQL, very interesting, and in fact a functional approach, based on relational algebra and tuple relational calculus, was originally what the father of the concept intended for interacting with the db. Other IBM engineers formulated the SQL language over that.

The precursors to sql rdbms, and the war over competing concepts, were also quite thought provoking.

https://www.youtube.com/watch?v=z8L202FlmD4

pdubroy · 2024-04-30T14:29:58 1714487398

I think most people associate bytecode VMs / interpreters with general-purpose programming languages, but it's a surprisingly useful concept in other contexts.

Sometimes bytecode VMs appear in unexpected places! A few that I'm aware of:

- eBPF, an extension mechanism in the Linux kernel - DWARF expression language, with interpreters in debuggers like GDB and LLDB - The RAR file format includes a bytecode encoding for custom data transformation

More here: https://dubroy.com/blog/bytecode-vms-in-surprising-places/

WickedSmoke · 2024-04-30T14:59:48 1714489188

Bytecode is great for tiny domain languages and I use them in many projects.

Ultima IV used one to animate sprites on it's title screen map. For the next version of XU4 I implemented three bytecode interpreters to script the entire title sequence. There's a high level presentation interpreter, a GPU rendering interpreter, and one for the Ultima IV bytecode.

xymostech · 2024-04-30T15:38:42 1714491522

The original TeX Fonts stored their metrics in TFM (short for TeX font metrics) files, which contains a bytecode interpreter for calculating ligatures and kerning between characters. I learned about that when I tried reading the files myself.

From what I can tell, modern fonts using OpenType just have tables to accomplish something similar now, in the form of the GSUB and GPOS tables?

Documentation for the TFM format here: https://tug.org/TUGboat/Articles/tb02-1/tb02fuchstfm.pdf (search for lig/kern)

atombender · 2024-04-30T20:09:29 1714507769

TrueType (and OpenType, which is an evolution of TT) absolutely includes a bytecode instruction set: https://developer.apple.com/fonts/TrueType-Reference-Manual/...

xymostech · 2024-05-02T00:31:46 1714609906

Oh yes, thank you for the link to that! Looks like that is an instruction set for representing the font glyphs themselves? I was talking about the instruction set in TFM which is for representing meta information, like ligatures and kerning between glyphs, and not for the actual glyphs. The glpyhs for the original TeX fonts are described using Metafont which is an interpreted language.

atombender · 2024-05-02T09:03:58 1714640638

Glyphs are generally defined as pure outlines (the "glyf" table [1]), and the instruction set is an optional system for things like grid fitting. Ligatures, kerning etc. are normal tables.

[1] https://developer.apple.com/fonts/TrueType-Reference-Manual/...

omoikane · 2024-04-30T17:46:36 1714499196

I believe these fall under "interpreter pattern":

https://en.wikipedia.org/wiki/Interpreter_pattern

https://gameprogrammingpatterns.com/bytecode.html

https://wiki.c2.com/?InterpreterPattern

I first read about it from the "Design Patterns" book, which I would recommend to everyone.

Narishma · 2024-04-30T19:07:27 1714504047

They are frequently used in games to run scripting languages.

UncleEntity · 2024-04-30T20:06:00 1714507560

Yep, Greenspun's tenth rule:

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

astrobe_ · 2024-04-30T17:12:40 1714497160

Also regular expression engines.

pizlonator · 2024-04-30T14:13:15 1714486395

That’s awesome, I didn’t know SQLite had a bytecode.

Based on my experience writing interpreters, I know that bytecode is faster to interpret.

(Source: I wrote JavaScriptCore’s interpreter and it’s optimizing JITs. I’ve also worked on other language implementations. I’ve seen the AST vs bytecode thing play out more than once.)

iainmerrick · 2024-05-01T11:16:11 1714562171

Maybe I'm missing something, but the bytecode approach seems really obviously better, just from a memory usage and locality point of view. Scanning an array of bytes or words is obviously going to be faster than traversing a tree of pointers.

So I'd be surprised to find a serious language implementation using a pure AST in its interpreter, without at the very least flattening it to an array!

Edit to add: of course there are some cases where you might actually want the AST approach, as the Sqlite doc is careful to point out. But not very many.

pizlonator · 2024-05-01T13:51:25 1714571485

It’s fine to use an AST interpreter if you pair it with really great JITs.

That has its own problems though - using an AST as a shared source of truth for your JIT compilers is cumbersome for implementing stuff like OSR. But from a perf standpoint, it’s probably fine.

Also - say you want to optimize just for startup time, like if the code you’re running has minimal looping or reexecution. Then AST is better because you skip a step before first execution.

andruc · 2024-04-30T15:00:09 1714489209

Faster to interpret than what?

jjice · 2024-04-30T16:40:04 1714495204

I assume they meant faster to interpret than a tree of objects, since that's the other concept discussed in the article.

pizlonator · 2024-04-30T18:34:48 1714502088

chrisaycock · 2024-04-30T03:31:28 1714447888

SQLite's design docs were the first time I had seen a database use a virtual machine instead of walking a tree. I later noticed VMs in libraries, embedded DSLs, and other applications outside of large general-purpose programming languages. That really drove home for me that VMs could be anywhere and were often a useful step in handling a user's expressions.

anon291 · 2024-04-30T05:04:42 1714453482

Stack-based VMs, like SQLite's (I think), ARE trees. A stack based VM's bytecode (without DUP and POP) is just the post-order depth-first traversal of the corresponding expression tree. With DUP you have a connected acyclic DAG. With POP you have an acyclic DAG with one or more components. With loops you have a full graph.

When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).

I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.

Either way, try taking something like the following and printing an expression tree like the one below

  LITERAL 2
  LITERAL 3
  LITERAL 4
  MUL
  PLUS

now, write something to produce on a terminal, the following:

You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.

Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.

Scaevolus · 2024-04-30T05:22:54 1714454574

SQLite's VM is register-based, not stack-based.

bch · 2024-04-30T06:41:50 1714459310

It’s been both - was stack, converted to register[0][1].

[0] https://www.sqlite.org/vdbe.html

[1] https://www.sqlite.org/src/info/051ec01f2799e095

notpeter · 2024-04-30T14:21:46 1714486906

This was the question I was going to ask! I've recently been diving into Lua 5.x internals and have been extremely impressed with Lua's register-based bytecode implementation. Lua has a stable byte code interface between patch releases (5.4.0 -> 5.4.1, etc) but not between major/minor revisions (5.3 -> 5.4). SQLite on the other hand does not consider this a public interface at all!

> "Remember: The VDBE opcodes are not part of the interface definition for SQLite. The number of opcodes and their names and meanings change from one release of SQLite to the next." https://www.sqlite.org/opcode.html#the_opcodes

For anyone interested in understanding how a register-based VM operates I highly recommend:

A No-Frills Introduction to Lua 5.1 VM Instructions by Kein-Hong Man. https://www.mcours.net/cours/pdf/hasclic3/hasssclic818.pdf

ojosilva · 2024-04-30T06:32:53 1714458773

Does SQLite's VM have an API? I mean, one where I can build and issue opcodes into directly.

OskarS · 2024-04-30T14:46:00 1714488360

Not a public one, because they want to have the freedom to change how it works. But if you type in `EXPLAIN CREATE TABLE x(y);` or `EXPLAIN SELECT 3;` or whatever into `sqlite3`, you can see it. And it's reasonably straightforward to dig into the source code (SQLite is very well-written C) to hook that up yourself if you really wanted to.

Sesse__ · 2024-04-30T16:29:01 1714494541

> Also, most SQL plans are trees (Until you get to WITH RECURSIVE).

WITH RECURSIVE can also generally be implemented tree-style, you just loop a bunch in one of the iterators.

There are some databases, like Umbra, which can have DAG query plans for the benefit of more efficient groupjoin (GROUP BY pushed down into a join). IIRC some of the unnesting patterns can also benefit from non-tree plans.

bane · 2024-04-30T04:20:57 1714450857

VMs really can be. They have a long history in code portability. In the old days it really wasn't uncommon to use an approach centered around some interpretable byte code running in a vm, where the reusable vm was all that needed porting to different architectures and operating systems. This all happened well before Java.

It was really big in gaming, Zork, Sierra games, LucasArts games, and even a few more "action" games like Flashback were all designed around VMs to make porting somewhat sane. Running the byte code is usually not the performance bottleneck in these cases but drawing stuff to the screen is, and the VM had to handle that.

runlaszlorun · 2024-04-30T05:18:20 1714454300

And Pascal p-code! Not the first, I’ve heard, but I believe it’s close to being the first.

pjmlp · 2024-04-30T05:57:41 1714456661

One of the first was Burroughs B5000,

https://en.wikipedia.org/wiki/Burroughs_Large_Systems

It used an almost memory safe systems language, ESPOL, zero Assembly, all CPU capabilities are exposed via intrisics, one of the first recoded uses of unsafe code blocks, there was tagging and capabilities support, the CPUs were microcoded. All of this in 1961, a decade before C came to be.

ESPOL was quickly replaced by NEWP, although there are very little data when it happened, probly a couple of years later.

Nowadays it is still sold by Unisys under the guise of being a mainframe system for those that value security above all, as ClearPath MCP, and you can get NEWP manual.

https://www.unisys.com/solutions/enterprise-computing/clearp...

https://public.support.unisys.com/aseries/docs/ClearPath-MCP...

kragen · 2024-04-30T06:09:48 1714457388

the b5000 was one of the first non-virtual stack machines, but its instruction set isn't any more of a virtual machine than the 8088's or the pdp-10's. there were a number of interpreted stack languages in the 50s, though not nearly as many as there would be later

pjmlp · 2024-04-30T06:25:23 1714458323

When one does digital archeology is it quite common to see Assembly referred to as bytecode, when the CPUs are actually interpreters written in microcode.

Another example, all the Xerox PARC workstations, which loaded the respective interpreter (Smalltalk, Interlisp, Mesa, Mesa/Cedar) into the CPU as first step during the boot process.

whartung · 2024-04-30T18:25:59 1714501559

According to 5s on Google, the x86 has always been microcoded. I guess the argument is that “machine language” is the public API of the CPU, and “assembly language” is the higher level script used to, mostly, directly create machine language.

Is Intel microcode able to be changed by anyone? I understand that even CPUs get field updates nowadays.

chuckadams · 2024-05-01T17:41:57 1714585317

Microcode updates are encrypted, and only Intel has the key. There are exploits to extract the key, but otherwise you're pretty much locked out.

I don't know a whole lot about microarchitecture, but from my understanding, it's not possible to modify microcode's actual uOps in software, it's the translation from assembly to microcode that's being patched in updates.

kragen · 2024-04-30T19:59:16 1714507156

i have never seen assembly, or the code generated from it, referred to as 'bytecode' unless there was no physical hardware or microcode that could interpret it — except in the case of smalltalk, whose bytecode was, as you say, interpreted by microcode on the alto and the d-machines. but possibly your archeology has delved into cultures mine has not? i'd be very interested in reading more, if you happen to have any links handy

gpderetta · 2024-04-30T09:47:10 1714470430

Don't know about Flashback, but Another World famously was VM based.

bane · 2024-04-30T20:06:46 1714507606

I might be wrong, but I've read on other project pages that it is. https://github.com/sylefeb/a5k

bitwize · 2024-04-30T11:58:50 1714478330

Wozniak found that 16-bit arithmetic was much easier to perform on a 16-bit machine. So he wrote SWEET16, a VM with its own bytecode, to work with 16-bit pointers in Apple's Integer BASIC.

Similarly, the TI-99/4 series of computers had a cumbersome way of accessing memory, because most of the RAM in the machine was exclusively controlled by the video chip and could only be accessed by poking a few registers. So the designers implemented a bytecode VM in ROM called GPL (Graphics Programming Language) that would make accesses to this video RAM seem like straightforward RAM accesses. TI encouraged the use of GPL to develop games and other applications for the system. Unfortunately, the fact that it was interpreted by the CPU, plus the fact that the CPU could only access video-chip memory during the horizontal and vertical retrace intervals, made GPL execution very slow. And since the machine's in-ROM BASIC interpreter was written in GPL, you now know why BASIC programs on the TI-99/4 and /4A crept along at the most leisurely of paces.

So VMs were definitely a thing used to make certain kinds of tradeoffs, even way back in the 8-bit days when systems had mere kilobytes of memory. Who knows how many might be lurking in a modern system!

krallja · 2024-04-30T13:03:06 1714482186

The unfortunate history of SWEET16 is that it was never used for its intended purpose - https://retrocomputing.stackexchange.com/a/14513/11579

hiAndrewQuinn · 2024-04-30T04:29:58 1714451398

VMs are a persistently underrated tool by people who think the conversation begins and ends at VirtualBox. They are a phenomenal way to constrain the surface area of what one's own code has to target, and that's always a plus.

whateveracct · 2024-04-30T04:15:11 1714450511

Everything is either a compiler or interpreter

Zambyte · 2024-04-30T04:29:52 1714451392

That's why SICP is often brought up in contexts that are not explicitly about interpreters :)

surfingdino · 2024-04-30T05:09:22 1714453762

Doesn't MS Word internally run a Forth-like VM? I remember reading an article by someone who decompiled an early MS-DOS version of Word only to discover that there was a VM inside.

jpfr · 2024-04-30T08:52:23 1714467143

They called it p-code at the time. The purpose was (purported) to simplify the porting between architectures.

https://casadevall.pro/articles/2020/11/compiling-word-for-w...

canucker2016 · 2024-05-02T08:48:00 1714639680

from http://www.trs-80.org/multiplan/

    "Originally code-named “EP” (for “Electronic Paper”), Multiplan was written to use a very clever p-code compiler system created by Richard Brodie. This p-code system, which was also used later by Microsoft Word, allowed Microsoft to target Multiplan to a huge number of different 8-bit and 16-bit computer systems. (Charles Simonyi once estimated that Multiplan was available for 100 platforms.)"

Many people say that the 'p' in 'p-code' stands for 'pseudo', i.e. pseudo code. But the article archived at https://techshelps.github.io/MSDN/BACKGRND/html/msdn_c7pcode... says the 'p' is short for 'packed'.

    "Microsoft has introduced a code compression technology in its C/C++ Development System for Windows version 7.0 (C/C++ 7.0) called p-code (short for packed code) that provides programmers with a flexible and easy-to-implement solution for minimizing an application's memory requirements. In most cases, p-code can reduce the size of an executable file by about 40 percent. For example, the Windows Project Manager version 1.0 (resource files not included) shrinks from 932K (C/C ++ 7.0 with size optimizations turned on) to 556K when p-code is employed."

    "Until now, p-code has been a proprietary technology developed by the applications group at Microsoft and used on a variety of internal projects. The retail releases of Microsoft Excel, Word, PowerPoint�, and other applications employ this technology to provide extensive breadth of functionality without consuming inordinate amounts of memory."

kqr · 2024-04-30T04:47:09 1714452429

Or indeed any time complexity need to be layered. A VM design doesn't even have to have an explicit bytecode compilation stage -- you can write an interpreter that runs as the instructions are issued.

David Parnas talks a lot about this as a way to reliable modularisation: https://two-wrongs.com/deliberate-abstraction.html

atombender · 2024-04-30T20:16:26 1714508186

MonetDB is another database that uses a VM [1], using an instruction set called the MonetDB Assembly Language (MAL). I believe it pioneered the technique [2]. The VM is basically designed to express the query as vectorized functions on columns.

[1] https://stratos.seas.harvard.edu/files/MonetDebull2012.pdf

[2] https://www.researchgate.net/publication/220538804_Database_...

naasking · 2024-04-30T13:23:08 1714483388

> That really drove home for me that VMs could be anywhere and were often a useful step in handling a user's expressions.

I think it's a special case of the fact that finite state machines are everywhere you look.

runlaszlorun · 2024-04-30T03:52:09 1714449129

SQLite is the first one I’ve looked at the internals of. Do others walk an AST of the query instead?

hn_throwaway_99 · 2024-04-30T03:55:50 1714449350

FTA:

> Tree-Of-Objects → The input SQL is translated in a tree of objects that represent the processing to be done. The SQL is executed by walking this tree. This is the technique used by MySQL and PostgreSQL.

Sesse__ · 2024-04-30T16:03:43 1714493023

Note that this only holds true for fairly recent MySQL (MySQL 8.x, not including the oldest 8.0.x releases). 5.7 and older, and by extension MariaDB, instead models pretty much everything as a large fixed-function recursive function that calls itself for each new table in the join, plus some function pointers for handling of GROUP BY and such.

TBH, when it comes to query execution (A JOIN B JOIN C GROUP BY …), I think the difference between SQLite's bytecode and a tree of iterators (the classical Volcano executor) is fairly small in practice; they are quite interchangeable in terms of what they can do, and similar when it comes to performance. The difference between bytecode and tree structure is much larger when it comes to evaluation of individual expressions (a + b * cos(c)), especially since that involves much more of a type system; that is more obviously in the “bytecode is the way to go” camp to me.

__s · 2024-04-30T03:55:20 1714449320

Yes, postgres for example. It maps pretty close to what you see from `explain` where the ops are pretty high level, reducing interpreter overhead. JIT's big gain is speeding up reading values out of row data

runlaszlorun · 2024-04-30T04:07:48 1714450068

Sorry, I’d read the article a couple years ago and forgot he goes into depth on the other approaches. My bad.

alex_smart · 2024-04-30T05:45:54 1714455954

I recently implemented my own expression evaluator in java for in-memory data frames, and once you think about doing that deeply, you very quickly understand the need for bytecode. If you directly evaluate the expression using a tree representation, you basically have to do a whole lot of branching (either via switch statements or polymorphism) for every single line of useful operation. Yes, the branch predictor kicks in and it means that your code wouldn’t be as slow as if it didn’t, but it is still measurably slower than if you converted the expression into bytecode once and just ran that on all rows instead.

eklavya · 2024-04-30T06:40:45 1714459245

Bytecode will only impact packing, so more efficient ram, cache and cpu wise. But I don't understand how it would help with branching? You still have to make the same decisions? As in the bytecode executor still needs to do differnt things based on the op code, its not in hardware.

alex_smart · 2024-04-30T10:24:19 1714472659

Consider evaluating an expression like "col1 + col2 == col3"

A tree based evaluator has to do something like -

    for (int i = 0; i < df.size(); i++) {
        // switch on expression type (column, binary operator, unary operator etc)
        // switch on expression operator type (add, subtract, equals etc)
        // switch on column type (int, long, float etc)
        // actual evaluation
    }

whereas a bytecode based evaluator is able to run something equivalent to -

    for (int i = 0; i < df.size(); i++) {
        result[i] = df.getLong(column1Index, i) + df.getLong(column2Index, i)) == df.getLong(column3Index, i)
    }

naasking · 2024-04-30T13:29:51 1714483791

You still have to branch on opcode selection which you're omitting in your translation. Branching on "expression type" and column type in your first example is also unnecessary. Bytecodes have different opcodes to operate on different types and different arities, so in both cases you have only one unpredictable branch for each operation/opcode.

The main benefit of bytecodes is then cache friendliness, by removing all of those unpredictable loads and stores to obtain the expression details (opcode, arguments, etc.). All of those are inlined into the bytecode array which is already in cache.

alex_smart · 2024-04-30T17:14:26 1714497266

> Branching on "expression type" and column type in your first example is also unnecessary

It’s not?

naasking · 2024-04-30T17:32:42 1714498362

It is unnecessary, only branching on expression operator type is strictly necessary. You're trying to abstract too much in your AST which yields an inefficient interpreter. I can invent an inefficient bytecode too, but that wouldn't be a correct evaluation of how fast bytecode interpretation could be when done right.

hickelpickle · 2024-04-30T07:56:03 1714463763

There is threaded bytecode as well, which uses direct jumping vs a switch for dispatch. This can improve branch prediction, though it is a debated topic and may not offer much improvement for modern processors.

kaba0 · 2024-04-30T15:18:27 1714490307

Do you have perhaps some links/references on that?

I have once tried benchmarking it by writing a tiny VM interpreter and a corresponding threaded one with direct jumps in Zig (which can force inline a call, so I could do efficient direct jumps) and I have - to me surprisingly- found that the naive while-switch loop was faster, even though the resulting assembly of the second approach seemed right.

I wasn’t sure if I saw it only due to my tiny language and dumb example program, or if it’s something deeper. E.g. the JVM does use direct threaded code for their interpreter.

immibis · 2024-04-30T09:21:53 1714468913

How does it know where to jump to?

chuckadams · 2024-04-30T14:14:16 1714486456

The jump target is compiled into the bytecode, so rather than return to the big switch statement, it jumps straight to the next opcode's implementation. The process is called "direct threading". These days a decent switch-based interpreter should fit in cache, so I'm not sure direct threading is much of a win anymore.

sitkack · 2024-04-30T08:30:34 1714465834

You should look at https://janino-compiler.github.io/janino/ it can compile Java into class files in memory that can be directly executed.

alex_smart · 2024-04-30T10:01:48 1714471308

I found this by looking at spark source code to figure out what they were doing under the hood to deal solve this problem. :)

Dwedit · 2024-04-30T04:28:15 1714451295

Running bytecode is much lower latency than compiling into native code. If you're not bottlenecked by running the bytecode (as opposed to memory or disk speed), you don't really have to JIT it any further into native code.

rhdunn · 2024-04-30T05:48:20 1714456100

Which is why JavaScript engines (and JIT compilers for other languages) are typically designed to:

1. start interpreting the code once it has been parsed, and start jitting a function being called;

2. generate naive bytecode for the function that generates native code equivalents of the run actions of the AST (including some fast-path code for simple cases such as adding two 32-bit integers, and falling back to function calls to perform the add on more complex types);

3. generate more optimal code for sequences of instructions in the background, such as entire for/while loops, then patch in calls to those fast-path versions when ready.

That way you can start running the code immediately after parsing it, and can switch to the faster versions when they are ready if the code takes longer to run in the slower version.

gpderetta · 2024-04-30T10:24:20 1714472660

Depends how much compilation you want to do. For example converting threaded code into native direct jumps need not be more expensive than plain bytecode. Of course the gains are also limited.

winrid · 2024-04-30T04:35:46 1714451746

Yeah, but nobody is seriously considering that unless maybe for huge prepared statements. The argument is usually bytecode vs parser and associated data structures.

_flux · 2024-04-30T05:49:39 1714456179

PostgreSQL is not only considering it, they're doing it! https://www.postgresql.org/docs/current/jit-reason.html

I don't have personal experience on it, but I've read that in practice it's not worth the effort—at least not yet. Apparently there are some issues with it and it barely speeds up queries (except perhaps certain ones?). I imagine this could be in big part because LLVM is not really a good fit for JIT where you want to spend very little time to do the compilation itself.

masklinn · 2024-04-30T06:15:36 1714457736

Yeah my experience of the pg jit is mostly negative, it’s quite slow and it has a hard time estimating the cost of interpreted v compilation + compiled execution so more often than not it’s actively detrimental. It might fare better if you make heavy use of a limited number of prepared statements.

winrid · 2024-04-30T08:47:03 1714466823

Interesting, thanks. I imagine they will have a query planner for the query planner to determine to JIT or not :)

MrBuddyCasino · 2024-04-30T05:03:23 1714453403

I was surprised the text didn’t mention one major difference between the byte code approach vs AST: coupling.

When your database engine runs in-process, there is no possibility of the server and the client library having diverging versions. But this is common with traditional databases.

Once you bake in the execution steps („how to execute“) instead of describing the query via AST („what to execute“), an important part of the execution logic now lives in the driver.

I suspect this complicates version upgrades and bugfixes, because the database is less self-contained.

Not an issue for sqlite, potentially disastrous for mysql.

tadfisher · 2024-04-30T05:22:43 1714454563

Do clients typically communicate with the server in some AST representation instead of, well, SQL? I'd be surprised if that much parsing/planning happens on the client.

MrBuddyCasino · 2024-04-30T05:31:15 1714455075

Since prepared statements are created by the driver, I was assuming this was the case - but I might be completely wrong here.

_flux · 2024-04-30T05:57:28 1714456648

Converting a SELECT to a PRPEPARE does not really require parsing the complete query—or even it it did, some small concessions for this could be implemented in the line protocol to enable the server to do the prepared statement out of client query.

I don't believe *any* SQL client library actually tries to parse e.g. PostgreSQL itself at any point of processing. You can read what the PostgreSQL protocol does here: https://www.postgresql.org/docs/current/protocol-flow.html#P...

layer8 · 2024-04-30T11:06:17 1714475177

No, prepared statements are server-side.

evanelias · 2024-04-30T14:50:06 1714488606

> an important part of the execution logic now lives in the driver

> potentially disastrous for mysql

This is completely incorrect. The MySQL client side isn't responsible for execution steps and this isn't part of its wire protocol at all.

With prepared statements, the server parses the statement and gives the client back a handle, which is basically just a unique identifier that the client can use to invoke the prepared statement.

Sesse__ · 2024-04-30T16:32:16 1714494736

> Not an issue for sqlite, potentially disastrous for mysql.

MySQL _completely_ switched execution paradigm through the course of a couple of 8.0.x sub-releases, generally without anyone noticing.

megadal · 2024-04-30T05:10:13 1714453813

Perhaps my understanding is off, but I am pretty sure parsing and translating SQL into bytecode still involves an AST.

Just that query processing itself is done from the bytecode (produced presumably from an AST or something similar) rather than directly from the AST itself.

If I'm right I can't really see how this performs better unless you're excluding the parsing step from benchmarks

miloignis · 2024-04-30T05:16:18 1714454178

An AST being generated as an intermediate step is mentioned in the article, at least in passing in section 2.4.

The reason bytecode is generally faster (not just for SQL, but in most interpreted languages you may use (Python, etc)) is that walking the AST is relatively expensive and doesn't treat caches nicely. Bytecode is generally located next to each other in memory, and you can make the bytecode fetch/dispatch pretty tight, relative to indirect function calls on an object in an AST.

rhdunn · 2024-04-30T05:32:42 1714455162

Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST, providing better instruction pipelining with having all the run code next to each other.

The downside -- especially for dynamic languages like JavaScript -- is that you need to keep all of the type checks and fast-paths in the code, resulting in larger code blocks. With more type analysis you could group fast-path instructions together (e.g. within a while or for loop) but that takes time, which is typically why a JIT engine uses multiple passes -- generate the slower machine code first, then improve the fast-path blocks for code that is long running.

epcoa · 2024-04-30T08:14:42 1714464882

> Another advantage to that is that it avoids the branching of the while loop in the interpreter that iterates over the AST

Huh? A bytecode interpreter still ends up branching based on the decoded value of the byte codes. The VM bytecodes are not directly executed by the CPU (at least not usually, Jazelle and some older P-code stuff being rare exceptions).

This is what it looks like for the example being discussed ("a massive switch statement"): https://github.com/sqlite/sqlite/blob/b11daa50f9ea11c332bb59...

Perhaps confusing bytecode generation and interpretation with JIT? JIT is often paired with a bytecode but neither is dependent on the other.

rhdunn · 2024-04-30T11:32:21 1714476741

The parent mentioned avoiding the dispatch of a virtual function call used to evaluate the AST/bytecode so I was assuming it was a minimal JIT instead of running the resulting bytecode in an interprative loop.

But yes, if you are interpreting the intermediate VM bytecode you will still have a switch or dispatch branch when evaluating each instruction.

lifthrasiir · 2024-04-30T05:13:23 1714454003

You don't need one, Lua is another example where no AST is ever generated. In some sense the resulting bytecode closely corresponds to the AST that would have been generated though.

megadal · 2024-04-30T05:26:06 1714454766

Genuinely asking as parsing without an AST is something I've never seen explained: How do you go from source code to bytecode without an AST?

Isn't the bytecode just a flattened representation of an AST obtained by some sort of tree traversal?

This seems to imply an AST is involved in the generation of the bytecode

lifthrasiir · 2024-04-30T05:41:40 1714455700

Have you used any parser generator like yacc/bison? They have "actions", which are arbitrary codes that will be executed when some grammar production is detected. For example, `expr ::= expr mulop expr { some code }` will execute `some code` when a multiplicative expression is detected, where intermediate results from two `expr`s and `mulop` are available to that code. This concept of actions generally applies to all sort of parsers, not just generated parsers.

Those actions would typically allocate and build (partial) ASTs, but you can do anything with them. You can for example directly evaluate the subexpression if your grammar is simple enough. Likewise bytecodes can be generated on the fly; the only concern here is a backward reference, which has to be patched after the whole expression block gets generated, but otherwise you don't have to build any tree-like structure here. (Most practical parsers only need a stack to function.)

vidarh · 2024-04-30T12:50:21 1714481421

Consider that if the bytecode is just a flattened representation, then instead of generating and traversing the tree, you can just traverse what would have been turned into the tree in the same traversal order directly.

E.g. imagine you're in a leaf production for some simple grammar for an operator that can only take integers, w/all error handling ignored:

     parse_fooexpr() {
          left = parse_int
          emit_int(left)
          expect_token(FOO_OP)
          right = parse_int
          emit_int(right)
          emit_foo_op()
     }

Whether emit_int outputs a "push <someint>" VM instruction to a bytecode buffer, or constructs an IntNode and pushes it to an argument stack, and whether "emit_foo_op" emits a "foo_op" bytecode instruction or pops the argument stack and constructs a FooOpNode(...argument stack values) AST node is an implementation detail in terms of the parser.

Wirth's compilers usually didn't use an AST, and his book on compiler construction contains a thorough walkthrough on generating code as you parse, if you want to see it explained in a lot more detail and w/tricks to generate better code than the most naive approaches will.

The difficulty with this approach comes when you want to apply optimisations or where it's inconvenient to generate the code in parse-order because the language tries to smart about what feels better to write, but you can get pretty far with this method.

kccqzy · 2024-04-30T06:42:31 1714459351

Another way to think about this is to imagine you are working in a lazy-by-default language like Haskell. The code might seem to build up an AST and then evaluate it, but (disregarding parse errors) the AST is never materialized in memory at once: the tree might only have one level, with its children represented by thunks that continue to parse more of the source code. (If you are unfamiliar with Haskell laziness, imagine that the so-called tree has children that are functions to produce the next level of the tree.) Of course you will need a carefully designed bytecode in order to generate bytecode from AST where the children is not yet known.

lolinder · 2024-04-30T12:04:40 1714478680

The second half of Crafting Interpreters [0] does this—rather than outputting the AST, the bytecode interpreter outputs bytecode directly. Essentially, the process of creating the AST is already very similar to the tree traversal you'll do to unpack it (recursive descent parsing is a traversal of the tree, just in the call stack rather than over a real data structure), so if you don't need the AST you can skip it and just start outputting bytecode as if you were already walking the tree.

[0] http://craftinginterpreters.com/

ksherlock · 2024-04-30T12:53:00 1714481580

Take a look at the evergreen Let's Build a Compiler, by Jack Crenshaw

https://compilers.iecc.com/crenshaw/

Chapter 2 and 3, expression parsing, should give you the idea. Instead of building the AST, you build code. So you're cutting out the middle man (AST).

ufo · 2024-04-30T12:02:48 1714478568

It's as if they generated an AST and then immediately converted it to bytecode, all at once.

The key restriction is that you must be able to generate the bytecode in a single pass over the source code. For example, the compilation of a function at the top of the file must not depend on declarations further down the file.

vidarh · 2024-04-30T12:52:24 1714481544

Depending on the form the dependencies takes you can handle that too. E.g. if it's "just" about knowing addresses, then just building up a list of relocation records where you need to patch in addresses as they become known works just fine. Of course the more work like that you have to do, the less you save over just building an AST anyway.

UK-AL · 2024-04-30T14:52:53 1714488773

Parse Tree and AST are slightly different. A parse tree can be implict. A recursive descent parser can can explore the parse tree without there being an explict parse tree data structure.

kazinator · 2024-04-30T06:00:13 1714456813

How you go from source to target code without an AST is that the syntax-directed translation step in your implementation (that which would build the AST) doesn't bother with that and just builds the output code instead. The extra traversal is skipped; replaced by the original parser's traversal of the raw syntax.

E.g. pseudo-Yacc rules for compiling the while loop in a C-like notation.

  while_loop : WHILE '(' expr ')' statement
               {
                   let back = get_label();
                   let fwd = get_label();
                   let expr = $3; // code for expr recursively generated
                   let stmt = $5; // code for statement, recursively generated
                   $$ = make_code(stmt.reg(),
                                  `$back:\n`
                                  `${expr.code()}\n`
                                  `BF ${expr.reg}, $fwd\n`  // branch if false
                                  `${stmt.code()}\n`
                                  `JMP $back\n`
                                  `$fwd:\n`)                                
               }
               ;

Every code fragment coming out of a rule has .code() and .reg(): the generated code, which is just a string, and the output register where it leaves its value. Such representational details are decided by the compiler writer.

The while statement produces no value, so we just borrow the statement's .reg() as a dummy in the call to make_code; our rule is that every code fragment has an output register, whether it produces a value or not.

When the LALR(1) parser reduces this while loop rule to the while_loop grammar symbol, the expr and statements have already been processed; so the rule action has ready access to the code objects for them. We just synthesize a new code object. We grab a pair of labels that we need for the forward and back jump.

I'm assuming we have a vaguely JS-like programming language being used for the grammar rule actions, in which we have template strings with interpolation, and adjacent strings get merged into one. The bytecode assembly is line oriented, so we have \n breaks.

One possible kind of expression is a simple integer constant, INTEGER:

  expr : INTEGER 
         {
           let reg = allocate_reg();
           let val = $1
           $$ = make_code(reg,
                          `LOAD $reg, #$val\n`)
         }

One possible statement is an empty statement dented by empty curly braces:

  statement : '{' '}'
              {
                $$ = make_code(R0,  // dedicated zero register
                               ""); // no-code solution
              }

So then when we have while (1) { } we might get R1 allocated in the expr rule, like this:

  LOAD R1, #1\n   ; output register is R1

then in the while loop, things get put together like this:

  L0:\n           ; back label
  LOAD R1, #1\n   ; code for expr
  BF R1, L1\n     ; branch if false to L1
                  ; no code came from empty statement
  JMP L0          ; back branch
  L1:\n           ; fwd label

thewakalix · 2024-04-30T06:12:24 1714457544