There's an older tradition of rule-based machine translation. In these methods, ...

GMoromisato · 2025-12-17T06:11:43 1765951903

This is the bitter lesson.[1]

I too used to think that rule-based AI would be better than statistical, Markov chain parrots, but here we are.

Though I still think/hope that some hybrid system of rule-based logic + LLMs will end up being the winner eventually.

----------------

[1] https://en.wikipedia.org/wiki/Bitter_lesson

beepbooptheory · 2025-12-17T17:13:40 1765991620

These days its pretty much the "sweet" lesson for everyone but Sutton and his peers it seems.

zavec · 2025-12-17T17:58:35 1765994315

It's bitter for me because I like looking at how things work under the hood and that's much less satisfying when it's "a bunch of stats and linear algebra that just happens to work"

warkdarrior · 2025-12-17T22:14:16 1766009656

So you prefer "a bunch of electrons, field effects, and clocks than just happen to work"?

schoen · 2025-12-17T22:51:15 1766011875

If you're building on a computer language, you can say you understand the computer's abstract machine, even though you don't know how we ever managed to make a physical device to instantiate it!

skylurk · 2025-12-17T10:06:22 1765965982

Yep, some domains have no hard rules at all.

Time flies like an arrow; fruit flies like a banana.

immibis · 2025-12-17T21:41:41 1766007701

It's completely possible to write a parser that outputs every possible parse of "time flies like an arrow", and then try interpreting each one and discard ones that don't make sense according to some downstream rules (unknown noun phrase: "time fly").

I did this for a text adventure parser, but it didn't work well because there are exponentially ways to group the words in a sentence like "put the ball on the bucket on the chair on the table on the floor"

skylurk · 2025-12-17T22:46:38 1766011598

I would argue that particular sentence only exists to convey the bamboozled feeling you get when you reach the end of it, so only sentient parsers can parse it properly.

FeepingCreature · 2025-12-17T16:49:16 1765990156

> There's an older tradition of rule-based machine translation. In these methods, someone really does understand exactly what the program does, in a detailed way

I would softly disagree with this. Technically, we also understand exactly what a LLM does, we can analyze every instruction that is executed. Nothing is hidden from us. We don't always know what the outcome will be; but, we also don't always know what the outcome will be in rule-based models, if we make the chain of logic too deep to reliably predict. There is a difference, but it is on a spectrum. In other words, explicit code may help but it does not guarantee understanding, because nothing does and nothing can.

schoen · 2025-12-17T22:46:47 1766011607

The grammars in rule-based MT are normally fully conceptually understood by the people who wrote them. That's a good start for human understanding.

You could say they don't understand why a human language evolved some feature but they fully understand the details of that feature in human conceptual terms.

I agree in principle the statistical parts of statistical MT are not secret and that computer code in high-level languages isn't guaranteed to be comprehensible to a human reader. Or in general, binary code isn't guaranteed to be incomprehensible and source code isn't guaranteed to be comprehensible.

But for MT, the hand-written grammars and rules are at least comprehended by their authors at the time they're initially constructed.

FeepingCreature · 2025-12-18T20:15:57 1766088957

Sure, I agree with that, but that's a property of hand-writing more than rule-based systems. For instance, you could probably translate a 6B LLM into an extremely big rule system, but doing so would not help you understand how the LLM worked.

pona-a · 2025-12-18T11:10:12 1766056212

Do you know what is the SOTA rule-based MT? I used to be deep into symbolics but couldn't find much in the way of contemporary rule based NLP.

schoen · 2025-12-18T23:23:06 1766100186

My friend is working on Grammatical Framework, which has a Resource Grammar library of pre-written natural language grammars, at least for portions of them. The GF research community continues to add new ones over time, based on implementing portions of written reference grammars, or sometimes by native speakers based on their own native speaker intuitions. I'm not sure if there are larger grammar libraries elsewhere.

There could be companies that made much better rule-based MT but kept the details as trade secrets. For example, I think Google Translate was rule-based for "a long time" (I don't remember until what year, although it was pretty apparent to users and researchers when it switched, and indeed I think some Google researchers even spoke publicly about it). They had made a lot of investment (very far beyond something like a GF resource grammar) but I don't think they ever published any of that underlying work even when they discontinued that version of the product.

So basically there may be this gap where academic stuff is advancing slowly and yet now represents the majority of examples in the field because companies are so unlikely to have ongoing rule-based projects as part of projects. The available state of the art you can actually interact with may have gone backwards in recent years as a result!

nimi sina li pona tawa mi.