Right, this is exactly what I'm getting at - learning a distribution over a fixe...

_hark · 2024-07-23T17:59:34 1721757574

We could discuss completeness of the sample space, and we can also discuss completeness of the hypothesis space.

In Solomonoff Induction, which purports to be a theory of universal inductive inference, the "complete hypothesis space" consists of all computable programs (note that all current physical theories are computable, so this hypothesis space is very general). Then induction is performed by keeping all programs consistent with the observations, weighted by 2 terms: the programs prior likelihood, and the probability that program assigns to the observations (the programs can be deterministic and assign probability 1).

The "prior likelihood" in Solomonoff Induction is the program's complexity (well, 2^(-Complexity), where the complexity is the length of the shortest representation of that program.

Altogether, the procedure looks like: maintain a belief which is a mixture of all programs consistent with the observations, weighted by their complexity and the likelihood they assign to the data. Of course, this procedure is still limited by the sample/observation space!

That's our best formal theory of induction in a nutshell.

bubblyworld · 2024-07-24T08:00:06 1721808006

Someone else pointed me to Solomonoff induction too, which looks really cool as an "idealised" theory of induction and it definitely solves my question in abstract. But there are obvious difficulties with that in practice, like the fact that it's probably uncomputable, right?

I mean I think even the "Complexity" coefficient should be uncomputable in general, since you could probably use a program which computes it to upper bound "Complexity", and if there was such an upper bound you could use it to solve the halting problem etc. Haven't worked out the details though!

Would be interesting if there are practical algorithms for this. Either direct approximations to SI or maybe something else entirely that approaches SI in the limit, like a recursive neural-net training scheme? I'll do some digging, thanks!