Hacker News new | past | comments | ask | show | jobs | submit login

> 2.5 No free lunch theorem > > Although it may be tempting to define the optimal learning algorithm that works optimally for all distributions, this is impossible. In other words, learning is only possible with assumptions.

A mention of no free lunch theorem should come with a disclaimer that the theorem is not relevant in practice. An assumption that your data originates from the real world, is sufficient that the no free lunch theorem is not a hindrance.

This book doesn't discuss this at all. Maybe mention that "all distributions" means a generalization to higher dimensional spaces of discontinuous functions (including the tiny subset of continuous functions) of something similar to all possible bit sequences generated by tossing a coin. So basically if you data is generated from an even random distribution of "all possibilities", you cannot learn to predict the outcome of the next coin tosses, or similar.




Yes, most free lunch theorems and results of this kind, which make overly general assumptions, tend to be too pessimistic.

For example, many people naively think that static program analysis is unfeasible due to the halting problem, Rice's theorem, etc.


Just curious, how does program analysis on real-world programs exactly circumvent the problems of the halting problem or Rice’s theorem? In the real world, do we only ever have statically analyze a special subset of all programs?


The main strategy is to build sound but imperfect analyzers, i.e. analyzers that never raise false negatives but that may raise some false positives. See §1.6 in [1], a simplified version of the classic Principles of Program Analysis. Good analyzers are practical, rarely raising false positives for well-written programs. The Astreé analyzer reported zero false positives for the 1 MLOC fly-by-wire A380 code.

Another complementary strategy is to avoid Turing-complete constructs as much as possible, i.e. use DSLs with restricted semantics. This way, advanced semantic properties such as termination are provable.

[1] Program Analysis, An Appetizer. https://arxiv.org/pdf/2012.10086.pdf


Halting problem only applies in the general case. I can trivially tell you that while(1) will never halt.

There are many examples of programs whose halting behavior is not known (collatz conjecture for example) but many others where program analysis works just fine.


If you write a program whose behavior is Collatz-like (say, in some states it queues up more work for itself, and in other states it completes work from the queue, and you believe that in general it should always ultimately complete the queue) it is actually useful to have a static analyzer tell you ‘it’s not entirely clear that this code will ever terminate’.

You can make the analyzer happy by adding a max iteration limit or a recursion depth limit or something to make sure it fails out rather than looping forever.

Which is probably a good idea anyway, if you’re running code that you can’t mathematically prove will always complete.


In the real world, if program analysis hits a blocker like this, you tweak stuff until it don't. Top-level post is correct in that, while theory applies to the general case, data and programs we actually use are not completely random/general - there's lots of properties baked in as a consequence of being real-world, physical entities.


The commercial static analyzers I've seen generate false positives (bogus issues that just aren't there) and false negatives (e.g., unconditional out of bounds pointer writes if that particular piece of C code is ever executed). Some of that comes with the territory because commonly used languages and their libraries are underspecified and commonly used at the boundaries of what is specified. And these tools must always produce some result even if they cannot even parse (or see) the entire code base.

Usually, when people say “static analysis“ they accept unsoundness and use of heuristics. Otherwise, they call the tool a type checker or a verifier. Such tools may run into the theoretical issues you mentioned. For them, the solution is to change the program until it compiles in a reasonable amount of time.


In the real World Turing machines don't exist only finite state Machines. (No infinite tape or infinite time)

I guess something related to this one way or another.


You don't need an infinite tape to make a finite state machine that never halts. As Legend2440 pointed out upthread, while(1) is a simple finite state machine that never halts.


Sure but halting problem is solvable for finite state machines.


Could you expand or provide a link to a good resource for me to understand this?

If the judge program should say terminates yes/no and the program given is `while True: continue`, I guess the argument is that in the finite case, you could in principle just enumerate all programs that don't terminate and identify them as such?


In principle, you can enumerate all possible memory states of the system and determine what the next memory state would be from each one (including multiple possible next states if you account for things like interrupts)

Then you treeshake the unreachable parts of that directed graph from the start state, and look for closed loops in what remains.


Makes sense! Thanks!


The theoretical halting problem is required to return a yes/no answer, whereas in the real world, it's actually really valuable to get back a "maybe doesn't halt" answer, so you can then more specifically iterate on those "here be dragons" areas of your system.


It's analogous to the problem of induction (Hume). Without assumptions, it's impossible to connect past observations to predictions about the future. Observing that the sun rises a thousand mornings does not automatically make it any more or less likely that it will rise tomorrow, unless we make some assumptions, for example that events tend to be similar over time. And those assumptions can't be extracted from the data. Maybe things tended to be similar over time in the past, but that says nothing about the future. The pattern might break tomorrow and to say that it likely won't, is simply an assumption on a higher level.

But it seems like science is "doing fine" despite this problem. Similarly machine learning chugs along fine, because people use their experience and prior knowledge when designing the algorithms. These assumptions are also called inductive biases. They are biasing the learning towards certain patterns (like "things tend to be similar locally").


One just assumes Occam's razor and you're good to go in a vast majority of cases


If you formalize Occam's "simplicity" as description length, then that depends on encoding. By assuming an encoding, you implicitly assume a distribution (according to entropy coding, eg Huffman coding) and hence inject an inductive bias.

I'm not saying that it's bad to assume one. The point is that it is an assumption. My bigger point is that the no free lunch theorem should only bother you as much as the induction problem bothers you. Which in practice means not at all.


I don't see how your disclaimer applies. My interpretation of no free lunch theorems is that no single algorithm works well for all classes of problems, not that some problems are unlearnable. The example in its proof might be contrived, but in actuality, additional assumptions can and do lead to different algorithms being picked, no?


Transformers go brrr


Work pretty well for all classes of problems?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: