> nothing impressive at all when it comes to RL in the real world
Apparently, many people on this website think Go and StarCraft qualify as "the real world". Majority of them probably never even heard of blocks world and SHRDLU.
AFAIK, the biggest problem in reinforcement learning still is connecting the final outcome with individual actions you took to achieve it. This hits you hard when you move from toy domains to real-world problems, because you can't replay real-world scenarios thousands of times, and even if you do, you can can get completely different results due to various random and external factors.
This can be somewhat mitigated by taking Marvin Minsky approach and building / using a simulation of the problem instead of the real thing. However, in many domains building a realistic simulation of the domain is significantly harder than supervising learning.
Somehow this reminds me of one I missed: all the simulation-based driving improvements at Waymo, Cruise and others. Hard to know if it's strictly RL or not from "we train neural networks while driving in simulation" but it's similar enough for my taste.
I'll be curious to see the set of domains where a hybrid approach (some real world, lots of simulation) works out. The nice thing about simulation is that you can experience lots of things that you never want happening in the real world (e.g., child runs in front of car with only X00 ms to impact). Trading off the difficulty of accurate simulation versus needing to trust that the model will behave correctly under a situation you could have simulated, will (likely) be an interesting liability challenge for autonomous driving at the least.
A common misconception is that self-driving car companies (outside of a few smaller startups) are using RL to drive the car. They are not. They use deep learning for perception systems which produce tangible outputs that can be processed by what amounts to expert systems.
I work in this space and even if you could assume the RL would never make a mistake it's not auditable in the way you would need it to be for things like insurance. In general, RL isn't ready to be used in complex situations where people can die when things go bad. This ignores the sample efficiency challenges and handling unseen data.
Apparently, many people on this website think Go and StarCraft qualify as "the real world". Majority of them probably never even heard of blocks world and SHRDLU.
AFAIK, the biggest problem in reinforcement learning still is connecting the final outcome with individual actions you took to achieve it. This hits you hard when you move from toy domains to real-world problems, because you can't replay real-world scenarios thousands of times, and even if you do, you can can get completely different results due to various random and external factors.
This can be somewhat mitigated by taking Marvin Minsky approach and building / using a simulation of the problem instead of the real thing. However, in many domains building a realistic simulation of the domain is significantly harder than supervising learning.