Hacker News new | past | comments | ask | show | jobs | submit login

I find it interesting that AlphaGo improves its play by playing against itself. I wonder what the limits of this are.



In RL you have two modes, "explore" and "exploit". In explore mode it doesn't always select the best known move, instead it selects a promising move for which it has less experience. This is how the surprising new strategies are discovered, in self play there's no shame in losing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: