I thought the AlphaZero paper was pretty cool: https://arxiv.org/abs/1712.01815 ...

not-my-account · on July 24, 2023

totally! MuZero is my favourite[1]

The super cool thing about MuZero is that it learns the dynamics of the problem, i.e. you don't have to give it the rules of the game, which makes the algorithm very general. For example, DeepMind threw MuZero at video compression and found that it can reduce video sizes by 6.28% (massive for something like YouTube)[2][3].

Curious if anyone else knows examples of MuZero being deployed outside of toy examples?

[1] https://arxiv.org/pdf/1911.08265.pdf [2] https://arxiv.org/pdf/2202.06626.pdf [3] https://www.deepmind.com/blog/muzeros-first-step-from-resear...

(edit s/Google/DeepMind)

smokel · on July 25, 2023

> you don't have to give it the rules of the game

To be fair, it uses MCTS, which requires many simulations of the game. For this, it needs to know which moves are valid, and when a player wins or loses the game.

So it does need to know the rules of the game, but it doesn't need any prior knowledge about which moves are better than others.

not-my-account · on July 31, 2023

Not quite, you can define an illegal move as losing the game, and winning/losing is a “meta-observation” - ie if the player wins/loses, you don’t invoke another search.