OpenAI Baselines: ACKTR and A2C

Dzugaru · on Aug 18, 2017

What these guys doing is amazing. RL algos are so hard to get right and evaluate - they can be very unstable and depend on subtle details. I've tried it myself (DQN and my version of actor critic) on simple tasks and still don't know if I had errors. Its like you're are chasing a moving target with a neural net - second-order instability :) With this code to pick apart and compare against (especially A2C) it'll be so much easier to hack RL.

fmap · on Aug 19, 2017

Can you expand a little bit on that? It seems like we are talking about very small programs here with simple specifications. What are some common problems with reinforcement learning implementations?

Is it really just that you can't easily test your program?

Dzugaru · on Aug 19, 2017

1) You can't easily test it - bugs can lead to semi-working thing and you may need to run thousands of experiments to average

2) Many hyperparameters - it may be critical to get them right

3) No gradual implementation - it doesn't work until you get all the (often gimmicky) parts right. Take A3C for example - its paper version is parallel, may be hard to implement and debug in your language (and hardware) of choice.

4) Task and reward function choice may be hard - take a look on my question https://stackoverflow.com/questions/44781401/is-openai-gym-c...

Generally, RL algos performance is underwhelming in my experience, without heavy tuning (and so-called "domain expertise" in reward function), but I'm not an expert and OpenAI guys show that you can make the working thing like Dota2 bot, so they give me hope.

windowshopping · on Aug 18, 2017

The fact that there's no documentation on these is a surprise to me. I look at the subdirectory containing the files for a given algorithm such as ACKTR, and I just see a bunch of files with no README or anything like that and I wonder, am I supposed to read each file end-to-end to know where the entry point is?

evc123 · on Aug 19, 2017

The paper is the documentation.

windowshopping · on Aug 19, 2017

The paper makes no mention of any of the files in the repo, nor any of the classes defined inside them. I'm not really sure what you mean.