Thursday, March 23, 2017

Evolution Strategies as a Scalable Alternative to Reinforcement Learning - implementation -




We explore the use of Evolution Strategies, a class of black box optimization algorithms, as an alternative to popular RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using hundreds to thousands of parallel workers, ES can solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training time. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.





Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

1 comment:

SeanVN said...

I pretty sure I gave the link to this scale free evolution strategies algorithm before: https://pdfs.semanticscholar.org/c980/dc8942b4d058be301d463dc3177e8aab850e.pdf
There are simple bit hacks you can use to generate that mutation probability distribution as well.

I found a good way to use it to evolve deep neural nets:
https://groups.google.com/forum/#!topic/artificial-general-intelligence/Nz_qW2FK8QY
https://discourse.numenta.org/t/overcoming-catastrophic-forgetting/2009

Printfriendly