AlphaGo Zero

Recently, DeepMind published an article about AlphaGo Zero, the latest evolution of AlphaGo. According to the results they have published, AlphaGo Zero is even more powerful and the strongest Go player in History. AlphaGo starts tabula rasa, that is, it starts from a blank state, and it uses only the board states and the games it plays against itself to tune the neural network and predict the right moves.

AlphaGo Zero uses a deep neural network, that takes as an input the raw board representations (present and history) and outputs both move probabilities and a value. Thus this neural network combines the role of both policy network and value network. The network is trained from games of self-play, unlike previous AlphaGo versions (they were trained using supervised learning). At each position, a Monte Carlo Tree Search (MCTS) is performed, guided by the neural network. The neural network is trained by a self-play reinforcement learning algorithm that uses MCTS to play each move.

Initially, the neural network has its weights randomly initialized. At each iteration step, many games of self-play are generated. At each time step, an MCTS search is performed for the possible policies using the previous iteration of the neural network., then a move is played by sampling the search probabilities. This is repeated till this particular game terminates. The game state, the policy taken and rewards for each time step of the game are stored. In parallel, the neural network is trained from the data sampled uniformly among all the time steps of the previous iteration(s) of self-play. The weights of the neural network are adjusted so as to minimize the error between the predicted value and the self-play winner, and to maximize the similarity of the neural network move probabilities to the search probabilities.

With only 3 days of training on a single machine with four TPUs, AlphaGo Zero beat AlphaGo by 100-0. AlphaGo Zero is based solely on RL. The detail of its implementation can be read in the paper Mastering the game of Go without human knowledge published in Nature, October 2017.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.15.85