Q Learning with Atari


Reinforcement Learning is the third type of machine learning in which our model learns to optimize it’s expected rewards given an unknown environment. In otherwords, our model tries to learn from it’s failures how to optimize the highest amount of rewards. Q-learning in particular, is an off policy reinforcement algorithm with the goal of finding the best action given a particular state. This model is different from on-policy algorithms in the sense that it can learn the best policy without following a predefinied policy from prior training or given policy. Q learning specifically aims to optimize the action value which can be thought of as a cumulative rewards given actions taken in a particular state. To be more formula, the equation is defined as the following:

Q Learning Formula

We are interested in updating the current action value Q given whatever action we take. We have a learning rate which tells us how much to update our action values. We collect the reward and minimize the optimal next reward with a discount factor to balance short term rewards and long term rewards.

If you are interested in learning more, Andrew Barto and and Richard Sutton have written a great book introducing the concepts called Reinforcement Learning: An Introduction.

OpenAI and Gym

OpenAI has a python package called gym which contains many different environments to test reinforcement learning algorithms. For example, I use the atari envinroments to model a basic q-learning algorithms following a episilon greedy policy on Ms.Pacman.

This is an implementation of Q-Learning trained on 100 episodes on Ms.Pacman.

Finally, the source code to generate your own q-learning algorithms using gym’s Atari games can be found at the following link:
Google Colab Notebook

OpenAI Gym