Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
Termordle - a terminal based wordle clone in python

Termordle - a terminal based wordle clone in python

2 Feb 08, 2022
Flappy bird using Pygames

flappy-bird Esse é um jogo que eu fiz utilizando a biblioteca de jogos do Python

Leandro Henrique 2 Jan 05, 2022
A playable version of Chess – classic two-player, various AI levels, and the crazyhouse variant! Written in Python 3

A playable version of Chess – classic two-player, various AI levels, and the crazyhouse variant! Written in Python 3. Requires the installation of PIL/Pillow and Requests

1 Dec 24, 2021
A set of tools to help you with running a Project Zomboid game server (Linux only)

Project Zomboid Server Tools A set of tools to help you with running a Project Zomboid game server (Linux only). Features Install Project Zomboid Dedi

24 Dec 25, 2022
Script to remap minecraft 1.12 java classes.

Remapper Script to remap minecraft 1.12 java classes. Usage You must have Python installed. You must have the script, mappings, and files / folders in

8 Dec 02, 2022
Simplerpg - python terminal game made from scratch.

Simplerpg - python terminal game made from scratch.

reversee 3 Sep 17, 2022
A puzzle game coded entirely in Python.

Pyzzle A puzzle game coded entirely in Python. This is a school project created by me, Mohit Singh. The .exe file, created from the main.py script, is

Mohit Singh 1 Mar 19, 2022
A base chess engine that makes moves on an instance of board.

A base chess engine that makes moves on an instance of board.

0 Feb 11, 2022
Tic-Tac-Toe Game in python3 Tkinter

Tic Tac Toe Tic-Tac-Toe Game in python3 Tkinter About: Tic Tac Toe or Noughts and Crosses as called in British is a pencil and paper game for two play

Sai Swarup Yakkala 5 Nov 06, 2022
A top-down arcade space shooter made in pygame.

About: Journey Through Space is a top-down arcade shooter made in pygame. You play as a pilot who was left behind after a battle and the goal is to go

Crimson Sane 0 Jan 01, 2022
Racers-API - a game where you have to go around racing with your car, earning money

Racers-API About Racers API is a game where you have to go around racing with yo

3 Jan 09, 2022
Quantum version of the game Tic Tac Toe.

QTicTacToe Quantum version of the game Tic Tac Toe. This game was inspired by the game at this site. Installation The game requires the qiskit python

1 Jan 05, 2022
This is Minesweeper coded in Python. It has almost all features from the main game

Minesweeper This is Minesweeper coded in Python. It has almost all features from the main game Use right click to open tile, right click on an open ti

3 Jul 12, 2022
Made to help you create UI using pygame in python, gonna add way more to this project

Pygame visualizer Made to help you create UI using pygame in python, gonna add way more to this project. As of now, this is only hours worth of work.

Ayza 2 Feb 08, 2022
This is a 2D Link to the Past-esque game made using Python 3.2.5 and pygame 1.9.2

Queen-s-Demise Queen's Demise This is a 2D Link to the Past-esque game made using Python 3.2.5 and pygame 1.9.2 I made this for a game development cla

Zoey 1 Dec 15, 2021
This is a python bot to automate BombCrypto game

This is a python bot to automate BombCrypto game. Logs in to the game, reconnects when needed, closes error warnings, sends heroes to work or home automatically, has Telegram integration and lets you

AFKAPP 39 Sep 28, 2022
A programme which basically has the same function as the actual Rock paper scissors game.

A programme which basically has the same function as the actual Rock paper scissors game.

1 Feb 11, 2022
Wordle-helper: python script to help solving wordle game

wordle-helper This is a python script to help solving wordle game 5-letter-word-

MD Nur Ahmed 2 Feb 08, 2022
source codes for my(small indie game developer) games

My repository for most of my finished && unfinished games Table of Contents Getting Started Prerequisites Installation Usage License Contact Prerequis

Gustavs Jākobsons 1 Jan 30, 2022
Stock game is a python program that simulates real-life stock marketing, saving, and investments

Stock game is a python program that simulates real-life stock marketing, saving, and investments. Users get to trade and manage their portfolio and manage their 100,000 dollar portfolio.

Sai Praneth Raju K. 1 Jul 14, 2022