Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
KBYD - Simple Bulls and Cows Game

KBYD KBYD - Simple Bulls and Cows Game How to Play KBYD is a simple Bulls and Cows Game. When the game starts, the computer randomly generates 3 to 5

1 Dec 04, 2021
2d war game single player

WarGame-third-version-0.0.4- 2d war game single player Hi ! Today, I publish on GitHub the version 0.0.4 of "WarGame". In this version, you can find a

Edouard Vincent 2 Apr 08, 2022
A Higher-Lower web game made in Python using Flask framework.

Higher Lower Web Game Guess the random number from 0 to 9 in this web game made with Python and Flask Framework Modules that were used Random Flask In

Yago Goltara 1 Oct 27, 2021
PyCraft - A Minecraft launcher made in python

A Minecraft launcher made in python. The main objective of this launcher is to enable players to enjoy minecraft (especially those without a mojang/microsoft account). This launcher is not illegal as

38 Dec 12, 2022
Pygame for humans (pip install hooman) (25k+ downloads)

hooman ~ pygame for humans pip install hooman join discord: https://discord.gg/Q23ATve The package for clearer, shorter and cleaner PyGame codebases!

Abdur-Rahmaan Janhangeer 31 Nov 08, 2022
DouZero_For_HLDDZ_FullAuto: 将DouZero用于欢乐斗地主自动化

DouZero_For_HLDDZ_FullAuto: 将DouZero用于欢乐斗地主自动化 本项目基于DouZero 和 DouZero_For_Happy_DouDiZhu 环境配置请移步项目DouZero 模型默认为ADP,更换模型请修改main.py中的模型路径 运行main.py即可 在原

322 Dec 25, 2022
Tic-Tac-Toe - Tic-Tac-Toe game build With Python

Tic Tac Toe This game is very popular amongst all of us and even fun to build as

PyLaboratory 0 Feb 06, 2022
a game of life implementation in python

gameoflife-py python implementation of game of life Installing As long as you have bash and curl installed and are on Linux the install script should

Raghav 5 Jun 09, 2021
Wordle Tas Tool is a terminal application for solving Wordle puzzles

Wordle Tas Tool Terminal application for solving Wordle puzzles Wordle Tas Tool (WTT) is a Python script that iterates over SCOWL95 to solve Wordle pu

1 Feb 08, 2022
Repository for the diana chess competition. AI Lecture 21/22

Notes for Assignment 8 (Chess AI) We recommend using an IDE (like Pycharm) for working on this assignment. IMPORTANT: Please make sure you use python

Cognitive Systems Research Group 3 Jan 15, 2022
Game using Python

🎡 Rock-Paper_Scissor Game Using Python Beginner Friendly Easy to use ♟ Want to Play this? Clone this repository Open in any IDE(for eg. VS Code, PyCh

Akash Kumar 1 Oct 17, 2021
Just a copied of flappy bird game made by Thuongton999

flappy-bird Just a copied of flappy bird game made by Thuongton999 Installation and Usage Using terminal (on Window) Make sure you installed Python an

ThuongTon 9 Aug 08, 2021
A launcher to launch games from Riot Games under Linux

rito-launcher A launcher to launch games from Riot Games under Linux Requirements: Python 3, with the following pip plugins: 'configparser, pathlib, w

6 Mar 07, 2022
Guess number game with PyQt5

Guess-Number-Project Guess number game with PyQt5 you can choose a number in your mind and then computer will guess a nummber and you guide the comput

MohammadAli.HBA 1 Nov 11, 2021
Pratice Project - Tic tac toe game

Hello! This tic-tac-toe game project and its notes are result from a course pratice milestone. The project itself is written in Python using the Jupyt

Rafael Nascimento 1 Jan 07, 2022
A Game of Life implementation in Python

Game of Life in Python (Golipy) Golipy is a simulator of John H. Conway's Game of Life, developed in Python based on the Pygame library. This is a toy

Alber 2 Dec 10, 2021
Multi minecraft server helper for python

呐 Yuki 您的群组服操作小助手。 使用Python3编写。使用 .yaml 配置文件记录子服,配合Screen管理Linux系统上的Minecraft子服,支持MCDR子服与非MCDR子服。 功能: 开启所有子服 关闭所有子服 重载所有子服MCDR 重载所有子服ChatBridge 使用方法:

3 Mar 17, 2022
Simple wordle clone + solver + backtesting

Wordle clone + solver + backtesting I created something. Or rather, I found about this game last week and decided that one challenge a day wasn't goin

1 Feb 08, 2022
A programme which basically has the same function as the actual Rock paper scissors game.

A programme which basically has the same function as the actual Rock paper scissors game.

1 Feb 11, 2022
A terminal-based number guessing game written in python

A terminal-based number guessing game written in python

Akshay Vs 15 Sep 22, 2022