Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Overview

Using Deep Q-Network to Learn How To Play Flappy Bird

7 mins version: DQN for flappy bird

Overview

This project follows the description of the Deep Q Learning algorithm described in Playing Atari with Deep Reinforcement Learning [2] and shows that this learning algorithm can be further generalized to the notorious Flappy Bird.

Installation Dependencies:

  • Python 2.7 or 3
  • TensorFlow 0.7
  • pygame
  • OpenCV-Python

How to Run?

git clone https://github.com/yenchenlin1994/DeepLearningFlappyBird.git
cd DeepLearningFlappyBird
python deep_q_network.py

What is Deep Q-Network?

It is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

For those who are interested in deep reinforcement learning, I highly recommend to read the following post:

Demystifying Deep Reinforcement Learning

Deep Q-Network Algorithm

The pseudo-code for the Deep Q Learning algorithm, as given in [1], can be found below:

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

Experiments

Environment

Since deep Q-network is trained on the raw pixel values observed from the game screen at each time step, [3] finds that remove the background appeared in the original game can make it converge faster. This process can be visualized as the following figure:

Network Architecture

According to [1], I first preprocessed the game screens with following steps:

  1. Convert image to grayscale
  2. Resize image to 80x80
  3. Stack last 4 frames to produce an 80x80x4 input array for network

The architecture of the network is shown in the figure below. The first layer convolves the input image with an 8x8x4x32 kernel at a stride size of 4. The output is then put through a 2x2 max pooling layer. The second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then max pool again. The third layer convolves with a 3x3x64x64 kernel at a stride of 1. We then max pool one more time. The last hidden layer consists of 256 fully connected ReLU nodes.

The final output layer has the same dimensionality as the number of valid actions which can be performed in the game, where the 0th index always corresponds to doing nothing. The values at this output layer represent the Q function given the input state for each valid action. At each time step, the network performs whichever action corresponds to the highest Q value using a ϵ greedy policy.

Training

At first, I initialize all weight matrices randomly using a normal distribution with a standard deviation of 0.01, then set the replay memory with a max size of 500,00 experiences.

I start training by choosing actions uniformly at random for the first 10,000 time steps, without updating the network weights. This allows the system to populate the replay memory before training begins.

Note that unlike [1], which initialize ϵ = 1, I linearly anneal ϵ from 0.1 to 0.0001 over the course of the next 3000,000 frames. The reason why I set it this way is that agent can choose an action every 0.03s (FPS=30) in our game, high ϵ will make it flap too much and thus keeps itself at the top of the game screen and finally bump the pipe in a clumsy way. This condition will make Q function converge relatively slow since it only start to look other conditions when ϵ is low. However, in other games, initialize ϵ to 1 is more reasonable.

During training time, at each time step, the network samples minibatches of size 32 from the replay memory to train on, and performs a gradient step on the loss function described above using the Adam optimization algorithm with a learning rate of 0.000001. After annealing finishes, the network continues to train indefinitely, with ϵ fixed at 0.001.

FAQ

Checkpoint not found

Change first line of saved_networks/checkpoint to

model_checkpoint_path: "saved_networks/bird-dqn-2920000"

How to reproduce?

  1. Comment out these lines

  2. Modify deep_q_network.py's parameter as follow:

OBSERVE = 10000
EXPLORE = 3000000
FINAL_EPSILON = 0.0001
INITIAL_EPSILON = 0.1

References

[1] Mnih Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level Control through Deep Reinforcement Learning. Nature, 529-33, 2015.

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing Atari with Deep Reinforcement Learning. NIPS, Deep Learning workshop

[3] Kevin Chen. Deep Reinforcement Learning for Flappy Bird Report | Youtube result

Disclaimer

This work is highly based on the following repos:

  1. [sourabhv/FlapPyBird] (https://github.com/sourabhv/FlapPyBird)
  2. asrivat1/DeepLearningVideoGames
Owner
Yen-Chen Lin
PhD student at MIT CSAIL
Yen-Chen Lin
Python code that gives the fastest path from point a to point b of a chess horse

PERSONAL-PROJECTS CARLOS MAGALLANES-ARANDA'S PERSONAL PROJECTS kchess.py is the code. its input is the start and the end. EXMPLE - a1 d5 its output is

Carlos Magallanes-Aranda 1 Dec 26, 2021
A fun discord bot for music, mini games, admin controls, economy, ai chatbot and levelling system

A fun discord bot for music, mini games, admin controls, economy, ai chatbot and levelling system. This bot was specially made for Dspark discord server.

2 Aug 30, 2022
Tic Tac Toe Python Game GUI

Tic Tac Toe is one of the most played games and is the best time killer game that you can play anywhere with just a pen and paper.

Astitva Veer Garg 1 Jan 11, 2022
Inflitator is a classic doom and wolfenstein3D like game made in Python, using the famous PYGAME module.

INFLITATOR Raycaster INFLITATOR is a raycaster made in Python3 with Pygame. It is a game built on top of a simple engine of the same name. An example

Zanvok Corporation 1 Jan 07, 2022
An exploration of a fantasy world, to autobattle your way to ruling the demesne.

Not Quite Paradise 2 (no relation to NQP, I just like the name enough to want to keep it.) Badges! Current position: Quality of last commit: Who dunni

9 Mar 12, 2022
Tic Tac Toe game developed in python; have 2 difficulty levels

Tic Tac Toe Game This is a code for Tic Tac Toe game in python. Game has 2 difficulty levels. Easy Hard To play the game, use this command in a LINUX

Akshat Mittal 1 Jun 25, 2022
Wordle-prophecy - The comprehensive list of all Wordle answers, past and future

About This repo contains the comprehensive list of all Wordle answers, past and

Hayden Moritz 2 Dec 15, 2022
Space shooter being built for PyWeek 32

Axium Humanity's expansion into space had lasted centuries by the time we encountered the vicious Threx. The Threx adopted a single, religious mission

Daniel Pope 6 Oct 28, 2021
A game made similar as space inveders with pygame

space-inveders-pygame a game made similar as space inveders with pygame . . . if you are using it make sure to change audio and imgs file i do no own

Volt_L18 2 Dec 26, 2021
PYGA: Python Google Analytics (ga.js) - Data Collection API

PYGA: Python Google Analytics - Data Collection API pyga is an implementation of Google Analytics (ga.js) in Python; so that it can be used at server

Arun Karunagath 136 Sep 19, 2022
A Game of Life implementation in Python

Game of Life in Python (Golipy) Golipy is a simulator of John H. Conway's Game of Life, developed in Python based on the Pygame library. This is a toy

Alber 2 Dec 10, 2021
Gamelib is a pure-Python single-file library/framework for writing simple games.

Gamelib is a pure-Python single-file library/framework for writing simple games. It is intended for educational purposes (e.g. to be used in b

Diego Essaya 15 Dec 22, 2022
Wordlebot - A simple Wordle puzzle solver in python

WordleBot A simple search-based puzzle solver for Wordle, built in Python. Inspi

Rob Kimball 2 Jan 27, 2022
A pure-Python Wordle and Absurdle solver

Pyrdle A pure-Python Wordle and Absurdle solver Find the originals here: Wordle Absurdle Basic solving: Wordle To solve today's Wordle, simply run: ./

3 Feb 09, 2022
Ladder network is a deep learning algorithm that combines supervised and unsupervised learning

This repository contains source code for the experiments in a paper titled Semi-Supervised Learning with Ladder Networks by A Rasmus, H Valpola, M Hon

Curious AI 505 Nov 15, 2022
2DMC is an abrrieviation for 2 Dimensional Minecraft.

2DMC 2DMC is an abrrieviation for 2 Dimensional Minecraft. This idea is originally created and implemented by Griffpatch on Scratch. This is a persona

DaNub 5 Nov 06, 2022
It calculates the Nim sum of a nim game.

nim-sum-calculator It calculates the Nim sum of a nim game. The rules of Nim The traditional game of Nim is played with a number of coins arranged in

2 Jan 02, 2022
Abandoned plan for a clone of the old Flash game Star Relic

space-grid When I was in middle school, I was a fan of the Flash game Star Relic (no longer playable in modern browsers, but it works alright in Flash

Radon Rosborough 3 Aug 23, 2021
Tekken-python-ml - A project of playing tekken game using python

Tekken Python Description Hi this is new project of playing tekken game using py

Programminghut 13 Dec 30, 2022
BUBBLE SHOOT - Pygame (python)

BUBBLE-SHOOT---Pygame BUBBLE SHOOT - Pygame (python) Bubbleshooter This is a Bubble shooter Game made with pygame. The arrow is controlled by the arro

ROBIN JONEY 1 Nov 12, 2021