An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Last update: Dec 26, 2022

Overview

AlphaZero-Gomoku

This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that we can focus on the training scheme of AlphaZero and obtain a pretty good AI model on a single PC in a few hours.

References:

AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
AlphaGo Zero: Mastering the game of Go without human knowledge

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

Each move with 400 MCTS playouts:

Requirements

To play with the trained AI models, only need:

Python >= 2.7
Numpy >= 1.11

To train the AI model from scratch, further need, either:

Theano >= 0.7 and Lasagne >= 0.1
or
PyTorch >= 0.2.0
or
TensorFlow

PS: if your Theano's version > 0.7, please follow this issue to install Lasagne,
otherwise, force pip to downgrade Theano to 0.7 pip install --upgrade theano==0.7.0

If you would like to train the model using other DL frameworks, you only need to rewrite policy_value_net.py.

Getting Started

To play with provided models, run the following script from the directory:

python human_play.py

You may modify human_play.py to try different provided models or the pure MCTS.

To train the AI model from scratch, with Theano and Lasagne, directly run:

python train.py

With PyTorch or TensorFlow, first modify the file train.py, i.e., comment the line

from policy_value_net import PolicyValueNet  # Theano and Lasagne

and uncomment the line

# from policy_value_net_pytorch import PolicyValueNet  # Pytorch
or
# from policy_value_net_tensorflow import PolicyValueNet # Tensorflow

and then execute: python train.py (To use GPU in PyTorch, set use_gpu=True and use return loss.item(), entropy.item() in function train_step in policy_value_net_pytorch.py if your pytorch version is greater than 0.5)

The models (best_policy.model and current_policy.model) will be saved every a few updates (default 50).

Note: the 4 provided models were trained using Theano/Lasagne, to use them with PyTorch, please refer to issue 5.

Tips for training:

It is good to start with a 6 * 6 board and 4 in a row. For this case, we may obtain a reasonably good model within 500~1000 self-play games in about 2 hours.
For the case of 8 * 8 board and 5 in a row, it may need 2000~3000 self-play games to get a good model, and it may take about 2 days on a single PC.

An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Related tags

Overview

AlphaZero-Gomoku

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

Requirements

Getting Started

Further reading

Owner

Junxiao Song

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Multi-Modal Fingerprint Presentation Attack Detection: Evaluation On A New Dataset

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

Session-based Recommendation, CoHHN, price preferences, interest preferences, Heterogeneous Hypergraph, Co-guided Learning, SIGIR2022

A Home Assistant custom component for Lobe. Lobe is an AI tool that can classify images.

Rafael Project- Classifying rockets to different types using data science algorithms.

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

magiCARP: Contrastive Authoring+Reviewing Pretraining

PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

PyTorch implementation of the wavelet analysis from Torrence & Compo

🎁 3,000,000+ Unsplash images made available for research and machine learning

PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

This repo is for segmentation of T2 hyp regions in gliomas.

Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks

Optical machine for senses sensing using speckle and deep learning