Deeprl - Standard DQN and dueling network for simple games

Overview

DeepRL

This code implements the standard deep Q-learning and dueling network with experience replay (memory buffer) for playing simple games.

DQN algorithm implemented in this code is from the Google DeepMind's paper Playing Atari with Deep Reinforcement Learning[link].

Dueling network is from the paper Dueling Network Architectures for Deep Reinforcement Learning [link]

Requirement

DeepRL is implemented with Torch and the packages of its ecosystem. This code is well worked on my Mac Pro with CPU (I haven't tested it on Linux and GPU). Install Torch7 firstly, then you should install the following packages by luarocks

luarocks install nn
luarocks install image
luarocks install qt
luarocks install optim

Running

You can run this code by tapping the command in the project dir.

qlua main.lua

The result looks like

DQN: I got the accuracy of 93.2% (932 success of 1000 epochs).

Dueling: I got the accuracy of 99.2% (992 success of 1000 epochs).

Code

The envir.lua indicates the environment in reinforcement learning stage, which receives the action and produces the states and a reward for agent.

The agent.lua is the implementation of agent which receives the states and reward to produce the action directed by the policy network.

The learner.lua is the learning algorithm of DQN with experience replay as the following.

MISC

I completed this code when I was an intern at Horizon Robotics. I will greatly thank the article of Andrej Karpathy and other implementations:SeanNaren's code and EderSantana's gist.

LICENSE

MIT

Owner
Yao Zhou
Computer Vision Researcher
Yao Zhou
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information This repository contains code, model, dataset for ChineseBERT at ACL2021. Ch

413 Dec 01, 2022
Yolact-keras实例分割模型在keras当中的实现

Yolact-keras实例分割模型在keras当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料 Reference 性能情况 训练数

Bubbliiiing 11 Dec 26, 2022
Export CenterPoint PonintPillars ONNX Model For TensorRT

CenterPoint-PonintPillars Pytroch model convert to ONNX and TensorRT Welcome to CenterPoint! This project is fork from tianweiy/CenterPoint. I impleme

CarkusL 149 Dec 13, 2022
Automatically replace ONNX's RandomNormal node with Constant node.

onnx-remove-random-normal This is a script to replace RandomNormal node with Constant node. Example Imagine that we have something ONNX model like the

Masashi Shibata 1 Dec 11, 2021
Navigating StyleGAN2 w latent space using CLIP

Navigating StyleGAN2 w latent space using CLIP an attempt to build sth with the official SG2-ADA Pytorch impl kinda inspired by Generating Images from

Mike K. 55 Dec 06, 2022
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory = 8G Numpy 1.

46 Dec 14, 2022
Benchmarks for Model-Based Optimization

Design-Bench Design-Bench is a benchmarking framework for solving automatic design problems that involve choosing an input that maximizes a black-box

Brandon Trabucco 43 Dec 20, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

Christopher Ley 1 Jan 06, 2022
This repository includes code of my study about Asynchronous in Frequency domain of GAN images.

Exploring the Asynchronous of the Frequency Spectra of GAN-generated Facial Images Binh M. Le & Simon S. Woo, "Exploring the Asynchronous of the Frequ

4 Aug 06, 2022
NeurIPS 2021 paper 'Representation Learning on Spatial Networks' code

Representation Learning on Spatial Networks This repository is the official implementation of Representation Learning on Spatial Networks. Training Ex

13 Dec 29, 2022
Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021. Figure 1: In the process of motion capture (mocap), some joints or even the whole human

Shinny cui 3 Oct 31, 2022
A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196

img_sussifier A python script to convert images to animated sus among us crewmate twerk jifs as seen on r/196 Examples How to use install python pip i

41 Sep 30, 2022
Soft actor-critic is a deep reinforcement learning framework for training maximum entropy policies in continuous domains.

This repository is no longer maintained. Please use our new Softlearning package instead. Soft Actor-Critic Soft actor-critic is a deep reinforcement

Tuomas Haarnoja 752 Jan 07, 2023
Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

COResets and Data Subset selection Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order

decile-team 244 Jan 09, 2023
基于Pytorch实现优秀的自然图像分割框架!(包括FCN、U-Net和Deeplab)

语义分割学习实验-基于VOC数据集 usage: 下载VOC数据集,将JPEGImages SegmentationClass两个文件夹放入到data文件夹下。 终端切换到目标目录,运行python train.py -h查看训练 (torch) Li Xiang 28 Dec 21, 2022

Highway networks implemented in PyTorch.

PyTorch Highway Networks Highway networks implemented in PyTorch. Just the MNIST example from PyTorch hacked to work with Highway layers. Todo Make th

Conner Vercellino 56 Dec 14, 2022
Code for the RA-L (ICRA) 2021 paper "SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition"

SeqNet: Learning Descriptors for Sequence-Based Hierarchical Place Recognition [ArXiv+Supplementary] [IEEE Xplore RA-L 2021] [ICRA 2021 YouTube Video]

Sourav Garg 63 Dec 12, 2022
PySOT - SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask.

PySOT is a software system designed by SenseTime Video Intelligence Research team. It implements state-of-the-art single object tracking algorit

STVIR 4.1k Dec 29, 2022
Corruption Invariant Learning for Re-identification

Corruption Invariant Learning for Re-identification The official repository for Benchmarks for Corruption Invariant Person Re-identification (NeurIPS

Minghui Chen 73 Dec 08, 2022