Adaptive Attention Span for Reinforcement Learning

Overview

Adaptive Transformers in RL

Official implementation of Adaptive Transformers in RL

In this work we replicate several results from Stabilizing Transformers for RL on both Pong and rooms_select_nonmatching_object from DMLab30.

We also extend the Stable Transformer architecture with Adaptive Attention Span on a partially observable (POMDP) setting of Reinforcement Learning. To our knowledge this is one of the first attempts to stabilize and explore Adaptive Attention Span in an RL domain.

Steps to replicate what we did on your own machine

  1. Downloading DMLab:

  2. Downloading Atari: Getting Started with Gym– http://gym.openai.com/docs/#getting-started-with-gym

  3. Execution notes:

  • The experiments take around 4 hours on 32vCPUs and 2 P100 GPUs for 6 million environment interactions. To run without a GPU, use the flag “--disable_cuda”.
  • For more details on other flags, see the top of train.py (include a link to this file) which has descriptions for each.
  • All experiments use a slightly revised version of IMPALA from torchbeast

Snippets

Best performing adaptive attention span model on “rooms_select_nonmatching_object”:

python train.py --total_steps 20000000 \
--learning_rate 0.0001 --unroll_length 299 --num_buffers 40 --n_layer 3 \
--d_inner 1024 --xpid row85 --chunk_size 100 --action_repeat 1 \
--num_actors 32 --num_learner_threads 1 --sleep_length 20 \
--level_name rooms_select_nonmatching_object --use_adaptive \
--attn_span 400 --adapt_span_loss 0.025 --adapt_span_cache

Best performing Stable Transformer on Pong:

python train.py --total_steps 10000000 \
--learning_rate 0.0004 --unroll_length 239 --num_buffers 40 \
--n_layer 3 --d_inner 1024 --xpid row82 --chunk_size 80 \
--action_repeat 1 --num_actors 32 --num_learner_threads 1 \
--sleep_length 5 --atari True

Best performing Stable Transformer on “rooms_select_nonmatching_object”:

python train.py --total_steps 20000000 \
--learning_rate 0.0001 --unroll_length 299 \
--num_buffers 40 --n_layer 3 --d_inner 1024 \
--xpid row79 --chunk_size 100 --action_repeat 1 \
--num_actors 32 --num_learner_threads 1 --sleep_length 20 \
--level_name rooms_select_nonmatching_object  --mem_len 200

Reference

If you find this repository useful, do cite it with,

@article{kumar2020adaptive,
    title={Adaptive Transformers in RL},
    author={Shakti Kumar and Jerrod Parker and Panteha Naderian},
    year={2020},
    eprint={2004.03761},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

DynaBOA Code repositoty for the paper: Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation Shanyan Guan, Jingwei Xu, Michell

197 Jan 07, 2023
CMSC320 - Introduction to Data Science - Fall 2021

CMSC320 - Introduction to Data Science - Fall 2021 Instructors: Elias Jonatan Gonzalez and José Manuel Calderón Trilla Lectures: MW 3:30-4:45 & 5:00-6

Introduction to Data Science 6 Sep 12, 2022
Efficiently computes derivatives of numpy code.

Note: Autograd is still being maintained but is no longer actively developed. The main developers (Dougal Maclaurin, David Duvenaud, Matt Johnson, and

Formerly: Harvard Intelligent Probabilistic Systems Group -- Now at Princeton 6.1k Jan 08, 2023
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 01, 2023
Fully Connected DenseNet for Image Segmentation

Fully Connected DenseNets for Semantic Segmentation Fully Connected DenseNet for Image Segmentation implementation of the paper The One Hundred Layers

Somshubra Majumdar 84 Oct 31, 2022
Analysis of Smiles through reservoir sampling & RDkit

Analysis of Smiles through reservoir sampling and machine learning (under development). This is a simple project that includes two Jupyter files for t

Aurimas A. Nausėdas 6 Aug 30, 2022
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

102 Dec 05, 2022
Code for "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds", CVPR 2021

PV-RAFT This repository contains the PyTorch implementation for paper "PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clou

Yi Wei 43 Dec 05, 2022
ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

AI2 152 Dec 27, 2022
A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items This repository co

Taimur Hassan 3 Mar 16, 2022
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 06, 2023
This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems.

This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems. The main directory include the code

0 Dec 23, 2021
The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

VAENAR-TTS This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis". Sa

THUHCSI 138 Oct 28, 2022
render sprites into your desktop environment as shaped windows using GTK

spritegtk render static or animated sprites into your desktop environment as dynamic shaped windows using GTK requires pycairo and PYGobject: pip inst

hermit 20 Oct 27, 2022
This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

AlphaRotate: A Rotation Detection Benchmark using TensorFlow Abstract AlphaRotate is maintained by Xue Yang with Shanghai Jiao Tong University supervi

yangxue 972 Jan 05, 2023
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

CSAW-M This repository contains code for CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer. Source code for tr

Yue Liu 7 Oct 11, 2022
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
JAX-based neural network library

Haiku: Sonnet for JAX Overview | Why Haiku? | Quickstart | Installation | Examples | User manual | Documentation | Citing Haiku What is Haiku? Haiku i

DeepMind 2.3k Jan 04, 2023
RaceBERT -- A transformer based model to predict race and ethnicty from names

RaceBERT -- A transformer based model to predict race and ethnicty from names Installation pip install racebert Using a virtual environment is highly

Prasanna Parasurama 3 Nov 02, 2022