The Unsupervised Reinforcement Learning Benchmark (URLB)

Last update: Dec 26, 2022

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.

Requirements

We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent	Command	Implementation Author(s)	Paper
ICM	`agent=icm`	Denis	paper
ProtoRL	`agent=proto`	Denis	paper
DIAYN	`agent=diayn`	Misha	paper
APT(ICM)	`agent=icm_apt`	Hao, Kimin	paper
APT(Ind)	`agent=ind_apt`	Hao, Kimin	paper
APS	`agent=aps`	Hao, Kimin	paper
SMM	`agent=smm`	Albert	paper
RND	`agent=rnd`	Kevin	paper
Disagreement	`agent=disagreement`	Catherine	paper

Available Domains

We support the following domains.

Domain	Tasks
`walker`	`stand`, `walk`, `run`, `flip`
`quadruped`	`walk`, `run`, `stand`, `jump`
`jaco`	`reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right`

Domain observation mode

Each domain supports two observation modes: states and pixels.

Model	Command
states	`obs_type=states`
pixels	`obs_type=pixels`

Instructions

Pre-training

To run pre-training use the pretrain.py script

python pretrain.py agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python pretrain.py agent=diayn domain=walker

This script will produce several agent snapshots after training for 100k, 500k, 1M, and 2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/

For example:

./pretrained_models/states/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize the DDPG agent and fine-tune it on a downstream task. For example, let's say you have pre-trained ICM, you can fine-tune it on walker_run by running the following command:

python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in ./pretrained_models/states/walker/icm/snapshot_1000000.pt, initialize DDPG with it (both the actor and critic), and start training on walker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and the reward_free tag to false.

python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false

Monitoring

Logs are stored in the exp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time

The Unsupervised Reinforcement Learning Benchmark (URLB)

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

Requirements

Implemented Agents

Available Domains

Domain observation mode

Instructions

Pre-training

Fine-tuning

Monitoring

Owner

Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

A framework for Quantification written in Python

CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

PyTorch implementation of DeepLab v2 on COCO-Stuff / PASCAL VOC

VGG16 model-based classification project about brain tumor detection.

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Source Code For Template-Based Named Entity Recognition Using BART

ShapeGlot: Learning Language for Shape Differentiation

Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

VOneNet: CNNs with a Primary Visual Cortex Front-End

OcclusionFusion: realtime dynamic 3D reconstruction based on single-view RGB-D

This repository contains code to train and render Mixture of Volumetric Primitives (MVP) models

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

A light weight data augmentation tool for training CNNs and Viola Jones detectors

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

Uses OpenCV and Python Code to detect a face on the screen

Object recognition using Azure Custom Vision AI and Azure Functions