Decision Transformer: A brand new Offline RL Pattern

Overview

DecisionTransformer_StepbyStep

Intro

Decision Transformer: A brand new Offline RL Pattern.

这是关于NeurIPS 2021 热门论文Decision Transformer的复现。

👍 原文地址: Decision Transformer: Reinforcement Learning via Sequence Modeling

👍 官方的Git仓库: decision-transformer(official)

Decision Transformer

Decision Transformer属于Offline RL,所谓Offline RL,即从次优数据中学习策略来分配Agent,即从固定、有限的经验中产生最大有效的行为。

👀️ Motivation

DT将RL看成一个序列建模问题(Sequence Modeling Problem ),不用传统RL方法,而使用网络直接输出动作进行决策。传统RL方法存在一些问题,比如估计未来Return过程中Bootstrapping过程会导致Overestimate; 马尔可夫假设;

DT借助了Transformer的强大表征能力和时序建模能力。

  • Decision Transformer的表现达到甚至超过了目前最好的基于dynamic programming的主流方法;
  • 在一些需要long-term credit assignment的task【例如sparse reward或者delayed reward等】,Decision Transformer的表现远超过了最好的主流方法.

🚀️ DT的核心思想

image.png

Decision Transformer的核心思想; States、Actions、Returns被Fed into Modality-Specific的线性Embedding;并添加了带有时间步信息的positional episodic timestep; 这些Tokens被输入一个GPT架构,使用a causal self-attention mask来预测actions。

🎉️ DT的优势

  1. 无需Markov假设;
  2. 没有使用一个可学习的Value Function作为Training Target;
  3. 利用Transformer的特性,绕过长期信用分配进行“自举bootstrapping”的需要,避免了时序差分学习的“短视”行为;
  4. 可以通过self-attention直接执行信度分配。这与缓慢传播奖励并容易产生干扰信号的 Bellman Backup 相反,可以使 Transformer 在奖励稀少或分散注意力的情况下仍然有效地工作.

Dependencies

1. D4RL ( Dataset for Deep Data-Driven Reinforcement Learning )

2. MUJOCO 210

# 安装之前先安装absl-py和matplotlib 
pip install absl-py 
pip install matplotlib 

"""
git clone https://github.com/rail-berkeley/d4rl.git
cd d4rl
pip install -e . # 这种方法不好使 !! 
"""

#首先在https://github.com/deepmind/dm_control这个库git clone
# cd
pip install -r requirement.txt 
# 然后 
pip install matplotlib 
# 然后 https://github.com/takuseno/d3rlpy 
pip install d3rlpy 
# 然后安装mujoco 210  
# 直接安装,然后添加环境变量 
# 装完之后进d4rl文件夹下
python setup.py install 
# 成功安装 d4rl 1.1 

3. GPT-2


pip install transformers

Experiments

Group1: Decision Transformer — Hopper-v3-Medium-Dataset

参数Config

class Config:
    env = "hopper"
    dataset = "medium"
    mode = "normal" # "delayed" : all rewards moved to end of trajectory
    device = 'cuda'
    log_dir = 'TB_log/'
    record_algo = 'DT_Hopper_v1'
    test_cycles = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')

    # 模型
    model_type = "DT"
    activation_function = 'relu'

    # Scalar
    max_length = 20 # max_len # K
    pct_traj = 1.
    batch_size = 64
    embed_dim = 128
    n_layer = 3
    n_head = 1
    dropout = 0.1
    lr = 1e-4
    wd = 1e-4
    warmup_steps = 1000
    num_eval_episodes = 100
    max_iters = 50
    num_steps_per_iter = 1000

    # Bool
    log_to_tb = True

效果

image.png

Owner
Irving
Irving
Doods2 - API for detecting objects in images and video streams using Tensorflow

DOODS2 - Return of DOODS Dedicated Open Object Detection Service - Yes, it's a b

Zach 101 Jan 04, 2023
Pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

About this repository This repo contains an Pytorch implementation for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Netwo

wxDai 7 Oct 14, 2022
Learning to Reconstruct 3D Manhattan Wireframes from a Single Image

Learning to Reconstruct 3D Manhattan Wireframes From a Single Image This repository contains the PyTorch implementation of the paper: Yichao Zhou, Hao

Yichao Zhou 50 Dec 27, 2022
Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

ML2 Takehome Project Reimplementing the paper: Cascaded Pyramid Network for Multi-Person Pose Estimation Dataset The model uses the COCO dataset which

Vo Van Tu 1 Nov 22, 2021
Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

ZePHyR: Zero-shot Pose Hypothesis Rating ZePHyR is a zero-shot 6D object pose estimation pipeline. The core is a learned scoring function that compare

R-Pad - Robots Perceiving and Doing 18 Aug 22, 2022
Code release for Universal Domain Adaptation(CVPR 2019)

Universal Domain Adaptation Code release for Universal Domain Adaptation(CVPR 2019) Requirements python 3.6+ PyTorch 1.0 pip install -r requirements.t

THUML @ Tsinghua University 229 Dec 23, 2022
Implements pytorch code for the Accelerated SGD algorithm.

AccSGD This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic O

205 Jan 02, 2023
Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification (ACDNE) This is a pytorch implementation of the Adv

陈志豪 8 Oct 13, 2022
a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)

UDL UDL is a practicable framework used in Deep Learning (computer vision). Benchmark codes, results and models are available in UDL, please contact @

Xiao Wu 11 Sep 30, 2022
ML for NLP and Computer Vision.

Sparrow is our open-source ML product. It runs on Skipper MLOps infrastructure.

Katana ML 2 Nov 28, 2021
Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks

This is an implementation of Volodymyr Mnih's dissertation methods on his Massachusetts road & building dataset and my original methods that are publi

Shunta Saito 255 Sep 07, 2022
Image Restoration Using Swin Transformer for VapourSynth

SwinIR SwinIR function for VapourSynth, based on https://github.com/JingyunLiang/SwinIR. Dependencies NumPy PyTorch, preferably with CUDA. Note that t

Holy Wu 11 Jun 19, 2022
This repository contains the map content ontology used in narrative cartography

Narrative-cartography-ontology This repository contains the map content ontology used in narrative cartography, which is associated with a submission

Weiming Huang 0 Oct 31, 2021
Fully Convolutional DenseNets for semantic segmentation.

Introduction This repo contains the code to train and evaluate FC-DenseNets as described in The One Hundred Layers Tiramisu: Fully Convolutional Dense

485 Nov 26, 2022
Turning pixels into virtual points for multimodal 3D object detection.

Multimodal Virtual Point 3D Detection Turning pixels into virtual points for multimodal 3D object detection. Multimodal Virtual Point 3D Detection, Ti

Tianwei Yin 204 Jan 08, 2023
Python-experiments - A Repository which contains python scripts to automate things and make your life easier with python

Python Experiments A Repository which contains python scripts to automate things

Vivek Kumar Singh 11 Sep 25, 2022
Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021) In this repository we provide PyTorch implementations for GeMCL; a

4 Apr 15, 2022
DeepStochlog Package For Python

DeepStochLog Installation Installing SWI Prolog DeepStochLog requires SWI Prolog to run. Run the following commands to install: sudo apt-add-repositor

KU Leuven Machine Learning Research Group 17 Dec 23, 2022
Multiwavelets-based operator model

Multiwavelet model for Operator maps Gaurav Gupta, Xiongye Xiao, and Paul Bogdan Multiwavelet-based Operator Learning for Differential Equations In Ne

Gaurav 33 Dec 04, 2022
Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Flybird | English Version 行为驱动开发(Behavior-driven development,缩写BDD),是一种软件过程的思想或者

Ctrip, Inc. 706 Dec 30, 2022