Multi-objective gym environments for reinforcement learning.

Overview

tests Project Status: Active – The project has reached a stable, usable state and is being actively developed. License

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env Obs/Action spaces Objectives Description
deep-sea-treasure-v0
Discrete / Discrete [treasure, time_penalty] Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
resource-gathering-v0
Discrete / Discrete [enemy, gold, gem] Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
four-room-v0
Discrete / Discrete [item1, item2, item3] Agent must collect three different types of items in the map and reach the goal.
mo-mountaincar-v0
Continuous / Discrete [time_penalty, reverse_penalty, forward_penalty] Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
mo-reacher-v0
Continuous / Discrete [target_1, target_2, target_3, target_4] Reacher robot from PyBullet, but there are 4 different target positions.
minecart-v0
Continuous or Image / Discrete [ore1, ore2, fuel] Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
mo-supermario-v0
Image / Discrete [x_pos, time, death, coin, enemy] Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

Comments
  • Adds the breakable bottles environment

    Adds the breakable bottles environment

    Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

    I wasn't really planning for creating a pull request, so the commit history is a bit messy...

    opened by rk1a 4
  • A few bug fixes

    A few bug fixes

    DST:

    • The bounds of the rewards were hardcoded for the convex map.
    • The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

    setup.py:

    • Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
    opened by ffelten 2
  • Consider using info field for reward vector

    Consider using info field for reward vector

    Hello,

    Thanks for this repository, it will be very useful to the MORL community :-).

    I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

    For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

    With kind regards,

    Florian

    opened by ffelten 2
  • Add MO reward wrappers

    Add MO reward wrappers

    I added two wrappers commonly used: normalize and clip.

    The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

    opened by ffelten 1
  • Fix notebook

    Fix notebook

    There are still issues with the video recorder :(

    /usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
      logger.warn(
    
    opened by ffelten 0
  • Add fishwood env

    Add fishwood env

    Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

    I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

    opened by ffelten 0
  • Add wrapper to help logging episode returns

    Add wrapper to help logging episode returns

    The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

    opened by ffelten 0
Releases(0.2.1)
Owner
Lucas Alegre
PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.
Lucas Alegre
基于PaddleOCR搭建的OCR server... 离线部署用

开头说明 DangoOCR 是基于大家的 CPU处理器 来运行的,CPU处理器 的好坏会直接影响其速度, 但不会影响识别的精度 ,目前此版本识别速度可能在 0.5-3秒之间,具体取决于大家机器的配置,可以的话尽量不要在运行时开其他太多东西。需要配合团子翻译器 Ver3.6 及其以上的版本才可以使用!

胖次团子 131 Dec 25, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 21k Jan 06, 2023
Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

LiDAR-MOS: Moving Object Segmentation in 3D LiDAR Data This repo contains the code for our paper: Moving Object Segmentation in 3D LiDAR Data: A Learn

Photogrammetry & Robotics Bonn 394 Dec 29, 2022
A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks. The purpose of this project is to promote the

Zhanghan Ke 255 Dec 11, 2022
Implementation of Change-Based Exploration Transfer (C-BET)

Implementation of Change-Based Exploration Transfer (C-BET), as presented in Interesting Object, Curious Agent: Learning Task-Agnostic Exploration.

Simone Parisi 29 Dec 04, 2022
Image segmentation with private İstanbul Dataset

Image Segmentation This repo was created for academic research and test result. Repo will update after academic article online. This repo contains wei

İrem KÖMÜRCÜ 9 Dec 11, 2022
NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

NAC Official PyTorch implementation of NAC from the paper: Neural Auto-Curricula in Two-Player Zero-Sum Games. We release code for: Gradient based ora

Xidong Feng 19 Nov 11, 2022
[PyTorch] Official implementation of CVPR2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency". https://arxiv.org/abs/2103.05465

PointDSC repository PyTorch implementation of PointDSC for CVPR'2021 paper "PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency",

153 Dec 14, 2022
Implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networks, using PyTorch

C-CNN: Contourlet Convolutional Neural Networks This repo implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networ

Goh Kun Shun (KHUN) 10 Nov 03, 2022
Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021)

HAIS Hierarchical Aggregation for 3D Instance Segmentation (ICCV 2021) by Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang*. (*) Corresp

Hust Visual Learning Team 145 Jan 05, 2023
Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

PAWS-TF 🐾 Implementation of Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (PAWS)

Sayak Paul 43 Jan 08, 2023
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
Neural Architecture Search Powered by Swarm Intelligence 🐜

Neural Architecture Search Powered by Swarm Intelligence 🐜 DeepSwarm DeepSwarm is an open-source library which uses Ant Colony Optimization to tackle

288 Oct 28, 2022
Learning from Synthetic Humans, CVPR 2017

Learning from Synthetic Humans (SURREAL) Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev and Cordelia Schmid,

Gul Varol 538 Dec 18, 2022
codes for Self-paced Deep Regression Forests with Consideration on Ranking Fairness

Self-paced Deep Regression Forests with Consideration on Ranking Fairness This is official codes for paper Self-paced Deep Regression Forests with Con

Learning in Vision 4 Sep 11, 2022
MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

Yixin Nie 27 May 09, 2022
Motion planning algorithms commonly used on autonomous vehicles. (path planning + path tracking)

Overview This repository implemented some common motion planners used on autonomous vehicles, including Hybrid A* Planner Frenet Optimal Trajectory Hi

Huiming Zhou 1k Jan 09, 2023
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings Results on STS Tasks Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg. unsup-prompt-be

196 Jan 08, 2023
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
Activating More Pixels in Image Super-Resolution Transformer

HAT [Paper Link] Activating More Pixels in Image Super-Resolution Transformer Xiangyu Chen, Xintao Wang, Jiantao Zhou and Chao Dong BibTeX @article{ch

XyChen 270 Dec 27, 2022