Multi-objective gym environments for reinforcement learning.

Last update: Jan 03, 2023

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env	Obs/Action spaces	Objectives	Description
`deep-sea-treasure-v0`	Discrete / Discrete	`[treasure, time_penalty]`	Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
`resource-gathering-v0`	Discrete / Discrete	`[enemy, gold, gem]`	Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
`four-room-v0`	Discrete / Discrete	`[item1, item2, item3]`	Agent must collect three different types of items in the map and reach the goal.
`mo-mountaincar-v0`	Continuous / Discrete	`[time_penalty, reverse_penalty, forward_penalty]`	Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
`mo-reacher-v0`	Continuous / Discrete	`[target_1, target_2, target_3, target_4]`	Reacher robot from PyBullet, but there are 4 different target positions.
`minecart-v0`	Continuous or Image / Discrete	`[ore1, ore2, fuel]`	Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
`mo-supermario-v0`	Image / Discrete	`[x_pos, time, death, coin, enemy]`	Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

The minecart-v0 env is a refactor of https://github.com/axelabels/DynMORL.
The deep-sea-treasure-v0 and mo-supermario-v0 are based on https://github.com/RunzheYang/MORL.
The four-room-v0 is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.

Comments

Adds the breakable bottles environment

Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

I wasn't really planning for creating a pull request, so the commit history is a bit messy...

opened by rk1a 4
A few bug fixes
DST:

The bounds of the rewards were hardcoded for the convex map.

The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

setup.py:

Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
opened by ffelten 2
Consider using info field for reward vector

Hello,

Thanks for this repository, it will be very useful to the MORL community :-).

I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

With kind regards,

Florian

opened by ffelten 2
Add MO reward wrappers

I added two wrappers commonly used: normalize and clip.

The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

opened by ffelten 1

Fix notebook

There are still issues with the video recorder :(

/usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
  logger.warn(

opened by ffelten 0

Add fishwood env

Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

opened by ffelten 0
Add wrapper to help logging episode returns

The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

opened by ffelten 0

Releases(0.2.1)

0.2.1(Dec 9, 2022)
5 new environments: fishwood-v0 (ESR), mo-MountainCarContinuous-v0, water-reservoir-v0, mo-highway-v0 and mo-highway-fast-v0;

Revamped README file;

Linting and automatic imports optimization;

Updated bib file and citation;

Few bugfixes.

Source code(tar.gz)
Source code(zip)
0.2.0(Sep 25, 2022)

Support for new Gym>=0.26 API
Source code(tar.gz)
Source code(zip)
0.1.2(Sep 25, 2022)

Source code(tar.gz)
Source code(zip)
0.1.1(Aug 24, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Lucas Alegre

PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.

GitHub Repository

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

8.9k Dec 30, 2022

Pca-on-genotypes - Mini bioinformatics project - PCA on genotypes

Mini bioinformatics project: PCA on genotypes This repo contains the code from t

8 Dec 04, 2022

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation Exploring Cross-Image Pixel Contrast for Semantic Segmentation, Wenguan Wang, Tianfei Z

510 Jan 02, 2023

Sound Event Detection with FilterAugment

Sound Event Detection with FilterAugment Official implementation of Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Chal

43 Aug 28, 2022

Code repository for "Free View Synthesis", ECCV 2020.

Free View Synthesis Code repository for "Free View Synthesis", ECCV 2020. Setup Install the following Python packages in your Python environment - num

253 Dec 07, 2022

The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Interscript The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts. Dataset data.json contains the data in an

8 Dec 01, 2022

My published benchmark for a Kaggle Simulations Competition

Lux AI Working Title Bot Please refer to the Kaggle notebook for the comment section. The comment section contains my explanation on my code structure

29 Aug 22, 2022

🛠 All-in-one web-based IDE specialized for machine learning and data science.

All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu

2.9k Jan 09, 2023

Predicts an answer in yes or no.

Oui-ou-non-prediction Predicts an answer in 'yes' or 'no'. It is based on the game 'effeuiller la marguerite' in which the person plucks flower petals

1 Jan 15, 2022

System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

2 Nov 23, 2022

Repository for code and dataset for our EMNLP 2021 paper - “So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy.

AI-OpenMic Dataset The dataset is available for download via the follwing link. Repository for code and dataset for our EMNLP 2021 paper - “So You Thi

6 Oct 26, 2022

[CVPR 2021] MiVOS - Scribble to Mask module

MiVOS (CVPR 2021) - Scribble To Mask Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] A simplistic network that turns scri

65 Dec 22, 2022

QHack—the quantum machine learning hackathon

Official repo for QHack—the quantum machine learning hackathon

72 Dec 21, 2022

Franka Emika Panda manipulator kinematics&dynamics simulation

pybullet_sim_panda Pybullet simulation environment for Franka Emika Panda Dependency pybullet, numpy, spatial_math_mini Simple example (please check s

0 Jan 20, 2022

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows.

272 Jan 02, 2023

Multi-objective gym environments for reinforcement learning.

Related tags

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Install

Usage

Environments

Citing

Acknowledgments

Comments

Adds the breakable bottles environment

A few bug fixes

Consider using info field for reward vector

Add MO reward wrappers

Fix notebook

Add fishwood env

Add wrapper to help logging episode returns

Releases(0.2.1)

0.2.1(Dec 9, 2022)

0.2.0(Sep 25, 2022)

0.1.2(Sep 25, 2022)

0.1.1(Aug 24, 2022)