An OpenAI Gym environment for Super Mario Bros

Overview

gym-super-mario-bros

BuildStatus PackageVersion PythonVersion Stable Format License

Mario

An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) using the nes-py emulator.

Installation

The preferred installation of gym-super-mario-bros is from pip:

pip install gym-super-mario-bros

Usage

Python

You must import gym_super_mario_bros before trying to make an environment. This is because gym environments are registered at runtime. By default, gym_super_mario_bros environments use the full NES action space of 256 discrete actions. To contstrain this, gym_super_mario_bros.actions provides three actions lists (RIGHT_ONLY, SIMPLE_MOVEMENT, and COMPLEX_MOVEMENT) for the nes_py.wrappers.JoypadSpace wrapper. See gym_super_mario_bros/actions.py for a breakdown of the legal actions in each of these three lists.

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    env.render()

env.close()

NOTE: gym_super_mario_bros.make is just an alias to gym.make for convenience.

NOTE: remove calls to render in training code for a nontrivial speedup.

Command Line

gym_super_mario_bros features a command line interface for playing environments using either the keyboard, or uniform random movement.

gym_super_mario_bros -e <the environment ID to play> -m <`human` or `random`>

NOTE: by default, -e is set to SuperMarioBros-v0 and -m is set to human.

Environments

These environments allow 3 attempts (lives) to make it through the 32 stages in the game. The environments only send reward-able game-play frames to agents; No cut-scenes, loading screens, etc. are sent from the NES emulator to an agent nor can an agent perform actions during these instances. If a cut-scene is not able to be skipped by hacking the NES's RAM, the environment will lock the Python process until the emulator is ready for the next action.

Environment Game ROM Screenshot
SuperMarioBros-v0 SMB standard
SuperMarioBros-v1 SMB downsample
SuperMarioBros-v2 SMB pixel
SuperMarioBros-v3 SMB rectangle
SuperMarioBros2-v0 SMB2 standard
SuperMarioBros2-v1 SMB2 downsample

Individual Stages

These environments allow a single attempt (life) to make it through a single stage of the game.

Use the template

SuperMarioBros-<world>-<stage>-v<version>

where:

  • <world> is a number in {1, 2, 3, 4, 5, 6, 7, 8} indicating the world
  • <stage> is a number in {1, 2, 3, 4} indicating the stage within a world
  • <version> is a number in {0, 1, 2, 3} specifying the ROM mode to use
    • 0: standard ROM
    • 1: downsampled ROM
    • 2: pixel ROM
    • 3: rectangle ROM

For example, to play 4-2 on the downsampled ROM, you would use the environment id SuperMarioBros-4-2-v1.

Random Stage Selection

The random stage selection environment randomly selects a stage and allows a single attempt to clear it. Upon a death and subsequent call to reset, the environment randomly selects a new stage. This is only available for the standard Super Mario Bros. game, not Lost Levels (at the moment). To use these environments, append RandomStages to the SuperMarioBros id. For example, to use the standard ROM with random stage selection use SuperMarioBrosRandomStages-v0. To seed the random stage selection use the seed method of the env, i.e., env.seed(1), before any calls to reset.

Step

Info about the rewards and info returned by the step method.

Reward Function

The reward function assumes the objective of the game is to move as far right as possible (increase the agent's x value), as fast as possible, without dying. To model this game, three separate variables compose the reward:

  1. v: the difference in agent x values between states
    • in this case this is instantaneous velocity for the given step
    • v = x1 - x0
      • x0 is the x position before the step
      • x1 is the x position after the step
    • moving right ⇔ v > 0
    • moving left ⇔ v < 0
    • not moving ⇔ v = 0
  2. c: the difference in the game clock between frames
    • the penalty prevents the agent from standing still
    • c = c0 - c1
      • c0 is the clock reading before the step
      • c1 is the clock reading after the step
    • no clock tick ⇔ c = 0
    • clock tick ⇔ c < 0
  3. d: a death penalty that penalizes the agent for dying in a state
    • this penalty encourages the agent to avoid death
    • alive ⇔ d = 0
    • dead ⇔ d = -15

r = v + c + d

The reward is clipped into the range (-15, 15).

info dictionary

The info dictionary returned by the step method contains the following keys:

Key Type Description
coins int The number of collected coins
flag_get bool True if Mario reached a flag or ax
life int The number of lives left, i.e., {3, 2, 1}
score int The cumulative in-game score
stage int The current stage, i.e., {1, ..., 4}
status str Mario's status, i.e., {'small', 'tall', 'fireball'}
time int The time left on the clock
world int The current world, i.e., {1, ..., 8}
x_pos int Mario's x position in the stage (from the left)
y_pos int Mario's y position in the stage (from the bottom)

Citation

Please cite gym-super-mario-bros if you use it in your research.

@misc{gym-super-mario-bros,
  author = {Christian Kauten},
  howpublished = {GitHub},
  title = {{S}uper {M}ario {B}ros for {O}pen{AI} {G}ym},
  URL = {https://github.com/Kautenja/gym-super-mario-bros},
  year = {2018},
}
Owner
Andrew Stelmach
Andrew Stelmach
From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Code for CCA-SSG model proposed in the NeurIPS 2021 paper From Canonical Correlation Analysis to Self-supervised Graph Neural Networks.

Hengrui Zhang 44 Nov 27, 2022
Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations This directory contains the model architectures and experimental

35 Dec 05, 2022
Morphable Detector for Object Detection on Demand

Morphable Detector for Object Detection on Demand (ICCV 2021) PyTorch implementation of the paper Morphable Detector for Object Detection on Demand. I

9 Feb 23, 2022
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

7 Jun 22, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Scientific Computation Methods in C and Python (Open for Hacktoberfest 2021)

Sci - cpy README is a stub. Do expand it. Objective This repository is meant to be a ready reference for scientific computation methods. Do ⭐ it if yo

Sandip Dutta 7 Oct 12, 2022
An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" in Pytorch.

GLOM An implementation of Geoffrey Hinton's paper "How to represent part-whole hierarchies in a neural network" for MNIST Dataset. To understand this

50 Oct 19, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110.06922). Our implementations are built on top of MMdetection3D.

Wang, Yue 539 Jan 07, 2023
Python interface for the DIGIT tactile sensor

DIGIT-INTERFACE Python interface for the DIGIT tactile sensor. For updates and discussions please join the #DIGIT channel at the www.touch-sensing.org

Facebook Research 35 Dec 22, 2022
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
Neural Factorization of Shape and Reflectance Under An Unknown Illumination

NeRFactor [Paper] [Video] [Project] This is the authors' code release for: NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown I

Google 283 Jan 04, 2023
【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

SANet Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 to

36 Jan 05, 2023
Sequential model-based optimization with a `scipy.optimize` interface

Scikit-Optimize Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements

Scikit-Optimize 2.5k Jan 04, 2023
[CVPR2021] De-rendering the World's Revolutionary Artefacts

De-rendering the World's Revolutionary Artefacts Project Page | Video | Paper In CVPR 2021 Shangzhe Wu1,4, Ameesh Makadia4, Jiajun Wu2, Noah Snavely4,

49 Nov 06, 2022
Transfer Learning Shootout for PyTorch's model zoo (torchvision)

pytorch-retraining Transfer Learning shootout for PyTorch's model zoo (torchvision). Load any pretrained model with custom final layer (num_classes) f

Alexander Hirner 169 Jun 29, 2022
CMP 414/765 course repository for Spring 2022 semester

CMP414/765: Artificial Intelligence Spring2021 This is the GitHub repository for course CMP 414/765: Artificial Intelligence taught at The City Univer

ch00226855 4 May 16, 2022
Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Forecasting with the Temporal Fusion Transformer Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invari

Nicolás Fornasari 6 Jan 24, 2022
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
Tweesent-back - Tweesent backend uses fastAPI as the web framework

TweeSent Backend Tweesent backend. This repo uses fastAPI as the web framework.

0 Mar 26, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023