A visualisation tool for Deep Reinforcement Learning

Related tags

Deep Learningdrlvis
Overview

DRLVIS - Visualising Deep Reinforcement Learning


Created by Marios Sirtmatsis with the support of Alex Bäuerle.

DRLVis is an application used for visualising deep reinforcement learning. The goal is to enable developers to get a further understanding of broadly used algorithms across the deep reinforcement learning landscape. Also DRLVis shall provide a tool for researchers and developers to help them understand errors in their implemented algorithms.

Installation

  1. Install the drlvis pip package by using the following command pip install -e drlvis from the directory above the drlvis directory
  2. After that simply run drlvis --logdir @PATH_TO_LOGDIR
  3. Open your browser on http://localhost:8000

Implementation

Architecture

The application is split into a backend and a fronted, where the backend does most of the data preprocessing. The frontend provides meaningful visualisations for further understanding of what the agent is doing, how rewards, weights and actions develop over time and how confident the agent is in selecting its actions.

Workflow for using DRLVis

  1. Train agent and log data
  2. Run drlvis
  3. Interpret meaningful visualisations in your browser

Logging

Logging for the use of drlvis is done by logger.py. The file contains a documentation on which values should be passed for logging. Thlogger.py contains an individual function for every loggable value/values. Some (the most important) of these functions are:


def create_logger(logdir)

The create_logger() function has to be used for initializing the logger and specifying the target destination of the logging directory. It is always important, that the logdir either does not exist yet or is an empty directory.


def log_episode_return(episode_return, episode_count)

With log_episode_return() one is able to log the accumulated reward per episode, with the step being the curresponding current episode count.


def log_action_divergence(action_probs, action_probs_old, episode_count, apply_softmax )

With log_action_divergence() one can calculate the divergence between actions in the current episode and actions in the last episode. Therefore the action_probabilities for each observation per timestep in an episode has to be collected. In the end of an episode this collection of action probabilites and the collection from the episode before can be passed to the log_action_divergence() method, which then calculates the kl divergence between action probabilities of the last episode and the current episode. Example code snippet with a model with softmax activation in the last layer:


def log_frame(frame, episode_count, step)

Using log_frame() one can log the frame which is currently being observed, or which corresponds with the current timestep. The episode count is the current episode and the step is the timestep within the episode on which the frame is being observed or corresponds with.


from drlvis import logger
import numpy as np

probs_curr = []

for episode in range(episode_range):

    for timestep in range(optional_timestep_range):
    
        if end_of_current_episode: #done in openai gym
            if episode >= 1:
                logger.log_action_divergence(probs_old, probs_curr, episode)
            probs_old = probs_curr

        probs_curr.append(model(observation[np.newaxis,:]))

def log_action_probs(predictions, episode_count, step, apply_softmax)

One can use log_action_probs() for logging the predictions of ones model for the currently observed timestep in an episode. If the model does not output probabilites, one can set apply_softmax to True for creating probabilities based on predictions.


def log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, state_meanings, apply_softmax)

The log_experiment_random_states()function takes a highdimensional array containing randomly generated states in bounds of the environments capabilities. (obs_min, obs_max) It also needs the episode in which a random states experiment shall be performed. The function then reduces the dimensions to two dimensions with UMAP for visualisation purposes. The state meanings can be passed for easier environments to reflect what the different states mean. A random state experiment itself is just a method to evaluate the agents confidence in selecting certain actions for randomly generated states. Example code snippet:

from drlvis import logger
import numpy as np

def random_states_experiment(model, episode_num):
   
    obs_space = env.observation_space
    obs_min = obs_space.low
    obs_max = obs_space.high


    num_samples = 10000 # can be an arbitrary number
    random_state_samples = np.random.uniform(
        low=obs_min, high=obs_max, size=(num_samples, len(obs_min)))

    predicted_dists = model(random_state_samples)
   
    logger.log_experiment_random_states(random_state_samples, predicted_dists, obs_min, obs_max, episode_num, [])

def log_action_distribution(actions, episode_count)

The log_action_distribution() function calculates the distribution of actions in the specified episode. Therefore one solely has to pass the actions, which where selected in the current episode episode_count


def log_weights(weight_tensor, step, episode_count)

With log_weights()one can log the weights of the last layer of ones model in a given timestep in an episode. This can be done as follows (model is keras model but not of major importance):

from drlvis import logger

weights = agent.model.weights[-2].numpy()
logger.log_weights(weight_tensor=weights, step=timestep ,episode_count=episode)

Examples

Examples on how to use the logger functions in real DRL implementations can be found in the examples folder that contains simple cartpole implementation in dqn_cartpole.ipynb and a more complex DQN implementation for playing Atari Breakout in dqn/.

Bachelor Thesis

For further information on how to use DRLVis and details about the application, I refer to my bachelor thesis located at documents/bachelor_thesis_visdrl.pdf.

License

MIT

Owner
Marios Sirtmatsis
Marios Sirtmatsis
Pytorch Implementation of rpautrat/SuperPoint

SuperPoint-Pytorch (A Pure Pytorch Implementation) SuperPoint: Self-Supervised Interest Point Detection and Description Thanks This work is based on:

76 Dec 27, 2022
Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch]

Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch] Abstract Snapshot compressive imaging (SCI) can rec

integirty 6 Nov 01, 2022
Pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

About this repository This repo contains an Pytorch implementation for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Netwo

wxDai 7 Oct 14, 2022
No Code AI/ML platform

NoCodeAIML No Code AI/ML platform - Community Edition Video credits: Uday Kiran Typical No Code AI/ML Platform will have features like drag and drop,

Bhagvan Kommadi 5 Jan 28, 2022
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

Keon Lee 279 Jan 04, 2023
Syllabus del curso IIC2115 - Programación como Herramienta para la Ingeniería 2022/I

IIC2115 - Programación como Herramienta para la Ingeniería Videos y tutoriales Tutorial CMD Tutorial Instalación Python y Jupyter Tutorial de git-GitH

21 Nov 09, 2022
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Unseen Object Amodal Instance Segmentation (UOAIS) Seunghyeok Back, Joosoon Lee, Taewon Kim, Sangjun Noh, Raeyoung Kang, Seongho Bak, Kyoobin Lee This

GIST-AILAB 92 Dec 13, 2022
Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations Requirements The code is implemented in Python and requires

1 Nov 03, 2021
O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning Object-object Interaction Affordance Learning. For a given object-object int

Kaichun Mo 26 Nov 04, 2022
All public open-source implementations of convnets benchmarks

convnet-benchmarks Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below. Machine: 6-cor

Soumith Chintala 2.7k Dec 30, 2022
This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans This repository contains the implementation of the pap

Photogrammetry & Robotics Bonn 40 Dec 01, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
The official TensorFlow implementation of the paper Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Action Transformer A Self-Attention Model for Short-Time Human Action Recognition This repository contains the official TensorFlow implementation of t

PIC4SeRCentre 20 Jan 03, 2023
Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

Facebook Research 360 Dec 27, 2022
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

294 Jan 01, 2023
"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [Paper] [Website] Pipeline Code Environment pip install -r requirements

VITA 250 Jan 05, 2023
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

Sohil Shah 197 Nov 29, 2022
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

CurriculumNet Introduction This repo contains related code and models from the ECCV 2018 CurriculumNet paper. CurriculumNet is a new training strategy

156 Jul 04, 2022
An alarm clock coded in Python 3 with Tkinter

Tkinter-Alarm-Clock An alarm clock coded in Python 3 with Tkinter. Run python3 Tkinter Alarm Clock.py in a terminal if you have Python 3. NOTE: This p

CodeMaster7000 1 Dec 25, 2021
Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 03, 2023