Deep reinforcement learning library built on top of Neural Network Libraries

Last update: Dec 14, 2022

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]

# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title	Notebook	Target RL task
Simple reinforcement learning training to get started		Pendulum
Learn how to use training algorithms		Pendulum
Learn how to use customized network model for training		Mountain car
Learn how to use different network solver for training		Pendulum
Learn how to use different replay buffer for training		Pendulum
Learn how to use your own environment for training		Customized environment
Atari game training example		Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments

Update cem function interface

Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

opened by sbsekiguchi 1
Add implementation for RNN support and DRQN algorithm
Add RNN model support and DRQN algorithm.

Following trainers will support RNN-model.

Q value-based trainers

Deterministic gradient and Soft policy trainers

Other trainers can support RNN models in future but is not implemented in the initial release.

See this paper for the details of the DRQN algorithm.
opened by ishihara-y 1

Implement SACD

This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

These changes have been made:

New environments with factored reward functions have been added
- FactoredLunarLanderContinuousV2NNablaRL-v1
- FactoredAntV4NNablaRL-v1
- FactoredHopperV4NNablaRL-v1
- FactoredHalfCheetahV4NNablaRL-v1
- FactoredWalker2dV4NNablaRL-v1
- FactoredHumanoidV4NNablaRL-v1
SACD algorithms has been added
SoftQDTrainer has been added
_InfluenceMetricsEvaluator has been added
reproduction script has been added (not benchmarked yet)

visualizing influence metrics

import gym

import numpy as np
import matplotlib.pyplot as plt

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")

evaluation_hook = H.EvaluationHook(
    eval_env,
    EpisodicEvaluator(run_per_evaluation=10),
    timing=5000,
    writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
)
iteration_num_hook = H.IterationNumHook(timing=100)

config = A.SACDConfig(gpu_id=0, reward_dimension=9)
sacd = A.SACD(env, config=config)
sacd.set_hooks([iteration_num_hook, evaluation_hook])
sacd.train_online(env, total_iterations=100000)

influence_history = []

state = env.reset()
while True:
    action = sacd.compute_eval_action(state)
    influence = sacd.compute_influence_metrics(state, action)
    influence_history.append(influence)
    state, _, done, _ = env.step(action)
    if done:
        break

influence_history = np.array(influence_history)
for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
    plt.plot(influence_history[:, i], label=label)
plt.xlabel("step")
plt.ylabel("influence metrics")
plt.legend()
plt.show()

sample animation

sample

opened by ishihara-y 0

Add gmm and Update gaussian

Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

The API change is like following:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
distribution = D.Gaussian(mean, ln_var)
# return nn.Variable
assert isinstance(distribution.sample(), nn.Variable)

Updated:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
# You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
assert isinstance(distribution.sample(), nn.Variable)

# If you pass np.ndarray, then all class methods return np.ndarray
# Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
assert isinstance(distribution.sample(), np.ndarray)

opened by sbsekiguchi 0

Support nnabla-browser

[x] add MonitorWriter
[x] save computational graph as nntxt

example

import gym

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

# save training computational graph
training_graph_hook = H.TrainingGraphHook(outdir="test")

# evaluation hook with nnabla's Monitor
eval_env = gym.make("Pendulum-v0")
evaluator = EpisodicEvaluator(run_per_evaluation=10)
evaluation_hook = H.EvaluationHook(
    eval_env,
    evaluator,
    timing=10,
    writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
)

env = gym.make("Pendulum-v0")
sac = A.SAC(env)
sac.set_hooks([training_graph_hook, evaluation_hook])

sac.train_online(env, total_iterations=100)

opened by ishihara-y 0

Add iLQR and LQR

Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

opened by ishihara-y 0
Check np_random instance and use correct randint alternative
I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

# in case of RandomState # this line works gym.unwrapped.np_random.rand_int(...) # in case of Generator # rand_int does not exist and we must use integers as an alternative gym.unwrapped.np_random.integers(...)

This PR will fix this issue and chooses correct function for sampling integers.
opened by ishihara-y 0
Add icra2018 qtopt

Add QtOpt algorithm proposed by Deirdre Quillen et al. in the paper Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods.

opened by sbsekiguchi 0

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)
special notes

This version does NOT support the version v0.26.0 and greater of openai gym.

We're going to support openai gym version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of openai gym from the next release.

Only support python 3.7 or greater

Python 3.6 is not supported from this new release

release-note-bugfix

Fix algos. Properly apply grad clip and weight decay

Correct variable to use during rnn training

Check np_random instance and use correct randint alternative

Fix pendulum-env render

Fix ScreenRenderEnv to support gym 0.25.0

release-note-algorithm

Run PPO on single process when actor num is 1

Add qrsac algorithm

Add REDQ algorithm

Update to support discrete tuple

Add icra2018 qtopt

Add goal_env module

Add PPO tuple state support

Add iLQR and LQR

Add mppi

Add ddp

release-note-distributions

Add gmm and Update gaussian

release-note-utility

Support nnabla-browser

release-note-docs

Fix module path of sac

Improve README with graph visulization feature with nnabla-browser

release-note-build

Extend github build timelimit to 5 minutes

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.11.0(Mar 17, 2022)
release-note-bugfix

Fix readme of reproduction

Fix cem test

Fix README samples and add prerequisites for Atari reproduction codes

Fix tutorial-model

Fix add workaround to avoid gym error

release-note-algorithm

Add ATRPO

Add implementation for RNN support and DRQN algorithm, Support RNN models on DQN and DQN inherited algorithms, Follow DRQN author's implementation and update results

Expand RNN support to dist rl algorithms

Add rnn support to actor critic algorithms

Support n-step q learning in ddpg, td3, her, sac and ICML2018SAC

Stop back propagating to target v function

Add MME-SAC algorithm and Sparse/Delayed mujoco environment and Add Disentangled version of MME-SAC

release-note-functions

Add stop gradient function

Add random shooting

Update cem function interface

release-note-distributions

Add Bernoulli distribution

Enable sampling from multidimensional logits

Add one hot softmax

release-note-utility

Support batched states for evaluation

Add convenient episode result env

Add profile function

release-note-docs

Update version in algorithm catalog

Add readthedocs yaml and Fixed yaml file

Add HER and IQN to algorithm catalog

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.10.0(Oct 20, 2021)
release-note-bugfix

Fix interactive-demos used in colab and Fix interactive-demos used in colab about gpu id

release-note-algorithm

Add HER

Add Rainbow

Fix algorithm reproduction directory path

Add rank-based prioritized replay

Add Double Dqn

Move algorithms reproduction dir to reproductions/algorithms

Enable injecting explorer to algorithm

Support multi-step Q learning

Add Categorical Double Dqn

Add c51 all atari game results

Support Tuple State and Update compute_v_target_and_advantage to support tuple state

release-note-parametric_functions

Add spatial_softmax function and Add spatial softmax docs

Add noisy net

release-note-functions

Add batch_flatten function

Add triangular_matrix function

release-note-utility

Fix load_snapshot

release-note-docs

Fix docs typo

Fix typo in readme

Display correct version

Fix numpy array typing to np.ndarray

Add function docs

Fix docstring of algorithms

Update NNablaRL to nnablaRL

Fix typo seemless -> seamless

Fix build badge URL

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.9.0(Jun 14, 2021)
We are happy to announce the release of nnablaRL, a deep reinforcement learning (RL) library built on top of nnabla. Reinforcement learning is one of the cutting edge machine learning technology that achieves super human performance in the field of gaming, robotics, etc.. We hope that this new library, nnablaRL, helps RL experts and also non-RL experts using reinforcement learning algorithms easily among our nnabla ecosystem.

Features of nnablaRL is the following.

Friendly API

nnablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") # 1 dqn = A.DQN(env) # 2 dqn.train(env) # 3

You can also customize the algorithm's hyper parameters easily. For example, you can change the batch size of training data as follows.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") config = A.DQNConfig(batch_size=100) dqn = A.DQN(env, config=config) dqn.train(env)

In addition to algorithm hyper parameters, you can also flexibly change the training component such as neural network models and model solvers. For details, see sample codes and API documents.

Many builtin algorithms

Most of famous/SoTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., is already implemented in nnablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations. Please check the sample codes and document for detail usage of each algorithm. You can find the list of implemented algorithms here.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With nnablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl import nnabla_rl.algorithms as A simulator = get_simulator() # This is just an example. Assuming that simulator exists dqn = A.DQN(simulator, config=config) dqn.train_online(simulator) real_data = get_real_data() # This is also an example. Assuming that you have real robot data dqn.train_offline(real_data)

Getting started

You can find both notebook style interactive demos and raw python scripts as a sample code to get started. If you are unfamiliar with reinforcement learning, we recommend trying the notebook as a starting point. You can immediately launch and start training through google colaboratory! Check the list of notebooks here.

Development of nnablaRL has just started. We will continue adding new reinforcement learning algorithms and SoTA techniques to nnablaRL. Feedbacks, feature requests and contributions are welcome! Check the contribution guide for details.
Source code(tar.gz)
Source code(zip)

Owner

Sony

Sony Group Corporation

GitHub Repository

The most annoying bot on Discord

FBot The most annoying bot on discord Features Lots of fun stuff Message responses, sort of our main feature, no big deal. FBot can respond to a wide

33 Jun 25, 2022

A tool for creating credentials for accessing S3 buckets

s3-credentials A tool for creating credentials for accessing S3 buckets For project background, see s3-credentials: a tool for creating credentials fo

138 Jan 06, 2023

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database. Its original intention is to monitor cryptocurrency related ch

3 Jun 07, 2022

:cloud: Python API for ThePirateBay.

Unofficial Python API for ThePirateBay. Build Status Test Coverage Version Downloads (30 days) Installation $ pip install ThePirateBay Note that ThePi

334 Oct 21, 2022

The open source version of Tentro - A multipurpose Discord bot.

Welcome to Tentro 👋 A multipurpose Discord bot. 🏠 Homepage Install pip install -r requirements.txt Usage py Tentro.py Contributors 👤 Tentro Dev Tea

6 Jul 14, 2022

Python library to download market data via Bloomberg, Eikon, Quandl, Yahoo etc.

findatapy findatapy creates an easy to use Python API to download market data from many sources including Quandl, Bloomberg, Yahoo, Google etc. using

1.3k Jan 04, 2023

Extend the commitizen tools to create conventional commits and README that link to Jira and GitHub.

cz-github-jira-conventional cz-github-jira-conventional is a plugin for the commitizen tools, a toolset that helps you to create conventional commit m

12 Dec 13, 2022

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Easy-to-use toolkit for retrieval-based Chatbot Our released data can be found at this link. Make sure the following steps are adopted to use our code

32 Nov 13, 2022

Starlink Order Status Notification

Starlink Order Status Notification This script logs into Starlink order portal, pulls your estimated delivery date and emails it to a designated email

1 Jul 08, 2022

Morpy Bot Linux - Morpy Bot Linux With Python

Morpy_Bot_Linux Guide to using the robot : 🔸 Lsmod = to identify admins and st

2 Jan 20, 2022

An Anime Theme Telegram group management bot. With lot of features.

Emilia Project Emilia-Prjkt is a modular bot running on python3 with anime theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is

3 Feb 03, 2022

The Python client library for the Tuneup Technology App.

Tuneup Technology App Python Client Library The Python client library for the Tuneup Technology App. This library allows you to interact with the cust

0 Jun 29, 2022

Simple Python Auto Follow Bot

Instagram-Auto-Follow-Bot Description Một IG BOT đơn giản. Tự động follow những người mà bạn muốn cướp follow. Tự động unfollow. Tự động đăng nhập vào

3 Aug 27, 2022

Public release of Telepathy, an OSINT toolkit for investigating Telegram groups. Enhanced features and improvements will be added over time.

Telepathy Welcome to Telepathy, an OSINT toolkit for scraping Telegram data to help investigate shady goings on. Currently, the tool is limited to scr

484 Jan 01, 2023

Deep reinforcement learning library built on top of Neural Network Libraries

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

Installation

Enabling GPU accelaration (Optional)

Features

Friendly API

Many builtin algorithms

Seemless switching of online and offline training

Getting started

Documentation

Contribution guide

License

Comments

visualizing influence metrics

sample animation

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)

v0.11.0(Mar 17, 2022)

v0.10.0(Oct 20, 2021)

v0.9.0(Jun 14, 2021)

Getting started

Owner

Sony

The most annoying bot on Discord

A tool for creating credentials for accessing S3 buckets

This is a scalable system that reads messages from public Telegram channels using Telethon and stores the data in a PostgreSQL database.

:cloud: Python API for ThePirateBay.

The open source version of Tentro - A multipurpose Discord bot.

Python library to download market data via Bloomberg, Eikon, Quandl, Yahoo etc.

Extend the commitizen tools to create conventional commits and README that link to Jira and GitHub.

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Starlink Order Status Notification

Morpy Bot Linux - Morpy Bot Linux With Python

An Anime Theme Telegram group management bot. With lot of features.

The Python client library for the Tuneup Technology App.

Simple Python Auto Follow Bot

Public release of Telepathy, an OSINT toolkit for investigating Telegram groups. Enhanced features and improvements will be added over time.

A Discord Bot created using Pycord!

Powerful and Advance Telegram Bot with soo many features😋🔥❤

pyhakuna is a client to access the API of the time keeping service hakuna.ch.

A basic template for Creating Odoo Module

A python library for anti-captcha.com

Telegram vc userbot