Deep reinforcement learning library built on top of Neural Network Libraries

Overview

License Build status

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]
# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title Notebook Target RL task
Simple reinforcement learning training to get started Open In Colab Pendulum
Learn how to use training algorithms Open In Colab Pendulum
Learn how to use customized network model for training Open In Colab Mountain car
Learn how to use different network solver for training Open In Colab Pendulum
Learn how to use different replay buffer for training Open In Colab Pendulum
Learn how to use your own environment for training Open In Colab Customized environment
Atari game training example Open In Colab Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments
  • Update cem function interface

    Update cem function interface

    Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

    opened by sbsekiguchi 1
  • Add implementation for RNN support and DRQN algorithm

    Add implementation for RNN support and DRQN algorithm

    Add RNN model support and DRQN algorithm.

    Following trainers will support RNN-model.

    • Q value-based trainers
    • Deterministic gradient and Soft policy trainers

    Other trainers can support RNN models in future but is not implemented in the initial release.

    See this paper for the details of the DRQN algorithm.

    opened by ishihara-y 1
  • Implement SACD

    Implement SACD

    This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

    These changes have been made:

    • New environments with factored reward functions have been added
      • FactoredLunarLanderContinuousV2NNablaRL-v1
      • FactoredAntV4NNablaRL-v1
      • FactoredHopperV4NNablaRL-v1
      • FactoredHalfCheetahV4NNablaRL-v1
      • FactoredWalker2dV4NNablaRL-v1
      • FactoredHumanoidV4NNablaRL-v1
    • SACD algorithms has been added
    • SoftQDTrainer has been added
    • _InfluenceMetricsEvaluator has been added
    • reproduction script has been added (not benchmarked yet)

    visualizing influence metrics

    import gym
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    
    evaluation_hook = H.EvaluationHook(
        eval_env,
        EpisodicEvaluator(run_per_evaluation=10),
        timing=5000,
        writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
    )
    iteration_num_hook = H.IterationNumHook(timing=100)
    
    config = A.SACDConfig(gpu_id=0, reward_dimension=9)
    sacd = A.SACD(env, config=config)
    sacd.set_hooks([iteration_num_hook, evaluation_hook])
    sacd.train_online(env, total_iterations=100000)
    
    influence_history = []
    
    state = env.reset()
    while True:
        action = sacd.compute_eval_action(state)
        influence = sacd.compute_influence_metrics(state, action)
        influence_history.append(influence)
        state, _, done, _ = env.step(action)
        if done:
            break
    
    influence_history = np.array(influence_history)
    for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
        plt.plot(influence_history[:, i], label=label)
    plt.xlabel("step")
    plt.ylabel("influence metrics")
    plt.legend()
    plt.show()
    

    image

    sample animation

    sample

    opened by ishihara-y 0
  • Add gmm and Update gaussian

    Add gmm and Update gaussian

    Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

    The API change is like following:

    Previous :

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    distribution = D.Gaussian(mean, ln_var)
    # return nn.Variable
    assert isinstance(distribution.sample(), nn.Variable)
    

    Updated:

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    # You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
    distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
    assert isinstance(distribution.sample(), nn.Variable)
    
    # If you pass np.ndarray, then all class methods return np.ndarray
    # Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
    distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
    assert isinstance(distribution.sample(), np.ndarray)
    
    opened by sbsekiguchi 0
  • Support nnabla-browser

    Support nnabla-browser

    • [x] add MonitorWriter
    • [x] save computational graph as nntxt

    example

    import gym
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    # save training computational graph
    training_graph_hook = H.TrainingGraphHook(outdir="test")
    
    # evaluation hook with nnabla's Monitor
    eval_env = gym.make("Pendulum-v0")
    evaluator = EpisodicEvaluator(run_per_evaluation=10)
    evaluation_hook = H.EvaluationHook(
        eval_env,
        evaluator,
        timing=10,
        writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
    )
    
    env = gym.make("Pendulum-v0")
    sac = A.SAC(env)
    sac.set_hooks([training_graph_hook, evaluation_hook])
    
    sac.train_online(env, total_iterations=100)
    

    image image

    opened by ishihara-y 0
  • Add iLQR and LQR

    Add iLQR and LQR

    Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

    Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

    opened by ishihara-y 0
  • Check np_random instance and use correct randint alternative

    Check np_random instance and use correct randint alternative

    I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

    # in case of RandomState
    # this line works
    gym.unwrapped.np_random.rand_int(...)
    # in case of Generator
    # rand_int does not exist and we must use integers as an alternative
    gym.unwrapped.np_random.integers(...)
    

    This PR will fix this issue and chooses correct function for sampling integers.

    opened by ishihara-y 0
  • Add icra2018 qtopt

    Add icra2018 qtopt

    opened by sbsekiguchi 0
Releases(v0.12.0)
Owner
Sony
Sony Group Corporation
Sony
PaddleOCR推理的pytorch实现和模型转换

PaddleOCR2Pytorch 简介 ”真·白嫖“PaddleOCR 注意 PytorchOCR由PaddleOCR-2.0rc1+动态图版本移植。 特性 高质量推理模型,准确的识别效果 超轻量ptocr_mobile移动端系列 通用ptocr_server系列 支持中英文数字组合识别、竖排文本

519 Jan 08, 2023
A python api to get info on covid-19

A python api to get info on covid-19

roof 2 Sep 18, 2022
Discord bot written in python

Discord bot created by dpshark#3004 for fun List of features/commands: [keyword] responses tools !add [respons] Adds new response to [keyword] !remove

Daniel K.Gunleiksrud 3 Dec 28, 2021
veez music bot is a telegram music bot project, allow you to play music on voice chat group telegram.

🎶 Veez Music Bot Music bot for playing music on telegram voice chat group. Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.7+ PyTgCalls 🧪 Get

levina 143 Jun 19, 2022
Isobot is originally made by notsniped. This is a remix of iso.bot by archisha.

iso6.9-1.2beta iso.bot is originally made by notsniped#0002. This is a remix of iso.bot by αrchιshα#5518. iso6.9 is a Discord bot written in Python an

Kamilla Youver 3 Jan 11, 2022
A Discord bot made by QwertyIsCoding

QwertyBot QwertyBot A Discord bot made by QwertyIsCoding Explore the docs » View Demo . Report Bug . Request Feature About The Project This Discord bo

4 Oct 08, 2022
b2blaze

b2blaze Welcome to the b2blaze library for Python. Backblaze B2 provides the cheapest cloud object storage and transfer available on the internet. Com

George Sibble 603 Jan 03, 2023
A Telegram bot that searches for the original source of anime, manga, and art

A Telegram bot that searches for the original source of anime, manga, and art How to use the bot Just send a screenshot of the anime, manga or art or

Kira Kormak 9 Dec 28, 2022
Python library for Seeedstudio Grove devices

grove.py Python library for Seeedstudio Grove Devices on embeded Linux platform, especially good on below platforms: Coral Dev Board (Wiki) NVIDIA Jet

Seeed Studio 123 Dec 17, 2022
Droplink URL Shortener Bot, deployable to Heroku and Railway.

Droplink-bot Make short link by using Droplink API key. Made by @dakshy. Installation The Easy Way Required Variables BOT_TOKEN: Create a bot using @B

ToonsHub 5 Jun 25, 2022
(@Tablada32BOT is my bot in twitter) This is a simple bot, its main and only function is to reply to tweets where they mention their bot with their @

Remember If you are going to host your twitter bot on a page where they can read your code, I recommend that you create an .env file and put your twit

3 Jun 04, 2021
One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

Joakim Sørensen 39 Dec 31, 2022
🌶️ Give real chat boosting to your discord server.

Chat-Booster Give real chat boosting to your discord server. ✅ Setup: - Add token to scrape messages on server that you whant. - Put the token in

&! Ѵιchy.#0110 36 Nov 04, 2022
A discord nuking tool made by python, this also has nuke accounts, inbuilt Selfbot, Massreport, Token Grabber, Nitro Sniper and ALOT more!

Disclaimer: Rage Multi Tool was made for Educational Purposes This project was created only for good purposes and personal use. By using Rage, you agr

†† 50 Jul 19, 2022
Lazy airdrop based on private temporary ids

LobsterDAO This uses a modified MerkleDistributor, which allows to issue a lazy airdrop using temporary IDs. In this example it uses Telegram chat_id

41 Sep 10, 2022
Mikasa is a 100% Spanish bot, a multifunctional bot, Mikasa is in beta.

Mikasa Miaksa, It is a multi-functional discord bot that is currently in development, this is not complete, there are still many things to fix and imp

Made in 2 Oct 05, 2021
Pure Python implementation of the Windows API method IDvdInfo2::GetDiscID.

pydvdid-m Pure Python implementation of the Windows API method IDvdInfo2::GetDiscID. This is a modification of sjwood's pydvdid. The Windows API metho

4 Nov 22, 2022
NFT which pays royalties to its creator each time it is sold.

Chialisp NFT with Perpetual Creator Royalties This is a chialisp NFT in which the creator/minter defines a puzzle hash which will capture a fixed perc

Geoff Walmsley 20 Jun 28, 2022
Tools for use in DeFi. Impermanent Loss calculations, staking and farming strategies, coingecko and pancakeswap API queries, liquidity pools and more

DeFi open source tools Get Started Instalation General Tools Impermanent Loss, simple calculation Compare Buy & Hold with Staking and Farming Complete

Juan Pablo Pisano 467 Jan 08, 2023
BroBot's files, code and tester.

README - BroBOT Made by Rohan Chaturvedi [email protected] DISCLAIMER: Th

1 Jan 09, 2022