ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Last update: Jan 08, 2023

Overview

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight: The core codes <1,000 lines (check elegantrl/tutorial), using PyTorch (train), OpenAI Gym (env), NumPy, Matplotlib (plot).
Efficient: performance is comparable with Ray RLlib.
Stable: as stable as Stable Baseline 3.

Currently, model-free deep reinforcement learning (DRL) algorithms:

DDPG, TD3, SAC, A2C, PPO, PPO(GAE) for continuous actions
DQN, DoubleDQN, D3QN for discrete actions

For DRL algorithms, please check out the educational webpage OpenAI Spinning Up.

Check out the ElegantRL documentation.

News
File Structure
Training Pipeline
Experimental Results
Requirements
Model-free DRL Algorithms

News

[Towardsdatascience] ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library
ElegantRL: Mastering PPO Algorithms (Part I)
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I)
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II)

File Structure

An agent in agent.py uses networks in net.py and is trained in run.py by interacting with an environment in env.py.

-----kernel file----

elegantrl/net.py # Neural networks.
- Q-Net,
- Actor Network,
- Critic Network,
elegantrl/agent.py # RL algorithms.
- AgentBase
elegantrl/run.py # run DEMO 1 ~ 4
- Parameter initialization,
- Training loop,
- Evaluator.

-----utils file----

elegantrl/envs/ # gym env or custom env, including FinanceStockEnv.
- gym_utils.py: A PreprocessEnv class for gym-environment modification.
- Stock_Trading_Env: A self-created stock trading environment as an example for user customization.
eRL_demo_BipedalWalker.ipynb # BipedalWalker-v2 in jupyter notebooks
eRL_demos.ipynb # Demo 1~4 in jupyter notebooks. Tell you how to use tutorial version and advanced version.
eRL_demo_SingleFilePPO.py # Use single file to train PPO, more simple than tutorial version
eRL_demo_StockTrading.py # Stock Trading Application in jupyter notebooks

As a high-level overview, the relations among the files are as follows. Initialize an environment in Env.py and an agent in Agent.py. The agent is constructed with Actor and Critic networks in Net.py. In each training step in Run.py, the agent interacts with the environment, generating transitions that are stored into a Replay Buffer. Then, the agent fetches transitions from the Replay Buffer to train its networks. After each update, an evaluator evaluates the agent's performance and saves the agent if the performance is good.

Training Pipeline

Initialization:

hyper-parameters args.
env = PreprocessEnv() : creates an environment (in the OpenAI gym format).
agent = agent.XXX() : creates an agent for a DRL algorithm.
evaluator = Evaluator() : evaluates and stores the trained model.
buffer = ReplayBuffer() : stores the transitions.

Then, the training process is controlled by a while-loop:

agent.explore_env(…): the agent explores the environment within target steps, generates transitions, and stores them into the ReplayBuffer.
agent.update_net(…): the agent uses a batch from the ReplayBuffer to update the network parameters.
evaluator.evaluate_save(…): evaluates the agent's performance and keeps the trained model with the highest score.

The while-loop will terminate when the conditions are met, e.g., achieving a target score, maximum steps, or manually breaks.

Experimental Results

Results using ElegantRL

LunarLanderContinuous-v2

BipedalWalkerHardcore-v2

BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward.

Check out a video on bilibili: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.6+     |           
| PyTorch 1.6+    |    

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==1.18 pyglet==1.6. Change to gym==1.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. Evaluate the agent performance.

pip3 install gym==1.17.0 pybullet Box2D matplotlib

Citation:

To cite this repository:

@misc{rlalgorithms,
  author = {Xiao-Yang Liu, Zechu Li, Zhaoran Wang, Jiahao Zheng},
  title = {ElegantRL: A Lightweight and Stable Deep Reinforcement Learning Library},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-LLC/ElegantRL}},
}

Comments

Bump py from 1.6.0 to 1.10.0 in /elegantrl/envs/SMAC
Bumps py from 1.6.0 to 1.10.0.

Changelog

Sourced from py's changelog.

1.10.0 (2020-12-12)

Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)

Update vendored apipkg: 1.4 => 1.5

Update vendored iniconfig: 1.0.0 => 1.1.1

1.9.0 (2020-06-24)

Add type annotation stubs for the following modules:

py.error

py.iniconfig

py.path (not including SVN paths)

py.io

py.xml

There are no plans to type other modules at this time.

The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

1.8.2 (2020-06-15)

On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

1.8.1 (2019-12-27)

Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

1.8.0 (2019-02-21)

add "importlib" pyimport mode for python3.5+, allowing unimportable test suites to contain identically named modules.

fix LocalPath.as_cwd() not calling os.chdir() with None, when being invoked from a non-existing directory.

... (truncated)

Commits

e5ff378 Update CHANGELOG for 1.10.0

94cf44f Update vendored libs

5e8ded5 testing: comment out an assert which fails on Python 3.9 for now

afdffcc Rename HOWTORELEASE.rst to RELEASING.rst

2de53a6 Merge pull request #266 from nicoddemus/gh-actions

fa1b32e Merge pull request #264 from hugovk/patch-2

887d6b8 Skip test_samefile_symlink on pypy3 on Windows

e94e670 Fix test_comments() in test_source

fef9a32 Adapt test

4a694b0 Add GitHub Actions badge to README

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
opened by dependabot[bot] 11
Bump pyyaml from 3.13 to 5.4 in /elegantrl/envs/SMAC
Bumps pyyaml from 3.13 to 5.4.

Changelog

Sourced from pyyaml's changelog.

5.4 (2021-01-19)

yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA

yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader

yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup

yaml/pyyaml#392 -- Fix py2 copy support for timezone objects

yaml/pyyaml#378 -- Fix compatibility with Jython

5.3.1 (2020-03-18)

yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

yaml/pyyaml#290 -- Use is instead of equality for comparing with None

yaml/pyyaml#270 -- Fix typos and stylistic nit

yaml/pyyaml#309 -- Fix up small typo

yaml/pyyaml#161 -- Fix handling of slots

yaml/pyyaml#358 -- Allow calling add_multi_constructor with None

yaml/pyyaml#285 -- Add use of safe_load() function in README

yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF

yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff

yaml/pyyaml#359 -- Use full_load in yaml-highlight example

yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython

yaml/pyyaml#329 -- Fix for Python 3.10

yaml/pyyaml#310 -- Increase size of index, line, and column fields

yaml/pyyaml#260 -- Remove some unused imports

yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such

yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)

Repair incompatibilities introduced with 5.1. The default Loader was changed, but several methods like add_constructor still used the old default yaml/pyyaml#279 -- A more flexible fix for custom tag constructors yaml/pyyaml#287 -- Change default loader for yaml.add_constructor yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver

Make FullLoader safer by removing python/object/apply from the default FullLoader yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor

Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff yaml/pyyaml#276 -- Fix logic for quoting special characters

Other PRs: yaml/pyyaml#280 -- Update CHANGES for 5.1

5.1.2 (2019-07-30)

Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+

... (truncated)

Commits

58d0cb7 5.4 release

a60f7a1 Fix compatibility with Jython

ee98abd Run CI on PR base branch changes

ddf2033 constructor.timezone: _copy & deepcopy

fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers

a001f27 Fix for CVE-2020-14343

fe15062 Add 3.9 to appveyor file for completeness sake

1e1c7fb Add a newline character to end of pyproject.toml

0b6b7d6 Start sentences and phrases for capital letters

c976915 Shell code improvements

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
opened by dependabot[bot] 11
Bump py from 1.6.0 to 1.10.0 in /elegantrl/elegantrl/envs/starcraft
Bumps py from 1.6.0 to 1.10.0.

Changelog

Sourced from py's changelog.

1.10.0 (2020-12-12)

Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)

Update vendored apipkg: 1.4 => 1.5

Update vendored iniconfig: 1.0.0 => 1.1.1

1.9.0 (2020-06-24)

Add type annotation stubs for the following modules:

py.error

py.iniconfig

py.path (not including SVN paths)

py.io

py.xml

There are no plans to type other modules at this time.

The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

1.8.2 (2020-06-15)

On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

1.8.1 (2019-12-27)

Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

1.8.0 (2019-02-21)

add "importlib" pyimport mode for python3.5+, allowing unimportable test suites to contain identically named modules.

fix LocalPath.as_cwd() not calling os.chdir() with None, when being invoked from a non-existing directory.

... (truncated)

Commits

e5ff378 Update CHANGELOG for 1.10.0

94cf44f Update vendored libs

5e8ded5 testing: comment out an assert which fails on Python 3.9 for now

afdffcc Rename HOWTORELEASE.rst to RELEASING.rst

2de53a6 Merge pull request #266 from nicoddemus/gh-actions

fa1b32e Merge pull request #264 from hugovk/patch-2

887d6b8 Skip test_samefile_symlink on pypy3 on Windows

e94e670 Fix test_comments() in test_source

fef9a32 Adapt test

4a694b0 Add GitHub Actions badge to README

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
opened by dependabot[bot] 10
Bump pyyaml from 3.13 to 5.4 in /elegantrl/elegantrl/envs/starcraft
Bumps pyyaml from 3.13 to 5.4.

Changelog

Sourced from pyyaml's changelog.

5.4 (2021-01-19)

yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA

yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader

yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup

yaml/pyyaml#392 -- Fix py2 copy support for timezone objects

yaml/pyyaml#378 -- Fix compatibility with Jython

5.3.1 (2020-03-18)

yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

yaml/pyyaml#290 -- Use is instead of equality for comparing with None

yaml/pyyaml#270 -- Fix typos and stylistic nit

yaml/pyyaml#309 -- Fix up small typo

yaml/pyyaml#161 -- Fix handling of slots

yaml/pyyaml#358 -- Allow calling add_multi_constructor with None

yaml/pyyaml#285 -- Add use of safe_load() function in README

yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF

yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff

yaml/pyyaml#359 -- Use full_load in yaml-highlight example

yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython

yaml/pyyaml#329 -- Fix for Python 3.10

yaml/pyyaml#310 -- Increase size of index, line, and column fields

yaml/pyyaml#260 -- Remove some unused imports

yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such

yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)

Repair incompatibilities introduced with 5.1. The default Loader was changed, but several methods like add_constructor still used the old default yaml/pyyaml#279 -- A more flexible fix for custom tag constructors yaml/pyyaml#287 -- Change default loader for yaml.add_constructor yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver

Make FullLoader safer by removing python/object/apply from the default FullLoader yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor

Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff yaml/pyyaml#276 -- Fix logic for quoting special characters

Other PRs: yaml/pyyaml#280 -- Update CHANGES for 5.1

5.1.2 (2019-07-30)

Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+

... (truncated)

Commits

58d0cb7 5.4 release

a60f7a1 Fix compatibility with Jython

ee98abd Run CI on PR base branch changes

ddf2033 constructor.timezone: _copy & deepcopy

fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers

a001f27 Fix for CVE-2020-14343

fe15062 Add 3.9 to appveyor file for completeness sake

1e1c7fb Add a newline character to end of pyproject.toml

0b6b7d6 Start sentences and phrases for capital letters

c976915 Shell code improvements

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
opened by dependabot[bot] 10

Unable to train on many agents

Whenever I try to train on agents, I consistently get the error:

AttributeError: type object 'AgentXYZ' has no attribute 'if_off_policy'

For example, here is the error for AgentDQN:

Traceback (most recent call last):
  File "/home/momin/Documents/GitHub/ElegantRL/tests/test_training_agents.py", line 50, in test_should_create_arguments_for_each_agent
    Arguments(agent, env_func=gym.make, env_args=self.discrete_env_args)
  File "/home/momin/Documents/GitHub/ElegantRL/elegantrl/train/config.py", line 140, in __init__
    self.if_off_policy = agent.if_off_policy  # agent is on-policy or off-policy
AttributeError: type object 'AgentDQN' has no attribute 'if_off_policy'

I can confirm that this error affects the following agents: AgentDQN AgentD3QN AgentDDPG AgentDiscretePPO AgentDoubleDQN AgentDuelingDQN AgentModSAC AgentPPO_H AgentPPO AgentSAC_H AgentSAC AgentTD3

@shixun404

bug

opened by hmomin 8

Cannot find reference 'ActorMAPPO' in 'net.py'

Hi I want to use this library for Multiagent RL, in AgentMAPPO.py file there are two undefined references from elegantrl.agents.net import ActorMAPPO, CriticMAPPO ActorMAPPO and CriticMAPPO how can I fix can this?
bug

opened by josyulakrishna 7
several issues found in recent update
in train/config.py it calls function self.get_if_off_policy(), but actually the function name is if_off_policy()

in train/config.py it calls self.agent_class.name , but 'Arguments' object has no attribute 'agent_class'

in train/run.py it calls args.agent_class(), but actually these is no agent_class in Arguments. similar issue to above

in train/run.py it calls args.max_memo, error message: 'Arguments' object has no attribute 'max_memo'

in train/run.py ti calls args.eval_env_func, error message: 'Arguments' object has no attribute 'eval_env_func' 6...

bug good first issue
opened by richardhuo 6
SAC : why actor has a target network? Why ModSAC has a Reliable lamdba and TTUR?

你好我看到在代码中，sac的actor也有target_net。这个在其他implementation，比如stable_baseline3, spinning_up都没有出现。 Spinning Up: SAC中也有强调，

Unlike in TD3, the next-state actions used in the target come from the current policy instead of a target policy.

请问下，加上target network是为了得到更稳定的actor吗？
dicussion

opened by wsgdrfz 6
Fix dead links to `elegantrl_helloworld`

The links to ElegantRL/helloworld in the "Hello World" section at the latest documentation (https://elegantrl.readthedocs.io/en/latest/helloworld/intro.html) are broken. I believe it was renamed in the repo but the change wasn't reflected in the docs. This fixes the broken links to point to the new remote url https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/helloworld.

(Other docs pages that reference ElegantRL/helloworld/ might still be broken too (!!) )

opened by Siraj-Qazi 5

Fail to run tutorial_Isaac_Gym.py

Hello! Thank you for creating this brilliant library! This is so helpful on a personal project I am working on. I faced an error when trying to run tutorial_Isaac_Gym.py in the example folder:

Traceback (most recent call last):
  File "/home/meow/anaconda3/envs/igym/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/meow/anaconda3/envs/igym/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/meow/ElegantRL/elegantrl/train/run.py", line 162, in run
    env = build_env(args.env, args.env_func, args.env_args)
  File "/home/meow/ElegantRL/elegantrl/train/config.py", line 249, in build_env
    env = env_func(**kwargs_filter(env_func.__init__, env_args.copy()))
  File "/home/meow/ElegantRL/elegantrl/envs/IsaacGym.py", line 45, in __init__
    env: VecTask = isaac_task(
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 69, in __init__
    super().__init__(
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/base/vec_task.py", line 213, in __init__
    self.create_sim()
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 156, in create_sim
    self._create_envs(
  File "/home/meow/ElegantRL/elegantrl/envs/isaac_tasks/ant.py", line 199, in _create_envs
    self.joint_gears = to_torch(motor_efforts, device=self.device)
  File "/home/meow/Downloads/IsaacGym_Preview_3_Package/isaacgym/python/isaacgym/torch_utils.py", line 16, in to_torch
    return torch.tensor(x, dtype=dtype, device=device, requires_grad=requires_grad)
  File "/home/meow/anaconda3/envs/igym/lib/python3.8/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
    torch._C._cuda_init()
RuntimeError: CUDA error: out of memory

I'm running this on NVIDIA RTX3070TI with 8GB VRAM, and my CUDA version is:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

The same Ant(with 2048env) example was working when I test it using the original isaac gym train.py. I'm pretty sure that I have free VRAM (~7.2GB) when running this but it still appears the CUDA out of memory error. My torch version is 1.11.0.

I have also tried to reduce the number of envs, batch size, network size and other parameters, but the error remains.

Once again thank you so much for any possible help on this issue

bug

opened by planetbalileua 5

TypeError: __init__() missing 1 required positional argument: 'action_dim'

If you run your own example, errors will be reported. Please help us to find out what the problem is?

/opt/anaconda3/envs/elegant_RL/bin/python /home/lhs/PycharmProjects/elegant_RL/tutorial_BipedalWalker-v3.py WARNING: env.action_space.high [1. 1. 1. 1.] env_args = { 'env_num': 1, 'env_name': 'BipedalWalker-v3', 'max_step': 1600, 'state_dim': 24, 'action_dim': 4, 'if_discrete': False, 'target_return': 300, } | Arguments Remove cwd: ./BipedalWalker-v3_PPO_0 Traceback (most recent call last): File "/home/lhs/PycharmProjects/elegant_RL/tutorial_BipedalWalker-v3.py", line 32, in train_and_evaluate(args) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/train/run.py", line 87, in train_and_evaluate agent = init_agent(args, gpu_id, env) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/train/run.py", line 16, in init_agent agent = args.agent_class(args.net_dim, args.state_dim, args.action_dim, gpu_id=gpu_id, args=args) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/agents/AgentPPO.py", line 38, in init AgentBase.init(self, net_dim, state_dim, action_dim, gpu_id, args) File "/opt/anaconda3/envs/elegant_RL/lib/python3.8/site-packages/elegantrl/agents/AgentBase.py", line 53, in init self.act = act_class(net_dim, state_dim, action_dim).to(self.device) TypeError: init() missing 1 required positional argument: 'action_dim'
bug

opened by haisheng666 4
ActorFixSAC or AgentBase的__init__有bug?
运行 tutorial_LunarLanderContinuous_v2.ipynb 报错：

ElegantRL\elegantrl\agents\AgentBase.py:58, in AgentBase.init(self, net_dim, state_dim, action_dim, gpu_id, args) 56 cri_class = getattr(self, "cri_class", None) 57 print(act_class) ---> 58 self.act = act_class(net_dim, state_dim, action_dim).to(self.device) 59 self.cri = cri_class(net_dim, state_dim, action_dim).to(self.device)
60 if cri_class else self.act 62 '''optimizer'''

TypeError: init() missing 1 required positional argument: 'action_dim' 3. 看了一下,调用的是 net.py ActorFixSAC 的def init(self, mid_dim, num_layer, state_dim, action_dim): 4. 多了一个参数num_layer，而且其它地方也没有用，应该要把这个里删除吧。 5. 我修改AgentBase 的init 总算成功运行了：self.act = act_class(net_dim, self.num_layer, state_dim, action_dim).to(self.device) self.cri = cri_class(net_dim, self.num_layer, state_dim, action_dim).to(self.device)
bug
opened by flhang 0
Issue with MADDPG and MATD3

Hi! I am trying to use ElegantRL for multi-agent RL training as it seems very well written.

I tried to use MADDPG or MATD3. But none of these agents seem to be runnable. For example, the construction method for AgentDDPG requires arguments: https://github.com/AI4Finance-Foundation/ElegantRL/blob/b447f3a04993e0ab8fc11017c1b20c6d560f493b/elegantrl/agents/AgentDDPG.py#L29

But the MADDPG or MATD3 implementation doesn't provide that. https://github.com/AI4Finance-Foundation/ElegantRL/blob/b447f3a04993e0ab8fc11017c1b20c6d560f493b/elegantrl/agents/AgentMADDPG.py#L43

There are also other places that don't seem to be compatible.

I wonder if this is a problem with the multi-agent implementations using a legacy version of the codebase. And is it possible to provide a minimal working demo for MADDPG or MATD3?

Thanks a lot!!
bug

opened by Gabr1e1 2
A policy update bug in AgentPPO?
The following codes show that the policy used to explore the env (generate the action and logprob) is 'self.act',

get_action = self.act.get_action convert = self.act.convert_action_for_env for i in range(horizon_len): state = torch.as_tensor(ary_state, dtype=torch.float32, device=self.device) action, logprob = [t.squeeze() for t in get_action(state.unsqueeze(0))]

while in the update function, the actions and policy used to calculate the 'new_log_prob' are exactly the same as the ones above:

new_logprob, obj_entropy = self.act.get_logprob_entropy(state, action) ratio = (new_logprob - logprob.detach()).exp()

I think that 'ratio' will be always 1. Is it a bug or there is something I misunderstand?
dicussion
opened by huge123 0
Fix the AgentBase.__init__ () for all the DRL algorithms in folder /elegantrl/agents
In the commit we still have

self.act = act_class(net_dim, state_dim, action_dim).to(self.device) self.cri = cri_class(net_dim, state_dim, action_dim).to(self.device)
if cri_class else self.act the example still crashes for me

Originally posted by @JonathanLehner in https://github.com/AI4Finance-Foundation/ElegantRL/issues/239#issuecomment-1352250265
refactoring
opened by Yonv1943 0

bug:TypeError: Value after * must be an iterable, not int

Run the FinRL_MultiCrypto_Trading.py got error, please fix it

binance successfully connected
tech_indicator_list:  ['macd', 'rsi', 'cci', 'dx']
indicator:  macd
indicator:  rsi
indicator:  cci
indicator:  dx
Succesfully add technical indicators
Successfully transformed into array
| Arguments Remove cwd: ./ppo
Traceback (most recent call last):
  File "/Users/quran/SourceCode/easy_live/FinRL-Meta/tutorials/3-Practical/FinRL_MultiCrypto_Trading.py", line 69, in <module>
    train(
  File "/Users/quran/SourceCode/easy_live/FinRL-Meta/train.py", line 51, in train
    trained_model = agent.train_model(
  File "/Users/quran/SourceCode/easy_live/FinRL-Meta/agents/elegantrl_models.py", line 79, in train_model
    train_and_evaluate(model)
  File "/Users/quran/opt/anaconda3/envs/finrl-meta/lib/python3.10/site-packages/elegantrl/train/run.py", line 95, in train_and_evaluate
    agent = init_agent(args, gpu_id, env)
  File "/Users/quran/opt/anaconda3/envs/finrl-meta/lib/python3.10/site-packages/elegantrl/train/run.py", line 24, in init_agent
    agent = args.agent_class(args.net_dim, args.state_dim, args.action_dim, gpu_id=gpu_id, args=args)
  File "/Users/quran/opt/anaconda3/envs/finrl-meta/lib/python3.10/site-packages/elegantrl/agents/AgentPPO.py", line 40, in __init__
    AgentBase.__init__(self, net_dim, state_dim, action_dim, gpu_id, args)
  File "/Users/quran/opt/anaconda3/envs/finrl-meta/lib/python3.10/site-packages/elegantrl/agents/AgentBase.py", line 57, in __init__
    self.act = act_class(net_dim, state_dim, action_dim).to(self.device)
  File "/Users/quran/opt/anaconda3/envs/finrl-meta/lib/python3.10/site-packages/elegantrl/agents/net.py", line 397, in __init__
    self.net = build_mlp_net(dims=[state_dim, *dims, action_dim])
TypeError: Value after * must be an iterable, not int

Process finished with exit code 1

bug

opened by Praying 4

Releases(v0.3.5)

v0.3.5(Jun 26, 2022)

Source code(tar.gz)
Source code(zip)

Owner

AI4Finance

Open Source Community in Finance.

GitHub Repository

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Related tags

Overview

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

Table of Contents

News

File Structure

Training Pipeline

Initialization:

Then, the training process is controlled by a while-loop:

Experimental Results

Requirements

Citation:

Comments

1.10.0 (2020-12-12)

1.9.0 (2020-06-24)

1.8.2 (2020-06-15)

1.8.1 (2019-12-27)

1.8.0 (2019-02-21)

5.2 (2019-12-02)

5.1.2 (2019-07-30)

1.10.0 (2020-12-12)

1.9.0 (2020-06-24)

1.8.2 (2020-06-15)

1.8.1 (2019-12-27)

1.8.0 (2019-02-21)

5.2 (2019-12-02)

5.1.2 (2019-07-30)

Releases(v0.3.5)

v0.3.5(Jun 26, 2022)

Owner

AI4Finance

SmartSim Infrastructure Library.

Single/multi view image(s) to voxel reconstruction using a recurrent neural network

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

🌳 A Python-inspired implementation of the Optimum-Path Forest classifier.

Pytorch implementation of set transformer

Implementation for Simple Spectral Graph Convolution in ICLR 2021

Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

👨‍💻 run nanosaur in simulation with Gazebo/Ingnition

Python script that takes an Impulse response .wav and a input .wav to demonstrate audio convolution.

A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

[ICCV21] Code for RetrievalFuse: Neural 3D Scene Reconstruction with a Database

a minimal terminal with python 😎😉

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

VOGUE: Try-On by StyleGAN Interpolation Optimization

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

Improving adversarial robustness by a coupling rejection strategy

A very short and easy implementation of Quantile Regression DQN

This repository is dedicated to developing and maintaining code for experiments with wide neural networks.