PyTorch implementation of TRPO

Try my implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

This is a PyTorch implementation of "Trust Region Policy Optimization (TRPO)".

This is code mostly ported from original implementation by John Schulman. In contrast to another implementation of TRPO in PyTorch, this implementation uses exact Hessian-vector product instead of finite differences approximation.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

python main.py --env-name "Reacher-v1"

Recommended hyper parameters

InvertedPendulum-v1: 5000

Reacher-v1, InvertedDoublePendulum-v1: 15000

HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000

Ant-v1, Humanoid-v1: 50000

Results

More or less similar to the original code. Coming soon.

Todo

Plots.
Collect data in multiple threads.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE.md		LICENSE.md
README.md		README.md
conjugate_gradients.py		conjugate_gradients.py
main.py		main.py
models.py		models.py
replay_memory.py		replay_memory.py
running_state.py		running_state.py
trpo.py		trpo.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE.md

LICENSE.md

README.md

README.md

conjugate_gradients.py

conjugate_gradients.py

main.py

main.py

models.py

models.py

replay_memory.py

replay_memory.py

running_state.py

running_state.py

trpo.py

trpo.py

utils.py

utils.py

Repository files navigation

PyTorch implementation of TRPO

Contributions

Usage

Recommended hyper parameters

Results

Todo

About

Releases

Packages

Languages

License

ikostrikov/pytorch-trpo

Folders and files

Latest commit

History

Repository files navigation

PyTorch implementation of TRPO

Contributions

Usage

Recommended hyper parameters

Results

Todo

About

Topics

Resources

License

Stars

Watchers

Forks

Languages