GitHub - Xingyu-Lin/mbpo_pytorch at zzun.app

Overview

This is a re-implementation of the model-based RL algorithm MBPO in pytorch as described in the following paper: When to Trust Your Model: Model-Based Policy Optimization.

This code is based on a previous paper in the NeurIPS reproducibility challenge that reproduces the result with a tensorflow ensemble model but shows a significant drop in performance with a pytorch ensemble model. This code re-implements the ensemble dynamics model with pytorch and closes the gap.

Reproduced results

The comparison are done on two tasks while other tasks are not tested. But on the tested two tasks, the pytorch implementation achieves similar performance compared to the official tensorflow code.

Dependencies

MuJoCo 1.5 & MuJoCo 2.0

Usage

python main_mbpo.py --env_name 'Walker2d-v2' --num_epoch 300 --model_type 'pytorch'

python main_mbpo.py --env_name 'Hopper-v2' --num_epoch 300 --model_type 'pytorch'

Reference

Official tensorflow implementation: https://github.com/JannerM/mbpo
Code to the reproducibility challenge paper: https://github.com/jxu43/replication-mbpo

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
results		results
sac		sac
tf_models		tf_models
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
main_mbpo.py		main_mbpo.py
make_plot.py		make_plot.py
model.py		model.py
predict_env.py		predict_env.py
sample_env.py		sample_env.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results

results

sac

sac

tf_models

tf_models

.gitignore

.gitignore

README.md

README.md

init.py

init.py

main_mbpo.py

main_mbpo.py

make_plot.py

make_plot.py

model.py

model.py

predict_env.py

predict_env.py

sample_env.py

sample_env.py

Repository files navigation

Overview

Reproduced results

Dependencies

Usage

Reference

About

Releases

Packages

Contributors 2

Languages

Xingyu-Lin/mbpo_pytorch

Folders and files

Latest commit

History

Repository files navigation

Overview

Reproduced results

Dependencies

Usage

Reference

About

Resources

Stars

Watchers

Forks

Languages