Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Python 3.8.0
pip install -r req.txt
Mujoco 200 license

Main Files

main.py: main run file for model training
models.py: neural networks for policy and critic models
optim.py: second-order approximations for realizing the natural gradient
utils.py: helper functions

Reproducing Experiments

scripts/: bash training scripts formatted for compute canada/SLURM jobs
visualize/json: training hyperparameters for each experiment
visualize/csv: training results in .csv format
visualize/performance.py: (after training) view results & create .csv results
- best to run with VSCode ipython cells

Experiment Example

To run the baseline experiments:

Tune hparams: bash scripts/hparams/baseline.sh
- runs will be saved in runs/hparams_baseline/...
Extract best hparams from runs: python baseline_hparams.py
- the best hparams will be saved in visualize/json/baseline.json
Run training with hparams: bash scripts/baseline/diagonal.sh
- runs will be saved in runs/5e6_baseline/...
Run speed tests: bash scripts/speed/baseline.sh
- runs will be saved in runs/baseline_speed/...
View results: run interactive ipython in visualize/performance.py

# %%
runs_path = pathlib.Path("../runs/5e6_baseline/")
speed_runs_path = pathlib.Path("../runs/baseline_speed/")
name = "baseline"
baseline_data = analyze(runs_path, speed_runs_path)
baseline_df = mean_df(*baseline_data, name, save=True)

Second-order Approximation References

Implementations

Other

Code formatted with Black
Experiment runs format: runs/{experiment_name}/{env_name}/{approximation}_runs/{tensorboard folder}/...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

visualize

visualize

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

models.py

models.py

optim.py

optim.py

req.txt

req.txt

utils.py

utils.py

Repository files navigation

Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Main Files

Reproducing Experiments

Experiment Example

Second-order Approximation References

Implementations

Other

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
models.py		models.py
optim.py		optim.py
req.txt		req.txt
utils.py		utils.py

License

gebob19/natural-policy-gradient-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning [ArXiv]

Setup

Main Files

Reproducing Experiments

Experiment Example

Second-order Approximation References

Implementations

Other

About

Resources

License

Stars

Watchers

Forks

Languages