DvD-TD3: Diversity via Determinants for TD3 version

The implementation of paper Effective Diversity in Population Based Reinforcement Learning.

Install

Install pbrl and clone this repo:

git clone https://github.com/jjccero/DvD_TD3
cd DvD_TD3
python train_dvd.py

Notes

Kernel Matrix

When DPP kernel matrix uses dot product kernel (or cosine similarity, see loss.py) instead of RBF as entry, we can take a linear mapping to make the value between 0 and 1. The beta makes the matrix positive-definite.

logdet

I'm not sure whether to take the logarithms of determinant. The author believes that this does not matter. In addition, I find that the numerical instability of logdet may be the reason for the gradients explosion or disappearance of policy networks, so I use det instead of logdet for optimization.

State/Reward Filter

In order to scale observations and rewards (when obs_norm=True), I calculate the local RunningMeanStd for each policy. When using the central Q-function, it needs to calculate the global RunningMeanStd via local RunningMeanStds.

Thank Jack Parker-Holder (the author of the paper) for his help. And welcome to get in touch with me if you have any questions about this implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dvd		dvd
.gitignore		.gitignore
README.md		README.md
train_dvd.py		train_dvd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dvd

dvd

.gitignore

.gitignore

README.md

README.md

train_dvd.py

train_dvd.py

Repository files navigation

DvD-TD3: Diversity via Determinants for TD3 version

Install

Notes

Kernel Matrix

logdet

State/Reward Filter

About

Releases

Packages

Languages

jjccero/DvD_TD3

Folders and files

Latest commit

History

Repository files navigation

DvD-TD3: Diversity via Determinants for TD3 version

Install

Notes

Kernel Matrix

logdet

State/Reward Filter

About

Resources

Stars

Watchers

Forks

Languages