Exploration-Exploitation Dilemma Solving Methods

Medium article for this repo - HERE

In ths repo I implemented two techniques for tackling mentioned tradeoff. Methods Include:-

Epsilon Greedy (With different epsilons)
Thompson Sampling(also known as posterior samplin

The reason for choosing these two only is to show the upper and lower bounds as epsilons are a starting point in dealing with these tradeoffs and Thompson Sampling is considered a recent state of the Art in this field.

ENV SPECIFICATIONS - A 10 arm testbed is simulated as same demonstrated in Sutton-Barto Book.
True Reward distribution (Here Action-2 is best)

Comparison Greedy(or Epsilon Greedies and TS

we used three different epsilons here for testing i.e:

epsilon = 0 => Greedy Agent
epsilon = 0.01 => exploration with 1% probability
epsilon = 0.1 => exploration with 10% probability

and TS

Averaged Over 2500 independent runs with 1500 timesteps

Comparison

Percentage Actions selected for epsilon = 0.01 and TS

Conclusion -> epsilon = 0.01 can be considered best for eps-greedies as it is increasing but pretty slow and the percentage Optimal Actions for it is Around 80% in later stages, on the other hand Thomsan Sampling shows a significant improvement in these results as it quickly explores and then exploit the optimal one with percentage goes upto almost 100 even very early!!.

In case you want to know more about TS visit this Reference.

Updates:

Added another version of Thompson Sampling known as Optimistic Bayesian Sampling. For Info about the paper and Algorithm, read this papaer - Optimistic Bayesian Sampling for Contextual Bandits

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
.gitignore		.gitignore
Agents.py		Agents.py
LICENSE		LICENSE
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

.gitignore

.gitignore

Agents.py

Agents.py

LICENSE

LICENSE

README.md

README.md

run.py

run.py

Repository files navigation

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

Updates:

About

Releases

Packages

Languages

License

Amshra267/Thompson-Greedy-Comparison-for-MultiArmed-Bandits

Folders and files

Latest commit

History

Repository files navigation

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

Updates:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages