Exploration-Exploitation Dilemma Solving Methods

Medium article for this repo - HERE

In ths repo I implemented two techniques for tackling mentioned tradeoff. Methods Include:-

Epsilon Greedy (With different epsilons)
Thompson Sampling(also known as posterior sampling)

The reason for choosing these two only is to show the upper and lower bounds as epsilons are a starting point in dealing with these tradeoffs and Thompson Sampling is considered a recent state of the Art in this field.

ENV SPECIFICATIONS - A 10 arm testbed is simulated as same demonstrated in Sutton-Barto Book.
True Reward distribution (Here Action-2 is best)

Comparison Greedy(or Epsilon Greedies and TS

we used three different epsilons here for testing i.e:

epsilon = 0 => Greedy Agent
epsilon = 0.01 => exploration with 1% probability
epsilon = 0.1 => exploration with 10% probability

and TS

Averaged Over 2500 independent runs with 1500 timesteps

Comparison

Percentage Actions selected for epsilon = 0.01 and TS

Conclusion -> epsilon = 0.01 can be considered best for eps-greedies as it is increasing but pretty slow and the percentage Optimal Actions for it is Around 80% in later stages, on the other hand Thomsan Sampling shows a significant improvement in these results as it quickly explores and then exploit the optimal one with percentage goes upto almost 100 even very early!!.

In case you want to know more about TS visit this Reference.

Exploration-Exploitation Dilemma Solving Methods

Related tags

Overview

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

Owner

Aman Mishra

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murder rates etc.

PECOS - Prediction for Enormous and Correlated Spaces

PyTorch implementaton of our CVPR 2021 paper "Bridging the Visual Gap: Wide-Range Image Blending"

SysWhispers Shellcode Loader

PyTorch code for the ICCV'21 paper: "Always Be Dreaming: A New Approach for Class-Incremental Learning"

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

Code release for the paper “Worldsheet Wrapping the World in a 3D Sheet for View Synthesis from a Single Image”, ICCV 2021.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

A curated list of awesome Model-Based RL resources

Code implementation for the paper 'Conditional Gaussian PAC-Bayes'.

Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

PED: DETR for Crowd Pedestrian Detection

2021-MICCAI-Progressively Normalized Self-Attention Network for Video Polyp Segmentation

EMNLP 2021 Findings' paper, SCICAP: Generating Captions for Scientific Figures

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Code for visualizing the loss landscape of neural nets

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)