batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

SNIPS: Solving Noisy Inverse Problems Stochastically

a reimplementation of Optical Flow Estimation using a Spatial Pyramid Network in PyTorch

DANA paper supplementary materials

pytorchのスライス代入操作をonnxに変換する際にScatterNDならないようにするサンプル

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Laplacian Score-regularized Concrete Autoencoders

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Implementation of PyTorch-based multi-task pre-trained models

Source code for our CVPR 2019 paper - PPGNet: Learning Point-Pair Graph for Line Segment Detection

Final Project for the CS238: Decision Making Under Uncertainty course at Stanford University in Autumn '21.

The code of “Similarity Reasoning and Filtration for Image-Text Matching” [AAAI2021]

SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

In this project, we'll be making our own screen recorder in Python using some libraries.

Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Augmentation for Single-Image-Super-Resolution

Quantized tflite models for ailia TFLite Runtime