An introduction of Markov decision process (MDP) and two algorithms that solve MDPs (value iteration, policy iteration) along with their Python implementations.

Overview

Markov Decision Process

A Markov decision process (MDP), by definition, is a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards. It consists of a set of states, a set of actions, a transition model, and a reward function. Here's an example.

This is a simple 4 x 3 environment, and each block represents a state. The agent can move left, right, up, or down from a state. The "intended" outcome occurs with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction. A collision with the wall or boundary results in no movement. The two terminal states have reward +1 and -1, respectively, and all other states have a constant reward (e.g. of -0.01).

Also, a MDP usually has a discount factor γ , a number between 0 and 1, that describes the preference of an agent for current rewards over future rewards.

Policy

A solution to a MDP is called a policy π(s). It specifies an action for each state s. In a MDP, we aim to find the optimal policy that yields the highest expected utility. Here's an example of a policy.

Value Iteration

Value iteration is an algorithm that gives an optimal policy for a MDP. It calculates the utility of each state, which is defined as the expected sum of discounted rewards from that state onward.

This is called the Bellman equation. For example, the utility of the state (1, 1) in the MDP example shown above is:

For n states, there are n Bellman equations with n unknowns (the utilities of states). To solve this system of equations, value iteration uses an iterative approach that repeatedly updates the utility of each state (starting from zero) until an equilibrium is reached (converge). The iteration step, called a Bellman update, looks like this:

Here's the pseudocode for calculating the utilities of states.

Then, after the utilities of states are calculated, we can use them to select an optimal action for each state.

valueIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

(Modified) Policy Iteration

Policy iteration is another algorithm that solves MDPs. It starts with a random policy and alternates the following two steps until the policy improvement step yields no change:

(1) Policy evaluation: given a policy, calculate the utility U(s) of each state s if the policy is executed;

(2) Policy improvement: update the policy based on U(s).

For the policy evaluation step, we use a simplified version of the Bellman equation to calculate the utility of each state.

For n states, there are n linear equations with n unknowns (the utilities of states), which can be solved in O(n^3) time. To make the algorithm more efficient, we can perform some number of simplified Bellman updates (simplified because the policy is fixed) to get an approximation of the utilities instead of calculating the exact solutions.

Here's the pseudocode for policy iteration.

policyIteration.py contains the Python implementation of this algorithm for the MDP example shown above.

Reference

Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach (3rd ed.).

Owner
Yu Shen
UCLA '24
Yu Shen
API-key based security utilities for FastAPI, focused on simplicity of use

FastAPI simple security API key based security package for FastAPI, focused on simplicity of use: Full functionality out of the box, no configuration

Tolki 154 Jan 03, 2023
User-related REST API based on the awesome Django REST Framework

Django REST Registration User registration REST API, based on Django REST Framework. Documentation Full documentation for the project is available at

Andrzej Pragacz 399 Jan 03, 2023
A Python inplementation for OAuth2

OAuth2-Python Discord Inplementation for OAuth2 login systems. This is a simple Python 'app' made to inplement in your programs that require (shitty)

Prifixy 0 Jan 06, 2022
Plotly Dash plugin to allow authentication through 3rd party OAuth providers.

dash-auth-external Integrate your dashboards with 3rd parties and external OAuth providers. Overview Do you want to build a Plotly Dash app which pull

James Holcombe 15 Dec 11, 2022
Boilerplate/Starter Project for building RESTful APIs using Flask, SQLite, JWT authentication.

auth-phyton Boilerplate/Starter Project for building RESTful APIs using Flask, SQLite, JWT authentication. Setup Step #1 - Install dependencies $ pip

sandhika 0 Aug 03, 2022
JWT authentication for Pyramid

JWT authentication for Pyramid This package implements an authentication policy for Pyramid that using JSON Web Tokens. This standard (RFC 7519) is of

Wichert Akkerman 73 Dec 03, 2021
A simple Boilerplate to Setup Authentication using Django-allauth 🚀

A simple Boilerplate to Setup Authentication using Django-allauth, with a custom template for login and registration using django-crispy-forms.

Yasser Tahiri 13 May 13, 2022
Includes Automation and Personal Projects

Python Models, and Connect Forclient & OpenCv projects Completed Automation** Alarm (S

tushar malhan 1 Jan 15, 2022
FastAPI-Login tries to provide similar functionality as Flask-Login does.

FastAPI-Login FastAPI-Login tries to provide similar functionality as Flask-Login does. Installation $ pip install fastapi-login Usage To begin we hav

417 Jan 07, 2023
FastAPI extension that provides JWT Auth support (secure, easy to use, and lightweight)

FastAPI JWT Auth Documentation: https://indominusbyte.github.io/fastapi-jwt-auth Source Code: https://github.com/IndominusByte/fastapi-jwt-auth Featur

Nyoman Pradipta Dewantara 468 Jan 01, 2023
A JSON Web Token authentication plugin for the Django REST Framework.

Simple JWT Abstract Simple JWT is a JSON Web Token authentication plugin for the Django REST Framework. For full documentation, visit django-rest-fram

Simple JWT 3.3k Jan 01, 2023
Corsair_scan is a security tool to test Cross-Origin Resource Sharing (CORS).

Welcome to Corsair_scan Corsair_scan is a security tool to test Cross-Origin Resource Sharing (CORS) misconfigurations. CORS is a mechanism that allow

Santander Security Research 116 Nov 09, 2022
Accounts for Django made beautifully simple

Django Userena Userena is a Django application that supplies your Django project with full account management. It's a fully customizable application t

Bread & Pepper 1.3k Sep 18, 2022
Simple two factor authemtication system, made by me.

Simple two factor authemtication system, made by me. Honestly, i don't even know How 2FAs work I just used my knowledge and did whatever i could. Send

Refined 5 Jan 04, 2022
Django Authetication with Twitch.

Django Twitch Auth Dependencies Install requests if not installed pip install requests Installation Install using pip pip install django_twitch_auth A

Leandro Lopes Bueno 1 Jan 02, 2022
Extending the Django authentication system with a phone verification step.

Extending the Django authentication system with a phone verification step.

Miguel Grinberg 50 Dec 04, 2022
Login-python - Login system made in Python, using native libraries

login-python Sistema de login feito 100% em Python, utilizando bibliotecas nativ

Nicholas Gabriel De Matos Leal 2 Jan 28, 2022
Skit-auth - Authorization for skit.ai's platform

skit-auth This is a simple authentication library for Skit's platform. Provides

Skit 3 Jan 08, 2022
Simple Login - Login Extension for Flask - maintainer @cuducos

Login Extension for Flask The simplest way to add login to flask! Top Contributors Add yourself, send a PR! How it works First install it from PyPI. p

Flask Extensions 181 Jan 01, 2023
This Python based program checks your CC Stripe Auth 1$ Based Checker

CC-Checker This Python based program checks your CC Stripe Auth 1$ Based Checker About Author Coded by xBlackx Reach Me On Telegram @xBlackx_Coder jOI

xBlackxCoder 11 Nov 20, 2022