An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

Overview

gym-idsgame An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

gym-idsgame is a reinforcement learning environment for simulating attack and defense operations in an abstract network intrusion game. The environment extends the abstract model described in (Elderman et al. 2017). The model constitutes a two-player Markov game between an attacker agent and a defender agent that face each other in a simulated computer network. The reinforcement learning environment exposes an interface to a partially observed Markov decision process (POMDP) model of the Markov game. The interface can be used to train, simulate, and evaluate attack- and defend policies against each other. Moreover, the repository contains code to reproduce baseline results for various reinforcement learning algorithms, including:

  • Tabular Q-learning
  • Neural-fitted Q-learning using the DQN algorithm.
  • REINFORCE with baseline
  • Actor-Critic REINFORCE
  • PPO

Please use this bibtex if you make use of this code in your publications (paper: https://arxiv.org/abs/2009.08120):

@INPROCEEDINGS{Hamm2011:Finding,
AUTHOR="Kim Hammar and Rolf Stadler",
TITLE="Finding Effective Security Strategies through Reinforcement Learning and
{Self-Play}",
BOOKTITLE="International Conference on Network and Service Management (CNSM 2020)
(CNSM 2020)",
ADDRESS="Izmir, Turkey",
DAYS=1,
MONTH=nov,
YEAR=2020,
KEYWORDS="Network Security; Reinforcement Learning; Markov Security Games",
ABSTRACT="We present a method to automatically find security strategies for the use
case of intrusion prevention. Following this method, we model the
interaction between an attacker and a defender as a Markov game and let
attack and defense strategies evolve through reinforcement learning and
self-play without human intervention. Using a simple infrastructure
configuration, we demonstrate that effective security strategies can emerge
from self-play. This shows that self-play, which has been applied in other
domains with great success, can be effective in the context of network
security. Inspection of the converged policies show that the emerged
policies reflect common-sense knowledge and are similar to strategies of
humans. Moreover, we address known challenges of reinforcement learning in
this domain and present an approach that uses function approximation, an
opponent pool, and an autoregressive policy representation. Through
evaluations we show that our method is superior to two baseline methods but
that policy convergence in self-play remains a challenge."
}

Publications

Table of Contents

Design

Included Environments

A rich set of configurations of the Markov game are registered as openAI gym environments. The environments are specified and implemented in gym_idsgame/envs/idsgame_env.py see also gym_idsgame/__init__.py.

minimal_defense

This is an environment where the agent is supposed to play the attacker in the Markov game and the defender is following the defend_minimal baseline defense policy. The defend_minimal policy entails that the defender will always defend the attribute with the minimal value out of all of its neighbors.

Registered configurations:

  • idsgame-minimal_defense-v0
  • idsgame-minimal_defense-v1
  • idsgame-minimal_defense-v2
  • idsgame-minimal_defense-v3
  • idsgame-minimal_defense-v4
  • idsgame-minimal_defense-v5
  • idsgame-minimal_defense-v6
  • idsgame-minimal_defense-v7
  • idsgame-minimal_defense-v8
  • idsgame-minimal_defense-v9
  • idsgame-minimal_defense-v10
  • idsgame-minimal_defense-v11
  • idsgame-minimal_defense-v12
  • idsgame-minimal_defense-v13
  • idsgame-minimal_defense-v14
  • idsgame-minimal_defense-v15
  • idsgame-minimal_defense-v16
  • idsgame-minimal_defense-v17
  • idsgame-minimal_defense-v18
  • idsgame-minimal_defense-v19
  • idsgame-minimal_defense-v20

maximal_attack

This is an environment where the agent is supposed to play the defender and the attacker is following the attack_maximal baseline attack policy. The attack_maximal policy entails that the attacker will always attack the attribute with the maximum value out of all of its neighbors.

Registered configurations:

  • idsgame-maximal_attack-v0
  • idsgame-maximal_attack-v1
  • idsgame-maximal_attack-v2
  • idsgame-maximal_attack-v3
  • idsgame-maximal_attack-v4
  • idsgame-maximal_attack-v5
  • idsgame-maximal_attack-v6
  • idsgame-maximal_attack-v7
  • idsgame-maximal_attack-v8
  • idsgame-maximal_attack-v9
  • idsgame-maximal_attack-v10
  • idsgame-maximal_attack-v11
  • idsgame-maximal_attack-v12
  • idsgame-maximal_attack-v13
  • idsgame-maximal_attack-v14
  • idsgame-maximal_attack-v15
  • idsgame-maximal_attack-v16
  • idsgame-maximal_attack-v17
  • idsgame-maximal_attack-v18
  • idsgame-maximal_attack-v19
  • idsgame-maximal_attack-v20

random_attack

This is an environment where the agent is supposed to play as the defender and the attacker is following a random baseline attack policy.

Registered configurations:

  • idsgame-random_attack-v0
  • idsgame-random_attack-v1
  • idsgame-random_attack-v2
  • idsgame-random_attack-v3
  • idsgame-random_attack-v4
  • idsgame-random_attack-v5
  • idsgame-random_attack-v6
  • idsgame-random_attack-v7
  • idsgame-random_attack-v8
  • idsgame-random_attack-v9
  • idsgame-random_attack-v10
  • idsgame-random_attack-v11
  • idsgame-random_attack-v12
  • idsgame-random_attack-v13
  • idsgame-random_attack-v14
  • idsgame-random_attack-v15
  • idsgame-random_attack-v16
  • idsgame-random_attack-v17
  • idsgame-random_attack-v18
  • idsgame-random_attack-v19
  • idsgame-random_attack-v20

random_defense

An environment where the agent is supposed to play as the attacker and the defender is following a random baseline defense policy.

Registered configurations:

  • idsgame-random_defense-v0
  • idsgame-random_defense-v1
  • idsgame-random_defense-v2
  • idsgame-random_defense-v3
  • idsgame-random_defense-v4
  • idsgame-random_defense-v5
  • idsgame-random_defense-v6
  • idsgame-random_defense-v7
  • idsgame-random_defense-v8
  • idsgame-random_defense-v9
  • idsgame-random_defense-v10
  • idsgame-random_defense-v11
  • idsgame-random_defense-v12
  • idsgame-random_defense-v13
  • idsgame-random_defense-v14
  • idsgame-random_defense-v15
  • idsgame-random_defense-v16
  • idsgame-random_defense-v17
  • idsgame-random_defense-v18
  • idsgame-random_defense-v19
  • idsgame-random_defense-v20

two_agents

This is an environment where neither the attacker nor defender is part of the environment, i.e. it is intended for 2-agent simulations or RL training. In the experiments folder you can see examples of using this environment for training PPO-attacker vs PPO-defender, DQN-attacker vs REINFORCE-defender, etc..

Registered configurations:

  • idsgame-v0
  • idsgame-v1
  • idsgame-v2
  • idsgame-v3
  • idsgame-v4
  • idsgame-v5
  • idsgame-v6
  • idsgame-v7
  • idsgame-v8
  • idsgame-v9
  • idsgame-v10
  • idsgame-v11
  • idsgame-v12
  • idsgame-v13
  • idsgame-v14
  • idsgame-v15
  • idsgame-v16
  • idsgame-v17
  • idsgame-v18
  • idsgame-v19
  • idsgame-v20

Requirements

  • Python 3.5+
  • OpenAI Gym
  • NumPy
  • Pyglet (OpenGL 3D graphics)
  • GPU for 3D graphics acceleration (optional)
  • jsonpickle (for configuration files)
  • torch (for baseline algorithms)

Installation & Tests

# install from pip
pip install gym-idsgame==1.0.12
# local install from source
$ pip install -e gym-idsgame
# force upgrade deps
$ pip install -e gym-idsgame --upgrade

# git clone and install from source
git clone https://github.com/Limmen/gym-idsgame
cd gym-idsgame
pip3 install -e .

# run unit tests
pytest

# run it tests
cd experiments
make tests

Usage

The environment can be accessed like any other OpenAI environment with gym.make. Once the environment has been created, the API functions step(), reset(), render(), and close() can be used to train any RL algorithm of your preference.

import gym
from gym_idsgame.envs import IdsGameEnv
env_name = "idsgame-maximal_attack-v3"
env = gym.make(env_name)

The environment ships with implementation of several baseline algorithms, e.g. the tabular Q(0) algorithm, see the example code below.

import gym
from gym_idsgame.agents.training_agents.q_learning.q_agent_config import QAgentConfig
from gym_idsgame.agents.training_agents.q_learning.tabular_q_learning.tabular_q_agent import TabularQAgent
random_seed = 0
util.create_artefact_dirs(default_output_dir(), random_seed)
q_agent_config = QAgentConfig(gamma=0.999, alpha=0.0005, epsilon=1, render=False, eval_sleep=0.9,
                              min_epsilon=0.01, eval_episodes=100, train_log_frequency=100,
                              epsilon_decay=0.9999, video=True, eval_log_frequency=1,
                              video_fps=5, video_dir=default_output_dir() + "/results/videos/" + str(random_seed), num_episodes=20001,
                              eval_render=False, gifs=True, gif_dir=default_output_dir() + "/results/gifs/" + str(random_seed),
                              eval_frequency=1000, attacker=True, defender=False, video_frequency=101,
                              save_dir=default_output_dir() + "/results/data/" + str(random_seed))
env_name = "idsgame-minimal_defense-v2"
env = gym.make(env_name, save_dir=default_output_dir() + "/results/data/" + str(random_seed))
attacker_agent = TabularQAgent(env, q_agent_config)
attacker_agent.train()
train_result = attacker_agent.train_result
eval_result = attacker_agent.eval_result

Manual Play

You can run the environment in a mode of "manual control" as well:

from gym_idsgame.agents.manual_agents.manual_defense_agent import ManualDefenseAgent
random_seed = 0
env_name = "idsgame-random_attack-v2"
env = gym.make(env_name)
ManualDefenseAgent(env.idsgame_config)

Baseline Experiments

The experiments folder contains results, hyperparameters and code to reproduce reported results using this environment. For more information about each individual experiment, see this README.

Clean All Experiment Results

cd experiments # cd into experiments folder
make clean

Run All Experiment Results (Takes a long time)

cd experiments # cd into experiments folder
make all

Run All Experiments For a specific environment (Takes a long time)

cd experiments # cd into experiments folder
make v0

Run a specific experiment

cd experiments/training/v0/random_defense/tabular_q_learning/ # cd into the experiment folder
make run

Clean a specific experiment

cd experiments/training/v0/random_defense/tabular_q_learning/ # cd into the experiment folder
make clean

Start tensorboard for a specifc experiment

cd experiments/training/v0/random_defense/tabular_q_learning/ # cd into the experiment folder
make tensorboard

Fetch Baseline Experiment Results

By default when cloning the repo the experiment results are not included, to fetch the experiment results, install and setup git-lfs then run:

git lfs fetch --all
git lfs pull

Author & Maintainer

Kim Hammar [email protected]

Copyright and license

LICENSE

MIT

(C) 2020, Kim Hammar

Owner
Kim Hammar
PhD @KTH, ML, Distributed systems, security & stuff. Previously @logicalclocks, Allstate, Ericsson.
Kim Hammar
chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

Overlap-based 3D LiDAR Monte Carlo Localization This repo contains the code for our IROS2020 paper: Learning an Overlap-based Observation Model for 3D

Photogrammetry & Robotics Bonn 219 Dec 15, 2022
Image Super-Resolution by Neural Texture Transfer

SRNTT: Image Super-Resolution by Neural Texture Transfer Tensorflow implementation of the paper Image Super-Resolution by Neural Texture Transfer acce

Zhifei Zhang 413 Nov 30, 2022
Meta-meta-learning with evolution and plasticity

Evolve plastic networks to be able to automatically acquire novel cognitive (meta-learning) tasks

5 Jun 28, 2022
Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

HEP Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior Implementation Python3 PyTorch=1.0 NVIDIA GPU+CUDA Training process The

FengZhang 34 Dec 04, 2022
RoMA: Robust Model Adaptation for Offline Model-based Optimization

RoMA: Robust Model Adaptation for Offline Model-based Optimization Implementation of RoMA: Robust Model Adaptation for Offline Model-based Optimizatio

9 Oct 31, 2022
DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

DeepLab Introduction DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe. It combines densely-compute

Ali 234 Nov 14, 2022
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Bag of Tricks for Natural Policy Gradient Reinforcement Learning [ArXiv] Setup Python 3.8.0 pip install -r req.txt Mujoco 200 license Main Files main.

Brennan Gebotys 1 Oct 10, 2022
Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

Differentially private Imagenet training Code for the tech report Toward Training at ImageNet Scale with Differential Privacy by Alexey Kurakin, Steve

Google Research 29 Nov 03, 2022
official implemntation for "Contrastive Learning with Stronger Augmentations"

CLSA CLSA is a self-supervised learning methods which focused on the pattern learning from strong augmentations. Copyright (C) 2020 Xiao Wang, Guo-Jun

Lab for MAchine Perception and LEarning (MAPLE) 47 Nov 29, 2022
Reinforcement Learning via Supervised Learning

Reinforcement Learning via Supervised Learning Installation Run pip install -e . in an environment with Python = 3.7.0, 3.9. The code depends on MuJ

Scott Emmons 49 Nov 28, 2022
Scalable and Elastic Deep Reinforcement Learning Using PyTorch. Please star. 🔥

ElegantRL “小雅”: Scalable and Elastic Deep Reinforcement Learning ElegantRL is developed for researchers and practitioners with the following advantage

AI4Finance Foundation 2.5k Jan 05, 2023
TFOD-MASKRCNN - Tensorflow MaskRCNN With Python

Tensorflow- MaskRCNN Steps git clone https://github.com/amalaj7/TFOD-MASKRCNN.gi

Amal Ajay 2 Jan 18, 2022
Code and training data for our ECCV 2016 paper on Unsupervised Learning

Shuffle and Learn (Shuffle Tuple) Created by Ishan Misra Based on the ECCV 2016 Paper - "Shuffle and Learn: Unsupervised Learning using Temporal Order

Ishan Misra 44 Dec 08, 2021
Global Filter Networks for Image Classification

Global Filter Networks for Image Classification Created by Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie Zhou This repository contains PyTorch

Yongming Rao 273 Dec 26, 2022
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

Jian Zhang 84 Dec 09, 2022
Repository of Vision Transformer with Deformable Attention

Vision Transformer with Deformable Attention This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv]. Int

410 Jan 03, 2023
PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Anti-Backdoor Learning PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data. Check the unlearning effect

Yige-Li 51 Dec 07, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Troyanskaya Laboratory 323 Jan 01, 2023