CorrProxies - Optimizing Machine Learning Inference Queries with Correlative Proxy Models

Overview

CorrProxies

Declaration

This repo is for paper: Optimizing Machine Learning Inference Queries with Correlative Proxy Models.

Setup ENV

Quick Start

  1. We provide a fully ready Docker Image ready to use out-of-box.
  2. Optionally, you can also follow the steps to build your own testing environment.

The Provided Docker Environment

Steps to run the Docker Environment

  • Get the docker image from this link.
  • Load the docker image. docker load -i corrproxies-image.tar
  • Run the docker image in a container. docker run --name=CorrProxies -i -t -d corrproxies-image
    • it will return you the docker container ID, for example d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6.
  • Use CLI to control the container with the specific ID generated. docker exec -it d979af9a17f23345cb2894b22dc8527680acdfd7a7e1aaed6a7a28ea134e66e6 /bin/zsh

ENV Spec

File structure:

  • The home directory for CorrProxies locates at /home/CorrProxies.
  • The Python executable locates at /home/anaconda3/envs/condaenv/bin/python3.
  • The models locate at /home/CorrProxies/model.
  • The datasets locate at /home/CorrProxies/data.
  • The starting scripts locate at /home/CorrProxies/scripts.

Build Your Own Environment

This instruction is based on a clean distribution of [email protected]

  1. Install pre-requisites.

    apt-get update && apt-get install -y build-essential

  2. Install Anaconda.

    • wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh && bash Anaconda3-5.3.1-Linux-x86_64.sh -b -p
    • export PATH=" /bin/:$PATH"
  3. Install [email protected] with Anaconda3.

    conda create -n condaenv python=3.6.6

  4. Activate the newly installed Python ENV.

    conda activate condaenv

  5. Install dependencies with pip.

    pip3 install -r requirements.txt

  6. Install Java (openjdk-8) (for standford-nlp usage).

    apt-get install -y openjdk-8-jdk

Queries & Datasets

  • We use Twitter text dataset, COCO image dataset and UCF101 video dataset as our benchmark datasets. Please see this page for examples of detailed Queries and Datasets examples we use in our experiments.

  • After you setup the environment, either manually or using the docker image provided by us, the next step is to download the datasets.

    • To get the COCO dataset: cd /home/CorrProxies/data/image/coco && ./get_coco_dataset.sh
    • To get the UCF101 dataset: cd /home/CorrProxies/data/video/ucf101 && wget -c https://www.crcv.ucf.edu/data/UCF101/UCF101.rar && unrar x UCF101.rar.

Execution

Please pull the latest code before executing the code. Command cd /home/CorrProxies && git pull

Run Operators Individually

To run and see each operator we used in our experiment, simply execute python3 . For example: python3 operators/ml_operators/image_video_operators/video_activity_recognition.py.

Run Experiments

We use scripts/run.sh to start experiments. The script will take in command line arguments.

  • Text(Twitter)

    • Since we do not provide text dataset, we will skip the experiment.
  • Image(COCO)

    Example: ./scripts/run.sh -w 2 -t 1 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • Video(UCF101)

    Example: ./scripts/run.sh -w 2 -t 2 -i '1' -a 0.9 -s 3 -o 2 -e 1

  • arguments detail.

    • w int: experiment type in [1, 2, 3, 4] referring to /home/CorrProxies/ml_workflow/exps/WorkflowExp*.py;
    • t int: query type in [0, 1, 2]. Int 0, 1, 2 means queries on the Twitter, COCO, and UCF101 datasets, respectively;
    • i int: query index in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    • a float: query accuracy;
    • s int: scheme in [0, 1, 2, 3, 4, 5, 6]. Int 0, 1, 2, 3, 4, 5, 6 means 'ORIG', 'NS', 'PP', 'CORE', 'COREa', 'COREh' and 'REORDER' schemes, respectively;
    • o int: number of threads used in optimization phase;
    • e int: number of threads used in execution phase after generating an optimized plan.
Owner
ZhihuiYangCS
ZhihuiYangCS
A toolbox to iNNvestigate neural networks' predictions!

iNNvestigate neural networks! Table of contents Introduction Installation Usage and Examples More documentation Contributing Releases Introduction In

Maximilian Alber 1.1k Jan 05, 2023
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Facebook 15.4k Jan 07, 2023
Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

Ravi Chaubey 6 Oct 20, 2022
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

NeuroDSP 224 Dec 02, 2022
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application

Intel(R) Extension for Scikit-learn* Installation | Documentation | Examples | Support | FAQ With Intel(R) Extension for Scikit-learn you can accelera

Intel Corporation 858 Dec 25, 2022
Open MLOps - A Production-focused Open-Source Machine Learning Framework

Open MLOps - A Production-focused Open-Source Machine Learning Framework Open MLOps is a set of open-source tools carefully chosen to ease user experi

Data Revenue 590 Dec 28, 2022
Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Tools Автоматический установщик хакерских утилит (ТОЛЬКО ДЛЯ ТЕРМУКС!) Оригиналь

14 Jan 05, 2023
A complete guide to start and improve in machine learning (ML)

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art

Louis-François Bouchard 3.3k Jan 04, 2023
Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn.

Repository Status for Scikit-learn Live webpage Auto updating website that tracks closed & open issues/PRs on scikit-learn/scikit-learn. Running local

Thomas J. Fan 6 Dec 27, 2022
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality wit

Soledad Galli 33 Dec 27, 2022
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

1 Dec 22, 2021
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

57 Dec 21, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

Jared M. Smith 40 Aug 26, 2022
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 06, 2023
A Python implementation of the Robotics Toolbox for MATLAB

Robotics Toolbox for Python A Python implementation of the Robotics Toolbox for MATLAB® GitHub repository Documentation Wiki (examples and details) Sy

Peter Corke 1.2k Jan 07, 2023
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 05, 2022