This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Overview

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

This is the repository for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement". The repository is structured as the following:

  • PyPruning: This repository contains the implementations for all pruning algorithms and can be installed as a regular python package and used in other projects. For more information have a look at the Readme file in PyPruning/Readme.md and its documentation in PyPruning/docs.
  • experiment_runner: This is a simple package / script which can be used to run multiple experiments in parallel on the same machine or distributed across many different machines. It can also be installed as a regular python package and used for other projects. For more information have a look at the Readme file in experiment_runner/Readme.md.
  • {adult, bank, connect, ..., wine-quality}: Each folder contains an script init.sh which downloads the necessary files and performs pre-processing if necessary (e.g. extract archives etc.).
  • init_all.sh: Iterates over all datasets and calls the respective init.sh files. Depending on your internet connection this may take some time
  • environment.yml: Anaconda environment file which contains all dependencies. For more details see below
  • LeafRefinement.py: This is the implementation of the LeafRefinement method. We initially implemented a more complex method which uses Proximal Gradient Descent to simultaneously learn the weights and refine leaf nodes. During our experiments we discovered that leaf-refinement in iteself was enough and much simpler. We kept our old code, but implemented the LeafRefinement.py class for easier usage.
  • run.py: The script which executes the experiments. For more details see the examples below.
  • plot_results.py: The script is used explore and display results. It also creates the plots for the paper.

Getting everything ready

This git repository contains two submodules PyPruning and experiment_runner which need to be cloned first.

git clone --recurse-submodules [email protected]:sbuschjaeger/leaf-refinement-experiments.git

After the code has been obtained you need to install all dependencies. If you use Anaconda you can simply call

conda env create -f environment.yml

to prepare and activate the environment LR. After that you can install the python packages PyPruning and experiment_runner via pip:

pip install -e file:PyPruning
pip install -e file:experiment_runner

and finally activate the environment with

conda activate LR

Last you will need to get some data. If you are interested in a specific dataset you can use the accompanying init.sh script via

cd `${Dataset}`
./init.sh

or if you want to download all datasets use

./init_all.sh

Depending on your internet connection this may take some time.

Running experiments

If everything worked as expected you should now be able to run the run.py script to prune some ensembles. This script has a decent amount of parameters. See further below for an minimal working example.

  • n_jobs: Number of jobs / threads used for multiprocessing
  • base: Base learner used for experiments. Can be {RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier, HeterogenousForest}. Can be a list of arguments for multiple experiments.
  • nl: Maximum number of leaf nodes (corresponds to scikit-learns max_leaf_nodes parameter)
  • dataset: Dataset used for experiment. Can be a list of arguments for multiple experiments.
  • n_estimators: Number of estimators trained for the base learner.
  • n_prune: Size of the pruned ensemble. Can be a list of arguments for multiple experiments.
  • xval: Number of cross validation runs (default is 5)
  • use_prune: If set then the script uses a train / prune / test split. If not set then the training data is also used for pruning.
  • timeout: Maximum number of seconds per run. If the runtime exceeds the provided value, stop execution (default is 5400 seconds)

Note that all base ensembles for all cross validation splits of a dataset are trained before any of the pruning algorithms are used. If you want to evaluate many datasets / hyperparameter configuration in one run this requires a lot of memory.

To train and prune forests on the magic dataset you can for example do

./run.py --dataset adult -n_estimators 256 --n_prune 2 4 8 16 32 64 128 256 --nl 64 128 256 512 1024 --n_jobs 128 --xval 5 --base RandomForestClassifier

The results are stored in ${Dataset}/results/${base}/${use_prune}/${date}/results.jsonl where ${Dataset} is the dataset (e.g. magic) and ${date} is the current time and date.

In order to re-produce the experiments form the paper you can call:

./run.py --dataset adult anura bank chess connect eeg elec postures japanese-vowels magic mozilla mnist nomao avila ida2016 satimage --n_estimators 256 --n_prune 2 4 8 16 32 64 128 256 --nl 64 128 256 512 1024 --n_jobs 128 --xval 5 --base RandomForestClassifier

Important: This call uses 128 threads and requires a decent (something in the range of 64GB) amount of memory to work.

Exploring the results

After you run the experiments you can view the results with the plot_results.py script. We recommend to use an interactive Python environment for that such as Jupyter or VSCode with the ability to execute cells, but you should also be able to run this script as-is. This script is fairly well-commented, so please have a look at it for more detailed comments.

Audio2Face - Audio To Face With Python

Audio2Face Discription We create a project that transforms audio to blendshape w

FACEGOOD 724 Dec 26, 2022
[MICCAI'20] AlignShift: Bridging the Gap of Imaging Thickness in 3D Anisotropic Volumes

AlignShift NEW: Code for our new MICCAI'21 paper "Asymmetric 3D Context Fusion for Universal Lesion Detection" will also be pushed to this repository

Medical 3D Vision 42 Jan 06, 2023
“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

EfficientFace Zengqun Zhao, Qingshan Liu, Feng Zhou. "Robust Lightweight Facial Expression Recognition Network with Label Distribution Training". AAAI

Zengqun Zhao 119 Jan 08, 2023
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression Knowledge distillation for BERT model Installation Run command below to install the environm

Siqi 180 Dec 19, 2022
Make differentially private training of transformers easy for everyone

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

Xuechen Li 73 Dec 28, 2022
Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

Microsoft 148 Nov 20, 2022
Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Introduction Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021 Prerequisites Python 3.8 and conda, get Conda CUDA 11

51 Dec 03, 2022
Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows This is the official implementation of the ICCV 2021 Paper "Probabilistic Mono

62 Nov 23, 2022
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Shizhe Chen 7 Apr 20, 2021
Discovering and Achieving Goals via World Models

Discovering and Achieving Goals via World Models [Project Website] [Benchmark Code] [Video (2min)] [Oral Talk (13min)] [Paper] Russell Mendonca*1, Ole

Oleg Rybkin 71 Dec 22, 2022
Extreme Rotation Estimation using Dense Correlation Volumes

Extreme Rotation Estimation using Dense Correlation Volumes This repository contains a PyTorch implementation of the paper: Extreme Rotation Estimatio

Ruojin Cai 29 Nov 18, 2022
CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

Bubbliiiing 267 Dec 29, 2022
Create Own QR code with Python

Create-Own-QR-code Create Own QR code with Python SO guys in here, you have to install pyqrcode 2. open CMD and type python -m pip install pyqrcode

JehanKandy 10 Jul 13, 2022
Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Progressive Transformers for End-to-End Sign Language Production Source code for "Progressive Transformers for End-to-End Sign Language Production" (B

58 Dec 21, 2022
UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering

UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering This repository holds all the code and data for our recent work on

Mohamed El Banani 118 Dec 06, 2022
用opencv的dnn模块做yolov5目标检测,包含C++和Python两个版本的程序

yolov5-dnn-cpp-py yolov5s,yolov5l,yolov5m,yolov5x的onnx文件在百度云盘下载, 链接:https://pan.baidu.com/s/1d67LUlOoPFQy0MV39gpJiw 提取码:bayj python版本的主程序是main_yolov5.

365 Jan 04, 2023
Hierarchical Few-Shot Generative Models

Hierarchical Few-Shot Generative Models Giorgio Giannone, Ole Winther This repo contains code and experiments for the paper Hierarchical Few-Shot Gene

Giorgio Giannone 6 Dec 12, 2022
An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

Logo by Zhuoning Yuan LibAUC: A Machine Learning Library for AUC Optimization Website | Updates | Installation | Tutorial | Research | Github LibAUC a

Optimization for AI 176 Jan 07, 2023
A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perform basic tasks.

AI_Personal_Voice_Assistant_Using_Python A project to build an AI voice assistant using Python . The Voice assistant interacts with the humans to perf

Chumui Tripura 1 Oct 30, 2021