Code for the paper "There is no Double-Descent in Random Forests"

This repository contains the code to run the experiments for our paper called "There is no Double-Descent in Random Forest". In the paper we highlight experiments on the 5 different datasets (adult, bank, eeg, magic, nomao), but this implementation also supports more datasets out of the box. Most of the code should be somewhat commented and self-explanatory given the two caveats below. To run the experiments simply clone this repository

[email protected]:sbuschjaeger/rf-double-descent.git

(Optional) Build the conda environment and activate it:

conda env creat -f environment.yml --force
conda activate rfdd

Run experiments on the adult dataset with M = 256 trees over a 5 fold cross validation with different number of max_nodes with 96 threads:

./run.py -x 5 -M 256 --n_jobs 96 --max_nodes 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 -d adult

Important 1: This will run all experiments with 96 threads. The experiments are executed in a multiprocessing.Pool environment which means that the entire dataset is copied for each cross-validation run. Hence this may take a decent amount of memory (up to 200GB) and some time.

Important 2: The command-line argument n_jobs only determines the total number of threads in the processing pool, but not the total number of threads used by this script. We currently supply n_jobs = n_jobs_per_forest = None to scikit-learns RandomForestClassifier when fitting the (initial) RF. Hence, scikit-learn uses a heuristic to choose the number of jobs used for fitting the RF. If required, then you can set n_jobs_per_forest in the script manually (line 132).

Important 3: Datasets which are not found in the tempfolder (issued by tempfile.gettmpdir() which likely points to /tmp on Linux systems) are automatically downloaded. If you have already downloaded the datasets or you simply do not like the temp folder you can set this via --tmpdir ${your_new_tmp_dir}.

Plot the results on the adult dataset and store the them in the current folder:

./plot.py -d adult -o .

Alternativley, plot.py is also divided into execution cells which you can run via an inline interpreter (e.g. VSCode or a Juypter Notebook).

Code for the paper "There is no Double-Descent in Random Forests"

Related tags

Overview

Code for the paper "There is no Double-Descent in Random Forests"

Owner

EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients.

Official implementation of "Robust channel-wise illumination estimation"

Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

PyTorch implementation for our paper Learning Character-Agnostic Motion for Motion Retargeting in 2D, SIGGRAPH 2019

An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches

Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Data Augmentation with Variational Autoencoders

Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

A way to store images in YAML.

《Improving Unsupervised Image Clustering With Robust Learning》(2020)

Fast and Easy Infinite Neural Networks in Python

Code for our SIGCOMM'21 paper "Network Planning with Deep Reinforcement Learning".

Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.

Adds timm pretrained backbone to pytorch's FasterRcnn model

A task-agnostic vision-language architecture as a step towards General Purpose Vision

Individual Treatment Effect Estimation

Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Diffusion Probabilistic Models for 3D Point Cloud Generation (CVPR 2021)