Generalized Random Forests

Last update: Dec 25, 2022

Overview

generalized random forests

A pluggable package for forest-based statistical estimation and inference. GRF currently provides non-parametric methods for least-squares regression, quantile regression, survival regression, and treatment effect estimation (optionally using instrumental variables), with support for missing values.

In addition, GRF supports 'honest' estimation (where one subset of the data is used for choosing splits, and another for populating the leaves of the tree), and confidence intervals for least-squares regression and treatment effect estimation.

Some helpful links for getting started:

The R package documentation contains usage examples and method reference.
The GRF reference gives a detailed description of the GRF algorithm and includes troubleshooting suggestions.
For community questions and answers around usage, see Github issues labelled 'question'.

The repository first started as a fork of the ranger repository -- we owe a great deal of thanks to the ranger authors for their useful and free package.

Installation

The latest release of the package can be installed through CRAN:

install.packages("grf")

conda users can install from the conda-forge channel:

conda install -c conda-forge r-grf

The current development version can be installed from source using devtools.

devtools::install_github("grf-labs/grf", subdir = "r-package/grf")

Note that to install from source, a compiler that implements C++11 is required (clang 3.3 or higher, or g++ 4.8 or higher). If installing on Windows, the RTools toolchain is also required.

Usage Examples

The following script demonstrates how to use GRF for heterogeneous treatment effect estimation. For examples of how to use types of forest, as for quantile regression and causal effect estimation using instrumental variables, please consult the R documentation on the relevant forest methods (quantile_forest, instrumental_forest, etc.).

library(grf)

# Generate data.
n <- 2000
p <- 10
X <- matrix(rnorm(n * p), n, p)
X.test <- matrix(0, 101, p)
X.test[, 1] <- seq(-2, 2, length.out = 101)

# Train a causal forest.
W <- rbinom(n, 1, 0.4 + 0.2 * (X[, 1] > 0))
Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n)
tau.forest <- causal_forest(X, Y, W)

# Estimate treatment effects for the training data using out-of-bag prediction.
tau.hat.oob <- predict(tau.forest)
hist(tau.hat.oob$predictions)

# Estimate treatment effects for the test sample.
tau.hat <- predict(tau.forest, X.test)
plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2), xlab = "x", ylab = "tau", type = "l")
lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 2)

# Estimate the conditional average treatment effect on the full sample (CATE).
average_treatment_effect(tau.forest, target.sample = "all")

# Estimate the conditional average treatment effect on the treated sample (CATT).
average_treatment_effect(tau.forest, target.sample = "treated")

# Add confidence intervals for heterogeneous treatment effects; growing more trees is now recommended.
tau.forest <- causal_forest(X, Y, W, num.trees = 4000)
tau.hat <- predict(tau.forest, X.test, estimate.variance = TRUE)
sigma.hat <- sqrt(tau.hat$variance.estimates)
plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions + 1.96 * sigma.hat, tau.hat$predictions - 1.96 * sigma.hat, 0, 2), xlab = "x", ylab = "tau", type = "l")
lines(X.test[, 1], tau.hat$predictions + 1.96 * sigma.hat, col = 1, lty = 2)
lines(X.test[, 1], tau.hat$predictions - 1.96 * sigma.hat, col = 1, lty = 2)
lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 1)

# In some examples, pre-fitting models for Y and W separately may
# be helpful (e.g., if different models use different covariates).
# In some applications, one may even want to get Y.hat and W.hat
# using a completely different method (e.g., boosting).

# Generate new data.
n <- 4000
p <- 20
X <- matrix(rnorm(n * p), n, p)
TAU <- 1 / (1 + exp(-X[, 3]))
W <- rbinom(n, 1, 1 / (1 + exp(-X[, 1] - X[, 2])))
Y <- pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)

forest.W <- regression_forest(X, W, tune.parameters = "all")
W.hat <- predict(forest.W)$predictions

forest.Y <- regression_forest(X, Y, tune.parameters = "all")
Y.hat <- predict(forest.Y)$predictions

forest.Y.varimp <- variable_importance(forest.Y)

# Note: Forests may have a hard time when trained on very few variables
# (e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive
# in selection.
selected.vars <- which(forest.Y.varimp / mean(forest.Y.varimp) > 0.2)

tau.forest <- causal_forest(X[, selected.vars], Y, W,
                            W.hat = W.hat, Y.hat = Y.hat,
                            tune.parameters = "all")

# Check whether causal forest predictions are well calibrated.
test_calibration(tau.forest)

Developing

In addition to providing out-of-the-box forests for quantile regression and causal effect estimation, GRF provides a framework for creating forests tailored to new statistical tasks. If you'd like to develop using GRF, please consult the algorithm reference and development guide.

Funding

Development of GRF is supported by the National Science Foundation, the Sloan Foundation, the Office of Naval Research (Grant N00014-17-1-2131) and Schmidt Futures.

References

Susan Athey and Stefan Wager. Estimating Treatment Effects with Causal Forests: An Application. Observational Studies, 5, 2019. [paper, arxiv]

Susan Athey, Julie Tibshirani and Stefan Wager. Generalized Random Forests. Annals of Statistics, 47(2), 2019. [paper, arxiv]

Yifan Cui, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests. 2020. [arxiv]

Rina Friedberg, Julie Tibshirani, Susan Athey, and Stefan Wager. Local Linear Forests. Journal of Computational and Graphical Statistics, 2020. [paper, arxiv]

Imke Mayer, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager and Julie Josse. Doubly Robust Treatment Effect Estimation with Missing Attributes. Annals of Applied Statistics, 14(3) 2020. [paper, arxiv]

Stefan Wager and Susan Athey. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 2018. [paper, arxiv]

Generalized Random Forests

Related tags

Overview

generalized random forests

Installation

Usage Examples

Developing

Funding

References

Owner

GRF Labs

Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionaries

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

High level network definitions with pre-trained weights in TensorFlow

OpenGAN: Open-Set Recognition via Open Data Generation

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

moving object detection for satellite videos.

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Realtime YOLO Monster Detection With Non Maximum Supression

Self-training for Few-shot Transfer Across Extreme Task Differences

A modification of Daniel Russell's notebook merged with Katherine Crowson's hq-skip-net changes

Generalized Random Forests

Related tags

Overview

generalized random forests

Installation

Usage Examples

Developing

Funding

References

Owner

GRF Labs

Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

Framework for Spectral Clustering on the Sparse Coefficients of Learned Dictionaries

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

This game was designed to encourage young people not to gamble on lotteries, as the probablity of correctly guessing the number is infinitesimal!

High level network definitions with pre-trained weights in TensorFlow

OpenGAN: Open-Set Recognition via Open Data Generation

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images

moving object detection for satellite videos.

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Realtime YOLO Monster Detection With Non Maximum Supression

Self-training for Few-shot Transfer Across Extreme Task Differences

A modification of Daniel Russell's notebook merged with Katherine Crowson's hq-skip-net changes

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch