SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Overview

Datasets | Website | Raw Data | OpenReview

SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Christopher Yeh, Chenlin Meng, Sherrie Wang, Anne Driscoll, Erik Rozi, Patrick Liu, Jihyeon Lee, Marshall Burke, David B. Lobell, Stefano Ermon

California Institute of Technology, Stanford University, and UC Berkeley

SustainBench is a collection of 15 benchmark tasks across 7 SDGs, including tasks related to economic development, agriculture, health, education, water and sanitation, climate action, and life on land. Datasets for 11 of the 15 tasks are released publicly for the first time. Our goals for SustainBench are to

  1. lower the barriers to entry for the machine learning community to contribute to measuring and achieving the SDGs;
  2. provide standard benchmarks for evaluating machine learning models on tasks across a variety of SDGs; and
  3. encourage the development of novel machine learning methods where improved model performance facilitates progress towards the SDGs.

Table of Contents

Overview

SustainBench provides datasets and standardized benchmarks for 15 SDG-related tasks, listed below. Details for each dataset and task can be found in our paper and on our website. The raw data can be downloaded from Google Drive and is released under a CC-BY-SA 4.0 license.

  • SDG 1: No Poverty
    • Task 1A: Predicting poverty over space
    • Task 1B: Predicting change in poverty over time
  • SDG 2: Zero Hunger
  • SDG 3: Good Health and Well-being
  • SDG 4: Quality Education
    • Task 4A: Women educational attainment
  • SDG 6: Clean Water and Sanitation
  • SDG 13: Climate Action
  • SDG 15: Life on Land
    • Task 15A: Feature learning for land cover classification
    • Task 15B: Out-of-domain land cover classification

Dataloaders

For each dataset, we provide Python dataloaders that load the data as PyTorch tensors. Please see the sustainbench folder as well as our website for detailed documentation.

Running Baseline Models

We provide baseline models for many of the benchmark tasks included in SustainBench. See the baseline_models folder for the code and detailed instructions to reproduce our results.

Dataset Preprocessing

11 of the 15 SustainBench benchmark tasks involve data that is being publicly released for the first time. We release the processed versions of our datasets on Google Drive. However, we also provide code and detailed instructions for how we preprocessed the datasets in the dataset_preprocessing folder. You do NOT need anything from the dataset_preprocessing folder for downloading the processed datasets or running our baseline models.

Computing Requirements

This code was tested on a system with the following specifications:

  • operating system: Ubuntu 16.04.7 LTS
  • CPU: Intel(R) Xeon(R) CPU E5-2620 v4
  • memory (RAM): 125 GB
  • disk storage: 5 TB
  • GPU: NVIDIA P100 GPU

The main software requirements are Python 3.7 with TensorFlow r1.15, PyTorch 1.9, and R 4.1. The complete list of required packages and library are listed in the two conda environment YAML files (env_create.yml and env_bench.yml), which are meant to be used with conda (version 4.10). See here for instructions on installing conda via Miniconda. Once conda is installed, run one of the following commands to set up the desired conda environment:

conda env update -f env_create.yml --prune
conda env update -f env_bench.yml --prune

The conda environment files default to CPU-only packages. If you have a GPU, please comment/uncomment the appropriate lines in the environment files; you may need to also install CUDA 10 or 11 and cuDNN 7.

Code Formatting and Type Checking

This repo uses flake8 for Python linting and mypy for type-checking. Configuration files for each are included in this repo: .flake8 and mypy.ini.

To run either code linting or type checking, set the current directory to the repo root directory. Then run any of the following commands:

# LINTING
# =======

# entire repo
flake8

# all modules within utils directory
flake8 utils

# a single module
flake8 path/to/module.py

# a jupyter notebook - ignore these error codes, in addition to the ignored codes in .flake8:
# - E305: expected 2 blank lines after class or function definition
# - E402: Module level import not at top of file
# - F404: from __future__ imports must occur at the beginning of the file
# - W391: Blank line at end of file
jupyter nbconvert path/to/notebook.ipynb --stdout --to script | flake8 - --extend-ignore=E305,E402,F404,W391


# TYPE CHECKING
# =============

# entire repo
mypy .

# all modules within utils directory
mypy -p utils

# a single module
mypy path/to/module.py

# a jupyter notebook
mypy -c "$(jupyter nbconvert path/to/notebook.ipynb --stdout --to script)"

Citation

Please cite this article as follows, or use the BibTeX entry below.

C. Yeh, C. Meng, S. Wang, A. Driscoll, E. Rozi, P. Liu, J. Lee, M. Burke, D. B. Lobell, and S. Ermon, "SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning," in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), Dec. 2021. [Online]. Available: https://openreview.net/forum?id=5HR3vCylqD.

@inproceedings{
    yeh2021sustainbench,
    title = {{SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning}},
    author = {Christopher Yeh and Chenlin Meng and Sherrie Wang and Anne Driscoll and Erik Rozi and Patrick Liu and Jihyeon Lee and Marshall Burke and David B. Lobell and Stefano Ermon},
    booktitle = {Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
    year = {2021},
    month = {12},
    url = {https://openreview.net/forum?id=5HR3vCylqD}
}
Contains supplementary materials for reproduce results in HMC divergence time estimation manuscript

Scalable Bayesian divergence time estimation with ratio transformations This repository contains the instructions and files to reproduce the analyses

Suchard Research Group 1 Sep 21, 2022
StrongSORT: Make DeepSORT Great Again

StrongSORT StrongSORT: Make DeepSORT Great Again StrongSORT: Make DeepSORT Great Again Yunhao Du, Yang Song, Bo Yang, Yanyun Zhao arxiv 2202.13514 Abs

369 Jan 04, 2023
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy

Creating a Linear Program Solver by Implementing the Simplex Method in Python with NumPy Simplex Algorithm is a popular algorithm for linear programmi

Reda BELHAJ 2 Oct 12, 2022
Arabic Car License Recognition. A solution to the kaggle competition Machathon 3.0.

Transformers Arabic licence plate recognition 🚗 Solution to the kaggle competition Machathon 3.0. Ranked in the top 6️⃣ at the final evaluation phase

Noran Hany 17 Dec 04, 2022
Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

this is a simple artificial neural network model using deep learning and torch-audio to classify cats and dog sounds.

crispengari 3 Dec 05, 2022
Locally Constrained Self-Attentive Sequential Recommendation

LOCKER This is the pytorch implementation of this paper: Locally Constrained Self-Attentive Sequential Recommendation. Zhankui He, Handong Zhao, Zhe L

Zhankui (Aaron) He 8 Jul 30, 2022
Multispectral Object Detection with Yolov5

Multispectral-Object-Detection Intro Official Code for Cross-Modality Fusion Transformer for Multispectral Object Detection. Multispectral Object Dete

Richard Fang 121 Jan 01, 2023
Multi-label classification of retinal disorders

Multi-label classification of retinal disorders This is a deep learning course project. The goal is to develop a solution, using computer vision techn

Sundeep Bhimireddy 1 Jan 29, 2022
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

Phil Wang 71 Dec 01, 2022
PyTorch and Tensorflow functional model definitions

functional-zoo Model definitions and pretrained weights for PyTorch and Tensorflow PyTorch, unlike lua torch, has autograd in it's core, so using modu

Sergey Zagoruyko 590 Dec 22, 2022
Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

Yunzhong Hou 80 Dec 25, 2022
Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

PonderNet-TensorFlow This is an Unofficial Implementation of the paper: PonderNet: Learning to Ponder in TensorFlow. Official PyTorch Implementation:

1 Oct 23, 2022
PFFDTD is an open-source FDTD simulator for 3D room acoustics

PFFDTD is an open-source FDTD simulator for 3D room acoustics

Brian Hamilton 34 Nov 24, 2022
Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive losses

Self-supervised learning Self-supervised learning algorithms provide a way to train Deep Neural Networks in an unsupervised way using contrastive loss

Arijit Das 2 Mar 26, 2022
A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Adversarial Video Generation This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Sc

Matt Cooper 704 Nov 26, 2022
Official Pytorch implementation of "Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral)"

Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral): Official Project Webpage This repository provides the off

Kakao Enterprise Corp. 68 Dec 17, 2022
[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

SelectionGAN for Guided Image-to-Image Translation CVPR Paper | Extended Paper | Guided-I2I-Translation-Papers Citation If you use this code for your

Hao Tang 424 Dec 02, 2022
Source-to-Source Debuggable Derivatives in Pure Python

Tangent Tangent is a new, free, and open-source Python library for automatic differentiation. Existing libraries implement automatic differentiation b

Google 2.2k Jan 01, 2023
Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

Vincent Sitzmann 130 Dec 29, 2022