A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Last update: Dec 30, 2022

Related tags

Overview

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

The WILDS package contains:

Data loaders that automatically handle data downloading, processing, and splitting, and
Dataset evaluators that standardize model evaluation for each dataset.

In addition, the example scripts contain default models, allowing new algorithms to be easily added and run on all of the WILDS datasets.

For more information, please read our paper or visit our website. For questions and feedback, please post on the discussion board.

Installation

We recommend using pip to install WILDS:

pip install wilds

If you have already installed it, please check that you have the latest version:

python -c "import wilds; print(wilds.__version__)"
# This should print "1.0.0". If it doesn't, update by running:
pip install -U wilds

If you plan to edit or contribute to WILDS, you should install from source:

git clone [email protected]:p-lambda/wilds.git
cd wilds
pip install -e .

Requirements

numpy>=1.19.1
pandas>=1.1.0
pillow>=7.2.0
torch>=1.7.0
tqdm>=4.53.0
pytz>=2020.4
outdated>=0.2.0
ogb>=1.2.3
torch-scatter>=2.0.5
torch-geometric>=1.6.1

Running pip install wilds will check for all of these requirements except for the torch-scatter and torch-geometric packages, which require a quick manual install.

Default models

After installing the WILDS package, you can use the scripts in examples/ to train default models on the WILDS datasets. These scripts are not part of the installed WILDS package. To use them, you should clone the repo (assuming you did not install from source):

git clone [email protected]:p-lambda/wilds.git

To run these scripts, you will need to install these additional dependencies:

torchvision>=0.8.1
transformers>=3.5.0

All baseline experiments in the paper were run on Python 3.8.5 and CUDA 10.1.

Usage

Default models

In the examples/ folder, we provide a set of scripts that we used to train models on the WILDS package. These scripts are configured with the default models and hyperparameters that we used for all of the baselines described in our paper. All baseline results in the paper can be easily replicated with commands like:

cd examples
python run_expt.py --dataset iwildcam --algorithm ERM --root_dir data
python run_expt.py --dataset civilcomments --algorithm groupDRO --root_dir data

The scripts are set up to facilitate general-purpose algorithm development: new algorithms can be added to examples/algorithms and then run on all of the WILDS datasets using the default models.

The first time you run these scripts, you might need to download the datasets. You can do so with the --download argument, for example:

python run_expt.py --dataset civilcomments --algorithm groupDRO --root_dir data --download

Data loading

The WILDS package provides a simple, standardized interface for all datasets in the benchmark. This short Python snippet covers all of the steps of getting started with a WILDS dataset, including dataset download and initialization, accessing various splits, and preparing a user-customizable data loader.

>>> from wilds.datasets.iwildcam_dataset import IWildCamDataset
>>> from wilds.common.data_loaders import get_train_loader
>>> import torchvision.transforms as transforms

# Load the full dataset, and download it if necessary
>>> dataset = IWildCamDataset(download=True)

# Get the training set
>>> train_data = dataset.get_subset('train',
...                                 transform=transforms.Compose([transforms.Resize((224,224)),
...                                                               transforms.ToTensor()]))

# Prepare the standard data loader
>>> train_loader = get_train_loader('standard', train_data, batch_size=16)

# Train loop
>>> for x, y_true, metadata in train_loader:
...   ...

The metadata contains information like the domain identity, e.g., which camera a photo was taken from, or which hospital the patient's data came from, etc.

Domain information

To allow algorithms to leverage domain annotations as well as other groupings over the available metadata, the WILDS package provides Grouper objects. These Grouper objects extract group annotations from metadata, allowing users to specify the grouping scheme in a flexible fashion.

>>> from wilds.common.grouper import CombinatorialGrouper

# Initialize grouper, which extracts domain information
# In this example, we form domains based on location
>>> grouper = CombinatorialGrouper(dataset, ['location'])

# Train loop
>>> for x, y_true, metadata in train_loader:
...   z = grouper.metadata_to_group(metadata)
...   ...

The Grouper can be used to prepare a group-aware data loader that, for each minibatch, first samples a specified number of groups, then samples examples from those groups. This allows our data loaders to accommodate a wide array of training algorithms, some of which require specific data loading schemes.

# Prepare a group data loader that samples from user-specified groups
>>> train_loader = get_train_loader('group', train_data,
...                                 grouper=grouper,
...                                 n_groups_per_batch=2,
...                                 batch_size=16)

Evaluators

The WILDS package standardizes and automates evaluation for each dataset. Invoking the eval method of each dataset yields all metrics reported in the paper and on the leaderboard.

>>> from wilds.common.data_loaders import get_eval_loader

# Get the test set
>>> test_data = dataset.get_subset('test',
...                                 transform=transforms.Compose([transforms.Resize((224,224)),
...                                                               transforms.ToTensor()]))

# Prepare the data loader
>>> test_loader = get_eval_loader('standard', test_data, batch_size=16)

# Get predictions for the full test set
>>> for x, y_true, metadata in test_loader:
...   y_pred = model(x)
...   [accumulate y_true, y_pred, metadata]

# Evaluate
>>> dataset.eval(all_y_pred, all_y_true, all_metadata)
{'recall_macro_all': 0.66, ...}

Citing WILDS

If you use WILDS datasets in your work, please cite our paper (Bibtex):

WILDS: A Benchmark of in-the-Wild Distribution Shifts (2020). Pang Wei Koh*, Shiori Sagawa*, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang.

Please also cite the original papers that introduce the datasets, as listed on the datasets page.

Acknowledgements

The design of the WILDS benchmark was inspired by the Open Graph Benchmark, and we are grateful to the Open Graph Benchmark team for their advice and help in setting up WILDS.

Comments

Installating via pip seems to miss `torch_scatter` dependency
Hey,

I noticed that the installation via pip install wilds seems to miss the torch_scatter dependency that is also listed in the README. When e.g. trying to do from wilds.datasets.amazon_dataset import AmazonDataset I got

from wilds.datasets.amazon_dataset import AmazonDataset File "/Users/deul/Desktop/wilds/wilds/datasets/amazon_dataset.py", line 6, in <module> from wilds.common.utils import map_to_id_array File "/Users/deul/Desktop/wilds/wilds/common/utils.py", line 1, in <module> import torch, torch_scatter ModuleNotFoundError: No module named 'torch_scatter'

As far as I can see, the solution should be as easy as adding torch_scatter>=2.0.5 to the install_requires attribute in setup.py. In my case, the error was resolved after installing torch_scatter separately.
opened by Kaleidophon 6
Data loader for PovertyMap is very slow

Hi -

Ran into a bit of an issue with data loading the Povertymap dataset - loading a single minibatch with 128 examples takes about 5-6 seconds. This is not a huge deal but slow enough to make me curious if there's a faster way of doing this.

Digging into the code a bit, it looks like the slowdown is mostly due to the array copy on line 239 of poverty_dataset.py https://github.com/p-lambda/wilds/blob/f984047af654eed6be51a7f770804a1c1b1ad0a0/wilds/datasets/poverty_dataset.py#L239 FWIW it looks like this is a known issue for memory-mapped numpy arrays on Linux systems (https://stackoverflow.com/questions/42864320/numpy-memmap-performance-issues).

I'm not sure if there are any recommendations for getting around this, or if there's another way the data could be loaded in? Or let me know if I'm totally off-base here. Thanks!

opened by dmadras 6
`assert` error in new wilds version with FMoW
Hello, I am using the new version of WILDS and getting the error:

... wilds/common/utils.py" line 86, in avg_over_groups assert v.numel()==g.numel()

any ideas? It may be a bug on my end and if I catch it I'll update here.
opened by mitchellnw 4
Unable to Train ERM model with civilcomments

Hi,

I am having trouble in running the code with command python3 wilds/examples/run_expt.py --dataset civilcomments --algorithm ERM --root_dir data --download Everything stuck, no error reported, both GPU and CPU are not leveraged.

If ctrl+C, it shows

The same thing didn't happen when I tried to run the same script but with groupDRO.

It would be very helpful if you have any clue on this, and thank you a lot for your amazing, well developed code!

opened by Bluepossibility 3
Issue in OOD data distribution when Grouper is set to "regions" for FMoW

Hi,

I am trying to change the groupby from "year" to "region". I have followed the instructions in the README page and currently using the following command: python3 wilds/examples/run_expt.py --dataset fmow --algorithm ERM --groupby_fields region --root_dir wilds_fmow/

However, the issue is that the training dataset is not being separated in terms of distinct regions for ID and OOD manner. That is, all regions are included in ID as well as OOD. Here is a screenshot of the output:

Therefore, I was wondering if that is a bug in the code or am I missing something?

Thanks Sara A. Al-Emadi

opened by saraalemadi 3
Minor issue: `pip install wilds` changes pytorch version

A really minor issue, but the pip install wilds changed my pytorch version which then caused some prior evals on non-wilds datasets to change slightly. Is it possible for this to not occur? No worries if not.

opened by mitchellnw 3
Understanding the prediction_dir format for leaderboard submission
I wonder if the log folder used during training is the prediction_dir described in Get Started: Evaluating trained models.

I tried to reproduce the ERM result on a subset of camelyon with the following command:

python examples/run_expt.py --dataset camelyon17 --algorithm ERM--root_dir data --frac 0.1 --log_dir log_erm_01.

Training goes well.

But my file camelyon17_split:id_val_seed:0_epoch is empty.

Then I ran the following command: python examples/evaluate.py log_erm_01 erm_01_output --root-dir data --dataset camelyon17

And I got this:

Traceback (most recent call last): File "examples/evaluate.py", line 282, in <module> main() File "examples/evaluate.py", line 244, in main evaluate_benchmark( File "examples/evaluate.py", line 136, in evaluate_benchmark predictions_file = get_prediction_file( File "examples/evaluate.py", line 89, in get_prediction_file raise FileNotFoundError( FileNotFoundError: Could not find CSV or pth prediction file that starts with camelyon17_split:id_val_seed:0.

So my question is whether the log file is the prediction_dir described in Get Started ?
opened by jmamath 3
How do I access data from only one group?
Hello, Thanks for the fantastic library!

I have two questions:

Is there any way I can get a per-group dataloader in wilds? This will help with, for instance, training a separate model for each group of data.

Can I change the split of data for each dataset? My application requires 50% of the data for each group/domain for testing.

Thanks!
opened by krishnap25 3

Model loaded from a .pth predicts only zeros

Hello !

I downloaded for the Camelyon17 dataset your trained model from CodaLab (ERM and seed0). I have installed all packages correctly according to your readme and load the model as follows:

path = "/best_model.pth"
state = torch.load(path)['algorithm']

state_dict = {}
 
for key in list(state.keys()):
    state_dict[key.replace('model.', '')] = state[key]

model.load_state_dict(state_dict)

model.eval()

I initialize the dataset I use for testing the model as follows:

import datasets_load  # from wilds package
dataset = datasets_load.Dataset('camelyon17', 32, '/data', 0.75, False)

For the prediction I used the following piece of code:

from wilds.common.data_loaders import get_eval_loader

test_data = dataset.test_set
test_loader = get_eval_loader('standard', test_data, batch_size=32)

with torch.no_grad():
    for x, y_true, metadata in test_loader:
          y_pred = model(x)
          labels = y_true
          _, predicted = torch.max(y_pred, 1)
          # print statements to check the output
          print("Labels: ", labels)
          print("Predicted: ", predicted)
          print("Correct: ", (predicted == labels).sum().item())

So far so good. When I run the code, the labels are printed (which always consist of 1 at the beginning, because shuffle=False) and the prediction which always consists of 0 values.

Labels:  tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Predicted:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Correct:  0

I would appreciate any advice or assistance. Many thanks in advance. Tim

opened by tim1188 3

Cannot fetch 'ogb-molpcba' dataset due to missing arg

dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')

Results in the following error:

--------------------------------------------------------------------
TypeError                          Traceback (most recent call last)
<ipython-input-2-c369817b9157> in <module>
----> 1 dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')

~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/get_dataset.py in get_dataset(dataset, version, **dataset_kwargs)
     51     elif dataset == 'ogb-molpcba':
     52         from wilds.datasets.ogbmolpcba_dataset import OGBPCBADataset
---> 53         return OGBPCBADataset(version=version, **dataset_kwargs)
     54 
     55     elif dataset == 'poverty':

~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/datasets/ogbmolpcba_dataset.py in __init__(self, version, root_dir, download, split_scheme)
     88             download_url('https://snap.stanford.edu/ogb/data/misc/ogbg_molpcba/scaffold_group.npy', os.path.join(self.ogb_dataset.root, 'raw'))
     89         self._metadata_array = torch.from_numpy(np.load(metadata_file_path)).reshape(-1,1).long()
---> 90         self._collate = PyGCollater(follow_batch=[])
     91 
     92         self._metric = Evaluator('ogbg-molpcba')

TypeError: __init__() missing 1 required positional argument: 'exclude_keys'

Versions:

wilds 1.1.0 torch_geometric 1.7.0

opened by arnaudvl 3

Could you provide the trained weights？

Hello，

I am training BERT+ERM on the Amazon dataset but it is very time cost. Is it possible to provide the best trained parameters to the users? ( like BERT is proving the pretrained weights, maybe you can have another folder under examples which contains all the weights for users.) It will save users about a week ( and computations). Thank you!

opened by yachuan 3
camelyon17 split scheme: in-dist

I am not able to run Camelyon17 with --split_scheme in-dist (I'm assuming this corresponds to the setting with ID val data).

Any pointers on how to run this, or in general how to run camelyon with the ID val data?

Thank you for the help!

opened by thomaspzollo 1
run_expt.py: --device argument doesn't set the device

Hey, I'm running Wilds on a p2.8xlarge AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to run run_expt.py and use the --device argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage in nvidia-smi as well as printing the device used by torch using torch.cuda.current_device(). My guess is that the CUDA_VISIBLE_DEVICES environment variable, set here, is set too late and PyTorch just defaults to device 0.

I've worked around this by setting the CUDA_VISIBLE_DEVICES variable manually, before running the script. I just thought I'd let you know I encountered this issue.

Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use run_expt pretty easily to run my own experiments.

opened by vvolhejn 0

Releases(v2.0.0)

v2.0.0(Dec 13, 2021)
The v2.0.0 release adds unlabeled data to 8 datasets and several new algorithms for taking advantage of the unlabeled data. It also updates the standard data augmentations used for the image datasets.

All labeled (training, validation, and test) datasets are exactly the same. Evaluation metrics are also exactly the same. All results on the datasets in v1.x are therefore still current and directly comparable to results obtained on v2.

For more information, please read our paper on the unlabeled data.

New datasets with unlabeled data

We have added unlabeled data to the following datasets:

iwildcam

camelyon17

ogb-molpcba

globalwheat

civilcomments

fmow

poverty

amazon

The following datasets have no unlabeled data and have not been changed:

rxrx1

py150

The labeled training, validation, and test data in all datasets have been kept exactly the same.

The unlabeled data comes from the same underlying sources as the original labeled data, and they can be from the source, validation, extra, or target domains. We describe each dataset in detail in our paper.

Each unlabeled dataset has its own corresponding data loader defined in wilds/datasets/unlabeled. Please see the README for more details on how to use them.

New algorithms for using unlabeled data

In our scripts in the examples folder, we have updated and/or added new algorithms that make use of the unlabeled data:

CORAL (Sun and Saenko, 2016)

DANN (Ganin et al., 2016)

AFN (Xu et al., 2019)

Pseudo-Label (Lee, 2013)

FixMatch (Sohn et al., 2020)

Noisy Student (Xie et al., 2020)

SwAV pre-training (Caron et al., 2020)

Masked language model pre-training (Devlin et al., 2019)

Other changes

GlobalWheat v1.0 -> v1.1

We have corrected some errors in the metadata for the previous version of the GlobalWheat (labeled) dataset. Users who did not explicitly make use of the location or stage metadata (which should be most users) will not be affected. All baseline results are unchanged.

DomainNet support

We have included data loaders for the DomainNet dataset (Peng at al., 2019) as a means of benchmarking the algorithms we implemented on existing datasets.

Data augmentation

We have added support for RandAugment (Cubuk et al., 2019) for RGB images, and we have also implemented a set of data augmentations for the multi-spectral Poverty dataset. These augmentations are used in all of the algorithms for unlabeled data listed above.

Hyperparameters

In our experiments to benchmark the algorithms for using unlabeled data, we tuned hyperparameters by random search instead of grid search. The default hyperparameters in examples/configs/datasets.py still work well but do not reflect the exact hyperparameters we used for our experiments. To see those, please view our CodaLab worksheet.

Miscellaneous

In our example scripts, we have added support for gradient accumulation by specifying the gradient_accumulation_steps parameter.

We have also added support for logging using Weights and Biases.

Source code(tar.gz)
Source code(zip)
v1.2.2(Aug 4, 2021)
v1.2.2 contains several minor changes:

Added a check to make sure that a group data loader is used whenever n_groups_per_batch or distinct_groups are passed in as arguments to examples/run_expt.py. (https://github.com/p-lambda/wilds/issues/79)

Data augmentations now only transform x by default. Set do_transform_y when initializing the WILDSSubset to modify both x and y. (https://github.com/p-lambda/wilds/issues/77)

For FasterRCNN, we now use the PyTorch implementation of smooth_l1_loss instead of the custom torchvision implementation, which was removed in torchvision v0.10.

Updated the requirements to include torchvision, scipy, and scikit-learn. Previously, torchvision was only needed for the example scripts. However, it is now also used for computing metrics in the GlobalWheat-WILDS dataset, so we have moved it into the core set of requirements.

Source code(tar.gz)
Source code(zip)
v1.2.1(Jul 19, 2021)
v1.2.1 adds two new benchmark datasets: the GlobalWheat wheat head detection dataset, and the RxRx1 cellular microscopy dataset. Please see our paper for more details on these datasets.

It also simplifies saving and evaluation predictions made across different replicates and datasets.

New datasets

New benchmark dataset: GlobalWheat-WILDS v1.0

The Global Wheat Head detection dataset comprises images of wheat fields collected from 12 countries around the world. The task is to draw bounding boxes around instances of wheat heads in each image, and the distribution shift is over images taken in different locations.

Model performance is measured by the proportion of the predicted bounding boxes that sufficiently overlap with the ground truth bounding boxes (IoU > 0.5). The example script implements a FasterRCNN baseline.

This dataset is adapted from the Global Wheat Head Dataset 2021, which was recently used in a public competition held in conjunction with the Computer Vision in Plant Phenotyping and Agriculture Workshop at ICCV 2021.

New benchmark dataset: RxRx1-WILDS v1.0

The RxRx1 dataset comprises images of genetically-perturbed cells taken with fluorescent microscopy and collected across 51 experimental batches. The task is to classify the identity of the genetic perturbation applied to each cell, and the distribution shift is over different experimental batches.

Model performance is measured by average classification accuracy. The example script implements a ResNet-50 baseline.

This dataset is adapted from the RxRx1 dataset released by Recursion.

Additional dataset: ENCODE

The ENCODE dataset is based on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge. The task is to classify if a given genomic location will be bound by a particular transcription factor, and the distribution shift is over different cell types.

We did not include this dataset in the official benchmark as we were unable to learn a model that could generalize across all the cell types simultaneously, even in an in-distribution setting, which suggested that the model family and/or feature set might not be rich enough.

Other changes

Saving and evaluating predictions

To ease evaluation and leaderboard submission, we have made the following changes:

Predictions are now automatically saved in the format described in our submission guidelines.

We have added an evaluation script that evaluates these saved predictions across multiple replicates and datasets. See the updated README and examples/evaluate.py for more details.

Code changes to support detection tasks

To support detection tasks, we have modified the example scripts as well as made slight changes to the WILDS data loaders. All interfaces should be backwards-compatible.

The labels y and the model outputs no longer need to be a Tensor. For example, for detection tasks, a model might return a dictionary containing bounding box coordinates as well as class predictions for each bounding box. Accordingly, several helper functions have been rewritten to be more flexible.

Models can now optionally take in y in the forward call. For example, during training, a model might use ground truth bounding boxes to train a bounding box classifier.

Data transforms can now transform both x and y. We have also merged train_transform and eval_transform functions into a single function that takes a is_training parameter.

Miscellaneous changes

We have changed the names of the in-distribution split_scheme's to match the terminology in Section 5 of the updated paper.

The FMoW-WILDS and PovertyMap-WILDS constructors now no longer use the oracle_training_set parameter to use an in-distribution split. This is now controlled through split_scheme to be consistent with the other datasets.

We fixed a minor bug in the PovertyMap-WILDS in-distribution baseline. The Val (ID) and Test (ID) splits are slightly changed.

The FMoW-WILDS constructor now sets use_ood_val=True by default. This change has no effect for users using the example scripts, as use_ood_val is already set in config/datasets.py.

Users who are only using the data loaders and not the evaluation metrics or example scripts will no longer need to install torch_scatter (thanks Ke Alexander Wang).

The Waterbirds dataset now computes the adjusted average accuracy on the validation and test sets, as described in Appendix C.1 of the corresponding paper.

The behavior of algorithm.eval() is now consistent with algorithm.model.eval() in that both preserve the grad_fn attribute (thanks Divya Shanmugam). See https://github.com/p-lambda/wilds/issues/45.

The dataset name for OGB-MolPCBA has been changed from ogbg-molpcba to to ogb-molpcba for consistency.

We have updated the OGB-MolPCBA data loader to be compatible with v1.7 of the pytorch_geometric dependency (thanks arnaudvl). See https://github.com/p-lambda/wilds/issues/52.

Source code(tar.gz)
Source code(zip)
v1.1.0(Mar 10, 2021)
The v1.1.0 release contains a new Py150 benchmark dataset for code completion, as well as updates to several existing datasets and default models to make them significantly faster and easier to use.

Some of these changes are breaking changes that will impact users who are currently running experiments with WILDS. We sincerely apologize for the inconvenience. We ask all users to update their package to v1.1.0, which will automatically update your datasets. In addition, please update your default models, for example by using the latest example scripts in this repo. These changes were primarily made to accelerate model training, which was a bottleneck for many users; at this time, we do not expect to have to make further changes to the existing datasets or default models.

New datasets

New benchmark dataset: Py150

The Py150-WILDS dataset is a code completion dataset, where the distribution shift is over code from different Github repositories.

We focus on accuracy on the subpopulation of class and method tokens, as prior work has shown that those are the most frequent queries in real-world code completion settings.

It is a variant of the Py150 dataset from Raychev et al., 2016.

See our paper for more details.

Additional dataset: SQF

The SQF dataset is based on the stop-question-and-frisk dataset released by the New York Police Department. We adapt the version processed by Goel et al., 2016. The task is to predict criminal possession of a weapon.

We use this dataset to study distribution shifts in an algorithmic fairness context. Specifically, we consider subpopulation shifts across locations and race groups. However, while there are large performance gaps, we did not find that they were caused by the distribution shift. We therefore did not include this dataset as part of the official benchmark.

Major updates to existing datasets

Note that datasets are versioned separately from the main WILDS version. We have two major updates (i.e., breaking, non-backwards-compatible changes) to datasets.

Amazon v1.0 -> v2.0

To speed up model training, we have subsampled the number of reviewers in this dataset to 25% of its original size, while keeping the same number of reviews per reviewer.

iWildCam v1.0 -> v2.0

Previously, the ID split was done uniformly at random, meaning that images from the same sequence (i.e., taken within a few seconds of each other by the same camera) could be found across all of the training / validation (ID) / test (ID) sets.

In v2.0, we have redone the ID split so that all images taken on the same day by the same camera are in only one of the training, validation (ID), or test (ID) sets. In other words, these sets still comprise images from the same cameras, but taken on different days.

In line with the new iWildCam 2021 challenge on Kaggle, we have also removed the following images:

images that include humans or pictures taken indoors.

images with non-animal categories such as start and unidentifiable.

images in categories such as unknown, unknown raptor and unknown rat.

We added back in location 537 that was previously removed as we mistakenly believed those images were corrupted.

We have re-split the data into training, validation (ID), test (ID), validation (OOD), and test (OOD) sets. This is a different random split from v1.0.

Since we remove any classes that do not end up in the train split, removing those images and redoing the split gave us a different set of species. There are now 182 classes instead of 186. Specifically, the following classes have been removed: ['unknown', 'macaca fascicularis', 'proechimys sp', 'unidentifiable', 'turtur calcospilos', 'streptopilia senegalensis', 'equus africanus', 'macaca nemestrina', 'start', 'paleosuchus sp', 'unknown raptor', 'unknown rat', 'misfire', 'mustela lutreolina', 'canis latrans', 'myoprocta pratti', 'xerus rutilus', 'end', 'psophia crepitans', 'ictonyx striatus']. The following classes have been added: [‘praomys tullbergi', 'polyplectron chalcurum', 'ardeotis kori', 'phaetornis sp', 'mus minutoides', 'raphicerus campestris', 'tigrisoma mexicanum', 'leptailurus serval', 'malacomys longipes', 'oenomys hypoxanthus', 'turdus olivaceus', 'macaca sp', 'leiothrix argentauris', 'lophura sp', 'mazama temama', 'hippopotamus amphibius']. For convenience, we have also added a categories.csv that maps from label IDs to species names.

To speed up downloading and model training (by reducing the I/O bottleneck), we have also resized all images to have a height of 448px while keeping the original aspect ratio. All images are wide (so they now have a min dimension of 448px). Note that as JPEG compression is lossy, this procedure gives different images from resizing the full-sized image in the code after loading it.

Minor updates to existing datasets

We made two backwards-compatible changes to existing datasets. We encourage all users to update these datasets; these updates should leave results unchanged (modulo training randomness). In future versions of the WILDS package, we will deprecate the older versions of these datasets.

FMoW v1.0 -> v1.1

Previously, the images were stored as chunks in .npy files and read in using NumPy memmapping.

Now, we have converted them (losslessly) into individual PNG images. This should help with disk I/O and memory usage, and make them more convenient to visualize and use in other pipelines.

PovertyMap v1.0 -> v1.1

Previously, the images were stored in a single .npy file and read in using NumPy memmapping.

Now, we have converted them (loselessly) into individual compressed .npz files. This should help with disk I/O and memory usage.

We have correspondingly updated the default number of workers for the data loader from 1 to 4.

Default model updates

We have updated the default models for several datasets. Please take note of these changes if you are currently running experiments with these datasets.

Amazon and CivilComments

To speed up model training, we have switched from BERT-base-uncased to DistilBERT-base-uncased. This obtains roughly similar accuracy but at twice the speed.

For CivilComments, we have also increased the number of replicates from 3 to 5, to reduce variability in the reported performance.

Camelyon17

Previously, we were upsizing each image to 224x224 before passing it into the model.

We now leave the images at their original resolution of 96x96, which significantly speeds up model training.

iWildCam

Previously, we were resizing each image to 224x224 before passing it into the model. However, this limited model accuracy, as the animals in the images can sometimes be quite small.

We now resize each image to 448x448 before passing it into the model, which improves accuracy and macro F1 across the board.

FMoW

For consistency with the other datasets, we have changed the early stopping validation criterion (val_metric) from acc_avg to acc_worst_region.

PovertyMap

For consistency with the other datasets, we have changed the early stopping validation criterion (val_metric) from r_all to r_wg.

Other changes

We have uploaded an executable version of our paper to CodaLab. This contains the exact commands, code, and data used for each experiment reported in our paper. The trained model weights for every experiment can also be found there.

To ease downloading, we have added wilds/download_datasets.py, which allows users to download all (or a subset of) datasets at once. Please see the README for instructions.

We have added a convenience function for getting the appropriate constructor for each dataset in wilds/get_dataset.py. This function allows you to specify a version argument. If this is not specified, it defaults to the latest available version for that dataset. If that version is not downloaded and the download argument is also set, then it will automatically download that version.

The example script examples/run_expt.py now also takes in a version argument.

We have added download sizes and expected training times to the README.

We have updated the default inputs for WILDSDatasets.eval methods for various datasets. For example, eval for most classification datasets now take in predicted labels by default, while the predictions were previously passed in as logits. The default inputs vary across datasets, and we document this in the docstring of each eval method.

We made a few updates to the code in examples/ to interface better with language modeling tasks (for Py150). None of these changes affect the results or the interface with algorithms.

We updated the code in examples/ to save model predictions in an appropriate format for submissions to the leaderboard.

Finally, we have also updated our paper to streamline the writing and include these new numbers and datasets.

Source code(tar.gz)
Source code(zip)

Owner

P-Lambda

GitHub Repository https://wilds.stanford.edu

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

983 Dec 23, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

Exploring Visual Engagement Signals for Representation Learning

Exploring Visual Engagement Signals for Representation Learning Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim C

9 Jul 23, 2022

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Swin Transformer This project aims to explore the deployment of SwinTransformer based on TensorRT, including the test results of FP16 and INT8. Introd

87 Dec 21, 2022

Reimplementation of Learning Mesh-based Simulation With Graph Networks

Pytorch Implementation of Learning Mesh-based Simulation With Graph Networks This is the unofficial implementation of the approach described in the pa

33 Dec 14, 2022

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

ByteTrack ByteTrack is a simple, fast and strong multi-object tracker. ByteTrack: Multi-Object Tracking by Associating Every Detection Box Yifu Zhang,

2.9k Jan 04, 2023

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

The Face Synthetics dataset Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels. It was introduced in ou

608 Jan 02, 2023

Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

Modified prey-predator system We aim to study the behaviors of the modified prey–predator model and establish the effects of several parameters that p

1 Jan 02, 2022

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

163 Dec 19, 2022

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Balanced-Evolutionary-Semi-Stacking Code for the paper ''BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalan

0 Jan 16, 2022

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

Algorand Burn Address This is a simple script to illustrate how a "burn address"

5 May 10, 2022

Data augmentation for NLP, accepted at EMNLP 2021 Findings

AEDA: An Easier Data Augmentation Technique for Text Classification This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Techni

81 Dec 09, 2022

UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

We proposed an unsupervised image-to-image translation method via pre-trained StyleGAN2 network. paper: Unsupervised Image-to-Image Translation via Pr

208 Dec 30, 2022

PyTorch Implementation for Deep Metric Learning Pipelines

Easily Extendable Basic Deep Metric Learning Pipeline Karsten Roth ([email

543 Jan 04, 2023

OBBDetection: an oriented object detection toolbox modified from MMdetection

OBBDetection note: If you have questions or good suggestions, feel free to propose issues and contact me. introduction OBBDetection is an oriented obj

3 Nov 11, 2022

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

GCA Source code for Graph Contrastive Learning with Adaptive Augmentation (WWW 2021) For example, to run GCA-Degree under WikiCS, execute: python trai

97 Jan 07, 2023

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances This repository contains the code for Measuring the Co

0 Nov 06, 2022

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Related tags

Overview

Overview

Installation

Requirements

Default models

Usage

Default models

Data loading

Domain information

Evaluators

Citing WILDS

Acknowledgements

Comments

Hello !

Releases(v2.0.0)

v2.0.0(Dec 13, 2021)

New datasets with unlabeled data

New algorithms for using unlabeled data

Other changes

GlobalWheat v1.0 -> v1.1

DomainNet support

Data augmentation

Hyperparameters

Miscellaneous

v1.2.2(Aug 4, 2021)

v1.2.1(Jul 19, 2021)

New datasets

New benchmark dataset: GlobalWheat-WILDS v1.0

New benchmark dataset: RxRx1-WILDS v1.0

Additional dataset: ENCODE

Other changes

Saving and evaluating predictions

Code changes to support detection tasks

Miscellaneous changes

v1.1.0(Mar 10, 2021)

New datasets

New benchmark dataset: Py150

Additional dataset: SQF

Major updates to existing datasets

Amazon v1.0 -> v2.0

iWildCam v1.0 -> v2.0

Minor updates to existing datasets

FMoW v1.0 -> v1.1

PovertyMap v1.0 -> v1.1

Default model updates

Amazon and CivilComments

Camelyon17

iWildCam

FMoW

PovertyMap

Other changes

Owner

P-Lambda

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Exploring Visual Engagement Signals for Representation Learning

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Reimplementation of Learning Mesh-based Simulation With Graph Networks

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

Modified prey-predator system - Modified prey–predator model describes the rate of change for each species by adding coupling terms.

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Algo-burn - Script to configure an Algorand address as a "burn" address for one or more ASA tokens

Data augmentation for NLP, accepted at EMNLP 2021 Findings

UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

PyTorch Implementation for Deep Metric Learning Pipelines

OBBDetection: an oriented object detection toolbox modified from MMdetection

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Python interface for SmartRF Sniffer 2 Firmware

Use CLIP to represent video for Retrieval Task

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.