A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Last update: Dec 30, 2022

Related tags

Overview

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

The WILDS package contains:

Data loaders that automatically handle data downloading, processing, and splitting, and
Dataset evaluators that standardize model evaluation for each dataset.

In addition, the example scripts contain default models, allowing new algorithms to be easily added and run on all of the WILDS datasets.

For more information, please read our paper or visit our website. For questions and feedback, please post on the discussion board.

Installation

We recommend using pip to install WILDS:

pip install wilds

If you have already installed it, please check that you have the latest version:

python -c "import wilds; print(wilds.__version__)"
# This should print "1.0.0". If it doesn't, update by running:
pip install -U wilds

If you plan to edit or contribute to WILDS, you should install from source:

git clone [email protected]:p-lambda/wilds.git
cd wilds
pip install -e .

Requirements

numpy>=1.19.1
pandas>=1.1.0
pillow>=7.2.0
torch>=1.7.0
tqdm>=4.53.0
pytz>=2020.4
outdated>=0.2.0
ogb>=1.2.3
torch-scatter>=2.0.5
torch-geometric>=1.6.1

Running pip install wilds will check for all of these requirements except for the torch-scatter and torch-geometric packages, which require a quick manual install.

Default models

After installing the WILDS package, you can use the scripts in examples/ to train default models on the WILDS datasets. These scripts are not part of the installed WILDS package. To use them, you should clone the repo (assuming you did not install from source):

git clone [email protected]:p-lambda/wilds.git

To run these scripts, you will need to install these additional dependencies:

torchvision>=0.8.1
transformers>=3.5.0

All baseline experiments in the paper were run on Python 3.8.5 and CUDA 10.1.

Usage

Default models

In the examples/ folder, we provide a set of scripts that we used to train models on the WILDS package. These scripts are configured with the default models and hyperparameters that we used for all of the baselines described in our paper. All baseline results in the paper can be easily replicated with commands like:

cd examples
python run_expt.py --dataset iwildcam --algorithm ERM --root_dir data
python run_expt.py --dataset civilcomments --algorithm groupDRO --root_dir data

The scripts are set up to facilitate general-purpose algorithm development: new algorithms can be added to examples/algorithms and then run on all of the WILDS datasets using the default models.

The first time you run these scripts, you might need to download the datasets. You can do so with the --download argument, for example:

python run_expt.py --dataset civilcomments --algorithm groupDRO --root_dir data --download

Data loading

The WILDS package provides a simple, standardized interface for all datasets in the benchmark. This short Python snippet covers all of the steps of getting started with a WILDS dataset, including dataset download and initialization, accessing various splits, and preparing a user-customizable data loader.

>>> from wilds.datasets.iwildcam_dataset import IWildCamDataset
>>> from wilds.common.data_loaders import get_train_loader
>>> import torchvision.transforms as transforms

# Load the full dataset, and download it if necessary
>>> dataset = IWildCamDataset(download=True)

# Get the training set
>>> train_data = dataset.get_subset('train',
...                                 transform=transforms.Compose([transforms.Resize((224,224)),
...                                                               transforms.ToTensor()]))

# Prepare the standard data loader
>>> train_loader = get_train_loader('standard', train_data, batch_size=16)

# Train loop
>>> for x, y_true, metadata in train_loader:
...   ...

The metadata contains information like the domain identity, e.g., which camera a photo was taken from, or which hospital the patient's data came from, etc.

Domain information

To allow algorithms to leverage domain annotations as well as other groupings over the available metadata, the WILDS package provides Grouper objects. These Grouper objects extract group annotations from metadata, allowing users to specify the grouping scheme in a flexible fashion.

>>> from wilds.common.grouper import CombinatorialGrouper

# Initialize grouper, which extracts domain information
# In this example, we form domains based on location
>>> grouper = CombinatorialGrouper(dataset, ['location'])

# Train loop
>>> for x, y_true, metadata in train_loader:
...   z = grouper.metadata_to_group(metadata)
...   ...

The Grouper can be used to prepare a group-aware data loader that, for each minibatch, first samples a specified number of groups, then samples examples from those groups. This allows our data loaders to accommodate a wide array of training algorithms, some of which require specific data loading schemes.

# Prepare a group data loader that samples from user-specified groups
>>> train_loader = get_train_loader('group', train_data,
...                                 grouper=grouper,
...                                 n_groups_per_batch=2,
...                                 batch_size=16)

Evaluators

The WILDS package standardizes and automates evaluation for each dataset. Invoking the eval method of each dataset yields all metrics reported in the paper and on the leaderboard.

>>> from wilds.common.data_loaders import get_eval_loader

# Get the test set
>>> test_data = dataset.get_subset('test',
...                                 transform=transforms.Compose([transforms.Resize((224,224)),
...                                                               transforms.ToTensor()]))

# Prepare the data loader
>>> test_loader = get_eval_loader('standard', test_data, batch_size=16)

# Get predictions for the full test set
>>> for x, y_true, metadata in test_loader:
...   y_pred = model(x)
...   [accumulate y_true, y_pred, metadata]

# Evaluate
>>> dataset.eval(all_y_pred, all_y_true, all_metadata)
{'recall_macro_all': 0.66, ...}

Citing WILDS

If you use WILDS datasets in your work, please cite our paper (Bibtex):

WILDS: A Benchmark of in-the-Wild Distribution Shifts (2020). Pang Wei Koh*, Shiori Sagawa*, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang.

Please also cite the original papers that introduce the datasets, as listed on the datasets page.

Acknowledgements

The design of the WILDS benchmark was inspired by the Open Graph Benchmark, and we are grateful to the Open Graph Benchmark team for their advice and help in setting up WILDS.

Comments

Installating via pip seems to miss `torch_scatter` dependency
Hey,

I noticed that the installation via pip install wilds seems to miss the torch_scatter dependency that is also listed in the README. When e.g. trying to do from wilds.datasets.amazon_dataset import AmazonDataset I got

from wilds.datasets.amazon_dataset import AmazonDataset File "/Users/deul/Desktop/wilds/wilds/datasets/amazon_dataset.py", line 6, in <module> from wilds.common.utils import map_to_id_array File "/Users/deul/Desktop/wilds/wilds/common/utils.py", line 1, in <module> import torch, torch_scatter ModuleNotFoundError: No module named 'torch_scatter'

As far as I can see, the solution should be as easy as adding torch_scatter>=2.0.5 to the install_requires attribute in setup.py. In my case, the error was resolved after installing torch_scatter separately.
opened by Kaleidophon 6
Data loader for PovertyMap is very slow

Hi -

Ran into a bit of an issue with data loading the Povertymap dataset - loading a single minibatch with 128 examples takes about 5-6 seconds. This is not a huge deal but slow enough to make me curious if there's a faster way of doing this.

Digging into the code a bit, it looks like the slowdown is mostly due to the array copy on line 239 of poverty_dataset.py https://github.com/p-lambda/wilds/blob/f984047af654eed6be51a7f770804a1c1b1ad0a0/wilds/datasets/poverty_dataset.py#L239 FWIW it looks like this is a known issue for memory-mapped numpy arrays on Linux systems (https://stackoverflow.com/questions/42864320/numpy-memmap-performance-issues).

I'm not sure if there are any recommendations for getting around this, or if there's another way the data could be loaded in? Or let me know if I'm totally off-base here. Thanks!

opened by dmadras 6
`assert` error in new wilds version with FMoW
Hello, I am using the new version of WILDS and getting the error:

... wilds/common/utils.py" line 86, in avg_over_groups assert v.numel()==g.numel()

any ideas? It may be a bug on my end and if I catch it I'll update here.
opened by mitchellnw 4
Unable to Train ERM model with civilcomments

Hi,

I am having trouble in running the code with command python3 wilds/examples/run_expt.py --dataset civilcomments --algorithm ERM --root_dir data --download Everything stuck, no error reported, both GPU and CPU are not leveraged.

If ctrl+C, it shows

The same thing didn't happen when I tried to run the same script but with groupDRO.

It would be very helpful if you have any clue on this, and thank you a lot for your amazing, well developed code!

opened by Bluepossibility 3
Issue in OOD data distribution when Grouper is set to "regions" for FMoW

Hi,

I am trying to change the groupby from "year" to "region". I have followed the instructions in the README page and currently using the following command: python3 wilds/examples/run_expt.py --dataset fmow --algorithm ERM --groupby_fields region --root_dir wilds_fmow/

However, the issue is that the training dataset is not being separated in terms of distinct regions for ID and OOD manner. That is, all regions are included in ID as well as OOD. Here is a screenshot of the output:

Therefore, I was wondering if that is a bug in the code or am I missing something?

Thanks Sara A. Al-Emadi

opened by saraalemadi 3
Minor issue: `pip install wilds` changes pytorch version

A really minor issue, but the pip install wilds changed my pytorch version which then caused some prior evals on non-wilds datasets to change slightly. Is it possible for this to not occur? No worries if not.

opened by mitchellnw 3
Understanding the prediction_dir format for leaderboard submission
I wonder if the log folder used during training is the prediction_dir described in Get Started: Evaluating trained models.

I tried to reproduce the ERM result on a subset of camelyon with the following command:

python examples/run_expt.py --dataset camelyon17 --algorithm ERM--root_dir data --frac 0.1 --log_dir log_erm_01.

Training goes well.

But my file camelyon17_split:id_val_seed:0_epoch is empty.

Then I ran the following command: python examples/evaluate.py log_erm_01 erm_01_output --root-dir data --dataset camelyon17

And I got this:

Traceback (most recent call last): File "examples/evaluate.py", line 282, in <module> main() File "examples/evaluate.py", line 244, in main evaluate_benchmark( File "examples/evaluate.py", line 136, in evaluate_benchmark predictions_file = get_prediction_file( File "examples/evaluate.py", line 89, in get_prediction_file raise FileNotFoundError( FileNotFoundError: Could not find CSV or pth prediction file that starts with camelyon17_split:id_val_seed:0.

So my question is whether the log file is the prediction_dir described in Get Started ?
opened by jmamath 3
How do I access data from only one group?
Hello, Thanks for the fantastic library!

I have two questions:

Is there any way I can get a per-group dataloader in wilds? This will help with, for instance, training a separate model for each group of data.

Can I change the split of data for each dataset? My application requires 50% of the data for each group/domain for testing.

Thanks!
opened by krishnap25 3

Model loaded from a .pth predicts only zeros

Hello !

I downloaded for the Camelyon17 dataset your trained model from CodaLab (ERM and seed0). I have installed all packages correctly according to your readme and load the model as follows:

path = "/best_model.pth"
state = torch.load(path)['algorithm']

state_dict = {}
 
for key in list(state.keys()):
    state_dict[key.replace('model.', '')] = state[key]

model.load_state_dict(state_dict)

model.eval()

I initialize the dataset I use for testing the model as follows:

import datasets_load  # from wilds package
dataset = datasets_load.Dataset('camelyon17', 32, '/data', 0.75, False)

For the prediction I used the following piece of code:

from wilds.common.data_loaders import get_eval_loader

test_data = dataset.test_set
test_loader = get_eval_loader('standard', test_data, batch_size=32)

with torch.no_grad():
    for x, y_true, metadata in test_loader:
          y_pred = model(x)
          labels = y_true
          _, predicted = torch.max(y_pred, 1)
          # print statements to check the output
          print("Labels: ", labels)
          print("Predicted: ", predicted)
          print("Correct: ", (predicted == labels).sum().item())

So far so good. When I run the code, the labels are printed (which always consist of 1 at the beginning, because shuffle=False) and the prediction which always consists of 0 values.

Labels:  tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Predicted:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Correct:  0

I would appreciate any advice or assistance. Many thanks in advance. Tim

opened by tim1188 3

Cannot fetch 'ogb-molpcba' dataset due to missing arg

dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')

Results in the following error:

--------------------------------------------------------------------
TypeError                          Traceback (most recent call last)
<ipython-input-2-c369817b9157> in <module>
----> 1 dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')

~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/get_dataset.py in get_dataset(dataset, version, **dataset_kwargs)
     51     elif dataset == 'ogb-molpcba':
     52         from wilds.datasets.ogbmolpcba_dataset import OGBPCBADataset
---> 53         return OGBPCBADataset(version=version, **dataset_kwargs)
     54 
     55     elif dataset == 'poverty':

~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/datasets/ogbmolpcba_dataset.py in __init__(self, version, root_dir, download, split_scheme)
     88             download_url('https://snap.stanford.edu/ogb/data/misc/ogbg_molpcba/scaffold_group.npy', os.path.join(self.ogb_dataset.root, 'raw'))
     89         self._metadata_array = torch.from_numpy(np.load(metadata_file_path)).reshape(-1,1).long()
---> 90         self._collate = PyGCollater(follow_batch=[])
     91 
     92         self._metric = Evaluator('ogbg-molpcba')

TypeError: __init__() missing 1 required positional argument: 'exclude_keys'

Versions:

wilds 1.1.0 torch_geometric 1.7.0

opened by arnaudvl 3

Could you provide the trained weights？

Hello，

I am training BERT+ERM on the Amazon dataset but it is very time cost. Is it possible to provide the best trained parameters to the users? ( like BERT is proving the pretrained weights, maybe you can have another folder under examples which contains all the weights for users.) It will save users about a week ( and computations). Thank you!

opened by yachuan 3
camelyon17 split scheme: in-dist

I am not able to run Camelyon17 with --split_scheme in-dist (I'm assuming this corresponds to the setting with ID val data).

Any pointers on how to run this, or in general how to run camelyon with the ID val data?

Thank you for the help!

opened by thomaspzollo 1
run_expt.py: --device argument doesn't set the device

Hey, I'm running Wilds on a p2.8xlarge AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to run run_expt.py and use the --device argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage in nvidia-smi as well as printing the device used by torch using torch.cuda.current_device(). My guess is that the CUDA_VISIBLE_DEVICES environment variable, set here, is set too late and PyTorch just defaults to device 0.

I've worked around this by setting the CUDA_VISIBLE_DEVICES variable manually, before running the script. I just thought I'd let you know I encountered this issue.

Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use run_expt pretty easily to run my own experiments.

opened by vvolhejn 0

Releases(v2.0.0)

v2.0.0(Dec 13, 2021)
The v2.0.0 release adds unlabeled data to 8 datasets and several new algorithms for taking advantage of the unlabeled data. It also updates the standard data augmentations used for the image datasets.

All labeled (training, validation, and test) datasets are exactly the same. Evaluation metrics are also exactly the same. All results on the datasets in v1.x are therefore still current and directly comparable to results obtained on v2.

For more information, please read our paper on the unlabeled data.

New datasets with unlabeled data

We have added unlabeled data to the following datasets:

iwildcam

camelyon17

ogb-molpcba

globalwheat

civilcomments

fmow

poverty

amazon

The following datasets have no unlabeled data and have not been changed:

rxrx1

py150

The labeled training, validation, and test data in all datasets have been kept exactly the same.

The unlabeled data comes from the same underlying sources as the original labeled data, and they can be from the source, validation, extra, or target domains. We describe each dataset in detail in our paper.

Each unlabeled dataset has its own corresponding data loader defined in wilds/datasets/unlabeled. Please see the README for more details on how to use them.

New algorithms for using unlabeled data

In our scripts in the examples folder, we have updated and/or added new algorithms that make use of the unlabeled data:

CORAL (Sun and Saenko, 2016)

DANN (Ganin et al., 2016)

AFN (Xu et al., 2019)

Pseudo-Label (Lee, 2013)

FixMatch (Sohn et al., 2020)

Noisy Student (Xie et al., 2020)

SwAV pre-training (Caron et al., 2020)

Masked language model pre-training (Devlin et al., 2019)

Other changes

GlobalWheat v1.0 -> v1.1

We have corrected some errors in the metadata for the previous version of the GlobalWheat (labeled) dataset. Users who did not explicitly make use of the location or stage metadata (which should be most users) will not be affected. All baseline results are unchanged.

DomainNet support

We have included data loaders for the DomainNet dataset (Peng at al., 2019) as a means of benchmarking the algorithms we implemented on existing datasets.

Data augmentation

We have added support for RandAugment (Cubuk et al., 2019) for RGB images, and we have also implemented a set of data augmentations for the multi-spectral Poverty dataset. These augmentations are used in all of the algorithms for unlabeled data listed above.

Hyperparameters

In our experiments to benchmark the algorithms for using unlabeled data, we tuned hyperparameters by random search instead of grid search. The default hyperparameters in examples/configs/datasets.py still work well but do not reflect the exact hyperparameters we used for our experiments. To see those, please view our CodaLab worksheet.

Miscellaneous

In our example scripts, we have added support for gradient accumulation by specifying the gradient_accumulation_steps parameter.

We have also added support for logging using Weights and Biases.

Source code(tar.gz)
Source code(zip)
v1.2.2(Aug 4, 2021)
v1.2.2 contains several minor changes:

Added a check to make sure that a group data loader is used whenever n_groups_per_batch or distinct_groups are passed in as arguments to examples/run_expt.py. (https://github.com/p-lambda/wilds/issues/79)

Data augmentations now only transform x by default. Set do_transform_y when initializing the WILDSSubset to modify both x and y. (https://github.com/p-lambda/wilds/issues/77)

For FasterRCNN, we now use the PyTorch implementation of smooth_l1_loss instead of the custom torchvision implementation, which was removed in torchvision v0.10.

Updated the requirements to include torchvision, scipy, and scikit-learn. Previously, torchvision was only needed for the example scripts. However, it is now also used for computing metrics in the GlobalWheat-WILDS dataset, so we have moved it into the core set of requirements.

Source code(tar.gz)
Source code(zip)
v1.2.1(Jul 19, 2021)
v1.2.1 adds two new benchmark datasets: the GlobalWheat wheat head detection dataset, and the RxRx1 cellular microscopy dataset. Please see our paper for more details on these datasets.

It also simplifies saving and evaluation predictions made across different replicates and datasets.

New datasets

New benchmark dataset: GlobalWheat-WILDS v1.0

The Global Wheat Head detection dataset comprises images of wheat fields collected from 12 countries around the world. The task is to draw bounding boxes around instances of wheat heads in each image, and the distribution shift is over images taken in different locations.

Model performance is measured by the proportion of the predicted bounding boxes that sufficiently overlap with the ground truth bounding boxes (IoU > 0.5). The example script implements a FasterRCNN baseline.

This dataset is adapted from the Global Wheat Head Dataset 2021, which was recently used in a public competition held in conjunction with the Computer Vision in Plant Phenotyping and Agriculture Workshop at ICCV 2021.

New benchmark dataset: RxRx1-WILDS v1.0

The RxRx1 dataset comprises images of genetically-perturbed cells taken with fluorescent microscopy and collected across 51 experimental batches. The task is to classify the identity of the genetic perturbation applied to each cell, and the distribution shift is over different experimental batches.

Model performance is measured by average classification accuracy. The example script implements a ResNet-50 baseline.

This dataset is adapted from the RxRx1 dataset released by Recursion.

Additional dataset: ENCODE

The ENCODE dataset is based on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge. The task is to classify if a given genomic location will be bound by a particular transcription factor, and the distribution shift is over different cell types.

We did not include this dataset in the official benchmark as we were unable to learn a model that could generalize across all the cell types simultaneously, even in an in-distribution setting, which suggested that the model family and/or feature set might not be rich enough.

Other changes

Saving and evaluating predictions

To ease evaluation and leaderboard submission, we have made the following changes:

Predictions are now automatically saved in the format described in our submission guidelines.

We have added an evaluation script that evaluates these saved predictions across multiple replicates and datasets. See the updated README and examples/evaluate.py for more details.

Code changes to support detection tasks

To support detection tasks, we have modified the example scripts as well as made slight changes to the WILDS data loaders. All interfaces should be backwards-compatible.

The labels y and the model outputs no longer need to be a Tensor. For example, for detection tasks, a model might return a dictionary containing bounding box coordinates as well as class predictions for each bounding box. Accordingly, several helper functions have been rewritten to be more flexible.

Models can now optionally take in y in the forward call. For example, during training, a model might use ground truth bounding boxes to train a bounding box classifier.

Data transforms can now transform both x and y. We have also merged train_transform and eval_transform functions into a single function that takes a is_training parameter.

Miscellaneous changes

We have changed the names of the in-distribution split_scheme's to match the terminology in Section 5 of the updated paper.

The FMoW-WILDS and PovertyMap-WILDS constructors now no longer use the oracle_training_set parameter to use an in-distribution split. This is now controlled through split_scheme to be consistent with the other datasets.

We fixed a minor bug in the PovertyMap-WILDS in-distribution baseline. The Val (ID) and Test (ID) splits are slightly changed.

The FMoW-WILDS constructor now sets use_ood_val=True by default. This change has no effect for users using the example scripts, as use_ood_val is already set in config/datasets.py.

Users who are only using the data loaders and not the evaluation metrics or example scripts will no longer need to install torch_scatter (thanks Ke Alexander Wang).

The Waterbirds dataset now computes the adjusted average accuracy on the validation and test sets, as described in Appendix C.1 of the corresponding paper.

The behavior of algorithm.eval() is now consistent with algorithm.model.eval() in that both preserve the grad_fn attribute (thanks Divya Shanmugam). See https://github.com/p-lambda/wilds/issues/45.

The dataset name for OGB-MolPCBA has been changed from ogbg-molpcba to to ogb-molpcba for consistency.

We have updated the OGB-MolPCBA data loader to be compatible with v1.7 of the pytorch_geometric dependency (thanks arnaudvl). See https://github.com/p-lambda/wilds/issues/52.

Source code(tar.gz)
Source code(zip)
v1.1.0(Mar 10, 2021)
The v1.1.0 release contains a new Py150 benchmark dataset for code completion, as well as updates to several existing datasets and default models to make them significantly faster and easier to use.

Some of these changes are breaking changes that will impact users who are currently running experiments with WILDS. We sincerely apologize for the inconvenience. We ask all users to update their package to v1.1.0, which will automatically update your datasets. In addition, please update your default models, for example by using the latest example scripts in this repo. These changes were primarily made to accelerate model training, which was a bottleneck for many users; at this time, we do not expect to have to make further changes to the existing datasets or default models.

New datasets

New benchmark dataset: Py150

The Py150-WILDS dataset is a code completion dataset, where the distribution shift is over code from different Github repositories.

We focus on accuracy on the subpopulation of class and method tokens, as prior work has shown that those are the most frequent queries in real-world code completion settings.

It is a variant of the Py150 dataset from Raychev et al., 2016.

See our paper for more details.

Additional dataset: SQF

The SQF dataset is based on the stop-question-and-frisk dataset released by the New York Police Department. We adapt the version processed by Goel et al., 2016. The task is to predict criminal possession of a weapon.

We use this dataset to study distribution shifts in an algorithmic fairness context. Specifically, we consider subpopulation shifts across locations and race groups. However, while there are large performance gaps, we did not find that they were caused by the distribution shift. We therefore did not include this dataset as part of the official benchmark.

Major updates to existing datasets

Note that datasets are versioned separately from the main WILDS version. We have two major updates (i.e., breaking, non-backwards-compatible changes) to datasets.

Amazon v1.0 -> v2.0

To speed up model training, we have subsampled the number of reviewers in this dataset to 25% of its original size, while keeping the same number of reviews per reviewer.

iWildCam v1.0 -> v2.0

Previously, the ID split was done uniformly at random, meaning that images from the same sequence (i.e., taken within a few seconds of each other by the same camera) could be found across all of the training / validation (ID) / test (ID) sets.

In v2.0, we have redone the ID split so that all images taken on the same day by the same camera are in only one of the training, validation (ID), or test (ID) sets. In other words, these sets still comprise images from the same cameras, but taken on different days.

In line with the new iWildCam 2021 challenge on Kaggle, we have also removed the following images:

images that include humans or pictures taken indoors.

images with non-animal categories such as start and unidentifiable.

images in categories such as unknown, unknown raptor and unknown rat.

We added back in location 537 that was previously removed as we mistakenly believed those images were corrupted.

We have re-split the data into training, validation (ID), test (ID), validation (OOD), and test (OOD) sets. This is a different random split from v1.0.

Since we remove any classes that do not end up in the train split, removing those images and redoing the split gave us a different set of species. There are now 182 classes instead of 186. Specifically, the following classes have been removed: ['unknown', 'macaca fascicularis', 'proechimys sp', 'unidentifiable', 'turtur calcospilos', 'streptopilia senegalensis', 'equus africanus', 'macaca nemestrina', 'start', 'paleosuchus sp', 'unknown raptor', 'unknown rat', 'misfire', 'mustela lutreolina', 'canis latrans', 'myoprocta pratti', 'xerus rutilus', 'end', 'psophia crepitans', 'ictonyx striatus']. The following classes have been added: [‘praomys tullbergi', 'polyplectron chalcurum', 'ardeotis kori', 'phaetornis sp', 'mus minutoides', 'raphicerus campestris', 'tigrisoma mexicanum', 'leptailurus serval', 'malacomys longipes', 'oenomys hypoxanthus', 'turdus olivaceus', 'macaca sp', 'leiothrix argentauris', 'lophura sp', 'mazama temama', 'hippopotamus amphibius']. For convenience, we have also added a categories.csv that maps from label IDs to species names.

To speed up downloading and model training (by reducing the I/O bottleneck), we have also resized all images to have a height of 448px while keeping the original aspect ratio. All images are wide (so they now have a min dimension of 448px). Note that as JPEG compression is lossy, this procedure gives different images from resizing the full-sized image in the code after loading it.

Minor updates to existing datasets

We made two backwards-compatible changes to existing datasets. We encourage all users to update these datasets; these updates should leave results unchanged (modulo training randomness). In future versions of the WILDS package, we will deprecate the older versions of these datasets.

FMoW v1.0 -> v1.1

Previously, the images were stored as chunks in .npy files and read in using NumPy memmapping.

Now, we have converted them (losslessly) into individual PNG images. This should help with disk I/O and memory usage, and make them more convenient to visualize and use in other pipelines.

PovertyMap v1.0 -> v1.1

Previously, the images were stored in a single .npy file and read in using NumPy memmapping.

Now, we have converted them (loselessly) into individual compressed .npz files. This should help with disk I/O and memory usage.

We have correspondingly updated the default number of workers for the data loader from 1 to 4.

Default model updates

We have updated the default models for several datasets. Please take note of these changes if you are currently running experiments with these datasets.

Amazon and CivilComments

To speed up model training, we have switched from BERT-base-uncased to DistilBERT-base-uncased. This obtains roughly similar accuracy but at twice the speed.

For CivilComments, we have also increased the number of replicates from 3 to 5, to reduce variability in the reported performance.

Camelyon17

Previously, we were upsizing each image to 224x224 before passing it into the model.

We now leave the images at their original resolution of 96x96, which significantly speeds up model training.

iWildCam

Previously, we were resizing each image to 224x224 before passing it into the model. However, this limited model accuracy, as the animals in the images can sometimes be quite small.

We now resize each image to 448x448 before passing it into the model, which improves accuracy and macro F1 across the board.

FMoW

For consistency with the other datasets, we have changed the early stopping validation criterion (val_metric) from acc_avg to acc_worst_region.

PovertyMap

For consistency with the other datasets, we have changed the early stopping validation criterion (val_metric) from r_all to r_wg.

Other changes

We have uploaded an executable version of our paper to CodaLab. This contains the exact commands, code, and data used for each experiment reported in our paper. The trained model weights for every experiment can also be found there.

To ease downloading, we have added wilds/download_datasets.py, which allows users to download all (or a subset of) datasets at once. Please see the README for instructions.

We have added a convenience function for getting the appropriate constructor for each dataset in wilds/get_dataset.py. This function allows you to specify a version argument. If this is not specified, it defaults to the latest available version for that dataset. If that version is not downloaded and the download argument is also set, then it will automatically download that version.

The example script examples/run_expt.py now also takes in a version argument.

We have added download sizes and expected training times to the README.

We have updated the default inputs for WILDSDatasets.eval methods for various datasets. For example, eval for most classification datasets now take in predicted labels by default, while the predictions were previously passed in as logits. The default inputs vary across datasets, and we document this in the docstring of each eval method.

We made a few updates to the code in examples/ to interface better with language modeling tasks (for Py150). None of these changes affect the results or the interface with algorithms.

We updated the code in examples/ to save model predictions in an appropriate format for submissions to the leaderboard.

Finally, we have also updated our paper to streamline the writing and include these new numbers and datasets.

Source code(tar.gz)
Source code(zip)

Owner

P-Lambda

GitHub Repository https://wilds.stanford.edu

Evolution Strategies in PyTorch

Evolution Strategies This is a PyTorch implementation of Evolution Strategies. Requirements Python 3.5, PyTorch = 0.2.0, numpy, gym, universe, cv2 Wh

333 Nov 14, 2022

Code and models for "Rethinking Deep Image Prior for Denoising" (ICCV 2021)

DIP-denosing This is a code repo for Rethinking Deep Image Prior for Denoising (ICCV 2021). Addressing the relationship between Deep image prior and e

36 Dec 29, 2022

The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Box-Aware Tracker (BAT) Pytorch-Lightning implementation of the Box-Aware Tracker. Box-Aware Feature Enhancement for Single Object Tracking on Point C

5 Mar 26, 2022

This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

An-Introduction-to-Statistical-Learning This repository contains the exercises and its solution contained in the book An Introduction to Statistical L

2.1k Jan 02, 2023

Fashion Landmark Estimation with HRNet

HRNet for Fashion Landmark Estimation (Modified from deep-high-resolution-net.pytorch) Introduction This code applies the HRNet (Deep High-Resolution

91 Dec 26, 2022

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers Official implementation of ViewFormer. ViewFormer is a NeRF-free neural rend

169 Dec 30, 2022

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences This repository is an official PyTorch implementation of Neighbor

8 Jun 12, 2022

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

Readme File for "Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis" by Ham, Imai, and Janson. (2022) All scripts were written and

0 Jan 27, 2022

Code for Learning to Segment The Tail (LST)

Learning to Segment the Tail [arXiv] In this repository, we release code for Learning to Segment The Tail (LST). The code is directly modified from th

47 Nov 07, 2022

Music library streaming app written in Flask & VueJS

djtaytay This is a little toy app made to explore Vue, brush up on my Python, and make a remote music collection accessable through a web interface. I

6 May 27, 2022

Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021]

Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021] This repository is the official implementation of Moiré Attack (MA): A New Pot

22 Dec 24, 2022

Functional deep learning

Pipeline abstractions for deep learning. Full documentation here: https://lf1-io.github.io/padl/ PADL: is a pipeline builder for PyTorch. may be used

101 Nov 09, 2022

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

deep_autoviml Build keras pipelines and models in a single line of code! Table of Contents Motivation How it works Technology Install Usage API Image

102 Dec 17, 2022

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Related tags

Overview

Overview

Installation

Requirements

Default models

Usage

Default models

Data loading

Domain information

Evaluators

Citing WILDS

Acknowledgements

Comments

Hello !

Releases(v2.0.0)

v2.0.0(Dec 13, 2021)

New datasets with unlabeled data

New algorithms for using unlabeled data

Other changes

GlobalWheat v1.0 -> v1.1

DomainNet support

Data augmentation

Hyperparameters

Miscellaneous

v1.2.2(Aug 4, 2021)

v1.2.1(Jul 19, 2021)

New datasets

New benchmark dataset: GlobalWheat-WILDS v1.0

New benchmark dataset: RxRx1-WILDS v1.0

Additional dataset: ENCODE

Other changes

Saving and evaluating predictions

Code changes to support detection tasks

Miscellaneous changes

v1.1.0(Mar 10, 2021)

New datasets

New benchmark dataset: Py150

Additional dataset: SQF

Major updates to existing datasets

Amazon v1.0 -> v2.0

iWildCam v1.0 -> v2.0

Minor updates to existing datasets

FMoW v1.0 -> v1.1

PovertyMap v1.0 -> v1.1

Default model updates

Amazon and CivilComments

Camelyon17

iWildCam

FMoW

PovertyMap

Other changes

Owner

P-Lambda

Evolution Strategies in PyTorch

Code and models for "Rethinking Deep Image Prior for Denoising" (ICCV 2021)

The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

Fashion Landmark Estimation with HRNet

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors to Sequences

Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

Code for Learning to Segment The Tail (LST)

Music library streaming app written in Flask & VueJS

Moiré Attack (MA): A New Potential Risk of Screen Photos [NeurIPS 2021]

Functional deep learning

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

Code to accompany our paper "Continual Learning Through Synaptic Intelligence" ICML 2017

End-to-end machine learning project for rices detection

PyTorch code accompanying our paper on Maximum Entropy Generators for Energy-Based Models

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Ejemplo Algoritmo Viterbi - Example of a Viterbi algorithm applied to a hidden Markov model on DNA sequence

BlueFog Tutorials