A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

Related tags

Deep Learningwilds
Overview


PyPI License

Overview

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

The WILDS package contains:

  1. Data loaders that automatically handle data downloading, processing, and splitting, and
  2. Dataset evaluators that standardize model evaluation for each dataset.

In addition, the example scripts contain default models, allowing new algorithms to be easily added and run on all of the WILDS datasets.

For more information, please read our paper or visit our website. For questions and feedback, please post on the discussion board.

Installation

We recommend using pip to install WILDS:

pip install wilds

If you have already installed it, please check that you have the latest version:

python -c "import wilds; print(wilds.__version__)"
# This should print "1.0.0". If it doesn't, update by running:
pip install -U wilds

If you plan to edit or contribute to WILDS, you should install from source:

git clone [email protected]:p-lambda/wilds.git
cd wilds
pip install -e .

Requirements

  • numpy>=1.19.1
  • pandas>=1.1.0
  • pillow>=7.2.0
  • torch>=1.7.0
  • tqdm>=4.53.0
  • pytz>=2020.4
  • outdated>=0.2.0
  • ogb>=1.2.3
  • torch-scatter>=2.0.5
  • torch-geometric>=1.6.1

Running pip install wilds will check for all of these requirements except for the torch-scatter and torch-geometric packages, which require a quick manual install.

Default models

After installing the WILDS package, you can use the scripts in examples/ to train default models on the WILDS datasets. These scripts are not part of the installed WILDS package. To use them, you should clone the repo (assuming you did not install from source):

git clone [email protected]:p-lambda/wilds.git

To run these scripts, you will need to install these additional dependencies:

  • torchvision>=0.8.1
  • transformers>=3.5.0

All baseline experiments in the paper were run on Python 3.8.5 and CUDA 10.1.

Usage

Default models

In the examples/ folder, we provide a set of scripts that we used to train models on the WILDS package. These scripts are configured with the default models and hyperparameters that we used for all of the baselines described in our paper. All baseline results in the paper can be easily replicated with commands like:

cd examples
python run_expt.py --dataset iwildcam --algorithm ERM --root_dir data
python run_expt.py --dataset civilcomments --algorithm groupDRO --root_dir data

The scripts are set up to facilitate general-purpose algorithm development: new algorithms can be added to examples/algorithms and then run on all of the WILDS datasets using the default models.

The first time you run these scripts, you might need to download the datasets. You can do so with the --download argument, for example:

python run_expt.py --dataset civilcomments --algorithm groupDRO --root_dir data --download

Data loading

The WILDS package provides a simple, standardized interface for all datasets in the benchmark. This short Python snippet covers all of the steps of getting started with a WILDS dataset, including dataset download and initialization, accessing various splits, and preparing a user-customizable data loader.

>>> from wilds.datasets.iwildcam_dataset import IWildCamDataset
>>> from wilds.common.data_loaders import get_train_loader
>>> import torchvision.transforms as transforms

# Load the full dataset, and download it if necessary
>>> dataset = IWildCamDataset(download=True)

# Get the training set
>>> train_data = dataset.get_subset('train',
...                                 transform=transforms.Compose([transforms.Resize((224,224)),
...                                                               transforms.ToTensor()]))

# Prepare the standard data loader
>>> train_loader = get_train_loader('standard', train_data, batch_size=16)

# Train loop
>>> for x, y_true, metadata in train_loader:
...   ...

The metadata contains information like the domain identity, e.g., which camera a photo was taken from, or which hospital the patient's data came from, etc.

Domain information

To allow algorithms to leverage domain annotations as well as other groupings over the available metadata, the WILDS package provides Grouper objects. These Grouper objects extract group annotations from metadata, allowing users to specify the grouping scheme in a flexible fashion.

>>> from wilds.common.grouper import CombinatorialGrouper

# Initialize grouper, which extracts domain information
# In this example, we form domains based on location
>>> grouper = CombinatorialGrouper(dataset, ['location'])

# Train loop
>>> for x, y_true, metadata in train_loader:
...   z = grouper.metadata_to_group(metadata)
...   ...

The Grouper can be used to prepare a group-aware data loader that, for each minibatch, first samples a specified number of groups, then samples examples from those groups. This allows our data loaders to accommodate a wide array of training algorithms, some of which require specific data loading schemes.

# Prepare a group data loader that samples from user-specified groups
>>> train_loader = get_train_loader('group', train_data,
...                                 grouper=grouper,
...                                 n_groups_per_batch=2,
...                                 batch_size=16)

Evaluators

The WILDS package standardizes and automates evaluation for each dataset. Invoking the eval method of each dataset yields all metrics reported in the paper and on the leaderboard.

>>> from wilds.common.data_loaders import get_eval_loader

# Get the test set
>>> test_data = dataset.get_subset('test',
...                                 transform=transforms.Compose([transforms.Resize((224,224)),
...                                                               transforms.ToTensor()]))

# Prepare the data loader
>>> test_loader = get_eval_loader('standard', test_data, batch_size=16)

# Get predictions for the full test set
>>> for x, y_true, metadata in test_loader:
...   y_pred = model(x)
...   [accumulate y_true, y_pred, metadata]

# Evaluate
>>> dataset.eval(all_y_pred, all_y_true, all_metadata)
{'recall_macro_all': 0.66, ...}

Citing WILDS

If you use WILDS datasets in your work, please cite our paper (Bibtex):

  • WILDS: A Benchmark of in-the-Wild Distribution Shifts (2020). Pang Wei Koh*, Shiori Sagawa*, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang.

Please also cite the original papers that introduce the datasets, as listed on the datasets page.

Acknowledgements

The design of the WILDS benchmark was inspired by the Open Graph Benchmark, and we are grateful to the Open Graph Benchmark team for their advice and help in setting up WILDS.

Comments
  • Installating via pip seems to miss `torch_scatter` dependency

    Installating via pip seems to miss `torch_scatter` dependency

    Hey,

    I noticed that the installation via pip install wilds seems to miss the torch_scatter dependency that is also listed in the README. When e.g. trying to do from wilds.datasets.amazon_dataset import AmazonDataset I got

    from wilds.datasets.amazon_dataset import AmazonDataset
      File "/Users/deul/Desktop/wilds/wilds/datasets/amazon_dataset.py", line 6, in <module>
        from wilds.common.utils import map_to_id_array
      File "/Users/deul/Desktop/wilds/wilds/common/utils.py", line 1, in <module>
        import torch, torch_scatter
    ModuleNotFoundError: No module named 'torch_scatter'
    

    As far as I can see, the solution should be as easy as adding torch_scatter>=2.0.5 to the install_requires attribute in setup.py. In my case, the error was resolved after installing torch_scatter separately.

    opened by Kaleidophon 6
  • Data loader for PovertyMap is very slow

    Data loader for PovertyMap is very slow

    Hi -

    Ran into a bit of an issue with data loading the Povertymap dataset - loading a single minibatch with 128 examples takes about 5-6 seconds. This is not a huge deal but slow enough to make me curious if there's a faster way of doing this.

    Digging into the code a bit, it looks like the slowdown is mostly due to the array copy on line 239 of poverty_dataset.py https://github.com/p-lambda/wilds/blob/f984047af654eed6be51a7f770804a1c1b1ad0a0/wilds/datasets/poverty_dataset.py#L239 FWIW it looks like this is a known issue for memory-mapped numpy arrays on Linux systems (https://stackoverflow.com/questions/42864320/numpy-memmap-performance-issues).

    I'm not sure if there are any recommendations for getting around this, or if there's another way the data could be loaded in? Or let me know if I'm totally off-base here. Thanks!

    opened by dmadras 6
  • `assert` error in new wilds version with FMoW

    `assert` error in new wilds version with FMoW

    Hello, I am using the new version of WILDS and getting the error:

    ... wilds/common/utils.py" line 86, in avg_over_groups
        assert v.numel()==g.numel()
    

    any ideas? It may be a bug on my end and if I catch it I'll update here.

    opened by mitchellnw 4
  • Unable to Train ERM model with civilcomments

    Unable to Train ERM model with civilcomments

    Hi,

    I am having trouble in running the code with command python3 wilds/examples/run_expt.py --dataset civilcomments --algorithm ERM --root_dir data --download Everything stuck, no error reported, both GPU and CPU are not leveraged.

    If ctrl+C, it shows image

    The same thing didn't happen when I tried to run the same script but with groupDRO.

    It would be very helpful if you have any clue on this, and thank you a lot for your amazing, well developed code!

    opened by Bluepossibility 3
  • Issue in OOD data distribution when Grouper is set to

    Issue in OOD data distribution when Grouper is set to "regions" for FMoW

    Hi,

    I am trying to change the groupby from "year" to "region". I have followed the instructions in the README page and currently using the following command: python3 wilds/examples/run_expt.py --dataset fmow --algorithm ERM --groupby_fields region --root_dir wilds_fmow/

    However, the issue is that the training dataset is not being separated in terms of distinct regions for ID and OOD manner. That is, all regions are included in ID as well as OOD. Here is a screenshot of the output: Screenshot 2022-11-09 at 15 37 47

    Therefore, I was wondering if that is a bug in the code or am I missing something?

    Thanks Sara A. Al-Emadi

    opened by saraalemadi 3
  • Minor issue: `pip install wilds` changes pytorch version

    Minor issue: `pip install wilds` changes pytorch version

    A really minor issue, but the pip install wilds changed my pytorch version which then caused some prior evals on non-wilds datasets to change slightly. Is it possible for this to not occur? No worries if not.

    opened by mitchellnw 3
  • Understanding the prediction_dir format for leaderboard submission

    Understanding the prediction_dir format for leaderboard submission

    I wonder if the log folder used during training is the prediction_dir described in Get Started: Evaluating trained models.

    I tried to reproduce the ERM result on a subset of camelyon with the following command:

    python examples/run_expt.py --dataset camelyon17 --algorithm ERM--root_dir data --frac 0.1 --log_dir log_erm_01.

    Training goes well.

    But my file camelyon17_split:id_val_seed:0_epoch is empty.

    Then I ran the following command: python examples/evaluate.py log_erm_01 erm_01_output --root-dir data --dataset camelyon17

    And I got this:

    Traceback (most recent call last):
      File "examples/evaluate.py", line 282, in <module>
        main()
      File "examples/evaluate.py", line 244, in main
        evaluate_benchmark(
      File "examples/evaluate.py", line 136, in evaluate_benchmark
        predictions_file = get_prediction_file(
      File "examples/evaluate.py", line 89, in get_prediction_file
        raise FileNotFoundError(
    FileNotFoundError: Could not find CSV or pth prediction file that starts with camelyon17_split:id_val_seed:0.
    

    So my question is whether the log file is the prediction_dir described in Get Started ?

    opened by jmamath 3
  • How do I access data from only one group?

    How do I access data from only one group?

    Hello, Thanks for the fantastic library!

    I have two questions:

    1. Is there any way I can get a per-group dataloader in wilds? This will help with, for instance, training a separate model for each group of data.
    2. Can I change the split of data for each dataset? My application requires 50% of the data for each group/domain for testing.

    Thanks!

    opened by krishnap25 3
  • Model loaded from a .pth predicts only zeros

    Model loaded from a .pth predicts only zeros

    Hello !

    I downloaded for the Camelyon17 dataset your trained model from CodaLab (ERM and seed0). I have installed all packages correctly according to your readme and load the model as follows:

    path = "/best_model.pth"
    state = torch.load(path)['algorithm']
    
    state_dict = {}
     
    for key in list(state.keys()):
        state_dict[key.replace('model.', '')] = state[key]
    
    model.load_state_dict(state_dict)
    
    model.eval()
    

    I initialize the dataset I use for testing the model as follows:

    import datasets_load  # from wilds package
    dataset = datasets_load.Dataset('camelyon17', 32, '/data', 0.75, False)
    

    For the prediction I used the following piece of code:

    from wilds.common.data_loaders import get_eval_loader
    
    test_data = dataset.test_set
    test_loader = get_eval_loader('standard', test_data, batch_size=32)
    
    with torch.no_grad():
        for x, y_true, metadata in test_loader:
              y_pred = model(x)
              labels = y_true
              _, predicted = torch.max(y_pred, 1)
              # print statements to check the output
              print("Labels: ", labels)
              print("Predicted: ", predicted)
              print("Correct: ", (predicted == labels).sum().item())
    
    

    So far so good. When I run the code, the labels are printed (which always consist of 1 at the beginning, because shuffle=False) and the prediction which always consists of 0 values.

    Labels:  tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
    Predicted:  tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    Correct:  0
    

    I would appreciate any advice or assistance. Many thanks in advance. Tim

    opened by tim1188 3
  • Cannot fetch 'ogb-molpcba' dataset due to missing arg

    Cannot fetch 'ogb-molpcba' dataset due to missing arg

    dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')
    

    Results in the following error:

    --------------------------------------------------------------------
    TypeError                          Traceback (most recent call last)
    <ipython-input-2-c369817b9157> in <module>
    ----> 1 dataset = get_dataset(dataset='ogb-molpcba', download=True, root_dir='../data/')
    
    ~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/get_dataset.py in get_dataset(dataset, version, **dataset_kwargs)
         51     elif dataset == 'ogb-molpcba':
         52         from wilds.datasets.ogbmolpcba_dataset import OGBPCBADataset
    ---> 53         return OGBPCBADataset(version=version, **dataset_kwargs)
         54 
         55     elif dataset == 'poverty':
    
    ~/anaconda3/envs/benchmark/lib/python3.7/site-packages/wilds/datasets/ogbmolpcba_dataset.py in __init__(self, version, root_dir, download, split_scheme)
         88             download_url('https://snap.stanford.edu/ogb/data/misc/ogbg_molpcba/scaffold_group.npy', os.path.join(self.ogb_dataset.root, 'raw'))
         89         self._metadata_array = torch.from_numpy(np.load(metadata_file_path)).reshape(-1,1).long()
    ---> 90         self._collate = PyGCollater(follow_batch=[])
         91 
         92         self._metric = Evaluator('ogbg-molpcba')
    
    TypeError: __init__() missing 1 required positional argument: 'exclude_keys'
    

    Versions:

    wilds 1.1.0 torch_geometric 1.7.0

    opened by arnaudvl 3
  • Could you provide the trained weights?

    Could you provide the trained weights?

    Hello,

    I am training BERT+ERM on the Amazon dataset but it is very time cost. Is it possible to provide the best trained parameters to the users? ( like BERT is proving the pretrained weights, maybe you can have another folder under examples which contains all the weights for users.) It will save users about a week ( and computations). Thank you!

    opened by yachuan 3
  • camelyon17 split scheme: in-dist

    camelyon17 split scheme: in-dist

    I am not able to run Camelyon17 with --split_scheme in-dist (I'm assuming this corresponds to the setting with ID val data).

    Any pointers on how to run this, or in general how to run camelyon with the ID val data?

    Thank you for the help!

    opened by thomaspzollo 1
  • run_expt.py: --device argument doesn't set the device

    run_expt.py: --device argument doesn't set the device

    Hey, I'm running Wilds on a p2.8xlarge AWS EC2 instance with 8 K80 GPUs. I noticed that when I try to run run_expt.py and use the --device argument to divide the jobs I'm trying to run between the GPUs, they all end up running on GPU 0. I verified this by the memory usage in nvidia-smi as well as printing the device used by torch using torch.cuda.current_device(). My guess is that the CUDA_VISIBLE_DEVICES environment variable, set here, is set too late and PyTorch just defaults to device 0.

    I've worked around this by setting the CUDA_VISIBLE_DEVICES variable manually, before running the script. I just thought I'd let you know I encountered this issue.

    Really appreciate the project by the way! Being able to access multiple datasets for domain generalization with the same interface is really useful, and I managed to use run_expt pretty easily to run my own experiments.

    opened by vvolhejn 0
Releases(v2.0.0)
  • v2.0.0(Dec 13, 2021)

    The v2.0.0 release adds unlabeled data to 8 datasets and several new algorithms for taking advantage of the unlabeled data. It also updates the standard data augmentations used for the image datasets.

    All labeled (training, validation, and test) datasets are exactly the same. Evaluation metrics are also exactly the same. All results on the datasets in v1.x are therefore still current and directly comparable to results obtained on v2.

    For more information, please read our paper on the unlabeled data.

    New datasets with unlabeled data

    We have added unlabeled data to the following datasets:

    1. iwildcam
    2. camelyon17
    3. ogb-molpcba
    4. globalwheat
    5. civilcomments
    6. fmow
    7. poverty
    8. amazon

    The following datasets have no unlabeled data and have not been changed:

    1. rxrx1
    2. py150

    The labeled training, validation, and test data in all datasets have been kept exactly the same.

    The unlabeled data comes from the same underlying sources as the original labeled data, and they can be from the source, validation, extra, or target domains. We describe each dataset in detail in our paper.

    Each unlabeled dataset has its own corresponding data loader defined in wilds/datasets/unlabeled. Please see the README for more details on how to use them.

    New algorithms for using unlabeled data

    In our scripts in the examples folder, we have updated and/or added new algorithms that make use of the unlabeled data:

    1. CORAL (Sun and Saenko, 2016)
    2. DANN (Ganin et al., 2016)
    3. AFN (Xu et al., 2019)
    4. Pseudo-Label (Lee, 2013)
    5. FixMatch (Sohn et al., 2020)
    6. Noisy Student (Xie et al., 2020)
    7. SwAV pre-training (Caron et al., 2020)
    8. Masked language model pre-training (Devlin et al., 2019)

    Other changes

    GlobalWheat v1.0 -> v1.1

    We have corrected some errors in the metadata for the previous version of the GlobalWheat (labeled) dataset. Users who did not explicitly make use of the location or stage metadata (which should be most users) will not be affected. All baseline results are unchanged.

    DomainNet support

    We have included data loaders for the DomainNet dataset (Peng at al., 2019) as a means of benchmarking the algorithms we implemented on existing datasets.

    Data augmentation

    We have added support for RandAugment (Cubuk et al., 2019) for RGB images, and we have also implemented a set of data augmentations for the multi-spectral Poverty dataset. These augmentations are used in all of the algorithms for unlabeled data listed above.

    Hyperparameters

    In our experiments to benchmark the algorithms for using unlabeled data, we tuned hyperparameters by random search instead of grid search. The default hyperparameters in examples/configs/datasets.py still work well but do not reflect the exact hyperparameters we used for our experiments. To see those, please view our CodaLab worksheet.

    Miscellaneous

    • In our example scripts, we have added support for gradient accumulation by specifying the gradient_accumulation_steps parameter.
    • We have also added support for logging using Weights and Biases.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Aug 4, 2021)

    v1.2.2 contains several minor changes:

    • Added a check to make sure that a group data loader is used whenever n_groups_per_batch or distinct_groups are passed in as arguments to examples/run_expt.py. (https://github.com/p-lambda/wilds/issues/79)
    • Data augmentations now only transform x by default. Set do_transform_y when initializing the WILDSSubset to modify both x and y. (https://github.com/p-lambda/wilds/issues/77)
    • For FasterRCNN, we now use the PyTorch implementation of smooth_l1_loss instead of the custom torchvision implementation, which was removed in torchvision v0.10.
    • Updated the requirements to include torchvision, scipy, and scikit-learn. Previously, torchvision was only needed for the example scripts. However, it is now also used for computing metrics in the GlobalWheat-WILDS dataset, so we have moved it into the core set of requirements.
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(Jul 19, 2021)

    v1.2.1 adds two new benchmark datasets: the GlobalWheat wheat head detection dataset, and the RxRx1 cellular microscopy dataset. Please see our paper for more details on these datasets.

    It also simplifies saving and evaluation predictions made across different replicates and datasets.

    New datasets

    New benchmark dataset: GlobalWheat-WILDS v1.0

    • The Global Wheat Head detection dataset comprises images of wheat fields collected from 12 countries around the world. The task is to draw bounding boxes around instances of wheat heads in each image, and the distribution shift is over images taken in different locations.
    • Model performance is measured by the proportion of the predicted bounding boxes that sufficiently overlap with the ground truth bounding boxes (IoU > 0.5). The example script implements a FasterRCNN baseline.
    • This dataset is adapted from the Global Wheat Head Dataset 2021, which was recently used in a public competition held in conjunction with the Computer Vision in Plant Phenotyping and Agriculture Workshop at ICCV 2021.

    New benchmark dataset: RxRx1-WILDS v1.0

    • The RxRx1 dataset comprises images of genetically-perturbed cells taken with fluorescent microscopy and collected across 51 experimental batches. The task is to classify the identity of the genetic perturbation applied to each cell, and the distribution shift is over different experimental batches.
    • Model performance is measured by average classification accuracy. The example script implements a ResNet-50 baseline.
    • This dataset is adapted from the RxRx1 dataset released by Recursion.

    Additional dataset: ENCODE

    • The ENCODE dataset is based on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge. The task is to classify if a given genomic location will be bound by a particular transcription factor, and the distribution shift is over different cell types.
    • We did not include this dataset in the official benchmark as we were unable to learn a model that could generalize across all the cell types simultaneously, even in an in-distribution setting, which suggested that the model family and/or feature set might not be rich enough.

    Other changes

    Saving and evaluating predictions

    To ease evaluation and leaderboard submission, we have made the following changes:

    • Predictions are now automatically saved in the format described in our submission guidelines.
    • We have added an evaluation script that evaluates these saved predictions across multiple replicates and datasets. See the updated README and examples/evaluate.py for more details.

    Code changes to support detection tasks

    To support detection tasks, we have modified the example scripts as well as made slight changes to the WILDS data loaders. All interfaces should be backwards-compatible.

    • The labels y and the model outputs no longer need to be a Tensor. For example, for detection tasks, a model might return a dictionary containing bounding box coordinates as well as class predictions for each bounding box. Accordingly, several helper functions have been rewritten to be more flexible.
    • Models can now optionally take in y in the forward call. For example, during training, a model might use ground truth bounding boxes to train a bounding box classifier.
    • Data transforms can now transform both x and y. We have also merged train_transform and eval_transform functions into a single function that takes a is_training parameter.

    Miscellaneous changes

    • We have changed the names of the in-distribution split_scheme's to match the terminology in Section 5 of the updated paper.
    • The FMoW-WILDS and PovertyMap-WILDS constructors now no longer use the oracle_training_set parameter to use an in-distribution split. This is now controlled through split_scheme to be consistent with the other datasets.
    • We fixed a minor bug in the PovertyMap-WILDS in-distribution baseline. The Val (ID) and Test (ID) splits are slightly changed.
    • The FMoW-WILDS constructor now sets use_ood_val=True by default. This change has no effect for users using the example scripts, as use_ood_val is already set in config/datasets.py.
    • Users who are only using the data loaders and not the evaluation metrics or example scripts will no longer need to install torch_scatter (thanks Ke Alexander Wang).
    • The Waterbirds dataset now computes the adjusted average accuracy on the validation and test sets, as described in Appendix C.1 of the corresponding paper.
    • The behavior of algorithm.eval() is now consistent with algorithm.model.eval() in that both preserve the grad_fn attribute (thanks Divya Shanmugam). See https://github.com/p-lambda/wilds/issues/45.
    • The dataset name for OGB-MolPCBA has been changed from ogbg-molpcba to to ogb-molpcba for consistency.
    • We have updated the OGB-MolPCBA data loader to be compatible with v1.7 of the pytorch_geometric dependency (thanks arnaudvl). See https://github.com/p-lambda/wilds/issues/52.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Mar 10, 2021)

    The v1.1.0 release contains a new Py150 benchmark dataset for code completion, as well as updates to several existing datasets and default models to make them significantly faster and easier to use.

    Some of these changes are breaking changes that will impact users who are currently running experiments with WILDS. We sincerely apologize for the inconvenience. We ask all users to update their package to v1.1.0, which will automatically update your datasets. In addition, please update your default models, for example by using the latest example scripts in this repo. These changes were primarily made to accelerate model training, which was a bottleneck for many users; at this time, we do not expect to have to make further changes to the existing datasets or default models.

    New datasets

    New benchmark dataset: Py150

    • The Py150-WILDS dataset is a code completion dataset, where the distribution shift is over code from different Github repositories.
    • We focus on accuracy on the subpopulation of class and method tokens, as prior work has shown that those are the most frequent queries in real-world code completion settings.
    • It is a variant of the Py150 dataset from Raychev et al., 2016.
    • See our paper for more details.

    Additional dataset: SQF

    • The SQF dataset is based on the stop-question-and-frisk dataset released by the New York Police Department. We adapt the version processed by Goel et al., 2016. The task is to predict criminal possession of a weapon.
    • We use this dataset to study distribution shifts in an algorithmic fairness context. Specifically, we consider subpopulation shifts across locations and race groups. However, while there are large performance gaps, we did not find that they were caused by the distribution shift. We therefore did not include this dataset as part of the official benchmark.

    Major updates to existing datasets

    Note that datasets are versioned separately from the main WILDS version. We have two major updates (i.e., breaking, non-backwards-compatible changes) to datasets.

    Amazon v1.0 -> v2.0

    • To speed up model training, we have subsampled the number of reviewers in this dataset to 25% of its original size, while keeping the same number of reviews per reviewer.

    iWildCam v1.0 -> v2.0

    • Previously, the ID split was done uniformly at random, meaning that images from the same sequence (i.e., taken within a few seconds of each other by the same camera) could be found across all of the training / validation (ID) / test (ID) sets.
    • In v2.0, we have redone the ID split so that all images taken on the same day by the same camera are in only one of the training, validation (ID), or test (ID) sets. In other words, these sets still comprise images from the same cameras, but taken on different days.
    • In line with the new iWildCam 2021 challenge on Kaggle, we have also removed the following images:
      • images that include humans or pictures taken indoors.
      • images with non-animal categories such as start and unidentifiable.
      • images in categories such as unknown, unknown raptor and unknown rat.
    • We added back in location 537 that was previously removed as we mistakenly believed those images were corrupted.
    • We have re-split the data into training, validation (ID), test (ID), validation (OOD), and test (OOD) sets. This is a different random split from v1.0.
    • Since we remove any classes that do not end up in the train split, removing those images and redoing the split gave us a different set of species. There are now 182 classes instead of 186. Specifically, the following classes have been removed: ['unknown', 'macaca fascicularis', 'proechimys sp', 'unidentifiable', 'turtur calcospilos', 'streptopilia senegalensis', 'equus africanus', 'macaca nemestrina', 'start', 'paleosuchus sp', 'unknown raptor', 'unknown rat', 'misfire', 'mustela lutreolina', 'canis latrans', 'myoprocta pratti', 'xerus rutilus', 'end', 'psophia crepitans', 'ictonyx striatus']. The following classes have been added: [‘praomys tullbergi', 'polyplectron chalcurum', 'ardeotis kori', 'phaetornis sp', 'mus minutoides', 'raphicerus campestris', 'tigrisoma mexicanum', 'leptailurus serval', 'malacomys longipes', 'oenomys hypoxanthus', 'turdus olivaceus', 'macaca sp', 'leiothrix argentauris', 'lophura sp', 'mazama temama', 'hippopotamus amphibius']. For convenience, we have also added a categories.csv that maps from label IDs to species names.
    • To speed up downloading and model training (by reducing the I/O bottleneck), we have also resized all images to have a height of 448px while keeping the original aspect ratio. All images are wide (so they now have a min dimension of 448px). Note that as JPEG compression is lossy, this procedure gives different images from resizing the full-sized image in the code after loading it.

    Minor updates to existing datasets

    We made two backwards-compatible changes to existing datasets. We encourage all users to update these datasets; these updates should leave results unchanged (modulo training randomness). In future versions of the WILDS package, we will deprecate the older versions of these datasets.

    FMoW v1.0 -> v1.1

    • Previously, the images were stored as chunks in .npy files and read in using NumPy memmapping.
    • Now, we have converted them (losslessly) into individual PNG images. This should help with disk I/O and memory usage, and make them more convenient to visualize and use in other pipelines.

    PovertyMap v1.0 -> v1.1

    • Previously, the images were stored in a single .npy file and read in using NumPy memmapping.
    • Now, we have converted them (loselessly) into individual compressed .npz files. This should help with disk I/O and memory usage.
    • We have correspondingly updated the default number of workers for the data loader from 1 to 4.

    Default model updates

    We have updated the default models for several datasets. Please take note of these changes if you are currently running experiments with these datasets.

    Amazon and CivilComments

    • To speed up model training, we have switched from BERT-base-uncased to DistilBERT-base-uncased. This obtains roughly similar accuracy but at twice the speed.
    • For CivilComments, we have also increased the number of replicates from 3 to 5, to reduce variability in the reported performance.

    Camelyon17

    • Previously, we were upsizing each image to 224x224 before passing it into the model.
    • We now leave the images at their original resolution of 96x96, which significantly speeds up model training.

    iWildCam

    • Previously, we were resizing each image to 224x224 before passing it into the model. However, this limited model accuracy, as the animals in the images can sometimes be quite small.
    • We now resize each image to 448x448 before passing it into the model, which improves accuracy and macro F1 across the board.

    FMoW

    • For consistency with the other datasets, we have changed the early stopping validation criterion (val_metric) from acc_avg to acc_worst_region.

    PovertyMap

    • For consistency with the other datasets, we have changed the early stopping validation criterion (val_metric) from r_all to r_wg.

    Other changes

    • We have uploaded an executable version of our paper to CodaLab. This contains the exact commands, code, and data used for each experiment reported in our paper. The trained model weights for every experiment can also be found there.
    • To ease downloading, we have added wilds/download_datasets.py, which allows users to download all (or a subset of) datasets at once. Please see the README for instructions.
    • We have added a convenience function for getting the appropriate constructor for each dataset in wilds/get_dataset.py. This function allows you to specify a version argument. If this is not specified, it defaults to the latest available version for that dataset. If that version is not downloaded and the download argument is also set, then it will automatically download that version.
    • The example script examples/run_expt.py now also takes in a version argument.
    • We have added download sizes and expected training times to the README.
    • We have updated the default inputs for WILDSDatasets.eval methods for various datasets. For example, eval for most classification datasets now take in predicted labels by default, while the predictions were previously passed in as logits. The default inputs vary across datasets, and we document this in the docstring of each eval method.
    • We made a few updates to the code in examples/ to interface better with language modeling tasks (for Py150). None of these changes affect the results or the interface with algorithms.
    • We updated the code in examples/ to save model predictions in an appropriate format for submissions to the leaderboard.
    • Finally, we have also updated our paper to streamline the writing and include these new numbers and datasets.
    Source code(tar.gz)
    Source code(zip)
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 832 Jan 08, 2023
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
Doods2 - API for detecting objects in images and video streams using Tensorflow

DOODS2 - Return of DOODS Dedicated Open Object Detection Service - Yes, it's a b

Zach 101 Jan 04, 2023
SelfRemaster: SSL Speech Restoration

SelfRemaster: Self-Supervised Speech Restoration Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesi

Takaaki Saeki 46 Jan 07, 2023
A simple code to convert image format and channel as well as resizing and renaming multiple images.

Rename-Resize-and-convert-multiple-images A simple code to convert image format and channel as well as resizing and renaming multiple images. This cod

Happy N. Monday 3 Feb 15, 2022
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
Interpretable-contrastive-word-mover-s-embedding

Interpretable-contrastive-word-mover-s-embedding Paper Datasets Here is a Dropbox link to the datasets used in the paper: https://www.dropbox.com/sh/n

0 Nov 02, 2021
DeepLab-ResNet rebuilt in TensorFlow

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Fr

Vladimir 1.2k Nov 04, 2022
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

170.1k Jan 05, 2023
Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

Launch Platform 16 Oct 11, 2022
This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

Machine Learning Hand Detector This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Dev

Popstar Idhant 3 Feb 25, 2022
ICON: Implicit Clothed humans Obtained from Normals

ICON: Implicit Clothed humans Obtained from Normals arXiv, December 2021. Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black Table of C

Yuliang Xiu 1.1k Dec 30, 2022
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022
YOLOv3 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices

Ultralytics 9.3k Jan 07, 2023
Segmentation-Aware Convolutional Networks Using Local Attention Masks

Segmentation-Aware Convolutional Networks Using Local Attention Masks [Project Page] [Paper] Segmentation-aware convolution filters are invariant to b

144 Jun 29, 2022
A Transformer-Based Siamese Network for Change Detection

ChangeFormer: A Transformer-Based Siamese Network for Change Detection (Under review at IGARSS-2022) Wele Gedara Chaminda Bandara, Vishal M. Patel Her

Wele Gedara Chaminda Bandara 214 Dec 29, 2022
frida工具的缝合怪

fridaUiTools fridaUiTools是一个界面化整理脚本的工具。新人的练手作品。参考项目ZenTracer,觉得既然可以界面化,那么应该可以把功能做的更加完善一些。跨平台支持:win、mac、linux 功能缝合怪。把一些常用的frida的hook脚本简单统一输出方式后,整合进来。并且

diveking 997 Jan 09, 2023
TRIQ implementation

TRIQ Implementation TF-Keras implementation of TRIQ as described in Transformer for Image Quality Assessment. Installation Clone this repository. Inst

Junyong You 115 Dec 30, 2022
Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

Facebook Research 27 Oct 06, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022