SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

Overview

License: MIT Python GitHub code size in bytes Downloads GitHub Workflow Status PyPI version GitHub issues GitHub commit activity GitHub last commit arXiv

[arXiv]

The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, which was actually in operation for a decade. In addition, the SHIFT15M dataset has several types of dataset shifts, allowing us to evaluate the robustness of the model to different types of shifts (e.g., covariate shift and target shift).

We provide the Datasheet for SHIFT15M. This datasheet is based on the Datasheets for Datasets [1] template.

System Python 3.6 Python 3.7 Python 3.8
Linux CPU
Linux GPU
Windows CPU / GPU Status Currently Unavailable Status Currently Unavailable Status Currently Unavailable
Mac OS CPU

SHIFT15M is a large-scale dataset based on approximately 15 million items accumulated by the fashion search service IQON.

Installation

From PyPi

$ pip install shift15m

From source

$ git clone https://github.com/st-tech/zozo-shift15m.git
$ cd zozo-shift15m
$ poetry build
$ pip install dist/shift15m-xxxx-py3-none-any.whl

Download SHIFT15M dataset

Use Dataset class

You can download SHIFT15M dataset as follows:

from shift15.datasets import NumLikesRegression

dataset = NumLikesRegression(root="./data", download=True)

Download directly by using download scripts

Please download the dataset as follows:

$ bash scripts/download_all.sh

To avoid downloading the test dataset for set matching (80GB), which is not required in training, you can use the following script.

$ bash scripts/download_all_wo_set_testdata.sh

Tasks

The following tasks are now available:

Tasks Task type Shift type # of input dim # of output dim
NumLikesRegression regression target shift (N, 25) (N, 1)
SumPricesRegression regression covariate shift, target shift (N, 1) (N, 1)
ItemPriceRegression regression target shift (N, 4096) (N, 1)
ItemCategoryClassification classification target shift (N, 4096) (N, 7)
Set2SetMatching set-to-set matching covariate shift (N, 4096)x(M, 4096) (1)

Benchmarks

As templates for numerical experiments on the SHIFT15M dataset, we have published experimental results for each task with several models.

Original Dataset Structure

The original dataset is maintained in json format, and a row consists of the following:

{
  "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
  "like_num":"xx",
  "set_id":"xxx",
  "items":[
    {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
    ...
  ],
  "publish_date":"yyyy-mm-dd"
}

Contributing

To learn more about making a contribution to SHIFT15M, please see the following materials:

License

The dataset itself is provided under a CC BY-NC 4.0 license. On the other hand, the software in this repository is provided under the MIT license.

Dataset metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name SHIFT15M Dataset
alternateName SHIFT15M
alternateName shift15m-dataset
url
sameAs https://github.com/st-tech/zozo-shift15m
description SHIFT15M is a multi-objective, multi-domain dataset which includes multiple dataset shifts.
provider
property value
name ZOZO Research
sameAs https://ja.wikipedia.org/wiki/ZOZO
license
property value
name CC BY-NC 4.0
url

Citation

@misc{Kimura_SHIFT15M_Multiobjective_LargeScale_2021,
author = {Kimura, Masanari and Nakamura, Takuma and Saito, Yuki},
month = {8},
title = {SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts},
year = {2021}
}

Errata

No errata are currently available.

References

  • [1] Gebru, Timnit, et al. "Datasheets for datasets." arXiv preprint arXiv:1803.09010 (2018).
Comments
Releases(v0.2.0)
  • v0.2.0(Sep 20, 2022)

    • add tags info as follows:
    {
      "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
      "like_num":"xx",
      "set_id":"xxx",
      "items":[
        {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
        ...
      ],
      "publish_date":"yyyy-mm-dd",
      "tags": "tag_a, tag_b, tag_c, ..."
    }
    
    • add superset matching benchmark
    • fix a label creation bug on set matching with multiple splits
    Source code(tar.gz)
    Source code(zip)
  • v.0.1.2(Nov 24, 2021)

Owner
ZOZO, Inc.
ZOZO, Inc.
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Dennis Bappert 104 Nov 25, 2022
MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks.

MVGCN MVGCN: a novel multi-view graph convolutional network (MVGCN) framework for link prediction in biomedical bipartite networks. Developer: Fu Hait

13 Dec 01, 2022
A module for solving and visualizing Schrödinger equation.

qmsolve This is an attempt at making a solid, easy to use solver, capable of solving and visualize the Schrödinger equation for multiple particles, an

506 Dec 28, 2022
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and ap

3.4k Jan 04, 2023
Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

RuiLiu 65 Dec 20, 2022
Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

Deep Learning with TensorFlow 2 and Keras – Notebooks This project accompanies my Deep Learning with TensorFlow 2 and Keras trainings. It contains the

Aurélien Geron 1.9k Dec 15, 2022
[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction International Conference on 3D Vision, 2020 Sai Sagar Jinka1, Rohan

Rohan Chacko 39 Oct 12, 2022
Dashboard for the COVID19 spread

COVID-19 Data Explorer App A streamlit Dashboard for the COVID-19 spread. The app is live at: [https://covid19.cwerner.ai]. New data is queried from G

Christian Werner 22 Sep 29, 2022
LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

LaneDetectionAndLaneKeeping This project is part of my bachelor's thesis. The go

5 Jun 27, 2022
Text Generation by Learning from Demonstrations

Text Generation by Learning from Demonstrations The README was last updated on March 7, 2021. The repo is based on fairseq (v0.9.?). Paper arXiv Prere

38 Oct 21, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
Simple Dynamic Batching Inference

Simple Dynamic Batching Inference 解决了什么问题? 众所周知,Batch对于GPU上深度学习模型的运行效率影响很大。。。 是在Inference时。搜索、推荐等场景自带比较大的batch,问题不大。但更多场景面临的往往是稀碎的请求(比如图片服务里一次一张图)。 如果

116 Jan 01, 2023
Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness?

Adversrial Machine Learning Benchmarks This code belongs to the papers: Is RobustBench/AutoAttack a suitable Benchmark for Adversarial Robustness? Det

Adversarial Machine Learning 9 Nov 27, 2022
Keeper for Ricochet Protocol, implemented with Apache Airflow

Ricochet Keeper This repository contains Apache Airflow DAGs for executing keeper operations for Ricochet Exchange. Usage You will need to run this us

Ricochet Exchange 5 May 24, 2022
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Jieyu Zhang 118 Jan 02, 2023
Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER (WIP) Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEER is an e

Alipay 6 Dec 17, 2022
Predict and time series avocado hass

RECOMMENDER SYSTEM MARKETING TỔNG QUAN VỀ HỆ THỐNG DỮ LIỆU 1. Giới thiệu - Tiki là một hệ sinh thái thương mại "all in one", trong đó có tiki.vn, là

hieulmsc 3 Jan 10, 2022
This repo contains the source code and a benchmark for predicting user's utilities with Machine Learning techniques for Computational Persuasion

Machine Learning for Argument-Based Computational Persuasion This repo contains the source code and a benchmark for predicting user's utilities with M

Ivan Donadello 4 Nov 07, 2022
Coded illumination for improved lensless imaging

CodedCam Coded Illumination for Improved Lensless Imaging Paper | Supplementary results | Data and Code are available. Coded illumination for improved

Computational Sensing and Information Processing Lab 1 Nov 29, 2021
Development kit for MIT Scene Parsing Benchmark

Development Kit for MIT Scene Parsing Benchmark [NEW!] Our PyTorch implementation is released in the following repository: https://github.com/hangzhao

MIT CSAIL Computer Vision 424 Dec 01, 2022