Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

GitHub Tests Codecov

Documentation

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
  • metrics of uncertainty, consistency, and agreement with aggregate
  • loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available () and in progress ( 🟡 ).

Categorical Responses

Method Status
Majority Vote
Dawid-Skene
Gold Majority Vote
M-MSR
Wawa
Zero-Based Skill
GLAD
BCC 🟡

Textual Responses

Method Status
RASA
HRRASA
ROVER

Image Segmentation

Method Status
Segmentation MV
Segmentation RASA
Segmentation EM

Pairwise Comparisons

Method Status
Bradley-Terry
Noisy Bradley-Terry

Citation

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

License

© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • Crowd-Kit Learning

    Crowd-Kit Learning

    This is just an example of what this subpackage will contain.

    We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

    opened by pilot7747 10
  • Fix the documentation generation issues

    Fix the documentation generation issues

    Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    documentation enhancement 
    opened by dustalov 9
  • Add MACE

    Add MACE

    Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.

    enhancement good first issue 
    opened by jcklie 4
  • Add MACE aggregation model

    Add MACE aggregation model

    I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

    Description

    Based on the original VB inference implementation, I wrote it in Python.

    Connected issues (if any)

    https://github.com/Toloka/crowd-kit/issues/5

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by pilot7747 3
  • Documentation updates

    Documentation updates

    Updated index.md and the Classification section:

    1. added extra information to the models descriptions;
    2. added descriptions for parameters;
    3. fixed error and typos in descriptions.
    opened by Natalyl3 2
  • Binary Relevance aggregation

    Binary Relevance aggregation

    Description

    I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by denaxen 2
  • Use mypy --strict

    Use mypy --strict

    Description

    This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement 
    opened by dustalov 2
  • Run Jupyter notebooks with tests

    Run Jupyter notebooks with tests

    Description

    This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement good first issue 
    opened by dustalov 2
  • Dramatically improve the code maintainability

    Dramatically improve the code maintainability

    This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

    Description

    In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

    I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    bug documentation enhancement 
    opened by dustalov 2
  • Add header and LM-based aggregation item

    Add header and LM-based aggregation item

    Description

    This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    documentation 
    opened by dustalov 2
  • Renamed columns?

    Renamed columns?

    Hi, the guide says

    df = pd.read_csv('results.csv') # should contain columns: task, performer, label

    but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

    opened by jcklie 2
  • Ordinal Labels

    Ordinal Labels

    Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

    • Labels are categorical but have an ordering defined 1 < ... < K.
    • The K class ordinal labels are transformed into K−1 binary class label data.
    • Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.
    • The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].
    • The class with the maximum probability is assigned to the instance
    enhancement 
    opened by vikasraykar 2
Releases(v1.2.0)
Owner
Toloka
Data labeling platform for ML
Toloka
Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

ConversorDeMedidas_CapuccinoGelado Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini! Requirem

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Multi-speaker DGP This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch. O

sarulab-speech 24 Sep 07, 2022
Deep metric learning methods implemented in Chainer

Deep Metric Learning Implementation of several methods for deep metric learning in Chainer v4.2.0. Proxy-NCA: No Fuss Distance Metric Learning using P

ronekko 156 Nov 28, 2022
Autonomous Driving on Curvy Roads without Reliance on Frenet Frame: A Cartesian-based Trajectory Planning Method

C++/ROS Source Codes for "Autonomous Driving on Curvy Roads without Reliance on Frenet Frame: A Cartesian-based Trajectory Planning Method" published in IEEE Trans. Intelligent Transportation Systems

Bai Li 88 Dec 23, 2022
VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

VACA Code repository for the paper "VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries (arXiv)". The impleme

Pablo Sánchez-Martín 16 Oct 10, 2022
Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Degree-Quant This repo provides a clean re-implementation of the code associated with the paper Degree-Quant: Quantization-Aware Training for Graph Ne

35 Oct 07, 2022
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

One Pixel Attack How simple is it to cause a deep neural network to misclassify an image if an attacker is only allowed to modify the color of one pix

Dan Kondratyuk 1.2k Dec 26, 2022
The official repository for Deep Image Matting with Flexible Guidance Input

FGI-Matting The official repository for Deep Image Matting with Flexible Guidance Input. Paper: https://arxiv.org/abs/2110.10898 Requirements easydict

Hang Cheng 51 Nov 10, 2022
This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Discontinuous Grammar as a Foreign Language This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing

Daniel Fernández-González 2 Apr 07, 2022
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022
Project repo for Learning Category-Specific Mesh Reconstruction from Image Collections

Learning Category-Specific Mesh Reconstruction from Image Collections Angjoo Kanazawa*, Shubham Tulsiani*, Alexei A. Efros, Jitendra Malik University

438 Dec 22, 2022
Classical OCR DCNN reproduction based on PaddlePaddle framework.

Paddle-SVHN Classical OCR DCNN reproduction based on PaddlePaddle framework. This project reproduces Multi-digit Number Recognition from Street View I

1 Nov 12, 2021
A Pytorch loader for MVTecAD dataset.

MVTecAD A Pytorch loader for MVTecAD dataset. It strictly follows the code style of common Pytorch datasets, such as torchvision.datasets.CIFAR10. The

Jiyuan 1 Dec 27, 2021
Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image This repository is an implementation of the method described in the following pap

21 Dec 15, 2022
sktime companion package for deep learning based on TensorFlow

NOTE: sktime-dl is currently being updated to work correctly with sktime 0.6, and wwill be fully relaunched over the summer. The plan is Refactor and

sktime 573 Jan 05, 2023
This repository contains all source code, pre-trained models related to the paper "An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator"

An Empirical Study on GANs with Margin Cosine Loss and Relativistic Discriminator This is a Pytorch implementation for the paper "An Empirical Study o

Cuong Nguyen 3 Nov 15, 2021
Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform This repository is the implementation of "Variable-Rate Deep Image C

Myungseo Song 47 Dec 13, 2022