IndoNLI: A Natural Language Inference Dataset for Indonesian

This is a repository for data and code accompanying our EMNLP 2021 paper "IndoNLI: A Natural Language Inference Dataset for Indonesian". The datasets used for our experiments can be found under the data directory:

indonli: human-annotated NLI data, split into train, val, and test (test_lay and test_expert)

diagnostic: subset of examples from test_expert that are annotated with linguistic and logical phenomena
translate_train.tar.gz: MNLI dataset translated to Indonesian (train and dev)
translate_train_small.tar.gz: sampled of translate_train used for the translate_train_small experiment.

The experiment code can be found under experiment directory, please check the related README file.

License

We use premises taken from the Indonesian Wikipedia, news, and Web articles.

Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).

For the news genre, we use premise text from Indonesian PUD and GSD treebanks provided by the Universal Dependencies 2.5 (Zeman et al., 2019) and IndoSum (Kurniawan and Louvan, 2018). Indonesian PUD and GSD treebanks are licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA). IndoSum is licensed under Apache License, Version 2.0.

Citation

If you use our corpus in your work, please consider citing our paper:

@inproceedings{indonli,
    title = "IndoNLI: A Natural Language Inference Dataset for Indonesian",
    author = "Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

IndoNLI: A Natural Language Inference Dataset for Indonesian

Related tags

Overview

IndoNLI: A Natural Language Inference Dataset for Indonesian

License

Citation

Owner

SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

A toolset for creating Qualtrics-based IAT experiments

Learning Features with Parameter-Free Layers (ICLR 2022)

WORD: Revisiting Organs Segmentation in the Whole Abdominal Region

Toward Spatially Unbiased Generative Models (ICCV 2021)

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Exadel CompreFace is a free and open-source face recognition GitHub project

torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

Official PyTorch implementation of "RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on" (IJCAI-ECAI 2022)

Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

4th place solution for the SIGIR 2021 challenge.

keyframes-CNN-RNN(action recognition)

Code and data accompanying our SVRHM'21 paper.

A Python Package for Convex Regression and Frontier Estimation

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

My take on a practical implementation of Linformer for Pytorch.

Java and SHACL code commented in the paper "Towards compliance checking in reified I/O logic via SHACL" submitted to ICAIL 2021

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'