Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Overview

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

This is an accompanying repository to the ICAIL 2021 paper entitled "Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains". All the data and the code used in the experiments reported in the paper are to be found here.

Data

The data set consists of 807 adjudicatory decisions from 7 different countries (6 languages) annotated in terms of the following type system:

  • Out of Scope - Parts outside of the main document body (e.g., metadata, editorial content, dissents, end notes, appendices).
  • Heading - Typically an incomplete sentence or marker starting a section (e.g., “Discussion,” “Analysis,” “II.”).
  • Background - The part where the court describes procedural history, relevant facts, or the parties’ claims.
  • Analysis - The section containing reasoning of the court, issues, and application of law to the facts of the case.
  • Introductory Summary - A brief summary of the case at the beginning of the decision.
  • Outcome - A few sentences stating how the case was decided (i.e, the overall outcome of the case).

The country specific subsets:

  • Canada - Random selection of cases retrieved from www.canlii.org from multiple provinces. The selection is not limited to any specific topic or court.
  • Czech Republic - A random selection of cases from Constitutional Court (30), Supreme Court (40), and Supreme Administrative Court (30). Temporal distribution was taken into account.
  • France - A selection of cases decided by Cour de cassation between 2011 and 2019. A stratified sampling based on the year of publication of the decision was used to select the cases.
  • Germany - A stratified sample from the federal jurisprudence database spanning all federal courts (civil, criminal, labor, finance, patent, social, constitutional, and administrative).
  • Italy - The top 100 cases of the criminal courts stored between 2015 and 2020 mentioning “stalking” and keyed to the Article 612 bis of the Criminal Code.
  • Poland - A stratified sample from trial-level, appellate, administrative courts, the Supreme Court, and the Constitutional tribunal. The cases mention “democratic country ruled by law.”
  • U.S.A. I - Federal district court decisions in employment law mentioning “motion for summary judgment,” “employee,” and “independent contractor.”
  • U.S.A. II - Administrative decisions from the U.S. Department of Labor. Top 100 ordered in reverse chronological rulings order, starting in October 2020, were selected.

For more detailed information, please, refer to the original paper.

How to Use

ICAIL 2021 Data

The data used in the ICAIL 2021 experiments can be found in the following paths:

data/Country-Language-*/annotator-*-ICAIL2021.csv

Note that the Canadian subset could not be included in this repository due to concerns about personal information protection in Canada. However, it can be obtained upon request at [email protected]. Once you obtain the data, you just need to create data/Canada-EN-1 directory and place all the files there.

If you would like to experiment with different preprocessing techniques the original texts are placed in the following paths:

data/Country-Language-*/texts

You can find the annotations corresponding to these texts here:

data/Country-Language-*/annotator-*.csv

The texts cleaned of the Out of Scope and Heading segments (via dataset_clean.py) are placed in the following paths:

data/Country-Language-*/texts-clean-annotator-*

Note that the processing depends on annotations. Hence, there are several versions of documents at this stage if there were multiple annotators. The annotations corresponding to the cleaned texts are here:

data/Country-Language-*/annotator-*-clean.csv

The dataset_ICAIL2021.py has the processing code that has been applied to the cleaned texts and annotations to generate the ICAIL 2021 dataset (see above). Note, that the code will skip the Czech Republic subset by default. This is because this subset requires an external resource for sentence segmentation (czech-pdt-ud-X.X-XXXXXX.udpipe). You first need to obtain the file at https://universaldependencies.org/. Then, you need to place it into the data directory. Then, you can remove the Czech_Republic-CZ-1 string from the EXCLUDED tuple in dataset_ICAIL2021.py. Finally, you need to replace the data/czech-pdt-ud-2.5-191206.udpipe string in the utils.py to correspond to the file that you have downloaded. After these changes, the code will also operate on the Czech Republic part of the dataset.

Dataset Statistics

To replicate the inter-annotator agreement analysis performed in the ICAIL 2021 paper you can use the ia_agreement.ipynb notebook.

To generate the dataset statistics reported in the ICAIL 2021 paper you can use the dataset_statistics.ipynb notebook.

Experiments

The file ICAIL2021_experiments.ipynb contains the code necessary to run the code presented in the paper. This includes the code to embed the sentences of the cases into a multilingual vector representation, the definition of the Gated Recurrent Unit model and the code to train and evaluated along the different experiments described in the paper. It also contains the code to create the visualizations presented in the discussion section of the paper.

The notebook can be run in two different ways:

Attribution

We kindly ask you to cite the following paper:

@inproceedings{savelka2021,
    title={Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains},
    author={Jaromir Savelka and Hannes Westermann and Karim Benyekhlef and Charlotte S. Alexander and Jayla C. Grant and David Restrepo Amariles and Rajaa El Hamdani and S\'{e}bastien Mee\`{u}s and Aurore Troussel and Micha\l\ Araszkiewicz and Kevin D. Ashley and Alexandra Ashley and Karl Branting and Mattia Falduti and Matthias Grabmair and Jakub Hara\v{s}ta and Tereza Novotn\'a, Elizabeth Tippett and Shiwanni Johnson},
    year={2021},
    booktitle={Proceedings of the 18th International Conference on Artificial Intelligence and Law},
    publisher={Association for Computing Machinery},
    doi={10.1145/3462757.3466149}
}

Jaromir Savelka, Hannes Westermann, Karim Benyekhlef, Charlotte S. Alexander, Jayla C. Grant, David Restrepo Amariles, Rajaa El Hamdani, Sébastien Meeùs, Aurore Troussel, Michał Araszkiewicz, Kevin D. Ashley, Alexandra Ashley, Karl Branting, Mattia Falduti, Matthias Grabmair, Jakub Harašta, Tereza Novotná, Elizabeth Tippett, and Shiwanni Johnson. 2021. Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains. In Eighteenth International Conference for Artificial Intelligence and Law (ICAIL’21), June 21–25, 2021, São Paulo, Brazil. ACM, New York,NY, USA, 10 pages. https://doi.org/10.1145/3462757.3466149

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Structured Edge Detection Toolbox

################################################################### # # # Structure

Piotr Dollar 779 Jan 02, 2023
Spectral Tensor Train Parameterization of Deep Learning Layers

Spectral Tensor Train Parameterization of Deep Learning Layers This repository is the official implementation of our AISTATS 2021 paper titled "Spectr

Anton Obukhov 12 Oct 23, 2022
The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

Code data_preprocess.py: preprocess data for Dependent-T5. parameters.py: define parameters of Dependent-T5. train_tools.py: traning and evaluation co

1 Apr 21, 2022
A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

S³FD: Single Shot Scale-invariant Face Detector A PyTorch Implementation of Single Shot Scale-invariant Face Detector. Eval python wider_eval_pytorch.

carwin 235 Jan 07, 2023
This repository contains code from the paper "TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network"

TTS-GAN: A Transformer-based Time-Series Generative Adversarial Network This repository contains code from the paper "TTS-GAN: A Transformer-based Tim

Intelligent Multimodal Computing and Sensing Laboratory (IMICS Lab) - Texas State University 108 Dec 29, 2022
Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

SFace Code of TIP2021 Paper 《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet, PyTorch and Jittor versi

Zhong Yaoyao 47 Nov 25, 2022
This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction".

TreePartNet This is the code repository implementing the paper "TreePartNet: Neural Decomposition of Point Clouds for 3D Tree Reconstruction". Depende

刘彦超 34 Nov 30, 2022
Interactive dimensionality reduction for large datasets

BlosSOM 🌼 BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimen

19 Dec 14, 2022
ByteTrack超详细教程!训练自己的数据集&&摄像头实时检测跟踪

ByteTrack超详细教程!训练自己的数据集&&摄像头实时检测跟踪

Double-zh 45 Dec 19, 2022
Super Resolution for images using deep learning.

Neural Enhance Example #1 — Old Station: view comparison in 24-bit HD, original photo CC-BY-SA @siv-athens. As seen on TV! What if you could increase

Alex J. Champandard 11.7k Dec 29, 2022
A collection of loss functions for medical image segmentation

A collection of loss functions for medical image segmentation

Jun 3.1k Jan 03, 2023
NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

OptiPrompt This is the PyTorch implementation of the paper Factual Probing Is [MASK]: Learning vs. Learning to Recall. We propose OptiPrompt, a simple

Princeton Natural Language Processing 150 Dec 20, 2022
This project implements "virtual speed" from heart rate monito

ANT+ Virtual Stride Based Speed and Distance Monitor Overview This project imple

2 May 20, 2022
My freqtrade strategies

My freqtrade-strategies Hi there! This is repo for my freqtrade-strategies. My name is Ilya Zelenchuk, I'm a lecturer at the SPbU university (https://

171 Dec 05, 2022
STEM: An approach to Multi-source Domain Adaptation with Guarantees

STEM: An approach to Multi-source Domain Adaptation with Guarantees Introduction This is the official implementation of ``STEM: An approach to Multi-s

5 Dec 19, 2022
Chainer implementation of recent GAN variants

Chainer-GAN-lib This repository collects chainer implementation of state-of-the-art GAN algorithms. These codes are evaluated with the inception score

399 Oct 23, 2022
Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

40 Dec 30, 2022
Balancing Principle for Unsupervised Domain Adaptation

Blancing Principle for Domain Adaptation NeurIPS 2021 Paper Abstract We address the unsolved algorithm design problem of choosing a justified regulari

Marius-Constantin Dinu 4 Dec 15, 2022
Differentiable scientific computing library

xitorch: differentiable scientific computing library xitorch is a PyTorch-based library of differentiable functions and functionals that can be widely

98 Dec 26, 2022