Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models.

Overview

Statutory Interpretation Data Set

This repository contains the data set created for the following research papers:

Savelka, Jaromir, and Kevin D. Ashley. "Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models." Findings of the Association for Computational Linguistics: EMNLP 2021. 2021.

Jaromir Savelka, Huihui Xu, and Kevin D. Ashley. 2019. Improving Sentence Retrieval from Case Law for Statutory Interpretation. In Seventeenth International Conference on Artificial Intelligence and Law (ICAIL ’19), June 17–21, 2019, Montreal, QC, Canada, Floris Bex (Ed.). ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3322640.3326736

Task

Given a statutory provision, user's interest in the meaning of a phrase from the provision, and a list of sentences we would like to rank more highly the sentences that elaborate upon the meaning of the statutory phrase of interest, such as:

  • definitional sentences (e.g., a sentence that provides a test for when the phrase applies)
  • sentences that state explicitly in a different way what the statutory phrase means or state what it does not mean
  • sentences that provide an example, instance, or counterexample of the phrase
  • sentences that show how a court determines whether something is such an example, instance, or counterexample.

Corpus Overview

For this corpus we selected fourty two terms from different provisions of the United States Code.

For each term we have collected a set of sentences by extracting all the sentences mentioning the term from the court decisions retrieved from the Caselaw access project data.

In total the corpus consists of 26,959 sentences.

The sentences are classified into four categories according to their usefulness for the interpretation:

  • high value - sentence intended to define or elaborate on the meaning of the term
  • certain value - sentence that provides grounds to elaborate on the term's meaning
  • potential value - sentence that provides additional information beyond what is known from the provision the term comes from
  • no value - no additional information over what is known from the provision

See Annotation guidelines for additional details.

Data Structure

Each zip file contains data related to one of the fourty two queries. There are four files in total containing the texts of different granularity. These allow to replicate experiments reported in the paper cited above.

  • case
    • original_id - case id from Caselaw access project
    • name
    • short_name
    • date
    • official_date
    • official citation
    • alternate_citations
    • court
    • short_court - court abbreviation
    • jurisdiction
    • short_jurisdiction - jurisdiction abbreviation
    • attorneys
    • parties
    • judges
    • text
  • opinion
    • case_id - pointer to the case the opinion belongs to
    • author
    • type - e.g., concurrence, dissent
    • position - position of the opinion within the case
    • text
  • paragraph
    • case_id - pointer to the case the opinion belongs to
    • opinion_id - pointer to the opinion the paragraph belongs to
    • position - position of the paragraph within the opinion
    • text
  • sentence
    • case_id - pointer to the case the sentence belongs to
    • opinion_id - pointer to the opinion the sentence belongs to
    • paragraph_id - pointer to the paragraph the sentence belongs to
    • position - position of the sentence within the paragraph
    • text
    • label - human-created gold label of the sentence value

Terms of Use

For use of the data we kindly ask you to provide the two following attributions:

Savelka, Jaromir, and Kevin D. Ashley. "Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models." Findings of the Association for Computational Linguistics: EMNLP 2021. 2021.

The President and Fellows of Harvard University, Caselaw access project, Caselaw access project, 2018.

GitHub repository for the ICLR Computational Geometry & Topology Challenge 2021

ICLR Computational Geometry & Topology Challenge 2022 Welcome to the ICLR 2022 Computational Geometry & Topology challenge 2022 --- by the ICLR 2022 W

42 Dec 13, 2022
Image segmentation with private İstanbul Dataset

Image Segmentation This repo was created for academic research and test result. Repo will update after academic article online. This repo contains wei

İrem KÖMÜRCÜ 9 Dec 11, 2022
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat

Sammarth Kumar 11 Jun 11, 2021
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

512x512 flowers after 12 hours of training, 1 gpu 256x256 flowers after 12 hours of training, 1 gpu Pizza 'Lightweight' GAN Implementation of 'lightwe

Phil Wang 1.5k Jan 02, 2023
Out-of-boundary View Synthesis towards Full-frame Video Stabilization

Out-of-boundary View Synthesis towards Full-frame Video Stabilization Introduction | Update | Results Demo | Introduction This repository contains the

25 Oct 10, 2022
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

Phil Wang 90 Nov 24, 2022
Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."

DialogLM Code for AAAI 2022 paper: DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization. Pre-trained Models We release two ve

Microsoft 92 Dec 19, 2022
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

Zhying 77 Dec 21, 2022
Tensorflow Implementation of ECCV'18 paper: Multimodal Human Motion Synthesis

MT-VAE for Multimodal Human Motion Synthesis This is the code for ECCV 2018 paper MT-VAE: Learning Motion Transformations to Generate Multimodal Human

Xinchen Yan 36 Oct 02, 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

75 Dec 29, 2022
Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Pruner for nested cross-validation - Sphinx-Doc Nested cross-validation is necessary to avoid biased model performance in embedded feature selection i

1 Dec 15, 2021
Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads-Tutorial-3 Testing the Facial Emotion Recognition (FER) algorithm on animations

PegHeads Inc 2 Jan 03, 2022
DumpSMBShare - A script to dump files and folders remotely from a Windows SMB share

DumpSMBShare A script to dump files and folders remotely from a Windows SMB shar

Podalirius 178 Jan 06, 2023
Image-Stitching - Panorama composition using SIFT Features and a custom implementaion of RANSAC algorithm

About The Project Panorama composition using SIFT Features and a custom implementaion of RANSAC algorithm (Random Sample Consensus). Author: Andreas P

Andreas Panayiotou 3 Jan 03, 2023
Code for Max-Margin Contrastive Learning - AAAI 2022

Max-Margin Contrastive Learning This is a pytorch implementation for the paper Max-Margin Contrastive Learning accepted to AAAI 2022. This repository

Anshul Shah 12 Oct 22, 2022
Simple Dynamic Batching Inference

Simple Dynamic Batching Inference 解决了什么问题? 众所周知,Batch对于GPU上深度学习模型的运行效率影响很大。。。 是在Inference时。搜索、推荐等场景自带比较大的batch,问题不大。但更多场景面临的往往是稀碎的请求(比如图片服务里一次一张图)。 如果

116 Jan 01, 2023
TensorFlow-based implementation of "Pyramid Scene Parsing Network".

PSPNet_tensorflow Important Code is fine for inference. However, the training code is just for reference and might be only used for fine-tuning. If yo

HsuanKung Yang 323 Dec 20, 2022
TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

TensorFlow-LiveLessons Note that the second edition of this video series is now available here. The second edition contains all of the content from th

Deep Learning Study Group 830 Jan 03, 2023
Learning to Simulate Dynamic Environments with GameGAN (CVPR 2020)

Learning to Simulate Dynamic Environments with GameGAN PyTorch code for GameGAN Learning to Simulate Dynamic Environments with GameGAN Seung Wook Kim,

199 Dec 26, 2022
CMP 414/765 course repository for Spring 2022 semester

CMP414/765: Artificial Intelligence Spring2021 This is the GitHub repository for course CMP 414/765: Artificial Intelligence taught at The City Univer

ch00226855 4 May 16, 2022