When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

Related tags

Deep Learningcasehold
Overview

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

This is the repository for the paper, When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings (Zheng and Guha et al., 2021), accepted to ICAIL 2021.

It includes models, datasets, and code for computing pretrain loss and finetuning Legal-BERT, Custom Legal-BERT, and BERT (double) models on legal benchmark tasks: Overruling, Terms of Service, CaseHOLD.

Download Models & Datasets

The legal benchmark task datasets and Legal-BERT, Custom Legal-BERT, and BERT (double) model files can be downloaded from the casehold Google Drive folder. For more information, see the Description of the folder.

The models can also be accessed directly from the Hugging Face model hub. To load a model from the model hub in a script, pass its Hugging Face model repository name to the model_name_or_path script argument. See demo.ipynb for more details.

Hugging Face Model Repositories

Download the legal benchmark task datasets and the models (optional, scripts can directly load models from Hugging Face model repositories) from the casehold Google Drive folder and unzip them under the top-level directory like:

reglab/casehold
├── data
│ ├── casehold.csv
│ └── overruling.csv
├── models
│ ├── bert-double
│ │ ├── config.json
│ │ ├── pytorch_model.bin
│ │ ├── special_tokens_map.json
│ │ ├── tf_model.h5
│ │ ├── tokenizer_config.json
│ │ └── vocab.txt
│ └── custom-legalbert
│ │ ├── config.json
│ │ ├── pytorch_model.bin
│ │ ├── special_tokens_map.json
│ │ ├── tf_model.h5
│ │ ├── tokenizer_config.json
│ │ └── vocab.txt
│ └── legalbert
│ │ ├── config.json
│ │ ├── pytorch_model.bin
│ │ ├── special_tokens_map.json
│ │ ├── tf_model.h5
│ │ ├── tokenizer_config.json
│ │ └── vocab.txt

Requirements

This code was tested with Python 3.7 and Pytorch 1.8.1.

Install required packages and dependencies:

pip install -r requirements.txt

Install transformers from source (required for tokenizers dependencies):

pip install git+https://github.com/huggingface/transformers

Model Descriptions

Legal-BERT

Training Data

The pretraining corpus was constructed by ingesting the entire Harvard Law case corpus from 1965 to the present. The size of this corpus (37GB) is substantial, representing 3,446,187 legal decisions across all federal and state courts, and is larger than the size of the BookCorpus/Wikipedia corpus originally used to train BERT (15GB). We randomly sample 10% of decisions from this corpus as a holdout set, which we use to create the CaseHOLD dataset. The remaining 90% is used for pretraining.

Training Objective

This model is initialized with the base BERT model (uncased, 110M parameters), bert-base-uncased, and trained for an additional 1M steps on the MLM and NSP objective, with tokenization and sentence segmentation adapted for legal text (cf. the paper).

Custom Legal-BERT

Training Data

Same pretraining corpus as Legal-BERT

Training Objective

This model is pretrained from scratch for 2M steps on the MLM and NSP objective, with tokenization and sentence segmentation adapted for legal text (cf. the paper).

The model also uses a custom domain-specific legal vocabulary. The vocabulary set is constructed using SentencePiece on a subsample (approx. 13M) of sentences from our pretraining corpus, with the number of tokens fixed to 32,000.

BERT (double)

Training Data

BERT (double) is pretrained using the same English Wikipedia corpus that the base BERT model (uncased, 110M parameters), bert-base-uncased, was pretrained on. For more information on the pretraining corpus, refer to the BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper.

Training Objective

This model is initialized with the base BERT model (uncased, 110M parameters), bert-base-uncased, and trained for an additional 1M steps on the MLM and NSP objective.

This facilitates a direct comparison to our BERT-based models for the legal domain, Legal-BERT and Custom Legal-BERT, which are also pretrained for 2M total steps.

Legal Benchmark Task Descriptions

Overruling

We release the Overruling dataset in conjunction with Casetext, the creators of the dataset.

The Overruling dataset corresponds to the task of determining when a sentence is overruling a prior decision. This is a binary classification task, where positive examples are overruling sentences and negative examples are non-overruling sentences extracted from legal opinions. In law, an overruling sentence is a statement that nullifies a previous case decision as a precedent, by a constitutionally valid statute or a decision by the same or higher ranking court which establishes a different rule on the point of law involved. The Overruling dataset consists of 2,400 examples.

Terms of Service

We provide a link to the Terms of Service dataset, created and made publicly accessible by the authors of CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service (Lippi et al., 2019).

The Terms of Service dataset corresponds to the task of identifying whether contractual terms are potentially unfair. This is a binary classification task, where positive examples are potentially unfair contractual terms (clauses) from the terms of service in consumer contracts. Article 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts defines an unfair contractual term as follows. A contractual term is unfair if: (1) it has not been individually negotiated; and (2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. The Terms of Service dataset consists of 9,414 examples.

CaseHOLD

We release the CaseHOLD dataset, created by the authors of our paper, When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings (Zheng and Guha et al., 2021).

The CaseHOLD dataset (Case Holdings On Legal Decisions) provides 53,000+ multiple choice questions with prompts from a judicial decision and multiple potential holdings, one of which is correct, that could be cited. Holdings are central to the common law system. They represent the the governing legal rule when the law is applied to a particular set of facts. It is what is precedential and what litigants can rely on in subsequent cases. The CaseHOLD task derived from the dataset is a multiple choice question answering task, with five candidate holdings (one correct, four incorrect) for each citing context.

For more details on the construction of these legal benchmark task datasets, please see our paper.

Hyperparameters for Downstream Tasks

We split each task dataset into a train and test set with an 80/20 split for hyperparameter tuning. For the baseline model, we performed a random search with batch size set to 16 and 32 over learning rates in the bounded domain 1e-5 to 1e-2, training for a maximum of 20 epochs. To set the model hyperparameters for fine-tuning our BERT and Legal-BERT models, we refer to the suggested hyperparameter ranges for batch size, learning rate and number of epochs in Devlin et al. as a reference point and perform two rounds of grid search for each task. We performed the coarse round of grid search with batch size set to 16 for Overruling and Terms of Service and batch size set to 128 for Citation, over learning rates: 1e-6, 1e-5, 1e-4, training for a maximum of 4 epochs. From the coarse round, we discovered that the optimal learning rates for the legal benchmark tasks were smaller than the lower end of the range suggested in Devlin et al., so we performed a finer round of grid search over a range that included smaller learning rates. For Overruling and Terms of Service, we performed the finer round of grid search over batch sizes (16, 32) and learning rates (5e-6, 1e-5, 2e-5, 3e-5, 5e-5), training for a maximum of 4 epochs. For CaseHOLD, we performed the finer round of grid search with batch size set to 128 over learning rates (1e-6, 3e-6, 5e-6, 7e-6, 9e-6), training for a maximum of 4 epochs. We report the hyperparameters used for evaluation in the table below.

Hyperparameter Table

Results

The results from the paper for the baseline BiLSTM, base BERT model (uncased, 110M parameters), BERT (double), Legal-BERT, and Custom Legal-BERT, finetuned on the legal benchmark tasks, are displayed below.

Demo

demo.ipynb provides examples of how to run the scripts to compute pretrain loss and finetune Legal-BERT/Custom Legal-BERT models on the legal benchmark tasks. These examples should be able to run on a GPU that has 16GB of RAM using the hyperparameters specified in the examples.

See demo.ipynb for details on calculating domain specificity (DS) scores for tasks or task examples by taking the difference in pretrain loss on BERT (double) and Legal-BERT. DS score may be readily extended to estimate domain specificity of tasks in other domains using BERT (double) and existing pretrained models (e.g., SciBERT).

Citation

If you are using this work, please cite it as:

@inproceedings{zhengguha2021,
	title={When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset},
	author={Lucia Zheng and Neel Guha and Brandon R. Anderson and Peter Henderson and Daniel E. Ho},
	year={2021},
	eprint={2104.08671},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	booktitle={Proceedings of the 18th International Conference on Artificial Intelligence and Law},
	publisher={Association for Computing Machinery},
	note={(in press)}
}

Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, and Daniel E. Ho. 2021. When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset. In Proceedings of the 18th International Conference on Artificial Intelligence and Law (ICAIL '21), June 21-25, 2021, São Paulo, Brazil. ACM Inc., New York, NY, (in press). arXiv: 2104.08671 [cs.CL].

Owner
RegLab
RegLab
This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf

Behavior-Sequence-Transformer-Pytorch This is a pytorch implementation for the BST model from Alibaba https://arxiv.org/pdf/1905.06874.pdf This model

Jaime Ferrando Huertas 83 Jan 05, 2023
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

Preparation Please see dataset/README.md to get more details about our datasets-VIL100 Please see INSTALL.md to install environment and evaluation too

82 Dec 15, 2022
Swapping face using Face Mesh with TensorFlow Lite

Swapping face using Face Mesh with TensorFlow Lite

iwatake 17 Apr 26, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
PyTorch implementation of our paper How robust are discriminatively trained zero-shot learning models?

How robust are discriminatively trained zero-shot learning models? This repository contains the PyTorch implementation of our paper How robust are dis

Mehmet Kerim Yucel 5 Feb 04, 2022
List of all dependencies affected by node-ipc malicious commit

node-ipc-dependencies-list List of all dependencies affected by node-ipc malicious commit as of 17/3/2022 - 19/3/2022 (timestamp) Please improve upon

99 Oct 15, 2022
COVID-Net Open Source Initiative

The COVID-Net models provided here are intended to be used as reference models that can be built upon and enhanced as new data becomes available

Linda Wang 1.1k Dec 26, 2022
Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection. Pytorch based library to rank predicted bounding boxes using t

Sergei Belousov 50 Nov 27, 2022
Deep Q-network learning to play flappybird.

AI Plays Flappy Bird I've trained a DQN that learns to play flappy bird on it's own. Try the pre-trained model First install the pip requirements and

Anish Shrestha 3 Mar 01, 2022
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
A High-Performance Distributed Library for Large-Scale Bundle Adjustment

MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment This repo contains an official implementation of MegBA. MegBA is a

旷视研究院 3D 组 336 Dec 27, 2022
Synthesize photos from PhotoDNA using machine learning 🌱

Ribosome Synthesize photos from PhotoDNA. See the blog post for more information. Installation Dependencies You can install Python dependencies using

Anish Athalye 112 Nov 23, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 07, 2023
"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

NAS-Bench-301 This repository containts code for the paper: "NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search". The

AutoML-Freiburg-Hannover 57 Nov 30, 2022
Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces"

Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces" This repo contains the implementation of GEBO algorithm.

Jaeyeon Ahn 2 Mar 22, 2022
PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation

PyGRANSO PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation Please check https://ncvx.org/PyGRANSO for detailed instructions (introd

SUN Group @ UMN 26 Nov 16, 2022
Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

The value of international students to the United States. Probability of getting a non-immigrant visa. Project timeline: Jan 2021 - April 2021 Project

Zinaida Dvoskina 2 Nov 21, 2021
Wordplay, an artificial Intelligence based crossword puzzle solver.

Wordplay, AI based crossword puzzle solver A crossword is a word puzzle that usually takes the form of a square or a rectangular grid of white- and bl

Vaibhaw 4 Nov 16, 2022
Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Structure-Aware-BART This repo contains codes for the following paper: Jiaao Chen, Diyi Yang:Structure-Aware Abstractive Conversation Summarization vi

GT-SALT 56 Dec 08, 2022