Time Masking for Temporal Language Models

This repository provides a reference implementation of the paper:

Time Masking for Temporal Language Models
Guy D. Rosin, Ido Guy, and Kira Radinsky
Accepted to WSDM 2022
Preprint: https://arxiv.org/abs/2110.06366

Abstract:

Our world is constantly evolving, and so is the content on the web. Consequently, our languages, often said to mirror the world, are dynamic in nature. However, most current contextual language models are static and cannot adapt to changes over time.
In this work, we propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts. Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information.
We leverage our approach for the tasks of semantic change detection and sentence time prediction, experimenting on diverse datasets in terms of time, size, genre, and language. Our extensive evaluation shows that both tasks benefit from exploiting time masking.

Prerequisites

Python 3.8
Install requirements using pip install -r requirements.txt
Obtain datasets for training and evaluation:
- For semantic change detection: LiverpoolFC dataset or the SemEval-2020 Task 1 datasets.
- For sentence time prediction: our NYT dataset can be found under datasets.

Usage

Train TempoBERT using train_tempobert.py. This script is similar to Hugging Face's language modeling training script (link), and introduces two new arguments: time_embedding_type, that should be set to "prepend_token", and time_mlm_probability, that's optional and can used for setting a custom probability for time masking.
Evaluate TempoBERT using semantic_change_detection.py for semantic change detection and sentence_time_prediction.py for sentence time prediction.

Pointers

The modification to the input texts is performed in tokenization_utils_fast.py, in TempoPreTrainedTokenizerFast._batch_encode_plus().
Time masking is performed in temporal_data_collator.py.

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Related tags

Overview

Time Masking for Temporal Language Models

Prerequisites

Usage

Pointers

Owner

Guy Rosin

Change Detection in SAR Images Based on Multiscale Capsule Network

Back to Event Basics: SSL of Image Reconstruction for Event Cameras

Magisk module to enable hidden features on Android 12 Developer Preview 1.

[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Safe Local Motion Planning with Self-Supervised Freespace Forecasting, CVPR 2021

Vignette is a face tracking software for characters using osu!framework.

The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

An off-line judger supporting distributed problem repositories

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Plover-tapey-tape: an alternative to Plover’s built-in paper tape

Must-read Papers on Physics-Informed Neural Networks.

This is a TensorFlow implementation for C2-Rec

The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

Supervised domain-agnostic prediction framework for probabilistic modelling

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

CVPR 2021 Challenge on Super-Resolution Space

Graph Convolutional Networks for Temporal Action Localization (ICCV2019)