TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Last update: Oct 31, 2021

Overview

TCube: Domain-Agnostic Neural Time series Narration

This repository contains the code for the paper: "TCube: Domain-Agnostic Neural Time series Narration" (to appear in IEEE ICDM 2021).

The PLMs used in this effort (T5, BART, and GPT-2) are implemented using the HuggingFace library (https://huggingface.co/) and finetuned to the WebNLG v3 (https://gitlab.com/shimorina/webnlg-dataset/-/tree/master/release_v3.0) and DART (https://arxiv.org/abs/2007.02871) datasets.

Clones of both datasets are available under /Finetune PLMs/Datasets in this repository.

The PLMs fine-tuned to WebNLG/DART could not be uploaded due to the 1GB limitations of GitLFS. However, pre-made scripts in this repository (detailed below) are present for convientiently fine-tuning these models.

The entire repository is based on Python 3.6 and the results are visaulized through the iPython Notebooks.

Dependencies

Interactive Environments

notebook
ipywidgets==7.5.1

Deep Learning Frameworks

torch 1.7.1 (suited to your CUDA version)
pytorch-lightning 0.9.0
transformers==3.1.0

NLP Toolkits

sentencepiece==0.1.91
nltk

Scientific Computing, Data Manipulation, and Visualizations

numpy
scipy
sklearn
matplotib
pandas
pwlf

Evaluation

rouge-score
textstat
lexical_diversity
language-tool-python

Misc

xlrd
tqdm
cython

Please make sure that the aforementioned Python packages with their specified versions are installed in your system in a separate virtual environment.

Data-Preprocessing Scripts

Under /Finetune PLMs in this repository there are two scripts for pre-processing the WebNLG and DART datasets:

preprocess_webnlg.py
preprocess_dart.py

These scripts draw from the original datasets in /Finetune PLMs/Datasets/WebNLGv3 and /Finetune PLMs/Datasets/DART and prepare CSV files in /Finetune PLMs/Datasets breaking the original datasets into train, dev, and test sets in the format required by our PLMs.

Fine-tuning Scripts

Under /Finetune PLMs in this repository there are three scripts for fine-tuning T5, BART, and GPT-2:

finetuneT5.py
finetuneBART.py
finetuneGPT2.py

Visualization and Evaluation Notebooks

In the root directory are 10 notebooks. For the descriptions of the time-series datasets used:

Datatsets.ipynb

For comparisons of segmentation and regime-change detection algorithms:

Error Determination.ipynb
Regime Detection.ipynb
Segmentation.ipynb
Trend Detection Plot.ipynb

For the evaluation of the TCube framework on respective time-series datasets:

T3-COVID.ipnyb
T3-DOTS.ipnyb
T3-Pollution.ipnyb
T3-Population.ipnyb
T3-Temperature.ipnyb

Citation and Contact

If any part of this code repository or the TCube framework is used in your work, please cite our paper. Thanks!

Contact: Mandar Sharma ([email protected]), First Author.

TCube generates rich and fluent narratives that describes the characteristics, trends, and anomalies of any time-series data (domain-agnostic) using the transfer learning capabilities of PLMs.

Related tags

Overview

TCube: Domain-Agnostic Neural Time series Narration

Dependencies

Interactive Environments

Deep Learning Frameworks

NLP Toolkits

Scientific Computing, Data Manipulation, and Visualizations

Evaluation

Misc

Data-Preprocessing Scripts

Fine-tuning Scripts

Visualization and Evaluation Notebooks

Citation and Contact

Owner

Mandar Sharma

Interactive Image Generation via Generative Adversarial Networks

《Deep Single Portrait Image Relighting》(ICCV 2019)

Single-Stage 6D Object Pose Estimation, CVPR 2020

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

A toolkit for developing and comparing reinforcement learning algorithms.

Efficient semidefinite bounds for multi-label discrete graphical models.

We are More than Our JOints: Predicting How 3D Bodies Move

A keras-based real-time model for medical image segmentation (CFPNet-M)

Submodular Subset Selection for Active Domain Adaptation (ICCV 2021)

BboxToolkit is a tiny library of special bounding boxes.

LSUN Dataset Documentation and Demo Code

A Lightweight Experiment & Resource Monitoring Tool 📺

Official PyTorch implementation of "Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble" (NeurIPS'21)

Neurolab is a simple and powerful Neural Network Library for Python

Simple, but essential Bayesian optimization package

Segmentation models with pretrained backbones. Keras and TensorFlow Keras.

DeLiGAN - This project is an implementation of the Generative Adversarial Network

A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.