CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Last update: Jan 03, 2023

Related tags

Overview

CausalNLP

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Install

pip install -U pip
pip install causalnlp

Usage

Example: What is the causal impact of a positive review on a product click?

import pandas as pd
df = pd.read_csv('sample_data/music_seed50.tsv', sep='\t', error_bad_lines=False)

The file music_seed50.tsv is a semi-simulated dataset from here. Columns of relevance include:

Y_sim: outcome, where 1 means product was clicked and 0 means not.
text: raw text of review
rating: rating associated with review (1 through 5)
T_true: 1 means rating less than 3, 0 means rating of 5, where T_true affects the outcome Y_sim.
T_ac: an approximation of true review sentiment (T_true) created with Autocoder from raw review text
C_true:confounding categorical variable (1=audio CD, 0=other)

We'll pretend the true sentiment (i.e., review rating and T_true) is hidden and only use T_ac as the treatment variable.

Using the text_col parameter, we include the raw review text as another "controlled-for" variable.

from causalnlp.causalinference import CausalInferenceModel
from lightgbm import LGBMClassifier
cm = CausalInferenceModel(df, 
                         metalearner_type='t-learner', learner=LGBMClassifier(num_leaves=500),
                         treatment_col='T_ac', outcome_col='Y_sim', text_col='text',
                         include_cols=['C_true'])
cm.fit()

outcome column (categorical): Y_sim
treatment column: T_ac
numerical/categorical covariates: ['C_true']
text covariate: text
preprocess time:  1.1179866790771484  sec
start fitting causal inference model
time to fit causal inference model:  10.361494302749634  sec

Estimating Treatment Effects

CausalNLP supports estimation of heterogeneous treatment effects (i.e., how causal impacts vary across observations, which could be documents, emails, posts, individuals, or organizations).

We will first calculate the overall average treatment effect (or ATE), which shows that a positive review increases the probability of a click by 13 percentage points in this dataset.

Average Treatment Effect (or ATE):

print( cm.estimate_ate() )

{'ate': 0.1309311542209525}

Conditional Average Treatment Effect (or CATE): reviews that mention the word "toddler":

print( cm.estimate_ate(df['text'].str.contains('toddler')) )

{'ate': 0.15559234254638685}

Individualized Treatment Effects (or ITE):

test_df = pd.DataFrame({'T_ac' : [1], 'C_true' : [1], 
                        'text' : ['I never bought this album, but I love his music and will soon!']})
effect = cm.predict(test_df)
print(effect)

[[0.80538201]]

Model Interpretability:

print( cm.interpret(plot=False)[1][:10] )

v_music    0.079042
v_cd       0.066838
v_album    0.055168
v_like     0.040784
v_love     0.040635
C_true     0.039949
v_just     0.035671
v_song     0.035362
v_great    0.029918
v_heard    0.028373
dtype: float64

Features with the v_ prefix are word features. C_true is the categorical variable indicating whether or not the product is a CD.

Text is Optional in CausalNLP

Despite the "NLP" in CausalNLP, the library can be used for causal inference on data without text (e.g., only numerical and categorical variables). See the examples for more info.

Documentation

API documentation and additional usage examples are available at: https://amaiya.github.io/causalnlp/

How to Cite

Please cite the following paper when using CausalNLP in your work:

@article{maiya2021causalnlp,
    title={CausalNLP: A Practical Toolkit for Causal Inference with Text},
    author={Arun S. Maiya},
    year={2021},
    eprint={2106.08043},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    journal={arXiv preprint arXiv:2106.08043},
}

Llvlir - Low Level Variable Length Intermediate Representation

Low Level Variable Length Intermediate Representation Low Level Variable Length

2 Jan 24, 2022

Semi-automated OpenVINO benchmark_app with variable parameters

Semi-automated OpenVINO benchmark_app with variable parameters. User can specify multiple options for any parameters in the benchmark_app and the progam runs the benchmark with all combinations of given options.

8 Apr 11, 2022

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

BMW Semantic Segmentation GPU/CPU Inference API This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit. The train

56 Nov 24, 2022

This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

34 Nov 24, 2022

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

1.2k Jan 4, 2023

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Note: This is an alpha (preview) version which is still under refining. nn-Meter is a novel and efficient system to accurately predict the inference l

244 Jan 6, 2023

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17.3k Dec 29, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17k Feb 11, 2021

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

🍐 quince Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding 🍐 Installation $ git clone [email protected]

19 Jun 23, 2022

Comments

Does your model support other languages than English?

Hi Amaiya, Thanks for your great package. Would you kindly let me know if your package supports languages other than English when using CausalBert?

I'm also interested in knowing whether I can exploit other Transformers models from the Huggingface hub?
question

opened by behroozazarkhalili 1

Error while fitting the model

Hi,

I ran to this bug while fitting the model. I checked the data and everything looks good. I don't get the root cause of this error.

File /opt/conda/lib/python3.8/site-packages/causalnlp/meta/slearner.py:80, in BaseSLearner.fit(self, X, treatment, y, p)
     78 mask = (treatment == group) | (treatment == self.control_name)
     79 treatment_filt = treatment[mask]
---> 80 X_filt = X[mask]
     81 y_filt = y[mask]
     83 w = (treatment_filt == group).astype(int)

IndexError: boolean index did not match indexed array along dimension 0

opened by hfarhidzadeh 1

Releases(v0.7.0)

v0.7.0(Aug 2, 2022)
0.7.0 (2022-08-02)

New:

N/A

Changed

updated dependencies

Fixed:

N/A

Source code(tar.gz)
Source code(zip)
v0.6.0(Oct 20, 2021)
0.6.0 (2021-10-20)

New:

Added model_name parameter to CausalBertModel to support other DistilBert models (e.g., multilingual)

Changed

N/A

Fixed:

N/A

Source code(tar.gz)
Source code(zip)
v0.5.0(Sep 3, 2021)
0.5.0 (2021-09-03)

New:

Added support for CausalBert

Changed

Added p parameter to CausalInferenceModel.fit and CausalInferenceModel.predict for user-supplied propensity scores in X-Learner and R-Learner.

Removed CV from propensity score computations in X-Learner and R-Learner and increase default max_iter to 10000

Fixed:

Resolved problem with CausalInferenceModel.tune_and_use_default_learner when outcome is continuous

Changed to max_iter=10000 for default LogisticRegression base learner

Source code(tar.gz)
Source code(zip)
v0.4.0(Sep 3, 2021)
0.4.0 (2021-07-20)

New:

N/A

Changed

Use LinearRegression and LogisticRegression as default base learners for s-learner.

changed parameter name of metalearner_type to method in CausalInferenceModel.

Fixed:

Resolved mis-references in _balance method (renamed from _minimize_bias).

Fixed convergence issues and factored out propensity score computations to CausalInferenceModel.compute_propensity_scores.

Source code(tar.gz)
Source code(zip)
v0.3.1(Jul 19, 2021)
0.3.1 (2021-07-19)

New:

N/A

Changed

N/A

Fixed:

Added sample_size parameter to CausalInferenceModel.evalute_robustness

Source code(tar.gz)
Source code(zip)
v0.3.0(Jul 15, 2021)
0.3.0 (2021-07-15)

New:

Added CausalInferenceModel.evaluate_robustness method to assess robustness of causal estimates using sensitivity analysis

Changed

reduced dependencies with local metalearner implementations

Fixed:

N/A

Source code(tar.gz)
Source code(zip)
v0.2.0(Jun 21, 2021)
0.2.0 (2021-06-21)

New:

key driver analysis

Changed

CausalInfererenceModel.fit returns self

Fixed:

N/A

Source code(tar.gz)
Source code(zip)
v0.1.3(Jun 17, 2021)
0.1.3 (2021-06-17)

New:

N/A

Changed

N/A

Fixed:

version fix

Source code(tar.gz)
Source code(zip)
v0.1.2(Jun 17, 2021)
0.1.2 (2021-06-17)

New:

N/A

Changed

Better interpretability and explainability of treatment effects

Fixed:

Fixes to some bugs in preprocessing

Source code(tar.gz)
Source code(zip)
v0.1.1(Jun 17, 2021)
0.1.1 (2021-06-16)

New:

N/A

Changed

Refactored DataFrame preprocessing

Fixed:

N/A

Source code(tar.gz)
Source code(zip)
v0.1.0(Jun 16, 2021)
0.1.0 (2021-06-15)

New:

First release.

Changed

N/A

Fixed:

N/A

Source code(tar.gz)
Source code(zip)

Owner

Arun S. Maiya

computer scientist

GitHub Repository https://amaiya.github.io/causalnlp/

Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

51 Dec 13, 2022

上海交通大学全自动抢课脚本，支持准点开抢与抢课后持续捡漏两种模式。2021/06/08更新。

Welcome to Course-Bullying-in-SJTU-v3.1！ 2021/6/8 紧急更新v3.1 更新说明为了更好地保护用户隐私，将原来用户名+密码的登录方式改为微信扫二维码+cookie登录方式，不再需要配置使用pytesseract。在使用扫码登录模式时，请稍等，二维码将马

87 Sep 13, 2022

Computationally efficient algorithm that identifies boundary points of a point cloud.

BoundaryTest Included are MATLAB and Python packages, each of which implement efficient algorithms for boundary detection and normal vector estimation

6 Dec 09, 2022

Shape-Adaptive Selection and Measurement for Oriented Object Detection

Source Code of AAAI22-2171 Introduction The source code includes training and inference procedures for the proposed method of the paper submitted to t

24 Nov 29, 2022

Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting" by Shu et al.

[Re] Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping

1 Mar 13, 2020

Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021]

Neural Material Official code repository for the paper: Generative Modelling of BRDF Textures from Flash Images [SIGGRAPH Asia, 2021] Henzler, Deschai

80 Dec 20, 2022

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet This Git repository for the official PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech E

357 Jan 04, 2023

Official PyTorch Implementation of "Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs". NeurIPS 2020.

Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs This repository is the implementation of SELAR. Dasol Hwang* , Jinyoung Pa

48 Nov 09, 2022

Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

LassoBench LassoBench is a library for high-dimensional hyperparameter optimization benchmarks based on Weighted Lasso regression. Note: LassoBench is

5 Mar 15, 2022

Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

0 May 04, 2022

An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

7 Dec 25, 2022

ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

ICON: Implicit Clothed humans Obtained from Normals Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black CVPR 2022 News 🚩 [2022/04/26] H

1.1k Jan 04, 2023

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Gesture-Volume-Control This Python program can adjust the system's volume by usi

1 Dec 30, 2021

a minimal terminal with python 😎😉

Meterm a terminal with python 😎 How to use Clone Project: $ git clone https://github.com/motahharm/meterm.git Run: in Terminal: meterm.exe Or pip ins

5 Jan 28, 2022

Subdivision-based Mesh Convolutional Networks

Subdivision-based Mesh Convolutional Networks The official implementation of SubdivNet in our paper, Subdivion-based Mesh Convolutional Networks Requi

181 Dec 28, 2022

Official implementation of "Articulation Aware Canonical Surface Mapping"

Articulation-Aware Canonical Surface Mapping Nilesh Kulkarni, Abhinav Gupta, David F. Fouhey, Shubham Tulsiani Paper Project Page Requirements Python

56 Dec 16, 2022

An Implementation of SiameseRPN with Feature Pyramid Networks

SiameseRPN with FPN This project is mainly based on HelloRicky123/Siamese-RPN. What I've done is just add a Feature Pyramid Network method to the orig

3 Apr 16, 2022

A PyTorch Implementation of the Luna: Linear Unified Nested Attention

Unofficial PyTorch implementation of Luna: Linear Unified Nested Attention The quadratic computational and memory complexities of the Transformer’s at

32 Nov 07, 2022

Sparse-dense operators implementation for Paddle

Sparse-dense operators implementation for Paddle This module implements coo, csc and csr matrix formats and their inter-ops with dense matrices. Feel

3 Dec 17, 2022

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations. Each modality’s augmentations are contained within its own sub-l

4.6k Jan 09, 2023

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Related tags

Overview

CausalNLP

Install

Usage

Example: What is the causal impact of a positive review on a product click?

Estimating Treatment Effects

Text is Optional in CausalNLP

Documentation

How to Cite

You might also like...

Llvlir - Low Level Variable Length Intermediate Representation

Semi-automated OpenVINO benchmark_app with variable parameters

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

Comments

Does your model support other languages than English?

Error while fitting the model

Releases(v0.7.0)

v0.7.0(Aug 2, 2022)

0.7.0 (2022-08-02)

New:

Changed

Fixed:

v0.6.0(Oct 20, 2021)

0.6.0 (2021-10-20)

New:

Changed

Fixed:

v0.5.0(Sep 3, 2021)

0.5.0 (2021-09-03)

New:

Changed

Fixed:

v0.4.0(Sep 3, 2021)

0.4.0 (2021-07-20)

New:

Changed

Fixed:

v0.3.1(Jul 19, 2021)

0.3.1 (2021-07-19)

New:

Changed

Fixed:

v0.3.0(Jul 15, 2021)

0.3.0 (2021-07-15)

New:

Changed

Fixed:

v0.2.0(Jun 21, 2021)

0.2.0 (2021-06-21)

New:

Changed

Fixed:

v0.1.3(Jun 17, 2021)

0.1.3 (2021-06-17)

New:

Changed

Fixed:

v0.1.2(Jun 17, 2021)

0.1.2 (2021-06-17)

New:

Changed

Fixed:

v0.1.1(Jun 17, 2021)

0.1.1 (2021-06-16)

New:

Changed

Fixed:

v0.1.0(Jun 16, 2021)

0.1.0 (2021-06-15)

New:

Changed

Fixed: