LCG T-TEST USING EUCLIDEAN METHOD

Overview

LCG T-TEST USING EUCLIDEAN METHOD


Advanced Analytics and Growth Marketing Telkomsel


  • Project Supervisor : Rizli Anshari, General Manager of AAGM Telkomsel
  • Writer : Azka Rohbiya Ramadani, Muhammad Gilang, Demi Lazuardi

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

Background


In offering digital product, bussiness analyst must have considered what is the most suitable criterias of customers who have potential to buy the product. As an illustration, targetting gamers for games product campaign is better decision rather than targetting random customers without knowing customers behaviour. However, bussiness wouldn't make all the gamers as the campaign target, otherwise, market team deliberately wouldn't offer to several gamers randomly as comparison, called as Control Group. Therefore, it enables the team in measuring how success the campaign is.

After the campaign, there are product takers who are expected comes from campaign target. Additionaly, nontakers group is expected comes from Control Group and non-target customers. The analysis problems emerge afterwards, since nontakers might come from outside target. For this reason, our team made statistical technique in python algorithm to determine Control Group by applying euclideanmethod-combined t-test, in order for comparing takers and nontakers behaviour before the campaign, in this case we're focusing on revenue before. As a result, it enables bussiness analyst evaluating how success the campaign is.

Installation Guide


This algorith has been uploaded to pypi.org. Therefore, in order to get package, you can easely download using the following command

pip3 install lcgeuclideanmethod

Requirements


This python requires related package more importantly python_requires='>=3.1', so that package can be install Make sure the other packages meet the requirements below

  • pandas>=1.1.5,
  • numpy>=1.18.5,
  • scipy>=1.2.0,
  • matplotlib>=3.1.0,
  • statsmodels>=0.8.0

Usage Guide


1. EuclideanMethod

  • Input:
    • df_takers : dataframe takers containing two columns, customers and revenue before campaign
    • df_nontakers : dataframe nontakers containing two columns, customers and revenue before campaign
  • Output:
    • summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc.
    • df_result : subprocess table to find p-value from random nontakers
    • df_tukey : main result containing category customers category, based on summary calculation
    • tukey : tukey HSD evaluation, readmore Tukey HSD

sample code

from lcgttest.lcgttest import EuclideanMethod
import pandas as pd

# where you put takers and nontakers file
df_takers = pd.read_csv('takers.csv')
df_nontakers = pd.read_csv('nontakers.csv')

model = EuclideanMethod(df_takers, df_nontakers)
model.run()

# output
print(model.summary)
print(model.df_result)
print(model.df_tukey)
print(model.tukey)

2. MapEuclideanMethod

This is like map function in python

  • Input:
    • arr_df_takers : dataframe takers but in array form
    • arr_df_nontakers : dataframe nontakers but in array form
    • labels : labels of both takers and nontakers in array form
  • Output:
    • df_summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc in dataframe form.
    • dict_df_result : subprocess table to find p-value from random nontakers in dicttionary type.
    • dict_df_tukey : main result containing category customers category, based on summary calculation in dicttionary type.
    • dict_tukey : tukey HSD evaluation, readmore Tukey HSD in dicttionary type.

sample code

from lcgttest.lcgttest import MapEuclideanMethod
import pandas as pd
import numpy as np

# where you put takers and nontakers file
arr_df_takers = np.array([df_takers, df_takers2, df_takers3])
arr_df_takers = np.array([df_nontakers, df_nontakers2, df_nontakers3])
labels = ['campaignA','campaignB','campaignC']

model2 = MapEuclideanMethod(arr_df_takers, arr_df_nontakers, label = labels )

# output
print(model.df_summary)
print(model.dict_df_result)
print(model.dict_df_tukey)
print(model.dict_tukey)

3. EuclideanMethodAscDesc

This is run twice MapEuclideanMethod ascending and descending (technique to randomize the nontakers samples)

  • Input:
    • arr_df_takers : dataframe takers but in array form
    • arr_df_nontakers : dataframe nontakers but in array form
    • labels : labels of both takers and nontakers in array form
  • Output:
    • df_summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc in dataframe form.
    • dict_df_result : subprocess table to find p-value from random nontakers in dicttionary type.
    • dict_df_tukey : main result containing category customers category, based on summary calculation in dicttionary type.
    • dict_tukey : tukey HSD evaluation, readmore Tukey HSD in dicttionary type.

sample code

from lcgttest.lcgttest import EuclideanMethodAscDesc
import pandas as pd
import numpy as np

# where you put takers and nontakers file
arr_df_takers = np.array([df_takers, df_takers2, df_takers3])
arr_df_takers = np.array([df_nontakers, df_nontakers2, df_nontakers3])
labels = ['campaignA','campaignB','campaignC']

model3 = EuclideanMethodAscDesc(arr_df_takers, arr_df_nontakers, label = labels )

# output
print(model3.df_summary)
print(model3.dict_df_result)
print(model3.dict_df_tukey)
print(model3.dict_tukey)

# additional input
print(model3.df_asc_desc_avg)
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO 🦕 ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023
ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

CAiRE 11 Dec 09, 2022
NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

NLPretext packages in a unique library all the text preprocessing functions you need to ease your NLP project.

Artefact 114 Dec 15, 2022
FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

FedNLP is a research-oriented benchmarking framework for advancing federated learning (FL) in natural language processing (NLP). It uses FedML repository as the git submodule. In other words, FedNLP

FedML-AI 216 Nov 27, 2022
Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".

Patience-based Early Exit Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit". NEWS: We now have a better and tidier i

Kevin Canwen Xu 54 Jan 04, 2023
Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow. Documentation Proper documentation is available at

HUSEIN ZOLKEPLI 151 Jan 05, 2023
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

6.9k Jan 01, 2023
Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

Donghoon Shin 3 Dec 02, 2022
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

Explosion 70 Dec 12, 2022
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

UNITER: UNiversal Image-TExt Representation Learning This is the official repository of UNITER (ECCV 2020). This repository currently supports finetun

Yen-Chun Chen 680 Dec 24, 2022
Based on 125GB of data leaked from Twitch, you can see their monthly revenues from 2019-2021

Twitch Revenues Bu script'i kullanarak istediğiniz yayıncıların, Twitch'den sızdırılan 125 GB'lik veriye dayanarak, 2019-2021 arası aylık gelirlerini

4 Nov 11, 2021
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Breame ( British English and American English) Breame is a lightweight Python package with a number of utility tools to aid in the detection of words

Charles 8 Oct 10, 2022
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
SurvTRACE: Transformers for Survival Analysis with Competing Events

⭐ SurvTRACE: Transformers for Survival Analysis with Competing Events This repo provides the implementation of SurvTRACE for survival analysis. It is

Zifeng 13 Oct 06, 2022
Snowball compiler and stemming algorithms

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algori

Snowball Stemming language and algorithms 613 Jan 07, 2023
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 01, 2023
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Applying BERT Fine Tuning to Sentiment Classification on Amazon Reviews Abstract Sentiment analysis has made great progress in recent years, due to th

Alexander Leonardo Lique Lamas 5 Jan 03, 2022
Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Rank-One Model Editing (ROME) This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only).

Kevin Meng 130 Dec 21, 2022