LCG T-TEST USING EUCLIDEAN METHOD

Overview

LCG T-TEST USING EUCLIDEAN METHOD


Advanced Analytics and Growth Marketing Telkomsel


  • Project Supervisor : Rizli Anshari, General Manager of AAGM Telkomsel
  • Writer : Azka Rohbiya Ramadani, Muhammad Gilang, Demi Lazuardi

This project has been created for statistical usage, purposing for determining ATL takers and nontakers using LCG ttest and Euclidean Method, especially for internal business case in Telkomsel.

Background


In offering digital product, bussiness analyst must have considered what is the most suitable criterias of customers who have potential to buy the product. As an illustration, targetting gamers for games product campaign is better decision rather than targetting random customers without knowing customers behaviour. However, bussiness wouldn't make all the gamers as the campaign target, otherwise, market team deliberately wouldn't offer to several gamers randomly as comparison, called as Control Group. Therefore, it enables the team in measuring how success the campaign is.

After the campaign, there are product takers who are expected comes from campaign target. Additionaly, nontakers group is expected comes from Control Group and non-target customers. The analysis problems emerge afterwards, since nontakers might come from outside target. For this reason, our team made statistical technique in python algorithm to determine Control Group by applying euclideanmethod-combined t-test, in order for comparing takers and nontakers behaviour before the campaign, in this case we're focusing on revenue before. As a result, it enables bussiness analyst evaluating how success the campaign is.

Installation Guide


This algorith has been uploaded to pypi.org. Therefore, in order to get package, you can easely download using the following command

pip3 install lcgeuclideanmethod

Requirements


This python requires related package more importantly python_requires='>=3.1', so that package can be install Make sure the other packages meet the requirements below

  • pandas>=1.1.5,
  • numpy>=1.18.5,
  • scipy>=1.2.0,
  • matplotlib>=3.1.0,
  • statsmodels>=0.8.0

Usage Guide


1. EuclideanMethod

  • Input:
    • df_takers : dataframe takers containing two columns, customers and revenue before campaign
    • df_nontakers : dataframe nontakers containing two columns, customers and revenue before campaign
  • Output:
    • summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc.
    • df_result : subprocess table to find p-value from random nontakers
    • df_tukey : main result containing category customers category, based on summary calculation
    • tukey : tukey HSD evaluation, readmore Tukey HSD

sample code

from lcgttest.lcgttest import EuclideanMethod
import pandas as pd

# where you put takers and nontakers file
df_takers = pd.read_csv('takers.csv')
df_nontakers = pd.read_csv('nontakers.csv')

model = EuclideanMethod(df_takers, df_nontakers)
model.run()

# output
print(model.summary)
print(model.df_result)
print(model.df_tukey)
print(model.tukey)

2. MapEuclideanMethod

This is like map function in python

  • Input:
    • arr_df_takers : dataframe takers but in array form
    • arr_df_nontakers : dataframe nontakers but in array form
    • labels : labels of both takers and nontakers in array form
  • Output:
    • df_summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc in dataframe form.
    • dict_df_result : subprocess table to find p-value from random nontakers in dicttionary type.
    • dict_df_tukey : main result containing category customers category, based on summary calculation in dicttionary type.
    • dict_tukey : tukey HSD evaluation, readmore Tukey HSD in dicttionary type.

sample code

from lcgttest.lcgttest import MapEuclideanMethod
import pandas as pd
import numpy as np

# where you put takers and nontakers file
arr_df_takers = np.array([df_takers, df_takers2, df_takers3])
arr_df_takers = np.array([df_nontakers, df_nontakers2, df_nontakers3])
labels = ['campaignA','campaignB','campaignC']

model2 = MapEuclideanMethod(arr_df_takers, arr_df_nontakers, label = labels )

# output
print(model.df_summary)
print(model.dict_df_result)
print(model.dict_df_tukey)
print(model.dict_tukey)

3. EuclideanMethodAscDesc

This is run twice MapEuclideanMethod ascending and descending (technique to randomize the nontakers samples)

  • Input:
    • arr_df_takers : dataframe takers but in array form
    • arr_df_nontakers : dataframe nontakers but in array form
    • labels : labels of both takers and nontakers in array form
  • Output:
    • df_summary : containing the number of expected Control Group populations based on max-p value, and other general information like average, std, max, min, etc in dataframe form.
    • dict_df_result : subprocess table to find p-value from random nontakers in dicttionary type.
    • dict_df_tukey : main result containing category customers category, based on summary calculation in dicttionary type.
    • dict_tukey : tukey HSD evaluation, readmore Tukey HSD in dicttionary type.

sample code

from lcgttest.lcgttest import EuclideanMethodAscDesc
import pandas as pd
import numpy as np

# where you put takers and nontakers file
arr_df_takers = np.array([df_takers, df_takers2, df_takers3])
arr_df_takers = np.array([df_nontakers, df_nontakers2, df_nontakers3])
labels = ['campaignA','campaignB','campaignC']

model3 = EuclideanMethodAscDesc(arr_df_takers, arr_df_nontakers, label = labels )

# output
print(model3.df_summary)
print(model3.dict_df_result)
print(model3.dict_df_tukey)
print(model3.dict_tukey)

# additional input
print(model3.df_asc_desc_avg)
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting Official PyTorch Implementation of paper "NeLF: Neural Light-tran

Ken Lin 38 Dec 26, 2022
中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur

452 Dec 30, 2022
This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

1.1k Dec 27, 2022
Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

Zumo Labs 253 Dec 21, 2022
History Aware Multimodal Transformer for Vision-and-Language Navigation

History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra

Shizhe Chen 46 Nov 23, 2022
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 06, 2023
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
Python powered crossword generator with database with 20k+ polish words

crossword_generator Generate simple crossword puzzle from words and definitions fetched from krzyżowki.edu.pl endpoints -/ string:word - returns js

0 Jan 04, 2022
A Structured Self-attentive Sentence Embedding

Structured Self-attentive sentence embeddings Implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR

Kaushal Shetty 488 Nov 28, 2022
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

David Zurow 22 Dec 29, 2022
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

🌳 Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
Python library for parsing resumes using natural language processing and machine learning

CVParser Python library for parsing resumes using natural language processing and machine learning. Setup Installation on Linux and Mac OS Follow the

nafiu 0 Jul 29, 2021
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

Laboratory for Social Machines 84 Dec 20, 2022
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023