Churn prediction with PySpark

Last update: Aug 13, 2021

Related tags

Data Analysis Churn_Prediction

Overview

Churn Prediction

Objective

It is expected to develop a machine learning model that can predict customers who will leave the company.

About Dataset

Consists of 10000 observations and 12 variables.
The independent variables contain information about customers.
The dependent variable represents the customer abandonment status.

Variables

Surname – Customer surname
CreditScore – Customer's credit score
Geography – Country where the customer is located
Gender – Customer's gender
Age – Customer's age
Tenure – Information on how many years of customer it is
NumOfProducts – Used bank product
HasCrCard – Credit card status (0=No,1=Yes)
IsActiveMember – Active Membership status (0=No,1=Yes)
EstimatedSalary – Customer's estimated salary
Exited: – Exited or not (0=No,1=Yes)

Owner

GitHub Repository

Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021

Data processing with Pandas.

Processing-data-with-python This is a simple example showing how to use Pandas to create a dataframe and the processing data with python. The jupyter

1 Jan 23, 2022

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f

342 Jan 07, 2023

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

7.2k Dec 30, 2022

Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

7 Aug 07, 2022

vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

gg I wasn't satisfied with any of the other available Gemini clients, so I wrote my own. Requires Python 3.9 (maybe older, I haven't checked) and opti

5 Jan 03, 2023

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

917 Jan 03, 2023

Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

96 Jan 01, 2023

Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

2.5k Jan 09, 2023

Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

791 Jan 04, 2023

Churn prediction with PySpark

Related tags

Overview

Churn Prediction

Objective

About Dataset

Variables

Owner

Projects that implement various aspects of Data Engineering.

Data processing with Pandas.

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Validation and inference over LinkML instance data using souffle

vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Scraping and analysis of leetcode-compensations page.

Extract data from a wide range of Internet sources into a pandas DataFrame.

Churn prediction with PySpark

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Top 50 best selling books on amazon

Generates a simple report about the current Covid-19 cases and deaths in Malaysia

Pandas and Spark DataFrame comparison for humans

Cleaning and analysing aggregated UK political polling data.

PyPSA: Python for Power System Analysis

Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

A 2-dimensional physics engine written in Cairo

Exploratory data analysis

Generate lookml for views from dbt models