Data imputations library to preprocess datasets with missing data

Last update: Dec 05, 2022

Overview

Impyute

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

>>> n = 5
>>> arr = np.random.uniform(high=6, size=(n, n))
>>> for _ in range(3):
>>>    arr[np.random.randint(n), np.random.randint(n)] = np.nan
>>> print(arr)
array([[0.25288643, 1.8149261 , 4.79943748, 0.54464834, np.nan],
       [4.44798362, 0.93518716, 3.24430922, 2.50915032, 5.75956805],
       [0.79802036, np.nan, 0.51729349, 5.06533123, 3.70669172],
       [1.30848217, 2.08386584, 2.29894541, np.nan, 3.38661392],
       [2.70989501, 3.13116687, 0.25851597, 4.24064355, 1.99607231]])
>>> import impyute as impy
>>> print(impy.mean(arr))
array([[0.25288643, 1.8149261 , 4.79943748, 0.54464834, 3.7122365],
       [4.44798362, 0.93518716, 3.24430922, 2.50915032, 5.75956805],
       [0.79802036, 1.99128649, 0.51729349, 5.06533123, 3.70669172],
       [1.30848217, 2.08386584, 2.29894541, 3.08994336, 3.38661392],
       [2.70989501, 3.13116687, 0.25851597, 4.24064355, 1.99607231]])

Feature Support

Imputation of Cross Sectional Data
- K-Nearest Neighbours
- Multivariate Imputation by Chained Equations
- Expectation Maximization
- Mean Imputation
- Mode Imputation
- Median Imputation
- Random Imputation
Imputation of Time Series Data
- Last Observation Carried Forward
- Moving Window
- Autoregressive Integrated Moving Average (WIP)
Diagnostic Tools
- Loggers
- Distribution of Null Values
- Comparison of imputations
- Little's MCAR Test (WIP)

Versions

Currently tested on 2.7, 3.4, 3.5, 3.6 and 3.7

Installation

To install impyute, run the following:

$ pip install impyute

Or to get the most current version:

$ git clone https://github.com/eltonlaw/impyute
$ cd impyute
$ python setup.py install

Documentation

Documentation is available here: http://impyute.readthedocs.io/

How to Contribute

Check out CONTRIBUTING

Data imputations library to preprocess datasets with missing data

Related tags

Overview

Impyute

Feature Support

Versions

Installation

Documentation

How to Contribute

Owner

Elton Law

Monitor the stability of a pandas or spark dataframe ⚙︎

A collection of robust and fast processing tools for parsing and analyzing web archive data.

BAyesian Model-Building Interface (Bambi) in Python.

A tax calculator for stocks and dividends activities.

EOD Historical Data Python Library (Unofficial)

This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

An Indexer that works out-of-the-box when you have less than 100K stored Documents

Universal data analysis tools for atmospheric sciences

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Aggregating gridded data (xarray) to polygons

Python utility to extract differences between two pandas dataframes.

Average time per match by division

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

Create HTML profiling reports from pandas DataFrame objects

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Tools for working with MARC data in Catalogue Bridge.

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

Statsmodels: statistical modeling and econometrics in Python

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)