Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.

Overview

Data Inspector

Author MIT Contributions welcome Stars Downloads

Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier.

Author: Kazi Amit Hasan

Project Description:

Data Inspector brings 15++ essential exploratory data analysis, data cleaning automations to make a dataset understandable. This is a perfect tool to get started with you data.

Latest Added Feature:

Added regplots in the library

Installation:

pip install data-inspector

Package available at https://pypi.org/project/data-inspector/

Available automation:

  1. Line plot : line_plot(data, x_data, y_data, x_label="", y_label="", title="")
  2. Skew feature: plot_skewed_feature(data, column)
  3. Showing data distribution: show_distribution(data, column)
  4. Scatter plot: plot_scatter(data,x_data, y_data)
  5. Correlation plot: plot_correlation(data)
  6. Create histogram: histogram(data,column, x_label, y_label, title)
  7. Create bar plot: plot_bar(data, column, xlabel, ylabel, title)
  8. Create boxplots of all features: box_plot(data)
  9. Checking dataset's shape: datasetShape(data)
  10. Get dataset's diagnostic plots: diagnostic_plots(data, variable)
  11. Divide numerical and categorical features: divideFeatures(data)
  12. Fill NaN values: fillNan(data, column, value)
  13. Get pearson's correlation between two variables: get_correlation(column_1, column_2, data)
  14. Plotting kde plots: plot_cont_kde(data, var)
  15. Automatic calculating the missing values and their percentage along with visualization : calculating_missing_values(data)
  16. Regression plot with 95% CI : plot_regplot(data,x_data, y_data)

Tutorial:

Link: https://github.com/AmitHasanShuvo/data-inspector/blob/main/notebook/example%20notebook.ipynb
Colab link: https://colab.research.google.com/drive/1mj9gz2XyQprSYdKMUKlKkJ9Qi8XmleHW?usp=sharing

Some visualizations:



How to cite:

@online{data-inspector,
title={data-inspector},
url={https://pypi.org/project/data-inspector/},
urldate = {2021-08-21}, 
publisher={Kazi Amit Hasan}
}

Future Works:

  1. Add some automations for time series data.

How to contribute:

Any contribution would be highly appreciated. Kindly go through the guidelines for contributing in github.

You might also like...
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

CLI tool to measure the build time of different, free configurable Sphinx-Projec

A module filled with many useful functions and modules in various subjects.
A module filled with many useful functions and modules in various subjects.

Usefulpy Check out the Usefulpy site Usefulpy site is not always up to date Download and Import download and install with with pip download usefulpyth

Template repo to quickly make a tested and documented GitHub action in Python with Poetry

Python + Poetry GitHub Action Template Getting started from the template Rename the src/action_python_poetry package. Globally replace instances of ac

Make posters from Markdown files.
Make posters from Markdown files.

MkPosters Create posters using Markdown. Supports icons, admonitions, and LaTeX mathematics. At the moment it is restricted to the specific layout of

A tutorial for people to run synthetic data replica's from source healthcare datasets
A tutorial for people to run synthetic data replica's from source healthcare datasets

Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic

A Python library for setting up projects using tabular data.

A Python library for setting up projects using tabular data. It can create project folders, standardize delimiters, and convert files to CSV from either individual files or a directory.

Source Code for 'Practical Python Projects' (video) by Sunil Gupta
Source Code for 'Practical Python Projects' (video) by Sunil Gupta

Apress Source Code This repository accompanies %Practical Python Projects by Sunil Gupta (Apress, 2021). Download the files as a zip using the green b

Automatically open a pull request for repositories that have no CONTRIBUTING.md file

automatic-contrib-prs Automatically open a pull request for repositories that have no CONTRIBUTING.md file for a targeted set of repositories. What th

The source code that powers readthedocs.org

Welcome to Read the Docs Purpose Read the Docs hosts documentation for the open source community. It supports Sphinx docs written with reStructuredTex

Releases(eda)
  • eda(Aug 19, 2021)

    Data Inspector brings a total of 15 essential exploratory data analysis, data cleaning automations to make a dataset understandable. This is a perfect tool to get started with you data.

    PYPI link: https://pypi.org/project/data-inspector/

    Source code(tar.gz)
    Source code(zip)
Owner
Kazi Amit Hasan
ML Engineer at ACI Limited | Kaggle Competition Expert (x4) | Researcher
Kazi Amit Hasan
Yet Another MkDocs Parser

yamp Motivation You want to document your project. You make an effort and write docstrings. You try Sphinx. You think it sucks and it's slow -- I did.

Max Halford 10 May 20, 2022
A system for Python that generates static type annotations by collecting runtime types

MonkeyType MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type

Instagram 4.1k Jan 07, 2023
Types that make coding in Python quick and safe.

Type[T] Types that make coding in Python quick and safe. Type[T] works best with Python 3.6 or later. Prior to 3.6, object types must use comment type

Contains 17 Aug 01, 2022
Tips for Writing a Research Paper using LaTeX

Tips for Writing a Research Paper using LaTeX

Guanying Chen 727 Dec 26, 2022
My solutions to the Advent of Code 2021 problems in Go and Python 🎄

🎄 Advent of Code 2021 🎄 Summary Advent of Code is an annual Advent calendar of programming puzzles. This year I am doing it in Go and Python. Runnin

Orfeas Antoniou 16 Jun 16, 2022
Pystm32ai - A Python wrapper for the stm32ai command-line tool

PySTM32.AI A python wrapper for the stm32ai command-line tool to analyse deep le

Thibaut Vercueil 5 Jul 28, 2022
Count the number of lines of code in a directory, minus the irrelevant stuff

countloc Simple library to count the lines of code in a directory (excluding stuff like node_modules) Simply just run: countloc node_modules args to

Anish 4 Feb 14, 2022
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Ram Seshadri 6 Oct 02, 2022
Testing-crud-login-drf - Creation of an application in django on music albums

testing-crud-login-drf Creation of an application in django on music albums Befo

Juan 1 Jan 11, 2022
NetBox plugin for BGP related objects documentation

Netbox BGP Plugin Netbox plugin for BGP related objects documentation. Compatibility This plugin in compatible with NetBox 2.10 and later. Installatio

Nikolay Yuzefovich 133 Dec 27, 2022
Generates, filters, parses, and cleans data regarding the financial disclosures of judges in the American Judicial System

This repository contains code that gets data regarding financial disclosures from the Court Listener API main.py: contains driver code that interacts

Ali Rastegar 2 Aug 06, 2022
Gaphor is the simple modeling tool

Gaphor Gaphor is a UML and SysML modeling application written in Python. It is designed to be easy to use, while still being powerful. Gaphor implemen

Gaphor 1.3k Jan 03, 2023
Python-slp - Side Ledger Protocol With Python

Side Ledger Protocol Run python-slp node First install Mongo DB and run the mong

Solar 3 Mar 02, 2022
Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents.

Iury de oliveira gomes figueiredo 60 Nov 16, 2022
Uses diff command to compare expected output with student's submission output

AUTOGRADER for GRADESCOPE using diff with partial grading Description: Uses diff command to compare expected output with student's submission output U

2 Jan 11, 2022
A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Knowledge Repo The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using

Airbnb 5.2k Dec 27, 2022
Highlight Translator can help you translate the words quickly and accurately.

Highlight Translator can help you translate the words quickly and accurately. By only highlighting, copying, or screenshoting the content you want to translate anywhere on your computer (ex. PDF, PPT

Coolshan 48 Dec 21, 2022
Exercism exercises in Python.

Exercism exercises in Python.

Exercism 1.3k Jan 04, 2023
Repository for learning Python (Python Tutorial)

Repository for learning Python (Python Tutorial) Languages and Tools 🧰 Overview 📑 Repository for learning Python (Python Tutorial) Languages and Too

Swiftman 2 Aug 22, 2022
The tutorial is a collection of many other resources and my own notes

Why we need CTC? --- looking back on history 1.1. About CRNN 1.2. from Cross Entropy Loss to CTC Loss Details about CTC 2.1. intuition: forward algor

手写AI 7 Sep 19, 2022