Important dataframe statistics with a single command

Last update: Dec 19, 2021

Overview

quick_eda

Receiving dataframe statistics with one command

Project description

A python package for Data Scientists, Students, ML Engineers and anyone who wants dataframe meta data without the trouble of having to type in numerous commands.

Installation

Use pip to install quick-eda by typing or copying the following command.

pip install quick-eda

License

This package is licensed under BSD Clause 3.

Example usage

Users of the package can import the individual modules from this package, for example:

import quick_eda.df_eda
import quick_eda.column_eda

This loads the submodules quick_eda.df_eda and quick_eda.column_eda. They must be referenced with their full name.

quick_eda.df_eda.df_eda(<df>)
quick_eda.column_eda.column_eda(<column_name>)

An alternative way of importing the submodules is:

from quick_eda import df_eda
from quick_eda import column_eda

This also loads the submodules quick_eda.df_eda and quick_eda.column_eda, and makes them available without their prefix, so they can be used as follows:

df_eda.df_eda(<df>)
column_eda.column_eda(<column_name>)

Yet another variation is to import the desired functions directly:

from quick_eda.df_eda import df_eda
from quick_eda.column_eda import column_eda

Again, this loads the submodules, but makes them directly available:

df_eda(<df>)
column_eda(<column_name>)

Imagine you have a dataframe called pets with the columns name, age and color. You could then run statistics on both the entire dataframe or e.g. the column age with

df_eda(pets)
column_eda(pets, "age")

Source code & further information

The source code is maintained at https://github.com/sveneschlbeck/quick_eda
There are also further information concerning the BSD license model, contributing guidelines and more...

Important dataframe statistics with a single command

Related tags

Overview

quick_eda

Project description

Installation

License

Example usage

Source code & further information

Owner

Sven Eschlbeck

Airflow ETL With EKS EFS Sagemaker

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

A 2-dimensional physics engine written in Cairo

API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

Common bioinformatics database construction

scikit-survival is a Python module for survival analysis built on top of scikit-learn.

sportsdataverse python package

pyETT: Python library for Eleven VR Table Tennis data

Flexible HDF5 saving/loading and other data science tools from the University of Chicago

The lastest all in one bombing tool coded in python uses tbomb api

A simplified prototype for an as-built tracking database with API

Project under the certification "Data Analysis with Python" on FreeCodeCamp

Nobel Data Analysis

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

This is a python script to navigate and extract the FSD50K dataset

Basis Set Format Converter

Churn prediction with PySpark

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra