Flenser

Have you ever been handed a dataset you've never seen before?

Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.

Flenser is intended to be run at the earliest stages of data exploration, when you have no familiarity with the dataset. It will do its best to tell you what is actually going on in the dataset, regardless of what is supposed to be going on in the dataset.

Flenser is designed to be helpful, not 'helpful': it will not attempt to modify or make assumptions about your dataset. Instead it will apply each simple test, to every column, and show you outputs that will allow your human brain to make decisions about what is actually going on.

Additional tests can be added by modifying the Test dataclass.

How to run

python3 flenser.py 'filename.csv'

Flenser will print its default list of nans. You may specify one or more additional nan values to use, as follows:

python3 flenser.py 'filename.csv' 'nan1' 'nan2' 'nan3' ...

With thanks to

Recurse
Kelly F
Rebecca H
Azhad S
Shivam S
Christina M
Adam K
Edith V
Justin R

Flenser is a simple, minimal, automated exploratory data analysis tool.

Related tags

Overview

Flenser

How to run

With thanks to

Owner

John McCambridge

Extract data from a wide range of Internet sources into a pandas DataFrame.

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Gathering data of likes on Tinder within the past 7 days

A real data analysis and modeling project - restaurant inspections

PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Stock Analysis dashboard Using Streamlit and Python

Functional tensors for probabilistic programming

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

Semi-Automated Data Processing

A multi-platform GUI for bit-based analysis, processing, and visualization

A set of tools to analyse the output from TraDIS analyses

Flenser is a simple, minimal, automated exploratory data analysis tool.

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

PyEmits, a python package for easy manipulation in time-series data.

Big Data & Cloud Computing for Oceanography

Office365 (Microsoft365) audit log analysis tool

Data Analytics on Genomes and Genetics

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.