demir.ai Dataset Operations

Overview

demir.ai Dataset Operations

With this application, you can have the empty values (nan/null) deleted or filled before giving your dataset to machine learning algorithms, you can access visual or numerical information about your dataset and have more detailed information about your attributes.

The application is written in Python programming language, Flask framework is used in the backend, Html is used in the frontent. Pandas framework is used to navigate over the dataset, all numerical operations on the dataset were written by me and no ready-made functions were used, while the plots were created from scratch by me using the Opencv framework.

Before running the application, you can install the necessary packages for the application with the following command.

pip3 install -r requirements.txt

You can launch the web application with the following command, and then you can use the application by going to http://localhost:5000/.

python3 main.py

With this web application, you can delete rows or columns with empty values (nan/null) on your dataset or fill these empty values in three different ways.

  • Null value (nan) operations you can do on your dataset with demir.ai Dataset Operations:

    • Column-based deletion of null data (nan/null)
    • Row-based deletion of null data (nan/null)
    • Filling in blank data by mean, median and mode

Again, thanks to this web application, you can reach visual or numerical results about your dataset and have detailed information about your dataset.

  • Information you can learn about your dataset with demir.ai Dataset Operations:

    • Mean of columns
    • Median of columns
    • Mode of columns
    • Frequency of columns
    • Interquartile range value (IQR) of columns
    • Outliers of columns
    • Five number summary of columns
    • Box Chart of columns
    • Variance and standard deviation of columns

Null value (nan/null) operations

  • Column-based deletion of null data (nan/null): The number of nulls is calculated for each column, then the percentage of nulls is calculated and if this percentage is greater than the percentage the user enters, this column is deleted.

  • Row-based deletion of null data (nan/null): The number of nulls is calculated for each line, and if this number of nulls is greater than the number entered by the user, this line is deleted.

  • Filling in blank data by mean, median and mode:

    • Mean: The sum of the non-blank values of the columns is taken and divided by the total number of non-blank values, the average obtained is written instead of the empty values.

    • Median: The median is calculated according to the non-blank values in the columns, and then this median value is written instead of the empty columns.

    • Mode: The mode is calculated according to the non-blank values in the columns, and then this mode value is written instead of the empty columns

Information you can learn about your dataset

  • Mean of columns: The mean is calculated for each column separately and the column mean information is presented to the user.

  • Median of columns: The median is calculated for each column separately and the column median information is presented to the user.

  • Mode of columns: The mode is calculated for each column separately and the column mode information is presented to the user.

  • Frequency of columns: Frequency is calculated for each column and the frequency information of the columns is presented to the user. In this section, frequency visualization is also done by creating a bar plot from scratch with Opencv.

  • Interquartile range value (IQR) of columns: Q1 and Q3 values are found for each column, then the IQR value of the columns is found with Q3-Q1 and presented to the user.

  • Outliers of columns: If the data in the column is less than (Q1-IQR * 1.5) and greater than (Q3+IQR * 1.5), it is called outlier and this information is presented to the user.

  • Five number summary of columns: Minimum, Q1, median, Q3 and Maximum values are calculated and presented to the user.

  • Box Chart of columns: After finding the minimum, Q1, median, Q3 and maximum values for each column, a box chart is created from scratch with Opencv and this chart is presented to the user.

  • Variance and standard deviation of columns: The variance and standard deviation for each column are calculated and presented to the user.

Application video

demirai.mp4
Owner
Ahmet Furkan DEMIR
Hi, my name is Ahmet Furkan DEMIR. I study computer engineering at Necmettin Erbakan University.
Ahmet Furkan DEMIR
Data visualization electromagnetic spectrum

Datenvisualisierung-Elektromagnetischen-Spektrum Anhand des Moduls matplotlib sollen die Daten des elektromagnetischen Spektrums dargestellt werden. D

Pulsar 1 Sep 01, 2022
Schema validation for Xarray objects

xarray-schema Schema validation for Xarray installation This package is in the early stages of development. Install it from source: pip install git+gi

carbonplan 22 Oct 31, 2022
script to generate HeN ipfs app exports of GLSL shaders

HeNerator A simple script to generate HeN ipfs app exports from any frag shader created with: GlslViewer GlslEditor The Book of Shaders glslCanvas VS

Patricio Gonzalez Vivo 22 Dec 21, 2022
HM02: Visualizing Interesting Datasets

HM02: Visualizing Interesting Datasets This is a homework assignment for CSCI 40 class at Claremont McKenna College. Go to the project page to learn m

Qiaoling Chen 11 Oct 26, 2021
阴阳师后台全平台(使用网易 MuMu 模拟器)辅助。支持御魂,觉醒,御灵,结界突破,秘闻副本,地域鬼王。

阴阳师后台全平台辅助 Python 版本:Python 3.8.3 模拟器:网易 MuMu | 雷电模拟器 模拟器分辨率:1024*576 显卡渲染模式:兼容(OpenGL) 兼容 Windows 系统和 MacOS 系统 思路: 利用 adb 截图后,使用 opencv 找图找色,模拟点击。使用

简讯 27 Jul 09, 2022
Plotting data from the landroid and a raspberry pi zero to a influx-db

landroid-pi-influx Plotting data from the landroid and a raspberry pi zero to a influx-db Dependancies Hardware: Landroid WR130E Raspberry Pi Zero Wif

2 Oct 22, 2021
Simple Inkscape Scripting

Simple Inkscape Scripting Description In the Inkscape vector-drawing program, how would you go about drawing 100 diamonds, each with a random color an

Scott Pakin 140 Dec 27, 2022
Simple spectra visualization tool for astronomers

SpecViewer A simple visualization tool for astronomers. Dependencies Python = 3.7.4 PyQt5 = 5.15.4 pyqtgraph == 0.10.0 numpy = 1.19.4 How to use py

5 Oct 07, 2021
基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。

COVID-19-Epidemic-Map 基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。 觉得项目还不错的话欢迎给一个star! 项目的源码可以正常运行,各个库的版本、数据库的建表语句、运行过程中遇到的坑以及解决方式在笔记.md中都

31 Dec 15, 2022
A small script written in Python3 that generates a visual representation of the Mandelbrot set.

Mandelbrot Set Generator A small script written in Python3 that generates a visual representation of the Mandelbrot set. Abstract The colors in the ou

1 Dec 28, 2021
Cartopy - a cartographic python library with matplotlib support

Cartopy is a Python package designed to make drawing maps for data analysis and visualisation easy. Table of contents Overview Get in touch License an

1.2k Jan 01, 2023
Plot and save the ground truth and predicted results of human 3.6 M and CMU mocap dataset.

Visualization-of-Human3.6M-Dataset Plot and save the ground truth and predicted results of human 3.6 M and CMU mocap dataset. human-motion-prediction

Gaurav Kumar Yadav 5 Nov 18, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 05, 2023
Jupyter Notebook extension leveraging pandas DataFrames by integrating DataTables and ChartJS.

Jupyter DataTables Jupyter Notebook extension to leverage pandas DataFrames by integrating DataTables JS. About Data scientists and in fact many devel

Marek Čermák 142 Dec 28, 2022
Visualizations of linear algebra algorithms for people who want a deep understanding

Visualising algorithms on symmetric matrices Examples QR algorithm and LR algorithm Here, we have a GIF animation of an interactive visualisation of t

ogogmad 3 May 05, 2022
Some method of processing point cloud

Point-Cloud Some method of processing point cloud inversion the completion pointcloud to incomplete point cloud Some model of encoding point cloud to

Tan 1 Nov 19, 2021
By default, networkx has problems with drawing self-loops in graphs.

By default, networkx has problems with drawing self-loops in graphs. It makes it hard to draw a graph with self-loops or to make a nicely looking chord diagram. This repository provides some code to

Vladimir Shitov 5 Jan 06, 2022
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

Plotly 17.9k Dec 31, 2022
A small tool to test and visualize protein embeddings and amino acid proportions.

polyprotein_stats A small tool to test and visualize protein embeddings and amino acid proportions. Currently deployed on streamlit.io. Given a set of

2 Jan 07, 2023
Small project to recursively calculate and plot each successive order of the Hilbert Curve

hilbert-curve Small project to recursively calculate and plot each successive order of the Hilbert Curve. After watching 3Blue1Brown's video on Hilber

Stefan Mejlgaard 2 Nov 15, 2021