Data-Analytics-on-Genomes-and-Genetics

Data Analytics performed on On genomes and Genetics dataset to predict genetic disorder and disorder subclass based on familial history and its effects.

This repository contains the visualisation, models and the source code as a part of the Final Project for the Data Analytics Course (UE19CS312) at PES University.

To run the code, make sure to include the test and train csv from this repository files in the same folder. The project can then be run cell by cell in order or by clicking the run all option by running the of Genomes and Genetics.ipynb file on google collab, jupyter notebook or any suitable platform.

Link to dataset - https://www.kaggle.com/aryarishabh/of-genomes-and-genetics-hackerearth-ml-challenge

Link to the video - https://drive.google.com/file/d/1s6c2P6uMhr8yOOZ62fS2h5Wx0dGasc9p/view

Team SIGMA

B Pravena - PES2UG19CS076

Lavanya Yavagal - PES2UG19CS904

Swarnamalya A S - PES2UG19CS418

Varna Satyanarayana - PES2UG19CS448

Data Analytics on Genomes and Genetics

Related tags

Overview

Data-Analytics-on-Genomes-and-Genetics

Team SIGMA

Owner

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Py-price-monitoring - A Python price monitor

Data imputations library to preprocess datasets with missing data

A multi-platform GUI for bit-based analysis, processing, and visualization

Python Package for DataHerb: create, search, and load datasets.

2019 Data Science Bowl

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Average time per match by division

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

A collection of robust and fast processing tools for parsing and analyzing web archive data.

A library to create multi-page Streamlit applications with ease.

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

First steps with Python in Life Sciences

Candlestick Pattern Recognition with Python and TA-Lib

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Statsmodels: statistical modeling and econometrics in Python

signac-flow - manage workflows with signac

EOD Historical Data Python Library (Unofficial)

MoRecon - A tool for reconstructing missing frames in motion capture data.