Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great
TheMachineScraper 🐱👤 is a tool made purely for analysing machine data for any reason.
FilterPy - Kalman filters and other optimal and non-optimal estimation filters in Python. NOTE: Imminent drop of support of Python 2.7, 3.4. See secti
Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F
SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part
HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays o
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses
1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju
Generates a simple report about the current Covid-19 cases and deaths in Malaysia. Results are delay one day, data provided by the Ministry of Health Malaysia Covid-19 public data.
ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir
To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof
PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an
GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian
xMCA | Maximum Covariance Analysis in Python The aim of this package is to provide a flexible tool for the climate science community to perform Maximu
Mathematical modeling is a powerful method for the analysis of complex biological systems. Although there are many researches devoted on produ
Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. Pipelines are code, ready to be version c
PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-
Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab
ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and
MetPy MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data. MetPy follows semantic versioni
Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic
Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza
TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl
Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang
About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien
Stock Statistics/Indicators Calculation Helper VERSION: 0.3.2 Introduction Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline s
Sample code for Harry's Airflow online trainng course You can find the videos on youtube or bilibili. I am working on adding below things: the slide p
bigdata_analyse 大数据分析项目 wish 采用不同的技术栈，通过对不同行业的数据集进行分析，期望达到以下目标： 了解不同领域的业务分析指标 深化数据处理、数据分析、数据可视化能力 增加大数据批处理、流处理的实践经验 增加数据挖掘的实践经验
Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.
Zipline is a Pythonic algorithmic trading library. It is an event-driven system for backtesting. Zipline is currently used in production as the backte
tedana: TE Dependent ANAlysis TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI)
MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is
pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.
pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit
Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour
Orchest is a browser based IDE for Data Science. It integrates your favorite Data Science tools out of the box, so you don’t have to. The application is easy to use and can run on your laptop as well as on a large scale cloud cluster.
scikit-posthocs is a Python package that provides post hoc tests for pairwise multiple comparisons that are usually performed in statistical data anal
Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes
What is Retentioneering? Retentioneering is a Python framework and library to assist product analysts and marketing analysts as it makes it easier to
datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie
PostQF Copyright © 2022 Ralph Seichter PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j. See the ma
dbt2looker Use dbt2looker to generate Looker view files automatically from dbt models. Features Column descriptions synced to looker Dimension for eac
MoRecon - A tool for reconstructing missing frames in motion capture data.
MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,
A library to create multi-page Streamlit applications with ease.
AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana
A neural-based binary analysis tool Introduction This directory contains the demo of a neural-based binary analysis tool. We test the framework using
Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Functional Data Analysis Python package
hmmlearn hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and
Python utility to extract differences between two pandas dataframes.
Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc