Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

Overview

HackerMath for Machine Learning

“Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard Feynman

Math literacy, including proficiency in Linear Algebra and Statistics,is a must for anyone pursuing a career in data science. The goal of this workshop is to introduce some key concepts from these domains that get used repeatedly in data science applications. Our approach is what we call the “Hacker’s way”. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught.

The focus will be on depth rather than breadth. Three areas are chosen - Hypothesis Testing, Supervised Learning and Unsupervised Learning. They will be covered to sufficient depth - 50% of the time will be on the concepts and 50% of the time will be spent coding them.

More details at http://amitkaps.com/hackermath

See it in action: Binder

Module #1: Hypothesis Testing

Math Concepts

  • Basic Metrics: Mean, Variance, Covariance, Correlation
  • Discrete Probability Distributions: Bernoulli, Binomial
  • Cumulative Mass Function, Probability Mass Function
  • Continuous Probability Distributions: Poisson, Uniform, Normal, Beta, Gamma
  • Cumulative Distribution Function, Probability Density Function

ML Applications

  • Direct Simulation
  • Shuffling
  • Bootstrapping
  • Application to A/B Testing

Module #2: Supervised Learning

Math Concepts

  • Basics of Matrix Operation
  • Matrix Determinant, Inverse
  • Basics of Linear Algebra
  • Solve for Ax=b for nxn
  • Solve for Ax=b for nxp+1

ML Applications

  • Linear Regression
  • L2 Regularization
  • Gradient Descent
  • Linear Classifier
  • Logistic Regression

Module #3: Unsupervised Learning

Math Concepts

  • Matrix Projections
  • Solve for Ax=λx for nxn
  • Eigenvectors & Eigenvalues
  • Distance in Vector Space

ML Applications

  • Dimensionality Reduction
  • Principle Component Analysis
  • Cluster Analysis

Target Audience

  • Someone with a background in programming who wants to pick the math needed for data science and get a flavor for different data science problems
  • Someone who is a beginner in data science or has been doing data analysis (at least using Excel at a minimum) and wants to pick skills to take the next step in their data science career

Pre-requisites

  • Having a basic understanding of linear algebra would help. And we know you may have forgotten all about it from your school or college days. So here is an amazing video playlist by @3blue1brown to learn The Essence of Linear Algebra in a very visual way.
  • Also, a touch of calculus knowledge would make it also easier. So if you want to brush up your basic calculus skills, then @3blue1brown has another amazing video playlist to learn The Essence of Calculus in a very visual way.
  • Programming knowledge is mandatory. You should, at the bare minimum, be able to write conditional statements, use loops, be comfortable writing functions and be able to understand code snippets and come up with programming logic. Since we will be using Python - brush up your basics there. Specifically, we expect you to know the first three sections from this: http://anandology.com/python-practice-book/

Software Requirements

You will require the Python data stack for the workshop. Please install Ananconda for Python 3.5 for the workshop. That has everything we need for the workshop. For attendees more curious, we will be using Jupyter Notebook as our IDE. We will be introducing numpy, scipy, seaborn, matplotlib, plotnine, statsmodel and scikit-learn.

The working repo for this workshop is at https://github.com/amitkaps/hackermath/


Authors:

Amit Kapoor

Bargava Subramanian

Owner
Amit Kapoor
Crafting Visual Stories with Data.
Amit Kapoor
Face uncertainty quantification or estimation using PyTorch.

Face-uncertainty-pytorch This is a demo code of face uncertainty quantification or estimation using PyTorch. The uncertainty of face recognition is af

Kaen 3 Sep 16, 2022
Implementation of Nyström Self-attention, from the paper Nyströmformer

Nyström Attention Implementation of Nyström Self-attention, from the paper Nyströmformer. Yannic Kilcher video Install $ pip install nystrom-attention

Phil Wang 95 Jan 02, 2023
Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

EDSR modelling A Github repository for deep-learning image enhancement, pore-network and continuum modelling from X-Ray Micro-CT images. The repositor

Samuel Jackson 7 Nov 03, 2022
SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss This repository implements the SAFL in pytorch. Installation conda env create -f environm

6 Aug 24, 2022
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Men

Hong Wang 4 Dec 27, 2022
ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

AI2 152 Dec 27, 2022
Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

Official TensorFlow implementation of the unsupervised reconstruction model using zero-Shot Learned Adversarial TransformERs (SLATER). (https://arxiv.

ICON Lab 22 Dec 22, 2022
Implementation of the paper ''Implicit Feature Refinement for Instance Segmentation''.

Implicit Feature Refinement for Instance Segmentation This repository is an official implementation of the ACM Multimedia 2021 paper Implicit Feature

Lufan Ma 17 Dec 28, 2022
ML-based medical imaging using Azure

Disclaimer This code is provided for research and development use only. This code is not intended for use in clinical decision-making or for any other

Microsoft Azure 68 Dec 23, 2022
[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

CAT arXiv Pytorch implementation of our method for compressing image-to-image models. Teachers Do More Than Teach: Compressing Image-to-Image Models Q

Snap Research 160 Dec 09, 2022
Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains This repository contains the source code for an end-to-end open-domain question

7 Sep 27, 2022
An Api for Emotion recognition.

PLAYEMO Playemo was built from the ground-up with Flask, a python tool that makes it easy for developers to build APIs. Use Cases Is Python your langu

greek geek 2 Jul 16, 2022
Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations The code of: Weakly Supervised Learning of Instance Segmentation with I

Jiwoon Ahn 472 Dec 29, 2022
A FAIR dataset of TCV experimental results for validating edge/divertor turbulence models.

TCV-X21 validation for divertor turbulence simulations Quick links Intro Welcome to TCV-X21. We're glad you've found us! This repository is designed t

0 Dec 18, 2021
Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

Facebook Research 360 Dec 27, 2022
Companion repo of the UCC 2021 paper "Predictive Auto-scaling with OpenStack Monasca"

Predictive Auto-scaling with OpenStack Monasca Giacomo Lanciano*, Filippo Galli, Tommaso Cucinotta, Davide Bacciu, Andrea Passarella 2021 IEEE/ACM 14t

Giacomo Lanciano 0 Dec 07, 2022
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 222 Dec 28, 2022