The code from the Machine Learning Bookcamp book and a free course based on the book

Overview

Machine Learning Bookcamp

The code from the Machine Learning Bookcamp book

Useful links:

Machine Learning Zoomcamp

Machine Learning Zoomcamp is a course based on the book

  • It's online and free
  • You can join at any moment
  • More information in the course-zoomcamp folder

Reading Plan

Chapters

Chapter 1: Introduction to Machine Learning

  • Understanding machine learning and the problems it can solve
  • CRISP-DM: Organizing a successful machine learning project
  • Training and selecting machine learning models
  • Performing model validation

No code

Chapter 2: Machine Learning for Regression

  • Creating a car-price prediction project with a linear regression model
  • Doing an initial exploratory data analysis with Jupyter notebooks
  • Setting up a validation framework
  • Implementing the linear regression model from scratch
  • Performing simple feature engineering for the model
  • Keeping the model under control with regularization
  • Using the model to predict car prices

Code: chapter-02-car-price/02-carprice.ipynb

Chapter 3: Machine Learning for Classification

  • Predicting customers who will churn with logistic regression
  • Doing exploratory data analysis for identifying important features
  • Encoding categorical variables to use them in machine learning models
  • Using logistic regression for classification

Code: chapter-03-churn-prediction/03-churn.ipynb

Chapter 4: Evaluation Metrics for Classification

  • Accuracy as a way of evaluating binary classification models and its limitations
  • Determining where our model makes mistakes using a confusion table
  • Deriving other metrics like precision and recall from the confusion table
  • Using ROC and AUC to further understand the performance of a binary classification model
  • Cross-validating a model to make sure it behaves optimally
  • Tuning the parameters of a model to achieve the best predictive performance

Code: chapter-03-churn-prediction/04-metrics.ipynb

Chapter 5: Deploying Machine Learning Models

  • Saving models with Pickle
  • Serving models with Flask
  • Managing dependencies with Pipenv
  • Making the service self-contained with Docker
  • Deploying it to the cloud using AWS Elastic Beanstalk

Code: chapter-05-deployment

Chapter 6: Decision Trees and Ensemble Learning

  • Predicting the risk of default with tree-based models
  • Decision trees and the decision tree learning algorithm
  • Random forest: putting multiple trees together into one model
  • Gradient boosting as an alternative way of combining decision trees

Code: chapter-06-trees/06-trees.ipynb

Chapter 7: Neural Networks and Deep Learning

  • Convolutional neural networks for image classification
  • TensorFlow and Keras — frameworks for building neural networks
  • Using pre-trained neural networks
  • Internals of a convolutional neural network
  • Training a model with transfer learning
  • Data augmentations — the process of generating more training data

Code: chapter-07-neural-nets/07-neural-nets-train.ipynb

Chapter 8: Serverless Deep Learning

  • Serving models with TensorFlow-Lite — a light-weight environment for applying TensorFlow models
  • Deploying deep learning models with AWS Lambda
  • Exposing the Lambda function as a web service via API Gateway

Code: chapter-08-serverless

Chapter 9: Kubernetes and Kubeflow

Kubernetes:

  • Understanding different methods of deploying and serving models in the cloud.
  • Serving Keras and TensorFlow models with TensorFlow-Serving
  • Deploying TensorFlow-Serving to Kubernetes

Code: chapter-09-kubernetes

Kubeflow:

  • Using Kubeflow and KFServing for simplifying the deployment process

Code: chapter-09-kubeflow

Articles from mlbookcamp.com:

Appendices

Appendix A: Setting up the Environment

  • Installing Anaconda, a Python distribution that includes most of the scientific libraries we need
  • Running a Jupyter Notebook service from a remote machine
  • Installing and configuring the Kaggle command line interface tool for accessing datasets from Kaggle
  • Creating an EC2 machine on AWS using the web interface and the command-line interface

Code: no code

Articles from mlbookcamp.com:

Appendix B: Introduction to Python

  • Basic python syntax: variables and control-flow structures
  • Collections: lists, tuples, sets, and dictionaries
  • List comprehensions: a concise way of operating on collections
  • Reusability: functions, classes and importing code
  • Package management: using pip for installing libraries
  • Running python scripts

Code: appendix-b-python.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to NumPy and Linear Algebra

  • One-dimensional and two-dimensional NumPy arrays
  • Generating NumPy arrays randomly
  • Operations with NumPy arrays: element-wise operations, summarizing operations, sorting and filtering
  • Multiplication in linear algebra: vector-vector, matrix-vector and matrix-matrix multiplications
  • Finding the inverse of a matrix and solving the normal equation

Code: appendix-c-numpy.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to Pandas

  • The main data structures in Pandas: DataFrame and Series
  • Accessing rows and columns of a DataFrame
  • Element-wise and summarizing operations
  • Working with missing values
  • Sorting and grouping

Code: appendix-d-pandas.ipynb

Appendix D: AWS SageMaker

  • Increasing the GPU quota limits
  • Renting a Jupyter notebook with GPU in AWS SageMaker
You might also like...
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Turns your machine learning code into microservices with web API, interactive GUI, and more.
Turns your machine learning code into microservices with web API, interactive GUI, and more.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

TorchDrug is a PyTorch-based machine learning toolbox designed for drug discovery

A powerful and flexible machine learning platform for drug discovery

Machine learning template for projects based on sklearn library.

Machine learning template for projects based on sklearn library.

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Painless Machine Learning for python based on scikit-learn

PlainML Painless Machine Learning Library for python based on scikit-learn. Install pip install plainml Example from plainml import KnnModel, load_ir

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Comments
  • Adding setup with docker

    Adding setup with docker

    Hi @alexeygrigorev ,

    I created a small guide for anyone who feels comfortable using Docker or might want to try it for setting up the environment.

    Since I saw a couple of questions today related to environment setup, I thought of sharing what I usually use when working on projects or courses, then it can be re-usable.

    Hoping is helpful :)

    Changelog:

    • Updated readme with link to guide to create docker container
    • Added new guide to build docker container and run it
    • Added Dockerfile and environment.yml
    opened by laurauzcategui 5
  • While converting keras to tflite error

    While converting keras to tflite error

    While converting keras to tflite error :

    raise ValueError('Unrecognized keyword arguments:', kwargs.keys()) ValueError: ('Unrecognized keyword arguments:', dict_keys(['ragged']))

    Traceback (most recent call last): File "convert.py", line 5, in <module> model = keras.models.load_model('xception_v4_large_08_0.894.h5')

    opened by saisubramani 5
  • notes correction in 06 Decision Trees...

    notes correction in 06 Decision Trees...

    Inside 02-data-prep.md , in the train/val/test split bullet note at the moment is : "Split the data with the distribution of 80% train, 20% validation, and 20% test sets with random seed to 11"

    should be:

    Split the data with the distribution of 60% train, 20% validation, and 20% test sets with random seed to 11

    opened by lucapug 4
  • Update homework.md

    Update homework.md

    Updated Question 4 text from "when one grows" to "when one grows up" and the F1 formula from "F1 = 2 * P * R / (P + R)" to "$$F1 = {2.}\frac{P . R}{P+R}$$"

    opened by ukokobili 3
Releases(chapter7-model)
Owner
Alexey Grigorev
Alexey Grigorev
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Allen Chiang 152 Jan 07, 2023
Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics

Facebook Research 4.1k Dec 29, 2022
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Mahalinga Prasad R N 1 Dec 17, 2021
Real-time stream processing for python

Streamz Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelin

Python Streamz 1.1k Dec 28, 2022
healthy and lesion models for learning based on the joint estimation of stochasticity and volatility

health-lesion-stovol healthy and lesion models for learning based on the joint estimation of stochasticity and volatility Reference please cite this p

5 Nov 01, 2022
A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement.

Organic Alkalinity Sausage Machine A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement. Getting started To mak

Charles Turner 1 Feb 01, 2022
PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors.

PyNNDescent PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors. It provides a python implementation of Nearest Neighbo

Leland McInnes 699 Jan 09, 2023
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 03, 2023
A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Ayşe Nur Türkaslan 3 Mar 06, 2022
Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Kaggle Tweet Sentiment Extraction Competition: 1st place solution (Dark of the Moon team)

Artsem Zhyvalkouski 64 Nov 30, 2022
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just

wenqi 2 Jun 26, 2022
Scikit learn library models to account for data and concept drift.

liquid_scikit_learn Scikit learn library models to account for data and concept drift. This python library focuses on solving data drift and concept d

7 Nov 18, 2021
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
Library for machine learning stacking generalization.

stacked_generalization Implemented machine learning *stacking technic[1]* as handy library in Python. Feature weighted linear stacking is also availab

114 Jul 19, 2022
MegFlow - Efficient ML solutions for long-tailed demands.

Efficient ML solutions for long-tailed demands.

旷视天元 MegEngine 371 Dec 21, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 03, 2022
Code Repository for Machine Learning with PyTorch and Scikit-Learn

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka 1.4k Jan 03, 2023
A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

This is a salary prediction website developed with the help of machine learning, this makes prediction of salary on basis of few parameters like interview score, experience test score.

1 Feb 10, 2022
Climin is a Python package for optimization, heavily biased to machine learning scenarios

climin climin is a Python package for optimization, heavily biased to machine learning scenarios distributed under the BSD 3-clause license. It works

Biomimetic Robotics and Machine Learning at Technische Universität München 177 Sep 02, 2022
using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

Mohammed Hussien 2 May 02, 2022