whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

Search-Engine - 📖 AI based search engine

Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

Labelling platform for text using distant supervision

Sequence-to-Sequence learning using PyTorch

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Mycroft Core, the Mycroft Artificial Intelligence platform.

This repository contains examples of Task-Informed Meta-Learning

A Survey of Natural Language Generation in Task-Oriented Dialogue System (TOD): Recent Advances and New Frontiers

A framework for cleaning Chinese dialog data

This is Assignment1 code for the Web Data Processing System.

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Natural Language Processing library built with AllenNLP 🌲🌱

Question and answer retrieval in Turkish with BERT

An Open-Source Package for Neural Relation Extraction (NRE)

novel deep learning research works with PaddlePaddle

Lattice methods in TensorFlow

ChessCoach is a neural network-based chess engine capable of natural-language commentary.

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.