A workshop on data visualization in Python with notebooks and exercises for following along.

Overview

Beyond the Basics: Data Visualization in Python

Binder Nbviewer View slides in browser

The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.

Workshop Outline

This is a workshop on data visualization in Python first delivered at ODSC West 2021 and subsequently at ODSC East 2022 and PyCon Italia 2022. It's divided into the following sections:

Section 1: Getting Started With Matplotlib

We will begin by familiarizing ourselves with Matplotlib. Moving beyond the default options, we will explore how to customize various aspects of our visualizations. By the end of this section, you will be able to generate plots using the Matplotlib API directly, as well as customize the plots that libraries like pandas and Seaborn create for you.

Section 2: Moving Beyond Static Visualizations

Static visualizations are limited in how much information they can show. To move beyond these limitations, we can create animated and/or interactive visualizations. Animations make it possible for our visualizations to tell a story through movement of the plot components (e.g., bars, points, lines). Interactivity makes it possible to explore the data visually by hiding and displaying information based on user interest. In this section, we will focus on creating animated visualizations using Matplotlib before moving on to create interactive visualizations in the next section.

Section 3: Building Interactive Visualizations for Data Exploration

When exploring our data, interactive visualizations can provide the most value. Without having to create multiple iterations of the same plot, we can use mouse actions (e.g., click, hover, zoom, etc.) to explore different aspects and subsets of the data. In this section, we will learn how to use a few of the libraries in the HoloViz ecosystem to create interactive visualizations for exploring our data utilizing the Bokeh backend.


Prerequisites

You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab. In addition, a basic understanding of pandas will be beneficial, but is not required; reviewing the first section of my pandas workshop will be sufficient.


Setup Instructions

  1. Install Anaconda/Miniconda. Note that you can use this Binder environment instead if you don't want to install anything on your machine.

  2. Fork this repository:

    location of fork button in GitHub

  3. Clone your forked repository:

    location of clone button in GitHub

  4. Create and activate a conda virtual environment (on Windows, these commands should be run in Anaconda Prompt):

    $ cd python-data-viz-workshop
    ~/python-data-viz-workshop$ conda install mamba -n base -c conda-forge
    ~/python-data-viz-workshop$ mamba env create --file environment.yml
    ~/python-data-viz-workshop$ conda activate data_viz_workshop
    (data_viz_workshop) ~/python-data-viz-workshop$
  5. Launch JupyterLab:

    (data_viz_workshop) ~/python-data-viz-workshop$ jupyter lab
  6. Navigate to the 0-check_your_env.ipynb notebook in the notebooks/ folder:

    open 0-check_your_env.ipynb

  7. Run the notebook to confirm everything is set up properly:

    check env


About the Author

Stefanie Molin (@stefmolin) is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of Hands-On Data Analysis with Pandas, which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Related Content

All examples herein were developed exclusively for this workshop. Hands-On Data Analysis with Pandas contains additional examples and exercises, as does this blog post and this workshop on pandas.

Owner
Stefanie Molin
Developer | Data Scientist | Author of "Hands-On Data Analysis with Pandas" | occasional hacker
Stefanie Molin
A Jupyter - Three.js bridge

pythreejs A Python / ThreeJS bridge utilizing the Jupyter widget infrastructure. Getting Started Installation Using pip: pip install pythreejs And the

Jupyter Widgets 844 Dec 27, 2022
trade bot connected to binance API/ websocket.,, include dashboard in plotly dash to visualize trades and balances

Crypto trade bot 1. What it is Trading bot connected to Binance API. This project made for fun. So ... Do not use to trade live before you have backte

G 3 Oct 07, 2022
Implementation of SOMs (Self-Organizing Maps) with neighborhood-based map topologies.

py-self-organizing-maps Simple implementation of self-organizing maps (SOMs) A SOM is an unsupervised method for learning a mapping from a discrete ne

Jonas Grebe 6 Nov 22, 2022
100 Days of Code The Complete Python Pro Bootcamp for 2022

100-Day-With-Python 100 Days of Code - The Complete Python Pro Bootcamp for 2022. In this course, I spend with python language over 100 days, and I up

Rajdip Das 8 Jun 22, 2022
Visualization Data Drug in thailand during 2014 to 2020

Visualization Data Drug in thailand during 2014 to 2020 Data sorce from ข้อมูลเปิดภาครัฐ สำนักงาน ป.ป.ส Inttroducing program Using tkinter module for

Narongkorn 1 Jan 05, 2022
A toolkit to generate MR sequence diagrams

mrsd: a toolkit to generate MR sequence diagrams mrsd is a Python toolkit to generate MR sequence diagrams, as shown below for the basic FLASH sequenc

Julien Lamy 3 Dec 25, 2021
The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based metabolomics.

MINT (Metabolomics Integrator) The Metabolomics Integrator (MINT) is a post-processing tool for liquid chromatography-mass spectrometry (LCMS) based m

Sören Wacker 0 May 04, 2022
📊 Extensions for Matplotlib

📊 Extensions for Matplotlib

Nico Schlömer 519 Dec 30, 2022
kyle's vision of how datadog's python client should look

kyle's datadog python vision/proposal not for production use See examples/comprehensive.py for a mostly working example of the proposed API. 📈 🐶 ❤️

Kyle Verhoog 2 Nov 21, 2021
A Python toolbox for gaining geometric insights into high-dimensional data

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say 'fourteen' very loudly. Everyone does it." - Geoff Hinton Overview

Contextual Dynamics Laboratory 1.8k Dec 29, 2022
TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow with breakpoints + real-time visualization of the data flowing through the computational graph

TensorDebugger (TDB) is a visual debugger for deep learning. It extends TensorFlow (Google's Deep Learning framework) with breakpoints + real-time visualization of the data flowing through the comput

Eric Jang 1.4k Dec 15, 2022
PyFlow is a general purpose visual scripting framework for python

PyFlow is a general purpose visual scripting framework for python. State Base structure of program implemented, such things as packages disco

1.8k Jan 07, 2023
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python 564 Jan 03, 2023
Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

Spotify 3.2k Jan 01, 2023
Visualization of the World Religion Data dataset by Correlates of War Project.

World Religion Data Visualization Visualization of the World Religion Data dataset by Correlates of War Project. Mostly personal project to famirializ

Emile Bangma 1 Oct 15, 2022
Compute and visualise incidence (reworking of the original incidence package)

incidence2 incidence2 is an R package that implements functions and classes to compute, handle and visualise incidence from linelist data. It refocuss

15 Nov 22, 2022
A simple interpreted language for creating basic mathematical graphs.

graphr Introduction graphr is a small language written to create basic mathematical graphs. It is an interpreted language written in python and essent

2 Dec 26, 2021
Painlessly create beautiful matplotlib plots.

Announcement Thank you to everyone who has used prettyplotlib and made it what it is today! Unfortunately, I no longer have the bandwidth to maintain

Olga Botvinnik 1.6k Jan 06, 2023
finds grocery stores and stuff next to route (gpx)

Route-Report Route report is a command-line utility that can be used to locate points-of-interest near your planned route (gpx). The results are based

Clemens Mosig 5 Oct 10, 2022
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 2.3k Jan 05, 2023