A repository with scraping code and soccer dataset from understat.com.

Last update: Jan 03, 2023

Related tags

Overview

UNDERSTAT - SHOTS DATASET

As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goals (xG) stats for every shot taken in the top 5 leagues in Europe, as well as the Russian league.

After watching an awesome tutorial by McKay Johns (great channel btw, loads of resources for beginners in soccer analytics), I decided to write some code to scrape all the shots data available at Understat. As a consequence I managed to generate this dataset, containing shots data of season 2014/2015, up to every match played in the 2020/2021 season, for the top division on the following countries:

England - EPL

Spain - La Liga

Germany - Bundesliga

Italy - Serie A

France - Ligue 1

Russia - RFPL

Besides shots data, I also managed to scrape very detailed season stats on every single player that took part in these matches.

The datasets have been split into folders for every league, so every folder has 7 .csv files for shots data and 7 .csv files for players data (1 for every season since 14/15). The full dataset, with every league and season combined is also available at the "datasets" folder. I plan on updating the datasets everyday, but I also uploaded the Python code that generates and updates the datasets. Feel free to play with it and suggest improvements (hit me up on twitter). To update it by yourself, just save "scraping" and "datasets" on the same folder, run Python with this folder as the current working directory and then run the update.py script, that is located in "scraping".

Most of the columns in the datasets are pretty straightforward, but some aren't. So I uploaded a couple of .pdf files in "documentation", explaining every column.

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

Owner

douglasbc

Scraping Top Repositories for Topics on GitHub,

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

A web scraper for nomadlist.com, made to avoid website restrictions.

Scrape Twitter for Tweets

Searching info from Google using Python Scrapy

A scalable frontier for web crawlers

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

Web Scraping OLX with Python and Bsoup.

An IpVanish Proxies Scraper

对于有验证码的站点爆破，用于安全合法测试

A dead simple crawler to get books information from Douban.

让中国用户使用git从github下载的速度提高1000倍!

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。

Scrapes Every Email Address of Every Society in Every University

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Rottentomatoes, Goodreads and IMDB sites crawler. Semantic Web final project.

Lovely Scrapper

NASA APOD Discord Bot - Fetches information from NASA APOD site.

A universal package of scraper scripts for humans