A repository with scraping code and soccer dataset from understat.com.

Last update: Jan 03, 2023

Related tags

Overview

UNDERSTAT - SHOTS DATASET

As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goals (xG) stats for every shot taken in the top 5 leagues in Europe, as well as the Russian league.

After watching an awesome tutorial by McKay Johns (great channel btw, loads of resources for beginners in soccer analytics), I decided to write some code to scrape all the shots data available at Understat. As a consequence I managed to generate this dataset, containing shots data of season 2014/2015, up to every match played in the 2020/2021 season, for the top division on the following countries:

England - EPL

Spain - La Liga

Germany - Bundesliga

Italy - Serie A

France - Ligue 1

Russia - RFPL

Besides shots data, I also managed to scrape very detailed season stats on every single player that took part in these matches.

The datasets have been split into folders for every league, so every folder has 7 .csv files for shots data and 7 .csv files for players data (1 for every season since 14/15). The full dataset, with every league and season combined is also available at the "datasets" folder. I plan on updating the datasets everyday, but I also uploaded the Python code that generates and updates the datasets. Feel free to play with it and suggest improvements (hit me up on twitter). To update it by yourself, just save "scraping" and "datasets" on the same folder, run Python with this folder as the current working directory and then run the update.py script, that is located in "scraping".

Most of the columns in the datasets are pretty straightforward, but some aren't. So I uploaded a couple of .pdf files in "documentation", explaining every column.

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

Owner

douglasbc

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

This tool can be used to extract information from any website

抖音批量下载用户所有无水印视频

Open Crawl Vietnamese Text

A modern CSS selector implementation for BeautifulSoup

京东抢茅台，秒杀成功很多次讨论，天猫抢购，赚钱交流等。

京东茅台抢购最新优化版本，京东茅台秒杀，优化了茅台抢购进程队列

Deep Web Miner Python | Spyder Crawler

A way to scrape sports streams for use with Jellyfin.

联通手机营业厅自动做任务、签到、领流量、领积分等。

A python module to parse the Open Graph Protocol

Searching info from Google using Python Scrapy

Web Scraping Framework

Minimal set of tools to conduct stealthy scraping.

Automatically download and crop key information from the arxiv daily paper.

A python script to extract answers to any question on Quora (Quora+ included)

A Telegram crawler to search groups and channels automatically and collect any type of data from them.

Basic-html-scraper - A complete how to of web scraping with Python for beginners

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

哔哩哔哩爬取器：以个人为中心