Scraping-test-matches-data

This is a Web scraping project using BeautifulSoup and Python to scrape basic information of all the Test matches played till Jan 2022.

To see the code, open the test-match-records.ipynb file.

The data is scraped from the ESPNCricinfo Stats website using BeautifulSoup and rendered into a CSV file using Pandas.
Link to the Source page: https://stats.espncricinfo.com/ci/content/records/307847.html

The data is initially arranged year-wise. Using web scraping, first the links to individual years are extracted a and then web scraping is performed on those links to get the data of all Test Matches.

From the scraped Year links, the By_Year folder is created, containing CSV files for each years' matches. Then the CSV files are read and a master CSV file containing all the matches is created and stored as All_Matches.csv.

Then the All_Matches.csv file is used to segregate the data into other folders like By_Ground, By_Team and By_Hosting_Nation.

You may find some anomalies in the CSV files in the Host Team column. Those anomalies are explained in the Jupyter Notebook.

The above dataset is also uploaded to Kaggle: https://www.kaggle.com/bong952/test-matches-played-from-1877-jan-2022
The Jupyter notebook was originally posted and edited on Jovian: https://jovian.ai/ash007online/test-match-records

This is my first Web Scraping project. Kindly give a Star if you like it !!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

By_Ground

By_Ground

By_Hosting_Nation

By_Hosting_Nation

By_Team

By_Team

By_Year

By_Year

.gitignore

.gitignore

All_Matches.csv

All_Matches.csv

README.md

README.md

test-match-records.ipynb

test-match-records.ipynb

Repository files navigation

Scraping-test-matches-data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
By_Ground		By_Ground
By_Hosting_Nation		By_Hosting_Nation
By_Team		By_Team
By_Year		By_Year
.gitignore		.gitignore
All_Matches.csv		All_Matches.csv
README.md		README.md
test-match-records.ipynb		test-match-records.ipynb

darthSoura/Scraping-test-matches-data

Folders and files

Latest commit

History

Repository files navigation

Scraping-test-matches-data

About

Topics

Resources

Stars

Watchers

Forks

Languages