Web Scraping COVID 19 Meta Portal with Python

Overview

Web-Scraping-COVID-19-Meta-Portal-with-Python

Requests API and Beautiful Soup to scrape real-time COVID statistics from worldometer website and perform data cleaning and visual analysis in Jupyter notebook.

Data Preparation Notebook

In the first module, web scraping techniques using requests, beautifulsoup packages are utilized to collect and manipulate COVID related data from the worldometer website

The notebook has a total of five code blocks.

The first four code blocks provide the following data:

  1. Summary Data for ALL Global COVID Cases
  2. Summary Data for ACTIVE Global COVID Cases
  3. Summary Data for CLOSED Global COVID Cases
  4. Tabular Data for COVID Cases by Country

The fifth and final code block provides an interactive interface for exporting each of these four tables

Data Analysis Notebook

In the second module, data analysis techniques using pandas, numpy, seaborn and statsmodels packages are utilized to collect effective insights from the data and plot necessary graphs. The raw csv data is the same table we collected in Part A of the project taken from the worldometer website regarding COVID cases tabulated by country.

The notebook has a total of twelve code blocks.

  1. Importing a CSV file, reading it and counting no. of rows and columns
  2. Using the to_numeric method to ensure all numerical columns get passed as numeric
  3. Using the describe function to display and analyze basic statistical data on the numerical columns of the imported data
  4. Working with a smaller set of imported data - Top 20 countries with most cases
  5. Horizontal bar chart to analyze total cases in the top 20 countries
  6. Vertical bar chart to analyze total deaths in the top 20 countries with most cases
  7. Distribution plot to analyze spread of data for Deaths/1M Population of the 20 countries
  8. Using the describe function to display basic statistical data on the numerical columns of the REDUCED dataset
  9. Comparing and analyzing mean and standard deviation between population of the Full dataset and the Reduced dataset
  10. Using regression scatter plot to check for data independence between tests/million people and the size of the population
  11. Finding and analyzing correlations between the variables in the dataset
  12. Applying a statistical model to collect useful information about Total Cases and Total Deaths in the full data set

-- Aarif M Jahan -- May 08, 2021

Owner
Aarif Munwar Jahan
Aarif Munwar Jahan
A python module to parse the Open Graph Protocol

OpenGraph is a module of python for parsing the Open Graph Protocol, you can read more about the specification at http://ogp.me/ Installation $ pip in

Erik Rivera 213 Nov 12, 2022
Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Guilherme Silva Uchoa 3 Oct 04, 2022
A simple reddit scraper to get memes (only images) from r/ProgrammerHumor.

memey A simple reddit scraper to get memes (only images) from r/ProgrammerHumor. Note Only works if you have firefox installed (yet). Instructions foo

2 Nov 16, 2021
crypto currency scraping

SCRYPTO What ? Crypto currencies scraping (At the moment, only bitcoin and ethereum crypto currencies are supported) How ? A python script is running

15 Sep 01, 2022
Scrapy uses Request and Response objects for crawling web sites.

Requests and Responses¶ Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and p

Md Rashidul Islam 1 Nov 03, 2021
Scrap-mtg-top-8 - A top 8 mtg scraper using python

Scrap-mtg-top-8 - A top 8 mtg scraper using python

1 Jan 24, 2022
NASA APOD Discord Bot - Fetches information from NASA APOD site.

NASA APOD Discord Bot - Fetches information from NASA APOD site.

Astronomy Club IITK 4 Apr 23, 2022
This is a module that I had created along with my friend. It's a basic web scraping module

QuickInfo PYPI link : https://pypi.org/project/quickinfo/ This is the library that you've all been searching for, it's built for developers and allows

OneBit 2 Dec 13, 2021
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

IST Research 1.1k Jan 06, 2023
A web crawler for recording posts in "sina weibo"

Web Crawler for "sina weibo" A web crawler for recording posts in "sina weibo" Introduction This script helps collect attributes of posts in "sina wei

4 Aug 20, 2022
A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

Xuye (Chris) Qin 1.5k Dec 24, 2022
Luis M. Capdevielle 1 Jan 14, 2022
河南工业大学 完美校园 自动校外打卡

HAUT-checkin 河南工业大学自动校外打卡 由于github actions存在明显延迟,建议直接使用腾讯云函数 特点 多人打卡 使用简单,仅需账号密码以及用于微信推送的uid 自动获取上一次打卡信息用于打卡 向所有成员微信单独推送打卡状态 完美校园服务器繁忙时造成打卡失败会自动重新打卡

36 Oct 27, 2022
Html Content / Article Extractor, web scrapping lib in Python

Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

Xavier Grangier 3.8k Jan 02, 2023
New World Market Scraper

Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

4 Sep 21, 2022
feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫,以及完善的爬虫报警机制。

feapder 是一款简单、快速、轻量级的爬虫框架。起名源于 fast、easy、air、pro、spider的缩写,以开发快速、抓取快速、使用简单、功能强大为宗旨,历时4年倾心打造。支持轻量爬虫、分布式爬虫、批次爬虫、爬虫集成,以及完善的爬虫报警机制。 之

boris 1.4k Dec 29, 2022
Crawler in Python 3.7, 3.8. 3.9. Pypy3

Description Python Crawler written Python 3. (Supports major Python releases Python3.6, Python3.7 and Python 3.8) Installation and Use Setup VirtualEn

Vinit Kumar 2 Mar 12, 2022
A way to scrape sports streams for use with Jellyfin.

Sportyfin Description Stream sports events straight from your Jellyfin server. Sportyfin allows users to scrape for live streamed events and watch str

axelmierczuk 38 Nov 05, 2022
中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 若对您有所帮助,记得点个Star 🌟 !!! 中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 🥰 🥰 🥰 依赖 本项目依赖的第三方库: requests 在终端执行以下

XWhite 229 Dec 12, 2022
This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

Saim Zafar 1 Dec 20, 2021