The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

Last update: Dec 14, 2022

Related tags

Web Crawling python scraper news jupyter-notebook journalism california data-journalism coronavirus covid-19 git-scraping

Overview

The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker.

Processed data ready for analysis is available at datadesk/california-coronavirus-data.

Scrapers

The scrapers are written using Python and Jupyter notebooks, scheduled and run via GitHub Actions and then archived using git.

module	status	maintainer
bed-surges		Ben Welsh
cases-deaths-demographics		Ben Welsh
cases-deaths-tests		Sean Greene
demographics-age		Sean Greene
demographics-race-by-county		Rahul Mukherjee
demographics-race-statewide		Aida Ylanan
federal-prisons		Iris Lee
homeless-impact		Jennifer Lu
hopkins		Ben Welsh
hospital-patients		Ben Welsh
hospital-capacity		Ben Welsh
hospital-locations		Ben Welsh
ice-detainees		Iris Lee
icu-capacity		Sean Greene
local-adult-detention-facilities		Iris Lee
local-juvenile-detention-facilities		Iris Lee
places		Et al.
probable-cases		Ben Welsh
reopening-tiers	Retired	Ben Welsh
school-reopenings	Retired	Iris Lee
skilled-nursing-facilities		Ben Welsh
skilled-nursing-totals		Ben Welsh
state-prisons		Iris Lee
vaccine-breakthrough-cases		Sean Greene
vaccine-cdc-state-totals		Ben Welsh
vaccine-doses-on-hand		Sean Greene
vaccine-progress		Sean Greene
vaccine-hpi		Sean Greene
vaccine-demographics-by-county		Sean Greene
vaccine-demographics-statewide		Sean Greene
vaccine-shipped-delivered		Sean Greene
variant-proportions-states		Matt Stiles
variant-toplines-ca		Matt Stiles
vaccine-zip-codes		Sean Greene, Matt Stiles

Installation

Clone the repository and install the Python dependencies.

pipenv install

Run all of the scraper commands.

make

Run one of the scraper commands.

make -f vaccine-hpi/Makefile

Owner

Los Angeles Times Data and Graphics Department

Reporting, editing, computer programming

Los Angeles Times Data and Graphics Department

GitHub Repository https://www.latimes.com/projects/california-coronavirus-cases-tracking-outbreak/

Visual scraping for Scrapy

Portia Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web pag

8.7k Jan 05, 2023

OSTA web scraper, for checking the status of school buses in Ottawa

OSTA-La-Vista OSTA web scraper, for checking the status of school buses in Ottawa. Getting Started Using a Raspberry Pi, download Python 3, and option

1 Jan 28, 2022

Scraping and visualising India's real-time COVID-19 data from the MOHFW dataset.

COVID19-WEB-SCRAPER Open Source Tech Lab - Project [SEMESTER IV] OSTL Assignments OSTL Assignments - 1 OSTL Assignments - 2 Project COVID19 India Data

8 Apr 28, 2022

京东茅台抢购 2021年4月最新版

Jd_Seckill 特别声明: 本仓库发布的jd_seckill项目中涉及的任何脚本，仅用于测试和学习研究，禁止用于商业用途，不能保证其合法性，准确性，完整性和有效性，请根据情况自行判断。本项目内所有资源文件，禁止任何公众号、自媒体进行任何形式的转载、发布。 huanghyw 对任何脚本问题概不

45 Dec 14, 2022

Scraping weather data using Python to receive umbrella reminders

A Python package which scrapes weather data from google and sends umbrella reminders to specified email at specified time daily.

1 Aug 23, 2022

Python script who crawl first shodan page and check DBLTEK vulnerability

🐛 MASS DBLTEK EXPLOIT CHECKER USING SHODAN 🕸 Python script who crawl first shodan page and check DBLTEK vulnerability

4 Jan 09, 2022

A web crawler script that crawls the target website and lists its links

A web crawler script that crawls the target website and lists its links || A web crawler script that lists links by scanning the target website.

2 Apr 29, 2022

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

2 Jan 24, 2022

Crawl BookCorpus

These are scripts to reproduce BookCorpus by yourself.

590 Jan 03, 2023

Screen scraping and web crawling framework

Pomp Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the

61 Jun 21, 2021

A Python module to bypass Cloudflare's anti-bot page.

cloudflare-scrape A simple Python module to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode", or IUAM), implemented with Reque

3k Jan 04, 2023

WebScraper - A script that prints out a list of all EXTERNAL references in the HTML response to an HTTP/S request

Project A: WebScraper A script that prints out a list of all EXTERNAL references

2 Apr 26, 2022

Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

0 Dec 06, 2021

Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

2 Nov 01, 2021

京东云无线宝积分推送，支持查看多设备积分使用情况

JDRouterPush 项目简介本项目调用京东云无线宝API,可每天定时推送积分收益情况,帮助你更好的观察主要信息更新日志 2021-03-02: 查询绑定的京东账户通知排版优化脚本检测更新支持Server酱Turbo版 2021-02-25: 实现多设备查询查询今

199 Dec 12, 2022

Command line program to download documents from web portals.

command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

16 Dec 26, 2022

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

AutoScraper: A Smart, Automatic, Fast and Lightweight Web Scraper for Python This project is made for automatic web scraping to make scraping easy. It

4.8k Jan 04, 2023

A Pixiv web crawler module

Pixiv-spider A Pixiv spider module WARNING It's an unfinished work, browsing the code carefully before using it. Features 0004 - Readme.md updated, co

1 Nov 14, 2021

PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具，可以快速批量下载大量论文，方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文，目前抓取成功率维持在90%以上。通过配置Config文件，可以抓取任意计算机领域相关会议的论文。 Installation Down

47 Nov 23, 2022

Facebook Group Scraping Using Beautiful Soup & Selenium

Extract Facebook group posts that are related to a specific topic and write them to a .json file.

14 Aug 12, 2022