Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Last update: Dec 22, 2021

Related tags

Data Analysis covid-county

Overview

Covid County

Executive summary

Setup

Install miniconda, then in the command line, run

conda create -n covid-county
conda activate covid-county
conda install pandas ipython matplotlib tabulate

(Let me know if you want pure-Python no-Conda instructions via venv.)

2020 US presidential election

I've already downloaded countypres_2000-2020.csv from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ but you can download it again to ensure I haven't committed bad data.

2020 data is missing counts for District of Columbia (FIPS 11001)? Party split taken from 2016 election.

Census

From https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates/2020-evaluation-estimates/2010s-counties-total.html I downloaded co-est2020.csv from the "Annual Resident Population Estimates for States and Counties: April 1, 2010 to July 1, 2019; April 1, 2020; and July 1, 2020 (CO-EST2020)" link. It's committed in this repo but you can download it yourself too.

Covid

Install Git and run this in this directory: git clone --depth 1 https://github.com/nytimes/covid-19-data.git (it might take a while)

Note five boroughs of NYC are combined into a single "county". This is taken into account by merging the 2020 Presidential votes from all five boroughs into a single county (since we can't split the Covid deaths into individual boroughs, this is the best we can do). Fix follows the recommendation per upstream issue 105.

Run

python main.py

(Takes ~45 seconds on my 2015-vintage laptop.)

More results

party bin	total Covid-19 deaths
Rep 80+%	38284
Rep 60–79%	211416
Rep 50–59%	123587
Dem 50–59%	196084
Dem 60–79%	210070
Dem 80+%	18331
unknown	5243

Simply by party:

Dem: 424485
Rep: 373287

Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Related tags

Overview

Covid County

Executive summary

Setup

2020 US presidential election

Census

Covid

Run

More results

Owner

Ahmed Fasih

Open source platform for Data Science Management automation

Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Data pipelines built with polars

A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

Using Python to derive insights on particular Pokemon, Types, Generations, and Stats

This is a repo documenting the best practices in PySpark.

A pipeline that creates consensus sequences from a Nanopore reads. I

Repository created with LinkedIn profile analysis project done

Toolchest provides APIs for scientific and bioinformatic data analysis.

Pandas and Dask test helper methods with beautiful error messages.

Py-price-monitoring - A Python price monitor

Fitting thermodynamic models with pycalphad

OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Python reader for Linked Data in HDF5 files

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Python package for analyzing sensor-collected human motion data

Common bioinformatics database construction

Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames