Python code for working with NFL play by play data.

Overview

nfl_data_py

nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout.

Includes import functions for play-by-play data, weekly data, seasonal data, rosters, win totals, scoring lines, officials, draft picks, draft pick values, schedules, team descriptive info, combine results and id mappings across various sites.

Installation

Use the package manager pip to install nfl_data_py.

pip install nfl_data_py

Usage

import nfl_data_py as nfl

Working with play-by-play data

nfl.import_pbp_data(years, columns, downcast=True, cache=False, alt_path=None)

Returns play-by-play data for the years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

cache : optional, determines whether to pull pbp data from github repo or local cache generated by nfl.cache_pbp()

alt_path : optional, required if nfl.cache_pbp() is called using an alternate path to the default cache

nfl.see_pbp_cols()

returns list of columns available in play-by-play dataset

Working with weekly data

nfl.import_weekly_data(years, columns, downcast)

Returns weekly data for the years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

downcast : converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

nfl.see_weekly_cols()

returns list of columns available in weekly dataset

Working with seasonal data

nfl.import_seasonal_data(years)

Returns seasonal data, including various calculated market share stats

years : required, list of years to pull data for (earliest available is 1999)

Additional data imports

nfl.import_rosters(years, columns)

Returns roster information for years and columns specified

years : required, list of years to pull data for (earliest available is 1999)

columns : optional, list of columns to pull data for

nfl.import_win_totals(years)

Returns win total lines for years specified

years : optional, list of years to pull

nfl.import_sc_lines(years)

Returns scoring lines for years specified

years : optional, list of years to pull

nfl.import_officials(years)

Returns official information by game for the years specified

years : optional, list of years to pull

nfl.import_draft_picks(years)

Returns list of draft picks for the years specified

years : optional, list of years to pull

nfl.import_draft_values()

Returns relative values by generic draft pick according to various popular valuation methods

nfl.import_team_desc()

Returns dataframe with color/logo/etc information for all NFL team

nfl.import_schedules(years)

Returns dataframe with schedule information for years specified

years : required, list of years to pull data for (earliest available is 1999)

nfl.import_combine_data(years, positions)

Returns dataframe with combine results for years and positions specified

years : optional, list or range of years to pull data from

positions : optional, list of positions to be pulled (standard format - WR/QB/RB/etc.)

nfl.import_ids(columns, ids)

Returns dataframe with mapped ids for all players across most major NFL and fantasy football data platforms

columns : optional, list of columns to return

ids : optional, list of ids to return

nfl.import_ngs_data(stat_type, years)

Returns dataframe with specified NGS data

columns : required, type of data (passing, rushing, receiving)

years : optional, list of years to return data for

nfl.import_depth_charts(years)

Returns dataframe with depth chart data

years : optional, list of years to return data for

nfl.import_injuries(years)

Returns dataframe of injury reports

years : optional, list of years to return data for

nfl.import_qbr(years, level, frequency)

Returns dataframe with QBR history

years : optional, years to return data for

level : optional, competition level to return data for, nfl or college, default nfl

frequency : optional, frequency to return data for, weekly or season, default season

nfl.import_pfr_passing(years)

Returns dataframe of PFR passing data

years : optional, years to return data for

nfl.import_snap_counts(years)

Returns dataframe with snap count records

years : optional, list of years to return data for

Additional features

nfl.cache_pbp(years, downcast=True, alt_path=None)

Caches play-by-play data locally to speed up download time. If years specified have already been cached they will be overwritten, so if using in-season must cache 1x per week to catch most recent data

years : required, list or range of years to cache

downcast : optional, converts float64 columns to float32, reducing memory usage by ~30%. Will slow down initial load speed ~50%

alt_path :optional, alternate path to store pbp cache - default is in program created user Local folder

nfl.clean_nfl_data(df)

Runs descriptive data (team name, player name, etc.) through various cleaning processes

df : required, dataframe to be cleaned

Recognition

I'd like to recognize all of Ben Baldwin, Sebastian Carl, and Lee Sharpe for making this data freely available and easy to access. I'd also like to thank Tan Ho, who has been an invaluable resource as I've worked through this project, and Josh Kazan for the resources and assistance he's provided.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Comments
  • Unable to import data

    Unable to import data

    I am trying to run ` import nfl_data_py as nfl

    nfl.import_seasonal_data([2010]) `

    but I get the following error: OverflowError: value too large Exception ignored in: 'fastparquet.cencoding.read_bitpacked'

    I have tried uninstalling and reinstalling fastparquet, which did not work, and the same issue has arisen in my other function calls.

    More specifically: line 530, in read_col part[defi == max_defi] = dic[val]

    IndexError: index 8389024 is out of bounds for axis 0 with size 94518

    opened by samob917 6
  • Caching

    Caching

    I think the library would benefit from having a caching strategy for downloaded or processed files. One option is to download the files outside of pandas and use an http cache. This tends to be temporary, no more than 7 days. Another option is to give the user the option to read from and save to a cache, which could just involve reading/writing parquet files from the existing data directory, which would be more permanent, or to the system tmp directory, which is more transient. Let me know if either option is of interest.

    opened by sansbacon 6
  • read_parquet engine parameter: 'fastparquet' vs. default 'auto' argument

    read_parquet engine parameter: 'fastparquet' vs. default 'auto' argument

    Pandas will try pyarrow and default back to fastparquet if pyarrow is unavailable. Is there a specific reason you are specifying engine='fastparquet' because, if not, I think it would be better to leave it as the default 'auto' argument.

    opened by sansbacon 3
  • ID Tables Small Issue?

    ID Tables Small Issue?

    Hi, i'm a newb and this is my first issue post so please delete if this is wrong. I suspect your code is fine and that the source data has some bugs, but i'm not sure who to alert to help them out.

    I believe there are some slight data issues on the player ID table.

    probably some more than listed below, but i have to run for now. these are pretty small things, doubtful they will be relevant to anything imminent for anyone... i ran into it on a merge i was doing on gsis_id that didn't like my many-to-one relationship because of it. i'm happy to share and/or try to help fix if manual edits are an option (i stink at coding though)

    gsis_id has four duplications: 00-0020270, 00-0019641, 00-0016098, and 00-0029435. On further research, each of these same cases also has a duplication of the pff_id. There are no other pff_id duplications.

    espn_ids have some duplication: 17257, 2578554, 2574009, 5774, 5730, 12771, 2516049, 13490, 2574010, 14660, 2582138, 16094, 17101. i'm not sure the source of all of these - some seem to be extremely similar names but different people and others seem to be typos (e.g. 5774 one of them should simply be 15774).

    yahoo_ids have one dup: 33495. the one from Duke should be 33542

    opened by robertryanharrison 2
  • .import_ngs_data() HTTPError when filtering for year(s)

    .import_ngs_data() HTTPError when filtering for year(s)

    HTTPError: HTTP Error 404: Not Found

    I receive a 404: not found error when I try to import ngs data from a specific year:

    This works just fine and returns the dataframe:

    ngs_data = nfl.import_ngs_data('passing')
    
    ngs_data
    

    When I add a specific year argument, this breaks:

    ngs_data = nfl.import_ngs_data('passing', [2020])
    
    ngs_data
    

    Yes: I realize that nfl.import_ngs_data('passing') is comprehensive of all seasons that are available and I can filter with that.

    I don't know much about these different file formats, but when I look in nflverse-data/releases/nextgen_stats it looks like there are no .parquet files specific to a season (only .qs, .csv.gz, .rds), only the larger ones (sans season) called in the first block: Screen Shot 2022-07-26 at 12 12 50 PM

    So I think something just needs to be changed around here?

    Thanks for putting together this awesome library!! It's been a huge help as I get ready for the season.

    opened by jbf302 2
  • What version of Python is this for? Getting an error

    What version of Python is this for? Getting an error

    I have tried to install it for both Python 3.8 and 3.6 and I am getting this error:

    Collecting pandas>1 (from nfl_data_py) Could not find a version that satisfies the requirement pandas>1 (from nfl_data_py) (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3)

    opened by epmck 2
  • Reduce memory usage of multi-year pbp dataframe

    Reduce memory usage of multi-year pbp dataframe

    If you load 20 years of data into one dataframe, you could start pushing up against memory limits on certain users' computers. You can reduce memory usage about 30% by converting float64 values to float32 when loading the yearly play-by-play data.

    cols = df.select_dtypes(include=[np.float64]).columns
    df.loc[:, cols] = df.loc[:, cols].astype(np.float32)
    

    On my computer, this reduced the memory usage of a single year from 129.7 MB to 94.5 MB. I don't think the lost precision is going to matter for anything we are doing with football stats.

    If you are interested, I can submit a pull request that implements this change. You could also make it optional with the default being to downcast but allow the user to override if they want np.float64 as the dtype.

    opened by sansbacon 2
  • HTTP Error 404: Not Found

    HTTP Error 404: Not Found

    There's a chance the links have changed in nflfastR. This link doesn't seem to work: data = pandas.read_parquet(r'https://github.com/nflverse/nflfastR-data/raw/master/data/player_stats.parquet', engine='auto')

    Using Python v3.10.7 with nfl_data_py v0.2.5 Call: nfl.import_weekly_data([2021, 2022]) Error: HTTPError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_15024/1768614258.py in ----> 1 nfl.import_weekly_data([2021, 2022])

    c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\nfl_data_py_init_.py in import_weekly_data(years, columns, downcast) 215 216 # read weekly data --> 217 data = pandas.read_parquet(r'https://github.com/nflverse/nflfastR-data/raw/master/data/player_stats.parquet', engine='auto') 218 data = data[data['season'].isin(years)] 219

    c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parquet.py in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, **kwargs) 493 impl = get_engine(engine) 494 --> 495 return impl.read( 496 path, 497 columns=columns,

    c:\Users\andre\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parquet.py in read(self, path, columns, use_nullable_dtypes, storage_options, **kwargs) 230 to_pandas_kwargs["split_blocks"] = True # type: ignore[assignment] 231 --> 232 path_or_handle, handles, kwargs["filesystem"] = _get_path_or_handle( 233 path, 234 kwargs.pop("filesystem", None), ... --> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp) 644 645 class HTTPRedirectHandler(BaseHandler):

    HTTPError: HTTP Error 404: Not Found

    opened by andre03051 1
  • Pandas version incompatibility

    Pandas version incompatibility

    Per discussion in nflverse discord.

    Release 0.2.7 updated Pandas to a version not compatible with Python 3.6. This requirement should be reverted if possible. Alternatively if functionality in the updated Pandas is found to be needed, then the package metadata should be updated to specify it requires Python >= 3.7.

    opened by alecglen 1
  • Pandas .append method deprication

    Pandas .append method deprication

    nfl.import_pbp_data give the warning below. Should update method to use concat rather than append method.

    FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. plays = plays.append(raw)

    opened by martydertz 1
  • import_pbp_data function not working

    import_pbp_data function not working

    when trying to run the function import_pbp_data, I receive the following error message: No such file or directory: [path]

    Code: import nfl_data_py as nfl data = nfl.import_pbp_data([2021])

    Screen Shot 2021-09-21 at 5 41 50 PM
    opened by Josephhero 1
  • Missing EPA data

    Missing EPA data

    In week 6 of 2021, the Chicago Bears lost 24-14 to the Green Bay Packers: https://www.nfl.com/games/packers-at-bears-2021-reg-6. When trying to load the EPA data for this game, I get an empty dataframe. Could someone let me know if I am doing something wrong?

    import nfl_data_py as nfl
    
    seasons = [2021]
    cols = ['epa', 'week', 'possession_team']
    
    nfl.cache_pbp(seasons, downcast=True, alt_path=None)
    data = nfl.import_pbp_data(seasons, cols, cache=True)
    
    print(data.loc[(data['possession_team'] == 'CHI') & (data['week'] == 5)])
    print(data.loc[(data['possession_team'] == 'CHI') & (data['week'] == 6)])
    
    
    opened by pphili 2
  • Changed downcast code to use df[cols] instead of df.loc[:, cols]

    Changed downcast code to use df[cols] instead of df.loc[:, cols]

    pandas was giving me the following FutureWarning when importing data:

    /nfl_data_py/__init__.py:137: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt
    to set the values inplace instead of always setting a new array. To retain the old behavior, use either
    `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
        plays.loc[:, cols] = plays.loc[:, cols].astype(numpy.float32)
    

    This change seems to have eliminated the warning.

    opened by wstrausser 0
  • import_win_totals() years not optional

    import_win_totals() years not optional

    Git ReadMe and PyPl docs say years is optional, but years is required. This is a bit difficult because all years are not in the data, so it's hard to know why 2022 doesn't show.

    Suggest making argument years=None and adding the following in the method/lambda: if years is None: years = []

    Let me know how I can contribute. --David

    opened by DavidAllyn68 0
  • Data Dictionary

    Data Dictionary

    Hey guys - thanks for pulling this all together!!! Do you have a data dictionary explaining the columns? Most are self explanatory, but a few are cryptic (e.g. in weekly_data dakota, pacr, racr, wopr, and ..._epa).

    opened by DavidAllyn68 2
  • Missing Personnel Data Week 4 NFL

    Missing Personnel Data Week 4 NFL

    It appears that the values for formation data (offense/defense), players on the play (offense/defense), men in the box, # of pass rushers, etc. are unavailable for 2022 Week 4 data (which is otherwise available). Is this a known issue, an intentional omission, or something else? Where is this data sourced from?

    Wondering if it will be available in the near future or if that is a result of an availability issue

    opened by jwald3 1
Releases(v0.3.0)
Zero configuration Airflow plugin that let you manage your DAG files.

simple-dag-editor SimpleDagEditor is a zero configuration plugin for Apache Airflow. It provides a file managing interface that points to your dag_fol

30 Jul 20, 2022
A powerful Sphinx changelog-generating extension.

What is Releases? Releases is a Python (2.7, 3.4+) compatible Sphinx (1.8+) extension designed to help you keep a source control friendly, merge frien

Jeff Forcier 166 Dec 29, 2022
A pluggable API specification generator. Currently supports the OpenAPI Specification (f.k.a. the Swagger specification)..

apispec A pluggable API specification generator. Currently supports the OpenAPI Specification (f.k.a. the Swagger specification). Features Supports th

marshmallow-code 1k Jan 01, 2023
Pydocstringformatter - A tool to automatically format Python docstrings that tries to follow recommendations from PEP 8 and PEP 257.

Pydocstringformatter A tool to automatically format Python docstrings that tries to follow recommendations from PEP 8 and PEP 257. See What it does fo

Daniël van Noord 31 Dec 29, 2022
A simple malware that tries to explain the logic of computer viruses with Python.

Simple-Virus-With-Python A simple malware that tries to explain the logic of computer viruses with Python. What Is The Virus ? Computer viruses are ma

Xrypt0 6 Nov 18, 2022
Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.

Introduction Swagger UI allows anyone — be it your development team or your end consumers — to visualize and interact with the API’s resources without

Swagger 23.2k Dec 29, 2022
Sms Bomber, Tool Encryptor

ɴᴏʙɪᴛᴀシ︎ ғᴏʀ ᴀɴʏ ʜᴇʟᴘシ︎ Install pkg install git -y pkg install python -y pip install requests git clone https://github.com/AK27HVAU/akash Run cd Akash

ɴᴏʙɪᴛᴀシ︎ 4 May 23, 2022
EasyMultiClipboard - Python script written to handle more than 1 string in clipboard

EasyMultiClipboard - Python script written to handle more than 1 string in clipboard

WVlab 1 Jun 18, 2022
Fully typesafe, Rust-like Result and Option types for Python

safetywrap Fully typesafe, Rust-inspired wrapper types for Python values Summary This library provides two main wrappers: Result and Option. These typ

Matthew Planchard 32 Dec 25, 2022
Sphinx theme for readthedocs.org

Read the Docs Sphinx Theme This Sphinx theme was designed to provide a great reader experience for documentation users on both desktop and mobile devi

Read the Docs 4.3k Dec 31, 2022
Data Inspector is an open-source python library that brings 15++ types of different functions to make EDA, data cleaning easier.

Data Inspector Data Inspector is an open-source python library that brings 15 types of different functions to make EDA, data cleaning easier. Author:

Kazi Amit Hasan 38 Nov 24, 2022
pytorch_example

pytorch_examples machine learning site map 정리자료 Resnet https://wolfy.tistory.com/243 convolution 연산 정리 https://gaussian37.github.io/dl-concept-covolut

injae hwang 1 Nov 24, 2021
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep

Here are the sections: Data Science Cheatsheets Data Science EBooks Data Science Question Bank Data Science Case Studies Data Science Portfolio Data J

James Le 2.5k Jan 02, 2023
Contains the assignments from the course Building a Modern Computer from First Principles: From Nand to Tetris.

Contains the assignments from the course Building a Modern Computer from First Principles: From Nand to Tetris.

Matheus Rodrigues 1 Jan 20, 2022
A collection and example code of every topic you need to know about in the basics of Python.

The Python Beginners Guide: Master The Python Basics Tonight This guide is a collection of every topic you need to know about in the basics of Python.

Ahmed Baari 1 Dec 19, 2021
A Json Schema Generator

JSON Schema Generator Author : Eru Michael About A Json Schema Generator. This is a generic program that: Reads a JSON file similar to what's present

1 Nov 10, 2021
epub2sphinx is a tool to convert epub files to ReST for Sphinx

epub2sphinx epub2sphinx is a tool to convert epub files to ReST for Sphinx. It uses Pandoc for converting HTML data inside epub files into ReST. It cr

Nihaal 8 Dec 15, 2022
Automatically open a pull request for repositories that have no CONTRIBUTING.md file

automatic-contrib-prs Automatically open a pull request for repositories that have no CONTRIBUTING.md file for a targeted set of repositories. What th

GitHub 8 Oct 20, 2022
the project for the most brutal and effective language learning technique

- "The project for the most brutal and effective language learning technique" (c) Alex Kay The langflow project was created especially for language le

Alexander Kaigorodov 7 Dec 26, 2021
OpenTelemetry Python API and SDK

Getting Started • API Documentation • Getting In Touch (GitHub Discussions) Contributing • Examples OpenTelemetry Python This page describes the Pytho

OpenTelemetry - CNCF 1.1k Jan 08, 2023