Me
Back to Projects
Snakemake NFL Report

Snakemake NFL Report

A report generator for weekly NFL stats using Snakemake

Dec 2024

Overview

As a fan of both sports and data, I was super excited to start this project. Of course, there was also the side benefit of possibly getting better at Fantasy Football 😂… But that is besides the point. To go back to the start, I had, at my work, been tasked with learning Snakemake for automating our bioinformatics pipelines. And while I had been making progress at that job, I wanted to cement my understanding of the system (and also experiment with report generation).

I therefore figured that there is no better way to do this than with an interest of mine; NFL football. To start out, I had to find an appropriate resource. Thankfully, someone on Reddit (where else) had the source I needed.

Setting it all up

And there it was; my starting point. In particular, I wanted to see, for a particular date,(corresonding to the Tuesday post-weekend to get all data for a week), which players were leading the way for various statistics.

Here are the stats I was interested in (defined at top of the Snakefile):

CATEGORIES = ['Passing_Yards','Rushing_Yards','Receiving_Yards','Total_Tackles','Sacks','Kickoff_Yards','Interceptions','Passing_Touchdowns','Quarterback_Rating','Rushing_Touchdowns','Receptions','Receiving_Touchdowns','Total_Points','Total_Touchdowns','Punt_Yards']

Creating the Pipeline

I then created a Python script to fetch data by using these (as well as date) as snakemake wildcards. Of course, these results also dependd on the season, so I also added season as a parameter for the script. I did the same for plotting these results (using altair as described by the docs).

Eventually, I decided to also call the API to retrieve scores for a given week and updates records for all teams (why not?!). Of course, the scores required a new weeknum parameter to get the week’s scores, so I did that.

Report Generation

This was cool and all (it creates a lot of results in the data,scores, team_records, and plots folders), but lacked refinement. That’s where Datavzrd comes in! I was hoping to display all old reports via the subdirectories, but this didn’t seem possilbe. Thankfully, the maintainer was quite helpful with helping me figure this out: https://github.com/datavzrd/datavzrd/issues/1085.

Eventually, I decided to just create each week’s report one at a time. This report includes all of the plots and html files for each (leaders, scores, records). Note that the “leaders” index file has a table for all categories:

rule view_leaders_with_datavzrd:
    input:
        Interceptions = "data/{DATE}/Interceptions.csv",
        Kickoff_Yards = "data/{DATE}/Kickoff_Yards.csv",
        Passing_Touchdowns = "data/{DATE}/Passing_Touchdowns.csv",
        Passing_Yards = "data/{DATE}/Passing_Yards.csv",
        Punt_Yards = "data/{DATE}/Punt_Yards.csv",
        Quarterback_Rating = "data/{DATE}/Quarterback_Rating.csv",
        Receiving_Touchdowns = "data/{DATE}/Receiving_Touchdowns.csv",
        Receiving_Yards = "data/{DATE}/Receiving_Yards.csv",
        Receptions = "data/{DATE}/Receptions.csv",
        Rushing_Yards =  "data/{DATE}/Rushing_Yards.csv",
        Rushing_Touchdowns =  "data/{DATE}/Rushing_Touchdowns.csv",
        Sacks = "data/{DATE}/Sacks.csv",
        Total_Points = "data/{DATE}/Total_Points.csv",
        Total_Tackles = "data/{DATE}/Total_Tackles.csv",
        Total_Touchdowns = "data/{DATE}/Total_Touchdowns.csv",
        config="resources/datavzrd/summary.yaml"
    output:
        report(
            directory("results/tables/{DATE}/leaders"),
            htmlindex="index.html",
            caption="workflow/report/stats.rst",
            category="Tables"
        ),
    wrapper:
         "v8.0.0/utils/datavzrd"

I also create “dead end” branches in the GitHub repo for each week and then just run it. This way I have a record of each run of the pipeline and can go back into the archive if needed. Running it is now (usually) as simple as:

  1. Run the Snakemake pipeline snakemake --cores {CORES} --use-conda
  2. Create the report snakemake --sdm conda --report report.zip
  3. Done!

Inputs

Here is the (very easy!) YAML that connects it all together:

#made to run on Tuesdays after the week is over
year: 2025 # season (2023 for 23/24 Season, 2024 for 2024/25 Season,etc.)
date: 2025-11-25 # date in YYYY-MM-DD format
weeknum: 12 # weeknumber in order to fetch scores

Outputs

You can find a full example report result here! I hope to update this on a weekly basis 🙂


Image: screenshot of a report generated by the Snakemake workflow

View All Projects