FSU Sports Analytics Speaker Series

FSU Sports Analytics Speaker Series

Slide Link: fsu2025.sportsdataverse.org

Saiem Gilani

Overview

The topic of our conversation will center around how to get access to sports data and why you should know your way around the data generated by the sports in which you are interested.

About me

Saiem Gilani - Lead engineer and founder of the SportsDataverse
@saiemgilani @saiemgilani @saiemgilani.bsky.social

Background

Born and raised a Seminole and a proud Tallahassee native. I am an FSU alumnus in mathematics and went to graduate school at Georgia Tech for analytics. My general domain of work is machine learning and data science with a current focus on sports.

Where are data skills used in sports?

  • Data-Driven Decision Making
    • Performance Optimization: player performance, tracking fitness, etc
    • Tactical Insights: analyzing opponent tendencies, optimizing game plans, etc
  • Player Evaluation
    • Scouting: evaluating player performance, identifying potential recruits, etc
    • Drafting: evaluating draft prospects, making informed decisions on draft picks, etc

Where are data skills used in sports? (cont’d)

  • Fan Engagement
    • Social Media: analyzing fan sentiment, creating engaging content, etc
    • Marketing: targeted advertising, personalized experiences, etc
  • Revenue Generation
    • Sponsorship Insights: data helps teams demonstrate ROI to sponsors by quantifying fan engagement and brand exposure
    • Dynamic Pricing: engineers create models for ticket pricing that adjust based on demand, maximizing revenue.

Where are data skills used in sports? (cont’d)

  • Injury Prevention
    • Wearable Technology: monitoring player health data, identifying injury risks, etc
    • Load Management: Teams can track training loads and recovery times to minimize overtraining and optimize performance
  • Operational Efficiency
    • Automated Workflows: streamlining processes like video analysis, data collection, and reporting, saving time for coaches and analysts.
    • Integration of Systems: integrating disparate data sources, creating a cohesive ecosystem for decision-making.

How did you get started?

I attended the FSU Sports Analytics Summit 2020!

I wrote up some of my thoughts and observations on incoming coach Mike Norvell’s presentation on a handful of analytics-related topics. This became my first article for Tomahawk Nation, the SBNation blog covering the Seminoles.

Further reading

Started meeting great folks online

Simultaneously, I started working with the {cfbscrapR} package (now archived) to help write analytics driven articles.

I would not be here without my collaborators from the cfbscrapR team:

I quickly became involved with contributing to my first open-source package on GitHub, eventually becoming a co-author. I then developed the successor to the package, {cfbfastR}.

Went to another conference

  • I used my experience from going to the FSU Sports Analytics Symposium to sharpen my networking and communication skills

  • A couple weeks later, I went to the 2020 MIT Sloan Sports Analytics Conference

  • Got to meet and see some sports analytics celebrities like Seth Partnow, John Hollinger, and Alok Pattani

  • Competed in the Hackathon, an exceptional opportunity to work with and chat with very talented individuals about shared research ideas and further steps we could take with our projects

Everything came to a screeching halt

Then, I had an idea 💡

I had a thought I am sure many of the long-standing members of the sports analytics community has had.

  • what if getting sports data for analysis was easy?

  • what if we worked together to build the data infrastructure for research?

  • how much further would we get?

The SportsDataverse

  • An organization trying to make the sports data and analytics industry more diverse, inclusive, and accessible by providing high-quality resources for end-users and opportunities for practical code skill development for those that join the effort
    💡 + 💻 + 📈

  • A set of packages for loading and scraping sports data in R, Python, and Node.js with focus placed on play-by-play data
    R + python + nodejs

The strength of the SportsDataverse

  • A community of developers committed to developing and maintaining open-source sports data packages and pipelines as on-going public utilities
    👥 + 💬 + 👩‍💻 + 📦

  • A set of corresponding data repositories which allow fast loading of the data for users and collectively form one of the largest open-source sports data resources with over 250Gb of data produced from the packages I contribute to
    🔑 + 👑

  • Our organization helps establish the bench of developers from diverse backgrounds to spearhead projects and make contributions

Our progress so far R

20+ R packages with over a dozen sports leagues covered.

Pro Leagues

  • NBA
  • WNBA
  • NBA G-League
  • MLB
  • NHL
  • PWHL
  • NWSL
  • A boatload of soccer leagues

Collegiate Leagues

  • College Football
  • Men’s College Basketball
  • Women’s College Basketball
  • College Baseball
  • College Softball
  • College Football Recruiting
  • College Basketball Recruiting

Our progress so far python + nodejs

Access to loadable SDV-provided data and functions in the sportsdataverse python module and access to ESPN endpoints. Additional modules include: sportypy, collegebaseball, nwslpy, and recruitR-py

Access to ESPN endpoints (among other websites) via the sportsdataverse node.js module for easy web application development.

Why use the SportsDataverse?

The first public conversation on the SportsDataverse projects happened at the Carnegie Mellon Sports Analytics Conference. The paper I wrote for the conference was selected as the winner for the Data and Software contribution, Open Track for their reproducible research competition.

  • It was built for you enthusiasts and soon-to-be entrants into the field
  • Allows users to quickly access seasons worth of datasets (which are updated nightly via automated GitHub Actions) via loading function calls, taking the burden off users to maintain their own web scraping scripts
  • This in-turn provides significantly easier opportunities for reproducible research and reporting

Further reading

Great… but how does this help me?

Well, there is a fairly direct pipeline from…

  • Contributing to open-source projects
  • Using open-source resources to create your own open-source sports analytics projects and portfolio
  • Being an active member of the open-source sports analytics community
  • Attending conferences and meetups to meet other sports analytics enthusiasts and professionals (I recommend the CMU Sports Analytics Conference)
  • Competing in collaborative hackathons and data challenges

…to getting a job in Sports Analytics

More on this topic

How do I get started?

  • Start with one of the many open-source sports data packages available from the SportsDataverse and their tutorials at their respective documentation sites
  • Use the data to create your own projects and analyses to publish in blogs or on social media
  • Work through introductory sports analytics texts like Analyzing Baseball Data with R, 3rd Ed. which uses the baseballr package to get you started with baseball data analysis

Find opportunities to build your portfolio

  • Contests are a great way to get involved with the community and meet other sports analytics enthusiasts
  • Start sharing your code and projects online so that people may see them
  • You may get opportunities to share projects created during the interview process on GitHub and in your portfolio
  • Listen to thought-provoking conversations and let your curiosity lead you to new projects which address the questions using the data you have access to

Then share on social media!

  • There are countless examples of people getting hired straight off their analysis on Twitter

  • Build a following to increase your network reach

  • Embrace non-traditional media and practice adapting communication of your analyses to different formats and audiences

  • Be prepared to have not nice things said about your work

  • Take feedback constructively and incrementally improve your projects

Continue your learning journey

AthlyticZ offers courses in Python, R, Shiny Applications, Stan programming and beyond!

🛠 Platform & Infrastructure

  • Fully functional Learning Management System built with Node.js and React + Google Cloud Platform for seamless coding experience.

🤝 Partnerships & Team

  • Curricula designed by industry and academic veterans from the NBA, MLB, ESPN, Notable authors from recognized textbooks in R/Python and members at Columbia University + UPenn.

AthlyticZ Academy

State-of-the-Art Offerings

  • Customized curricula to meet the market needs for a blend of data science with sports analytics applications

Per Student IDE’s/CPU’s

  • All coding modules will appear with preinstalled packages and data through a familiar IDE

Custom CMS

  • Instead of an off the shelf LMS/CMS – we invested in the development of a customized environment to ensure scalability, bespoke content management, code ownership and more.

AthlyticZ Preview

Sign up for preview of our Bayesian Modeling course using real NBA and soccer data

FREE TRIAL: https://athlyticz.com/stan-i-preview

Some Inspirations and Heros

My beautiful and brilliant wife, Madiha, and my family

My collaborators from the cfbfastR team:

  • Akshay Easwaran
  • Jared Lee
  • Eric Hess

The creator of CollegeFootballData.com:

  • Bill Radjewski

The nflverse team:

  • Sebastian Carl
  • Ben Baldwin
  • Tan Ho

Thank you

  • FSU Sports Analytics Club for creating a wonderful speaker series
  • The seriously awesome community of developers that helps build and maintain resources
  • All y’all for listening in

Learn more

Questions?