Slide Link: fsu2025.sportsdataverse.org
Saiem Gilani
The topic of our conversation will center around how to get access to sports data and why you should know your way around the data generated by the sports in which you are interested.
Saiem Gilani - Lead engineer and founder of the SportsDataverse
Born and raised a Seminole and a proud Tallahassee native. I am an FSU alumnus in mathematics and went to graduate school at Georgia Tech for analytics. My general domain of work is machine learning and data science with a current focus on sports.
I attended the FSU Sports Analytics Summit 2020!
I wrote up some of my thoughts and observations on incoming coach Mike Norvell’s presentation on a handful of analytics-related topics. This became my first article for Tomahawk Nation, the SBNation blog covering the Seminoles.
Further reading
Simultaneously, I started working with the {cfbscrapR}
package (now archived) to help write analytics driven articles.
I would not be here without my collaborators from the cfbscrapR
team:
I quickly became involved with contributing to my first open-source package on GitHub, eventually becoming a co-author. I then developed the successor to the package, {cfbfastR}
.
I used my experience from going to the FSU Sports Analytics Symposium to sharpen my networking and communication skills
A couple weeks later, I went to the 2020 MIT Sloan Sports Analytics Conference
Got to meet and see some sports analytics celebrities like Seth Partnow, John Hollinger, and Alok Pattani
Competed in the Hackathon, an exceptional opportunity to work with and chat with very talented individuals about shared research ideas and further steps we could take with our projects
I had a thought I am sure many of the long-standing members of the sports analytics community has had.
what if getting sports data for analysis was easy?
what if we worked together to build the data infrastructure for research?
how much further would we get?
An organization trying to make the sports data and analytics industry more diverse, inclusive, and accessible by providing high-quality resources for end-users and opportunities for practical code skill development for those that join the effort
💡 + 💻 + 📈
A set of packages for loading and scraping sports data in R, Python, and Node.js with focus placed on play-by-play data
+
+
A community of developers committed to developing and maintaining open-source sports data packages and pipelines as on-going public utilities
👥 + 💬 + 👩💻 + 📦
A set of corresponding data repositories which allow fast loading of the data for users and collectively form one of the largest open-source sports data resources with over 250Gb of data produced from the packages I contribute to
🔑 + 👑
Our organization helps establish the bench of developers from diverse backgrounds to spearhead projects and make contributions
20+ R packages with over a dozen sports leagues covered.
Pro Leagues
Collegiate Leagues
Access to loadable SDV-provided data and functions in the sportsdataverse
python module and access to ESPN endpoints. Additional modules include: sportypy
, collegebaseball
, nwslpy
, and recruitR-py
Access to ESPN endpoints (among other websites) via the sportsdataverse
node.js module for easy web application development.
The first public conversation on the SportsDataverse projects happened at the Carnegie Mellon Sports Analytics Conference. The paper I wrote for the conference was selected as the winner for the Data and Software contribution, Open Track for their reproducible research competition.
Further reading
baseballr
package to get you started with baseball data analysisThere are countless examples of people getting hired straight off their analysis on Twitter
Build a following to increase your network reach
Embrace non-traditional media and practice adapting communication of your analyses to different formats and audiences
Be prepared to have not nice things said about your work
Take feedback constructively and incrementally improve your projects
AthlyticZ offers courses in Python, R, Shiny Applications, Stan programming and beyond!
Sign up for preview of our Bayesian Modeling course using real NBA and soccer data
FREE TRIAL: https://athlyticz.com/stan-i-preview
cfbfastR
team:nflverse
team:Game on Paper - for a look at the sportsdataverse
python package serving live CFB expected points and win probability metrics
AthlyticZ.com - for state of the art data science training from sports industry veterans
guideR - compilation of sports analytics resources
Slides link fsu2025.sportsdataverse.org
| Source code | Author: Saiem Gilani