Math and Data on the Diamond

By Tim Chartier @timchartier, Davidson College

Baseball is full of history and traditions. President William Howard Taft pioneered the presidential first pitch on Opening Day in 1910. The seventh inning stretch fills stadiums around the country with the sound of “Take Me out to the Ball Game.” Statistics have been part of baseball since the 1800s when journalist Henry Chadwick printed tallies of basic statistics such as runs, hits, errors, strikeouts and batting averages. The use of mathematics in baseball has grown and plays an important role at the team and league level. How can you get in the game? For this, we turned to Scott Shapiro who works for Major League Baseball (MLB).

Chartier: Tell us about your job at Major League Baseball. What do you do and what role does mathematics play in your job?

Scott Shapiro: I work on the Data Quality team at MLB. We monitor and evaluate the accuracy of all Statcast data being captured by the video tracking systems for use by MLB clubs, broadcasters and other vendors. From a mathematical standpoint, we track outliers and provide best practices on translating raw data into metrics.

Chartier: Baseball fans have long enjoyed the game's statistics such as a pitcher's Earned Run Average or a batter's Runs Batted In. Are new analytics still being developed? How does one choose what's most important among the many statistics of baseball?

Scott Shapiro: Yes! The baseball industry has gone through many phases of data analysis. Baseball has always been a sport ripe for statistical analysis in part because it has many binary outcomes and individual contributions under the guise of a team game. Bill James popularized an early wave in the 1980s and Moneyball built on this in the 2000s. Now, the video tracking data has given rise to a host of new behaviors we can evaluate, from a hitter’s “Exit Velocity” and “Launch Angle” off the bat to a pitcher’s “Spin Rate” or “Horizontal/Vertical Break” on every given pitch.

Chartier: You interned with Excel Sports Management during college, then worked for them after graduation in analytics. What's your advice on getting internships? What did you do in your job and what kind of analytics did you study?

Scott Shapiro: My best advice would be to network, specifically in the online baseball analytics community. Baseball analytics have historically had a particularly strong online presence and open source culture, so there are great resources and avenues to get involved if you’re interested. Organizations like SABR and FanGraphs and tools like “baseballr” are great starting points that can build connections in the community and lead to internship opportunities. At Excel, specifically, my focus was around contract negotiations, and I actually relied on a lot of the open-source work being done by this community to bridge the gap in resources my agency had compared to MLB clubs.

Chartier: Data plays an ever-increasing role in our world. What advice do you have for students studying mathematics, for those interested in working in data, and for those who may not plan to work in data?

Scott Shapiro: Data is obviously a very hot topic and a buzzy term, but that has actually diversified the types of roles that are centered around data. Whether you plan to work directly with data or not, you will benefit from being able to speak the language of data and understand different uses and limitations. Those studying mathematics should take advantage of the broad scope of baseball data available publicly and incorporate it into their coursework. While students should not feel the need to find a specialty early on, it will be helpful to understand what part of the data funnel you enjoy most as data roles range from, for example, data engineering and infrastructure to data visualization and analysis.

Chartier: For a student with a dream job, such as working as a data analyst for Major League Baseball, what advice do you have as they prepare in college or they begin to look for that first job?

Scott Shapiro: I will echo my point above about networking and taking advantage of the online community. Additionally, I would stress patience as there are many non-linear paths that might take you towards a sports-related dream job and the industry can be very difficult in the early stages. Your first few jobs will be a great opportunity to add specific skills to a well-rounded college background that includes the ability to break down complex problems, communications, and technical and mathematical fundamentals. Even for technically strong students on the coding side, your early career will be a chance to learn business context and understand how your skills fit into a larger organization. So I guess this is all to say focus on discovering the subjects and skills you enjoy as the context and application will come later.

Chartier: Any final words for professors and students of mathematics?

Scott Shapiro: I’ll leave you with two things. First, always remember to evaluate your inputs and biases as strongly as your results. We have a lot more control over the process than the results and knowing how to interpret this and ask the right questions will be valuable long-term. Second, get inspired in unusual places. You might not think that reading fiction, watching a movie, or studying history will improve your data skills, but creativity and communication are underappreciated and important skills in the data world.