How Winning Percentage Correlates in Baseball
The Motive and Dataset
-
Analyzing the winning percentages of teams in each season from 2010 to 2022
-
Analyzing the performance of teams in MLB for seasons from 2010 to 2022 in terms of On Base Percentage, Slugging Percentage, Batting Average and Strikeout-to-Walk Ratio metrics of teams
-
Analyzing whether there is a correlation between Win Pct. and OBP, SLG, AVG, K/BB or not
-
Retrosheet Game Logs data of MLB from 2010 to 2022 seasons are extracted
Winning Shares
-
Home team and away team wins are calculated and summed up
-
Number of home and away team games are counted and summed up
- Los Angeles Dodgers has the highest win percentage with 58.9% since 2010
- Miami Marlins has the lowest win percentage with 43% since 2010

Calculated Metrics
-
The following metrics are calculated and the table below is obtained:
-
On Base Percentage : (Hits + Walks + Hit by Pitch) / (At bats + Walks + Hit by Pitch + Sacrifice Flies)
-
Slugging Percentage : (Singles + 2x Doubles + 3x Triples + 4 x Home runs) / At bats
-
Batting Average : Hits / At bats
-
Strikeout-to-Walk Ratio : Strikeouts / Walks
-
-
The measures are calculated both for as a stats scored by the team (for) and allowed by the team (against)

-
Every season's top winning share is shown on the left
-
Covid-19 season (2020) can be considered as outlier, and it makes sense nearly 60 games are played by teams almost 1 third of a regular season
-
Los Angeles Dodgers has the highest win share in 2022 with 68%
-
Each year's best team in terms of win share has at least 60% since 2010
-
The highest win share of each year is tend to increase in the last years compared to 2010s

Regressions and Correlation
-
The ordinary least square regression model is obtained
-
it is seen that the other values rather than On Base Percentage and Slugging Percentage measures have insufficient evidence to conclude there is a non-zero correlation
-
Only OBP and SLG are kept and a new regression model is built as seen on the right
-
On Base Percentage has a higher correlation than the Slugging Percentage on winning percentage
-
Scored and allowed measures have very similar coefficients with opposite signs which totally makes sense
-
In addition to the regression results, the correlation matrix below is obtained
-
In addition to high correlation between on base percentage and slugging percentage with winning percentage, there is also high correlation between slugging percentage and on base percentage as expected


Batting Analysis in Home Runs
The Motive and Dataset
-
To analyze players in terms of batting performance in home runs
-
Seasonal analysis
-
Overall analysis between seasons 2017 and 2022
-
-
Comparison of the launch speed and launch angle values of two of the best batters
-
Comparison of average exit velocity of the balls that the best batters scored home runs
-
Comparison of pitch location that the best batters scored home runs
-
Used statscast data for the seasons 2017 to 2022

Home Run Performances
-
Aaron Judge has the highest home runs with 216 in total in the last 6 seasons
-
Aaron Judge broke the record of most home run with 62 home runs in 2022
-
Mike Trout has two seasons with the values in the list of 10 highest home run per game season performances in 2019 and 2022
-
Similar to total number of home runs, Aaron Judge has the highest home run per game performance in season 2022 in the last 6 years
-
Mike Trout and Aaron Judge are chosen to be compared
Launch Speed vs Launch Angle and Exit Velocity


Pitch Location
-
Most of the hits have a launch speed between 100 and 120 mph
-
Similar to the launch speed two players have very similar launch angle distribution
-
Median of the home runs that Judge hits are slightly faster than the Trout’s
-
When the different seasons of the pitcher velocity that the Judge hit home runs are analyzed, it is seen that, velocity of the balls coming to bat do not differ for Judge between seasons


-
The strike zone is divided a 3x3 matrix and the locations are plotted for both Judge and Trout
-
As seen in figures below both players do not hit many home runs on pitches in the upper third of the zone
-
Both players love pitches in the horizontal and vertical middle of the plate
-
Even if there are slightly different between the locations, like the previous comparison of other metrics, both player have a similar pitch location matrices
Running Expectancy in MLB
The Motive and Dataset
-
To inqury the weights of events in Slugging Percentage
-
To compare the running expectancy of hit, double, triple and home runs with weights of them in SLG calculation
-
To analyze the correlation between Running Expectnacy and Winning Percentage at team level
-
All events in 2014 to 2018 are analyzed at player level
Event Occurences
-
Of all events, single has a share of nearly 15% while doubles have a share of 4.4%
-
Percentage of occcurences of events do not differ too much through seasons
Running Expectancy and Winning Correlation
-
Running expectancy is calculated to obtain the average number of expected run per inning by considering the current number of outs and placement of baserunners.
-
There are mainly 24 start states which includes zero, one, two outs and 8 baserunner arrangement. It is calculated as follows:
-
RE = RE End State — RE Start State + Runs Scored in the Event
-
RE Start State and RE End State is calculated by getting the average of runs of different states. While Start State has 24 possible values, End State has additional 8 conbination which contain 8 baserunner arangements with 3 outs.
-
It might be wiser to use the weights seen in Table 3, rather than one, two, three, and four for Single, Double, Triple and Home Run
-
High correlation between running expectancy and Winning Percentage of teams in season 2018
-
There is a similar correlation between running expectancy and winning percentage in different seasons


