Wednesday, March 25, 2015

Correlation between Winning Percentage and OBP

This week has been interesting, with Brandon Phillips having some not so smart thoughts on on base percentage. Bob Nightengale of USA Today talked to Phillips about it, and Phillips responded with this:

"I don't do that MLB Network on-base percentage (stuff). That's the new thing now. I feel like all of these stats and all of these geeks upstairs, they're messing up baseball, they're just changing the game. It's all about on-base percentage. If you don't get on base, then you suck. That's basically what they're saying. People don't care about RBI or scoring runs, it's all about getting on base."

Coming from a guy like Phillips this isn't very surprising, but just being around a guy like Joey Votto, you'd think he'd be a little more educated. Now I don't expect players to be into advanced statistics, but they at least understand the value of getting on base.

Well today Jon Morosi over at Fox Sports tweeted this:

Jon's tweet got me thinking. I would assume there is some sort of connection between on base percentage and winning percentage, but just how strong is that correlation? So I decided to crunch the numbers. 

I took the winning percentage of all 30 teams over the past five years (2010-2014). This gave me a decent sample of 150. I did a simple correlation with two different statistics, OBP and wOBA (Weighted on base, which you can read about HERE. Here were the results, in both table form and as a scatter plot

Winning % OBP wOBA
Winning %  1.000
OBP  .535  1.000
wOBA  .547  .908  1.000

Winning Percentage vs. OBP


Winning Percentage vs. wOBA 


As you can see from the scatter plots, there is a pretty strong correlation between winning percentage and both OBP and wOBA, with wOBA being the stronger correlation.

The r value, which describes how strong the correlation is for OBP is .535 while the r value for wOBA is .547.

In general, here is a guide that describes r value and how strong a correlation is (From http://faculty.quinnipiac.edu/libarts/polsci/Statistics.html).

If r = +.70 or higher Very strong positive relationship 

+.40 to +.69 Strong positive relationship 

+.30 to +.39 Moderate positive relationship 

+.20 to +.29 weak positive relationship 
+.01 to +.19 No or negligible relationship 
-.01 to -.19 No or negligible relationship 
-.20 to -.29 weak negative relationship 
-.30 to -.39 Moderate negative relationship 
-.40 to -.69 Strong negative relationship 
-.70 or higher Very strong negative relationship

This shows that the correlation is indeed a strong one, like I had originally suspected. Take notes Brandon Phillips, on base percentage is important.