Hello! I'm a pretty big Habs fan, and although I've maybe dropped a comment or two here and there, this is my first real post on EOTP. I first started reading the site regularly towards the beginning of this season, and I'm excited to join because I feel that I may be able to contribute every now and again, as I am huge fancystats nerd. I have about a half dozen projects on my computer that I work on when I get the chance and this one is pretty quick and easy, so I figured it'll be the first of hopefully many that I post.
So, this project was triggered a few weeks ago when I had a conversation with a friend of mine as we were both procrastinating during the dreaded finals week. Both being hockey fans, our conversation naturally shifted towards the first round of the playoffs, and he, being a Chicago fan, said "Man, our team is really good, but I just think Crawford is really overrated," to which my response was "Well, it doesn't really matter that Crawford is overrated because Chicago doesn't need a world-beater like Montreal does, just a solid, consistent goalie."
Save for a few melodramatic Hawks fans, I really doubt that any hockey fan would disagree with my assertion in my response to my distressed Hawks-fan friend (to be fair, it was the day after they fell to St. Louis 2-0). There is a perception, even if it's not spoken of very much, that there is an inverse relationship between puck possession and how good your goalie has to be. As another example, look at Canada's goalie situation during the Olympics. While we obviously wanted the best goalie possible to start, general consensus was that we wouldn't need a goaltender to steal us games, just one that wasn't going to lose us games. The same can be said for Chicago this year, and the reason is because both Team Canada and the Chicago Blackhawks are damn good teams, and damn good teams possess the puck.
So, is this true? Do teams that possess the puck rely less on their goaltenders and, to be more general, luck? In order to assess this, I looked at the correlation over the past 3 seasons between PDO and success. In this context, success is going to be points per game, because of the 48-game schedule last season. There will be 3 groups of teams, those that were top-10, middle-10, and bottom-10 in terms of possession (Fenwick) for each season, meaning there is a total sample size of 30 teams for each group.
Correlations will be measured using residuals, or the R^2 value that comes with a linear regression analysis. The R^2 value will always be between 0 and 1, and the higher it is, the better the linear regression is at explaining the data. I used a Fisher r-to-z transformation to receive a z-value, and then finally a p-value to determine if there is a significant difference between correlations. The p-value also takes on a value between 0 and 1, and it is the probability that the difference between the two regressions is due to random, natural variations in the data. If we get high R^2 values and low p-values, then we'll know we're onto something. Using these as the final analysis, we will be able to say whether or not the importance of PDO varies with how good a team is at puck possession.
All data comes from ExtraSkater.com, and is taken 5 on 5 while the score is close.
Part I: PDO Analysis
In order to analyze the correlations between PDO and success for each group, we need something to compare them to. Rather than compare the 3 groups directly to each other, I feel that it is best to compare them to the overall, full-league data. So, each team for each season is part of the "control" group, with no sorting, meaning there are 90 data points. On the x-axis is a team's PDO, and the y-axis is their points per game. The best fit line is shown, and the R^2 is shown as well.
Now, here's the correlation between PDO and success for the group of top-10 possession teams:
The first thing to notice is that the R^2 values aren't really all that different. The Fisher r-to-z transformation yields a z-value of .05, and the p-value associated with that is .4801, meaning there's a 48% chance you get this result assuming there isn't a significant difference between the two, and that the observed difference is just due to random chance. This means that I would say that the correlation between PDO and success for top-10 teams is not significantly different from the overall correlation. While you could argue there is the possibility of a significant difference, the odds, the difference in correlations would only be .01, so approximately nobody would care.
Now for the middle-10 teams. This is where things get a little bit more interesting:
Simply put, I did not see this coming. I was so surprised, in fact, that I double and then triple checked the data to make sure I didn't just screw up. Turns out, I didn't, and the difference is, of course, significant. The data yields a z-value of 2.42, and a p-value of .0139. If you wanted to argue that there isn't a significant difference between middle-10 teams and all teams, you'd be arguing in favor of 1.5% odds, so, uh, good luck with that.
Finally the bottom 10 teams, where things get a little interesting again:
The difference between correlations here leads to a z-value of 1.26, and a p-value of .2077. This is a tougher set of data to read. While it's certainly tempting to argue that there is a significant difference, as the odds of the observed difference just being random are only about 21%, it's certainly possible to argue the contrary, as was the case with the top-10 possession teams. If the difference between correlations is much greater than was the case with top-10 possession teams, though, so I'd label this one as inconclusive.
So what does it mean?
Well, the first thing to notice is that the middle-10 teams tend to align with PDO much more than their counterparts, who either suck or are really good. This makes sense when you think about it. Most of the middle-10 possession teams are the ones that are fighting for playoff spots, and the difference that hot goalie or a hot line can make down the stretch is huge, and often makes that little difference between making the playoffs and not. As for the top-10 and bottom-10 teams, we really can't say too much. There is some evidence to suggest bottom-10 teams align more with PDO as well, but it's not nearly strong enough to be irrefutably true.
Part II: There's got to be more to it than this
There sure is. The problem is that there are two ways for importance of PDO to be reflected. While the correlation of a linear regression is sure to change, the actual slope of the line can change as well. A steep slope would indicate a large difference in success based on a small difference in PDO, whereas a shallow slope would indicate that PDO could change a lot, but success wouldn't be affected as much.
So what do the slopes show? Well, they get a little steeper with each progressively worse possession grouping, which could suggest that a small difference in PDO is more important in determining success for bad possession teams, but the difference in slopes isn't so big that the result is statistically significant, so, once again, I'd label this as inconclusive, although more data would paint a clearer and potentially very telling picture.
Part III: What if we break PDO down?
PDO is the addition of save percentages and shooting percentages, so why don't we look at the results for just shooting and saving percentages?
Without getting into making this a longer read than it already is, shooting percentages paint much the same picture that PDO does, with the difference in correlation being strongest in the middle-10 group of possession teams, and the slopes actually being conclusively similar, rather than being inconclusive.
As for saving, well things get interesting again. The correlations align the same way that PDO did, but the slopes are really interesting. This graph is the same as the ones above, except for the fact that the x-axis is now save percentage instead of PDO. I have all 3 groups shown, and the black line is the baseline, or the slope of the line for all teams.
The only slope that differs much from the slope of the line for all teams, is that for top-10 possession teams. It is, in fact, about 10% as steep, which means that even a large change in goaltending performance leads to just a small change in success for those teams.
That means the narrative that elite teams don't necessarily need elite goaltending is in fact supported by data, and not just lazy narrative-building, built on almost nothing except for decades of a raging confirmation bias and idiotic media members who just decide to say the same things over and over again. In other words, the mainstream media actually got something right.
So what did we learn? Well, essentially 3 things:
1. PDO has a much stronger correlation to success for middle-10 possession teams. That means that successful teams that are neither elite nor terrible at possession probably have a high PDO.
2. We learned that the slope of the linear regression line for success compared to save percentage for elite possession teams is very low. That means that the difference between elite and sub-par goaltending has little bearing on how elite possession teams fare during the regular season.
3. Other than those two things, there is evidence to suggest that the slopes of best fit lines for success and PDO get steeper as possession gets worse. To be sure, more data needs to be collected and analyzed again. If it's true that the slopes are steeper, that would explain why regression to the mean for PDO has a much greater impact this season on a team like Toronto (a terrible possession team) than New Jersey (an elite possession team).
Finally, just so that you don't have to take me at my word for some of the numbers that I used, I've got a table of all the correlations (R^2 values), the p-values associated with them, and the slopes of each best-fit linear regression line that I used during this project.
So there you have it. To expand on the idea, I think it would be interesting to see if success of special teams has a similar impact, being less significant for elite possession teams than mediocre ones. That would explain why Boston was able to win the Stanley Cup with a powerplay only slightly less miserable than the existence of the post-67 Toronto Maple Leafs and why the Habs will struggle to win another game, much less the series, against Boston if their powerplay goes cold. Because let's face it: In this series so far, basically one team has had the puck.