In the search for the magical shot quality, plenty of fancy stat hoarders have relied on the NHL shot data archive to come to absolute conclusions. Considering I have spent the last 18 months of my life on the mystical search for the imaginary unicorn and numerous hours watching how inaccurate this data is, it is frustrating when it gets referenced to prove anything with certainty.
There have been plenty of studies that use the x,y data from the NHL.com database and they are all very rational and intelligent interpretations of probabilities. The problem is that the data is almost useless.
Recently James Mirtle used the shot distance to show how Carlyle has improved the Leaf defense (an assertion that I believe). To do so, he used Greg Sinclair's (@theninjagreg) shot application to do so. Andrew also used it at the end of last season and it is a great little app, unfortunately the NHL's data sucks.
I recently went over the Sabres/Canadiens game to illustrate it's inaccuracy. I understand that this is a one game sample, but I have tracked 500+ games and this inaccuracy is consistent with my experience.
Some of these might not seem like a big deal, but when you look at the expected goals by distance, it increases the closer you get to the net.
Of those 32 shots ten were off by more than 10 feet.
|67 - PACIORETTY||22||49||+27|
|17 - BOURQUE||52||26||-26|
|14 - PLEKANEC||41||17||-24|
|21 - GIONTA||14||31||+17|
|51 - DESHARNAIS||70||53||-17|
|61 - DIAZ||57||42||-15|
|72 - COLE||20||6||-14|
|81 - ELLER||60||48||-12|
|20 - ARMSTRONG||18||28||+10|
Another 6 shots were between 5-10 feet off. Only 4 were 100% accurate. Essentially the NHL data was 50% accurate to within 5 feet. That is distance alone, it doesn't take into account the accuracy of the co-ordinates. A shot from 20 feet in the slot is much different than a shot that is from 20 feet from below the face-off circle.
The average NHL shot distance measured 36.7 feet, the actual measurement was 34.3, a difference between almost 2.5 feet. Considering that Mirtle's article concludes that Carlyle is responsible for shots 4 feet further than last season and the NHL shot data is off by 2.5 feet, is that number actually 1.5 feet off, 6.5 feet off, is the number somehow accurate because the numbers are skewed exactly the same in random amounts?
Here are the actual shots and the nhl locations with the true shot indicated in green (red for goals) with a screen capture for confirmation.
This is post bounty Rene Bourque producing 8 shots (NHL.com only credited him with 7 because of a phantom Gionta tip-in). As you can see by the evidence, these results are all over the place. When you add up the actual and the data locations we have almost exactly the same distance (208 to 210), but that is only because I credited Bourque with the phantom shot awarded to Gionta from the lip of the crease.
Distance wise this might not seem to be too inaccurate, but if you use x,y co-ordinates and equate it to the home plate area that Olivier tracks scoring chances, you can see how much it can impact the data.
The shot distance numbers are extremely erratic in regards to Tomas Plekanec's data. He had 5 shots on goal (two of which were goals) which totaled 199 feet, in actuality the number was 159 feet. Close to 10 extra feet per shot.
Colby Armstrong would have looked to have created 3 scoring chances according to the home plate indicator and the nhl data. Looking at the screen captures, I would be hard pressed to award him with more than 1.
Pretty accurate reading for Raphael Diaz, although the one slap shot is a little more dangerous than the data would indicate.
The reading for P.K. Subban is also fairly accurate. It is off by a couple feet, not 100% accurate, but in the ballpark of acceptability.
Here is where the data can be extremely frustrating. It has credited Max Pacioretty with 2 shots and provided David Desharnais with a phantom shot. So they removed a shot from Max, nailed one with 100% accuracy and missed the other shot by 27 feet. The NHL shot data everyone.
Brian Gionta. Phantom tip.
Brandon Prust. I could live with this.
For Eric Cole, the goal probability difference between these two locations is fairly wide.
Brendan Gallagher and Alex Galchenyuk didn't come up in the app, my guess is because they are rookies and their names haven't been entered into the database, so the accuracy of their shots could not be measured.
Now multiply these findings during an 82 game season by 2460 and you will have 24,600 shots that are off by more than 10 feet. You will have 39,360 shots that are inaccurate by more than 5 feet. If you average out each team for 30 shots per game x 30 teams against the 12.5% accuracy of this sample and you are looking at 9,225 of 73,800 shots being accurate. I am supposed to rely on this data to make conclusions? Even with accurate data you can have sample size problems, when 90% of your data is inaccurate and about 30% spectacularly so, this data is worth what exactly?
It's a great little app and would be invaluable to me if the data sets were actually accurate. In this day and age with the NBA using the Synergy tracking system the NHL embarrasses itself with these data sets. If the readings are this inaccurate, I don't really understand the point in tracking it at all.