Over the past few weeks, as I’ve been teaching myself R (an incredibly powerful tool that I highly recommend for anyone interested in any kind of numerical analysis or data science), I’ve been using a number of big data sets as fodder to play around and learn from. One nearly bottomless source of data that I happen to be extremely interested in is hockey statistics: I’ve been an avid reader and occasional contributor to Eyes on the Prize, a very intelligent and analytics-oriented blog covering the Habs, that has planted me firmly in the ‘fancy stats Excel nerds’ camp of hockey fans.
As with most well-watched sports, there is a huge wealth of data freely available that anyone can play around with: although NHL.com has a few (very basic) numbers available, the majority of the information used by the analytics community comes from third-party aggregation websites like war-on-ice.com, behindthenet.ca, and many more. I’ve been having a lot of fun going through these vast treasure troves of data, and learning my way around R at the same time. Here’s one example of something neat I uncovered, which I hope you all find interesting.
For anyone who doesn’t follow hockey closely, one important concept to understand is a Zone Start. Whenever the puck is dropped to initiate play, players on any given team could find themselves in one of three situations: in their own zone, where your opponents are threatening to score; in their opponents’ zone, where the reverse is true, and the neutral zone which is in between. Some teams (most notably the Vancouver Canucks during the years when they were insanely good) developed strategies to maximize player value by shifting around the deployment of their lines. Others tend to use their offensive zone starts as a way to shelter certain players or lines, typically younger players with high offensive potential but who are still learning the defensive aspects of the game.
With this in mind, I recently sifted through some data and asked the following question: is there a consistent relationship between Zone Start Ratio (Offensive Zone Starts/Defensive Zone Starts) and age? I expected there to be a significant, and negative one: the older players would take on more defensive zone starts as a proportion of the total than younger players.
To answer that question, I took a look at every player-season for the last 8 years (15 games minimum), which was well over 5000 data points: enough to hopefully show something. Looking over the entire league over 8 years, graphed as a hex-bin (basically a two-dimensional histogram), here’s what came out:
This surprised me a bit; I figured you’d see younger players see more zonally-sheltered minutes, with more D-Zone starts taken by older players. Indeed you do see a small curve towards D-Zone starts in the middle-aged years, with a minimum in the early 30s, but the magnitude was a lot smaller than I expected.
So I looked at the same thing team by team, something really interesting emerged: there appear to be three distinct ‘signatures’ for how teams allocate zone starts between the younger and older members of their teams.
Group 1, which I call the ‘normal group’, is more or less what I’d expected the league to look like: a sharp tendency towards offensively-weighted zone starts for younger players, balanced out by the rest of the team:
The second group, or the ‘flat line’ group as I call them, shows essentially zero relationship between age and ZS%:
And finally, the third group of teams I call the ‘well teams’, due to the representative shape on their curves: rather than prioritizing OZ starts for a specific group of young players by distributing the load among the rest of the team, they take a different approach: by loading one group of players (usually older ones, but not necessarily) with a heavy DZ load, the remaining OZ time is distributed among the rest of the team more or less evenly by age:
Hope you enjoyed these idle observations; I’ll post more neat things as I uncover them (and hopefully with more sophisticated analysis later as I become more proficient with R).
-All numbers thanks to www.war-on-ice.com. Thanks to the great people there who keep the site running!