Over the course of a season, the average Premier League team will suffer an average of 26 injuries! Naturally, any fan would wonder how well their team might have preformed had it not been for inconvenient injuries.
The data and code I used for this analysis can be found on my Github page
We can try to answer how much impact injuries can have on a given team’s performance using 10 seasons of available Premier League data. I’ve collected final league tables from Wikipedia, season total performance statistics from the Stats section of the Premier League and Football Data websites, as well as injury statistics from Physio Room.
To help answer the question of how much do injuries impact Premier League teams, we could create a simple correlation plot between the number of points earned and the total days lost to injury for each team over the past 10 Premier League seasons.
As you can see the correlation plot suggests there is a very small relationship between the points a team earns, and the injuries it suffers. So I guess that’s that, end of analysis.
Actually no. This result is highly suspicious, since basic football knowledge would suggest injuries should have an impact on a team’s performance. It may be that our simple correlation plot is leaving out a lot of important factors that are related to both points earned and injuries. Possession could be a potential factor, as teams which enjoy more possession could suffer more injuries when opposing teams seek to aggressively win the ball back.
I specified a least squares model with points earned as my response variable, and various combinations of touches, shots, tackles, bookings, and days lost to injury as my predictors. I used days lost to injury rather than number of injuries for each team, because it better captures the severity of injuries each team faced during a given season.
The direction of the coefficient estimates in my model were as expected, injuries had a negative impact on points, shots had a positive impact on points and so on. There were two OLS models which found a significant coefficient for injuries:
|Days Lost to Injury||-0.019**||-0.020**|
|Booking Points (squared)||0.0001||0.0001|
|Days Lost to Injury (squared)||0.00001**||0.00001**|
|Residual Std. Error||10.399 (df = 191)||10.712 (df = 193)|
|F Statistic||41.872*** (df = 8; 191)||50.459*** (df = 6; 193)|
|Note:||*p<0.1; **p<0.05; ***p<0.01|
Both models suggest that for every 100 days lost to injury, a given team loses about 2 points for the season. On average, a team experiences about 1000 days lost to injury, which equates to 20 lost points a season! Evaluating these models’ predictive value (by mean absolute error), it appears that accounting for injuries creates slightly more accurate models, although Model 2 actually performed worse than the benchmark model of just touches and shots as the predictors.
Injuries by Elite, Middle of the Pack, and Relegated Teams
Besides looking at all teams at once, could it be that injuries matter more for some teams, such as the elite who have higher quality players?
I created three sub-populations from the original data set: Premier League teams competing in the UEFA Champions League (UCL), relegated teams, and teams which experienced neither. UCL-participants can be viewed as “elite”, since they finished in the EPL top four the previous year. Relegated teams are “bad”, because they ended up getting relegated (a revolutionary insight). And finally, teams which finished between 5th and 17th are “middle-of-the-pack” or average.
Why create these three sub-groups? Well there’s actually a theory to my madness…
I figure that teams which participate in the UCL will play more matches, and face a higher possibility of injuries. UCL-participating teams may suffer more injuries, but their squad depth allows them to still preform well in the EPL. Relegated teams have a limited squad, talent wise, and hence suffer most from injuries. Squad depth within middle of the pack teams can vary greatly, and so does the impact of injuries on their season
I’m essentially saying injuries impact each team’s performance differently based on their own squad depth. Maybe I found no relationship between injuries and points because elite teams do well despite injuries, relegated teams suffer from injuries, and the rest of the teams muddy up the picture because their impact from injury varies. So how does this theory hold up to the actual data?
If my theory held up, I’d expect there to be two clusters on the graph above. Elite teams would be clustered around the top right corner with lots of injuries and points, while relegated teams would be clustered around the bottom right corner with lots of injuries and a low number of points. Middle-of-the-pack teams would be all over the place.
Looking at the graph above, middle-of-the-pack teams are really spread out, but there’s no obvious clusters of elite or relegated teams.
I’ve also conducted a difference of means test between each sub-population and the total population mean for days lost to injury. Unsurprisingly, the tests indicate that there is no statistically significant difference between injuries suffered by each sub-population and the overall population.
Our regression model suggests that injuries cost a team on average, 20 points a season. However, injuries don’t appear to affect certain teams differently. This may be because the days lost to injury variable is fairly normally distributed, meaning that a lot of teams will suffer a similar amount and severity of injuries. In theory, reducing the number of injuries could help a team gain more points over the course of a season, but that’s much easier said than done, and most teams suffer a significant amount of injuries over the course of a season, negating their overall impact.