There’s No 2019 Preview, Why?

In the past 10 months of coding, I’ve built up a number of tools and modules that navigate the data I have collected and ratings I have calculated. I had lofty ambitions of using these, and developing more, to provide a comprehensive preview of the upcoming season from the SOLDIER perspective.

It took a while to properly elucidate, but it was soon blindingly obvious that although such a prediction was possible, it would be pointless and inaccurate! The reason is easy to explain. The key outcomes I have chosen to focus on with the SOLDIER model thus far are on form-based predictions with the purpose of predicting upcoming games. The game-prediction model uses recent (5-game) and longer-term (20-game) form of player and team performances to predict an outcome using the players selected on the team sheets. Predicting 24 rounds ahead (plus finals) is a very long bow to draw when the ammunition is a set of darts.

The Problem

Later in the 2018 season, I started producing some weekly predictions simulating the rest of the season to establish end-of-season predictions based on current form. Naturally, my first port of call in predicting 2019 was to use this method to pit the teams against each other on both level playing fields (round-robin, each team playing each other home and away) to rate each team, and simulating the actual 2019 fixture as a more practical measure.

The results were surprising at first.

Wow, are Geelong that good? Are Sydney that bad? There’s a lot to unpack here, but something doesn’t quite add up. With such a broad prediction, the first sanity check for me is to compare to the bookies. After all, they’re the professionals in this caper. Just looking at the top 8 percentages, the ladder simulation is a lot more certain of things than the odds suggest*. Sydney, in particular, are about a 50% chance with the bookies to get into the top 8.

While this serves as a strong argument for the uselessness of such a long-range prediction, it serves as a reminder of the strengths and disadvantages of what I’m modelling. Furthermore, it provides direction for what could be done to improve such predictions in the future

The hurdles to overcome are plentiful if I were to predict a season with the model as it is:

  1. Which players will play each game?
  2. What effect does the off-season have?
  3. How do you account for natural evolution of players?
  4. Will a team’s game plan change?
  5. Will rule changes or “in-vogue” tactics change what statistical measures win games?

*mainly because the player/team form distribution is fixed in the above simulation rather than using a Brownian-motion inspired model

1. Which players will play each game?

The SOLDIER model encompasses player statistics and form. The chosen players for each team have a small but noticeable effect on the predicted outcome of the game. For the above simulations I used the “first choice team” as opined on afl.com.au for each club, and adjusted it for known injuries. But of course, no team is going unchanged all season; injuries play a part, younger players get tried out, and selection can be based on the opposition. A more sensible way would be to look at a squad (say, of 30) and average out to 22 players, which would be fairly easy to do.

2. What effect does the off-season have?

It’s simple to argue that form will not necessarily carry over the off-season. There are too many immeasurables and unknowables to look at the individual off-season and pre-seasons of all 500+ players and adjust “form” accordingly. Are there any rules of thumb? To probe this question, I browsed player data for the last five home-and-away rounds of 2017, and the first five home-and-away rounds of 2018, and looked at the difference of each players’ average performances in these two periods. While there were mostly drops in stats across the board going to a new season, it was not statistically significant. As an easily relatable example, out of the 293 qualifying players for this study, the players scored on average -0.5 less Supercoach points after the season break (p=0.7)*.

A further thought was probed that less- and more-experienced players may be affected differently by the summer break. Further splitting the already-filtered data into players with less than 50 games at end of 2017 (N=92) and more than 200 games (N=28) also proved fruitless with no statistically significant differences across the break. There’s more that could possibly be looked at here but I strongly suspect little progress.

*This isn’t the best measure to use here as Supercoach points are scaled per game, but the p-values are similar for other unscaled measures.

3. How do you account for the natural evolution of players?

Sam Walsh will definitely play this year, barring tragedy. So, how does one predict what Sam Walsh will produce this year? There is no data on how he performs against other AFL-quality teams in games for premiership points. How could I handle him and every other rookie that may or may not play this year?

Currently, if a debutante is playing the model will assign the player’s performance to be a plain old average of every first-game player’s historical outputs, regardless of the draft pick, playing role, team, etc. By the player’s second game, and subsequent games, it uses their personal recorded data. This is a decent trade-off for simplicity in hanging debutantes on a week-by-week basis, but does not hold for long-term predictions.

HPNFooty have done some magnificent work with player-value projection based on analysing a current player’s output and comparing with other players on a similar trajectory, discounting for other factors such as player age. On a slightly different arc, The Arc has used clustering algorithms to classify players into particular roles. By implementing similar concepts and merging the two together, it could be possible to (manually) assign a debutante a playing role (say, key defender or small forward) to project a more meaningful prediction for a season ahead.

Other thoughts this question brought up is how to handle players undergoing positional change (i.e. James Sicily, Tom McDonald) and old fogies put out to stud in the forward line (GAJ), but these are more one-offs that are probably not worth trying to manually override.

4. Will a team’s game plan change?

While player performance is a focus of this model, as important (if not more so) are the team measures that input into the predictions. Each team itself gets a rating in 6/7 SOLDIER categories based on team form. These measures incorporate team-aggregate statistics that cannot be allocated to individual players (say, tackles per opposition contested possessions). These could, conceivably, be a function of both team performance as a whole and the team’s game plan.

How well does this team form carry over to a new season? Do big personnel changes effect a noticeable change in a team’s output? These are questions I planned to have answered before this moment but they’ll have to wait.

5. Will rule changes affect game balance?

The AFL is an evolving competition, there are very frequent rule changes that never really allow the game to settle to a point where all strategies with a given set of rules are explored. Having said that, assuming the player and team performances are projected as well as possible over a whole season, will the model’s prediction be accurate when the effect of rule changes is unknown?

The fit of the model is updated every round when there is fresh data. It compares the player and team SOLDIER scores, as calculated from the published statistics, and fits them against the game results. More recent games are weighted more strongly to reflect the prevailing style of football – and what combinations and strategies beat what combinations and strategies. Significantly better results have been obtained using this approach rather than using all historic data equally weighted to fit the model.

The effects the new 2019 rules will have is very much unknown, and the response in tactics from teams will naturally evolve over the season.

Conclusion

I had planned on presenting more data to back up the above points but time got the better of me and hopefully I’ll expand on this throughout the season.

Without a number of pending and unplanned improvement to my processes, a long-term prediction covering a whole season is not going to be purposefully indicative of reality. Sure, Geelong could top the ladder, but for the above reasons I wouldn’t bank on it!

The SOLDIER model

In this first post for 2019 I will give an outline of the AFL Lab model to be used for its debut full season. I originally started this project from my love of sport and desire to learn about machine learning and data science in general. It also coincided with a career move, which left me with a bit of free time for a while! It is by no means complete, professional and optimised and never will be.

***

Summary

AFL is a team sport that, like many sports, relies on a combination of individual and team efforts. A number of freely-available statistics are accessible to the public, recorded for each player in each game. For many of these statistics, the difference between the teams’ aggregates correlates well with the game outcome. These key statistics are selected from the data and assigned one of seven categories (SOLDIER); and each player in each past game are given a rating in these seven areas. In addition to these “player” ratings, certain team features (derived statistics) that strongly correlate with match outcomes have been identified and this affords a “team” rating in these seven areas. These team statistics are not attributable to particular players and could be considered a descriptor of an overall game plan, or just team performance. A machine learning model was trained using standard supervised learning techniques and parameter tuning. The inputs to the model are the difference in the teams’ aggregate player ratings (seven variables), and the difference in the teams’ team ratings (seven variables), with the output being the game margin. Future games are predicted by using recent data for the selected players and teams to predict each the model inputs for the game, with appropriate error tolerances for variation in form. This allows Monte Carlo simulations of each game, producing a distribution of outcomes. Simulations of past seasons produce accuracy similar to other published AFL model results. The model has the potential to bring deeper insight into many facets of the sport including team tactics, the impact of individual players, and

Aim

The aim of building this model is to implement machine-learning techniques to predict and understand the outcomes of AFL games.

Raw Data

There are a number of sources available that compile and store statistics for AFL matches, without which projects like these just can’t go ahead. AFLTables provides a comprehensive coverage of historical matches in an easy-to-handle manner. Footywire provides additional statistics for more recent games, and AFL.com.au fills the gaps and provides some text describing games. All three of these sources are implemented and scraped responsibly to maintain a database.

The statistics recorded, and the availability of statistics has changed in the past decade. The full gamut of statistics that is used in the current model have been available since 2014 and so earlier data is not used. While it would be possible to adjust the following analysis to account for missing statistics in past data, a key focus of this work is to consider the changing nature and tactics of AFL football and as such, including earlier games may be counterproductive to understanding the game as it is played today.

SOLDIER

The raw statistics were analysed against game outcomes to understand which have the strongest correlation: In each game, for each raw statistic, the sum of away player contributions was subtracted from the sum of home player contributions to obtain a “margin” statistic (if the margin statistic is positive, the home team accrued more of the statistic). The margin statistics were tested against the score margin and a simple Pearson’s correlation coefficient was calculated.

Following this, rather than naively looking at the raw statistics, a number of features (derived statistics) were calculated and tested in the same manner. Features have the potential of providing better context for the raw statistics, for example, a team recording many Rebound 50s (defensive act where the ball is moved out of defensive 50) is not that impressive if their opponent has many more Inside 50s; their success at defending the opponent’s Inside 50s is important, not the raw number.

The relevant raw statistics and features were then allocated different categories depending on what aspect of gameplay they represent. Seven different categories were identified:

  • Scoring – Directly scoring goals/behind, setting up others who do the same.
  • Outside Play – Also called “Uncontested”. Staying out of the contest and being efficient at it.
  • Long Play – Moving the ball quickly with marks and kicks. Getting the ball in the forward 50.
  • Discipline – Also called “Defence”. Doing the tough stuff. Spoils, intercepts, winning free kicks (and not giving away free kicks)
  • Inside Play – Also called “Contested”. Getting the ball in the contest, clearing it from the contest, and tackling. Efficiency important but less so than uncontested play.
  • Experience – How experienced are the players? The number of games played, finals played, Brownlow votes received.
  • aeRial Play – Commanding the ball overhead. Contested marks, hit outs, raw height.

The raw statistics and some of the features can be directly attributed to individual players, but most of the features are representative of the team itself rather than the individuals. These team measures could be considered a way to quantify teamwork and/or game plan. Each chosen statistic and feature has been distributed to each of the above seven categories, each split up by whether they are player-specific or team-specific.

Category Player Examples Team Measure Example
S Scoring Goals, Goal Assists, Points/I50, Marks I50 Percentage
O Outside Metres Gained, Uncontested Possessions Cont. Pos. Ratio
L Long Play Marks, Kicks, Inside 50s Inside 50 Efficiency
D Discipline One Percenters, Intercepts, Free Kicks Rebound 50 Efficiency
I Inside Contested Possessions, Clearances, Tackles Cont. Pos. Margin
E Experience Games Played, Past Brownlow Votes HGA adjustment*
R aeRial Height, Contested Marks, Hit Outs Cont. Marks conceded
*The Team Experience measure is currently taken to be a completely deterministic variable that depends on how far each team has travelled to get to the venue.

For each game, each player and each team get a rating in these seven categories based on the above statistics and features. From this, it is natural to consider extensions such as overall ratings, analysis of form, and determination of a player’s role in a team. However, for the moment, the focus will be on development of the model for predicting match outcomes.

Model Construction

Match outcomes are to be predicted using a machine-learning model. The large number of input variables chosen in this project favours machine-learning models over other models widely (and very successfully) adopted in the sports modelling space.

Machine-learning models, in particular supervised-learning models, are designed to learn from known results and determine non-linear relationships that relate the inputs to particular outcomes. The variety and complexity of machine-learning models is vast, each with their advantages and disadvantages. This project implements techniques in the https://scikit-learn.org/ libraries, allowing many models to be tested side-by-side.

The model has fourteen inputs, and a single output:

Inputs:

  • Margin of player SOLDIER scores (7 variables)
  • Margin of team SOLDIR scores (6 variables)
  • Venue/HGA adjustment (1 variable)

Output:

  • Points margin of the game

Fitting models is very simple once player and team SOLDIER scores are calculated and rescaled. A common measure for selecting a model and tuning its parameters is a train-and-test model, where a proportion (say, 70%) of the data is used to fit the model and is tested against the remaining proportion. However, predicting an unplayed game is quite different; the player and team SOLDIER scores are not known a priori. It is necessary make predictions as to how each player and each team will perform in a given game; in order to predict the outcome using the proposed model.

In a previous piece, I examined how one could measure a player’s form, and what other mitigating factors can affect a player’s output. For the game to be predicted, the form of the involved teams and their players are calculated (mean and variation) to determine probable distributions for the inputs to the model. As predicting unplayed games is the goal, simulating games using no foreknowledge (i.e. only considering the past) is the only appropriate way to test the model. The only exception is that the Team Experience (aka Home Ground Advantage) is known as this is determined from the fixture.

Results and Discussion

I have performed full simulations of the 2017 and 2018 seasons to test a variety of models and tune parameters. The testing procedure is as follows, using 2017 as an example:

  1. Train model using pre-2017 data.
  2. Predict round one performances using pre-2017 data.
  3. Predict round one results a large number of times (N=10,000) and record.
  4. Retrain model with real round one data.
  5. Predict round two performances using pre-2017 data and real round one data.
  6. Predict round two results a large number of times (N=10,000) and record.
  7. Repeat 4-6 for remaining rounds.

The large number of predictions gives a distribution that allows a win probability and a median margin to be recorded for comparison against the actual results. In the following tables, the results from four models are presented with the number of tips they got correct, the number of “Bits” (higher is better) and the average error in the margin (lower is better). The “BEST” row is the best performances in each measure from squiggle.com.au.

2017

Model Tips Bits Av Margin
SVR1 120 12.06 31.08
SVR2 125 12.72 30.27
XGBoost 128 11.19 30.18
KNR 121 1.73 30.48
BEST 137 20.57 29.18

2018

Model Tips Bits Av Margin
SVR1 141 35.68 28.42
SVR2 143 34.98 28.11
XGBoost 141 29.55 28.57
KNR 150 33.39 27.80
BEST 147 39.76 26.55

Models

What is immediately noticeable is not only that different models are better at different prediction types, but also performance is season-dependent. On that second point, if these machine-learning models are picking up gameplay and tactics patterns, doesn’t it make sense that this would change from season to season? In training the models, more recent data is given stronger weighting to reflect this and small improvements (consistent but not necessarily statistically significant) have been observed.

The actual performance of the SVR2 model appears to be the most consistent over many seasons and in 2018 was comparable in success to other models with published results. This model, with a few additional tweaks, is the one that will be adopted for the 2019 season.

A deeper investigation into individual games reveals that with all the models, there is a tendency to under-predict the target. Games expected to be blowouts are predicted to be merely comfortable victories. While this does not affect the tip for the game, it evidently does have affect the margin, and to a lesser extent Bits. One example is the 2018, Round 18 game between Carlton and Hawthorn. Hawthorn were expected to win by over 10 goals, and they did. The SVR2 model predicted a median margin of -25 points (away team’s favour).

badpred.PNG
2018 Round 18, Carlton vs Hawthorn predicted margin distribution (SVR2). Actual margin -72.

 

That this happens with all models tested suggests an issue with how the inputs to the model are calculated rather than the models themselves. Recall that each player and team performance is simulated based on samples from a normal distribution along with their individual means and variances. This infers that in a given game it’s equally probable that each player will perform better or worse than their average. This doesn’t really make sense! One would assume that against a very strong team, player outputs would be less than a normal distribution would suggest. Of course, against a very weak team, player outputs would be higher. The best way to adjust for this is not obvious and is a focus of ongoing work.

Conclusions and Further Work

The model as presented today is in working order, has the capacity of predicting results in the ballpark of other models, and still has many avenues to improve. In particular, the following have been of interest:

  • Home Ground Advantage: The model still uses a flat score based on where the teams are from and where the game is played. There is clearly a lot more to Home Ground Advantage than that.
  • Team Experience score: Currently this is where Home Ground Advantage lives, originally it was planned to be a measure of how experienced the team is playing together; are there a lot of list changes? Coaching staff changes? This is difficult to quantify, and difficult to account for without manual intervention so it has been shelved for the moment.
  • Weather Effects: Wet weather affects the outcome of AFL matches, especially with regards to the expected scores and efficiency (see Part 1 and Part 2)

The game prediction model is just one arm of this project but is definitely the most technical one. By learning about and improving this model it is hoped that further insights into the sport can be uncovered.

Environmental factors affecting AFL outcomes – the weather, part 2

Today I continue my focus on the weather. In particular, I will look at some key statistics that differentiate dry-weather football from wet-weather football. Unfortunately, for the most part, the results are completely self-evident. However, there is a nugget or two in there that I think are interesting.

In the first part, I discussed how the prevailing weather and conditions of the game affect the outcome, as part of the larger overview of how all environmental factors affect all aspects of the game. There are a few obvious independent variables within the weather space that could affect the game; precipitation, wind and heat.

Precipitation anecdotally affects the game in a number of ways. Rain falling during the game keeps the ball and the ground wet, impacting on the efficiency of skills and even the choice of skills (“wet weather football”). The lasting effect of rain, perhaps after it stops, and other effects such as dew also causes these impacts perhaps to a lesser extent. Wet weather games are anecdotally characterised by “scrappy football”; less handballing, more kicking, and low scores.

Wind comes in a couple of flavours. In all cases the main effect is expected to be on long kicking and consequently goal kicking accuracy. Prevailing winds down the length of the ground provide a bias towards scoring at one end of the ground (i.e. a “five goal breeze”). A prevailing cross wind is a bit more of an unknown. Swirling winds can result from changeable conditions or heavy weather conditions, but also from the geometry of the venue; large grandstands particularly at the goal ends can produce some erratic conditions. It’s possible that this can be somewhat predictable based on knowledge of the venue.

Footy is a winter game but heat sometimes plays a part, especially near the front and back ends of the season. I don’t expect heat to be a huge factor, perhaps it affects player fatigue and creates a more open, high-scoring game.

Evaluating the conditions in past games

It is a relatively straightforward procedure to watch a game of football (or merely some highlights) and, with some knowledge of the game, evaluate the effect of certain environmental conditions on the outcome. An avid football watcher could easily do this on a week-by-week basis and keep a database. However, lacking an ongoing database, it would be very time consuming to individually to do this for a past season’s games, let alone multiple seasons. How can one efficiently and accurately record the conditions at past matches?

In the previous piece I scraped daily rainfall data from the Bureau of Meteorology at the closest weather station to each AFL ground and attached the data to each game. Then I examined the distribution of total points scored for a few different rainfall ranges. The hope was that games with more rainfall would be lower scoring.

RainfallVsTotalPoints

Unfortunately this was quite unsuccessful. It did, however, elucidate that what really matters is the conditions at the ground at the time of the game. The “daily rainfall” numbers reset at 9am, there’s a good chance that game-day rain could fall well before or after the game has been played and not affect conditions at all.

I then moved on to looking at published match reports for past games. The main idea is that if the conditions affected the game significantly, it would be discussed in the match report.

Methodology for match report scraping

I chose to use the match reports published on http://www.afl.com.au for the simple reason that the URL is formulaic and therefore easy to scrape large quantities of data. In example, http://www.afl.com.au/match-centre/2018/17/adel-v-geel is from 2018, Round 17, is an Adelaide home game against Geelong. For all games from 2014 onwards, I scraped the match report text into a database for ease of handling.

I then used Microsoft Excel to flag match reports that contain certain weather-related keywords. The keywords I chose (i.e. rain, slippery, windy, storm, etc.) were borne through a brainstorm and through reading samples of match reports. This allowed me to pass by a vast majority of match reports where weather wasn’t (seemingly) a factor. I also, as a matter of curiosity, flagged some reports where the total points was particularly low.

For the flagged match reports, I pasted the report text into Notepad++ and defined a custom syntax to highlight the list of keywords. This allowed me to efficiently and selectively read match reports to summarise the conditions described.

False flags

If journalists could stop using the following cliches, that would be marvellous:

  • <TEAM> stormed into contention…
  • <TEAM> stormed home…
  • It was raining goals…
  • <TEAM> came home with a wet sail…
  • <PLAYER> put the heat on…
  • etc.

How do you represent the conditions quantitatively?

Now, we have a good summary of the weather for weather-affected games. How do we quantify this in a meaningful way so that it may be used in a model? As mentioned in the first piece, one could be as specific as they like in describing the conditions of a game that’s already happened. This would give a very good measure of the effect on past games. However, my interest (at the moment, at least) is modelling games that haven’t happened yet. Having sophisticated measures for conditions is useless if you can’t predict the conditions with the same accuracy you can measure it with. After looking at the summaries of conditions I recorded, I decided to record weather with four binary (yes or no) variables:

  • If there was mention of wind (or inferred through description of “sideways rain”, etc.) the game would be classified as “windy”.
  • If there was mention of heat it would be a “hot” game.
  • If conditions were slippery (wet ground, actual rain, dew, humidity, etc.) it would be “damp”.
  • If rain fell for a significant portion of the match if would be “rainy” (and of course, also “damp”)

These variables should be very easy to measure in the future, and also relatively predictable from looking at weather forecasts.

Some initial results

The first thing to do is to see if the data passes a sniff test. When looking at the rainfall data I looked at total points scored as a measure. Generally reports mention the rain/damp conditions moreso than wind or heat, so let’s start with this.

drydamprain-totalpoints.png

This was a relief, the time spent was worth it! The samples are statistically different (t-test: p(Dry~Damp)<10^{-7}, p(Dry~Rain)<10^{-16}, p(Damp~Rain)\approx 0.0015) and are logical in that a dry game is expected to be higher scoring than a damp game, and damp game higher scoring than a rainy game.

For what it’s worth, the “Rain” mode (peak of the curve) is approximately 132 points, “Damp” is 145. The median total score is probably a better measure though:

  • Dry: 178 points
  • Damp: 151.5 points
  • Rain: 136 points

Some problems

While these results look good, they must be scrutinised. The AFL Data Twitterati suggested a number of things to look into when I tweeted the above plot.

  1. What if the conditions just aren’t mentioned in the match report?
  2. Are certain match report authors more likely to mention the conditions?
  3. Is there an agenda in the reporting that might affect exaggerating/understating of the conditions?

These are excellent points. The first was also a prime concern of mine when doing this. To alleviate this going forward, bow that I have a database of past games, for subsequent games I plan to record conditions week-by-week based on my own observations.

Over round 17, when watching games/highlights I kept some notes about the conditions. I noted two games where the conditions were present. Fremantle v Port was affected by rain (and atrocious skills, mind :/) and this was noted in the match report. Hawthorn v Brisbane in Tasmania was beset by dew (as mentioned by the commentators regularly) and conditions were slippery. There was no mention of the slick conditions in the match report. Arguably the conditions were on the minor side and scoring wasn’t hugely affected, but nevertheless I would want this to be recorded for my database.

It is conceivable that my match-report parsing process is mainly flagging games where the adverse conditions had a noticeable effect, or the reporter mentioned it in passing (“skills were good despite the tricky conditions” and the like appeared sometimes). The consequence is that the distributions plotted above are most likely biased. This is not good for predictive purposes; I can predict whether a game will be damp but not whether the teams will perform/score well despite the conditions.

What I can say for sure is the games marked as wet, damp or windy are affected. So let’s see what sets these games apart. Today I’m just going to look at the distributions of certain key statistics that are considerably affected by the weather. Most of it is really self-evident, but it’s always good to have some quantitative confirmation of well-known theories.

Wet Weather Football

What changes? Everything! Well, almost. Let’s start with something that shouldn’t be affected too much as a bit of a control measure. An inside 50 is the movement of the ball (by carrying or disposal) into the forward 50 from outside the 50. I would argue the numbers should be largely independent of the weather; the efficiency will be the main difference.

weather-inside50s.png

There’s a noticeable increase in Inside 50s in “Damp” games and it is a statistically significant difference. Speaking of efficiency, let’s look at how “Inside 50 Efficiency” is affected. I define this as:

\text{I50 Efficiency}=\frac{\text{Inside 50s} - \text{Rebound 50s}}{\text{Inside 50s}}\times 100\%

weather-I50efficiency.pngThere is less efficiency in “Rain” games, as expected, but even less in “Damp” games! Perhaps this can be explained by teams not respecting slightly difficult conditions and trying to play a normal game style. While we’re on Inside 50s, Marks Inside 50 are a strong predictor of AFL success.

weather-marksi50Indeed there are less Marks Inside 50s in weather-affected games. Not surprising at all, it’s harder to mark in the wet and harder to hit targets.

Scoring in the wet

Sticking with scoring still, goal accuracy is strongly affected by the weather, not just rain, the wind too.

weather-scoringshotsweather-goalaccuracy

weather-totgoalsweather-totbehinds2

There are less scoring shots, and a lower goal accuracy. Unfortunately there is no data available on goal attempts that fall short or are kicked out-on-the-full. Strangely (?), the number of behinds scored in games (including rushed behinds) are not distinguishable statistically in different conditions.

Moving the pill

Disposal efficiency is crap in the wet. It drops dramatically and is one of the key differences in wet-weather football. It’s more of a measure of performance rather than tactics. Something that is more of a measure of tactical changes is players choosing to kick or handball.

weather-efficiencyweather-kph

Perhaps the two stats are related, kicking is more inefficient than handballing but there is the prevailing thought in the wet that you should boot it long rather than dish it around with the hands.

Nevertheless, in the modern game of many stoppages and flooding the contest there’s a lot of contested ball. In fact, in the wet there is much more contested ball than in the dry.

weather-CUratioweather-tackles

Interestingly, tackles per contested possession (a team measure I used called “Tackling Pressure”) is almost unchanged in the wet. I would have expected tackles to not “stick” as much but the definition of a tackle requires one to affect the efficiency of the disposal, and with many disposals inefficient by default in the wet there may naturally be more tackles recorded.

Picking up the soap

There are a lot of inefficient disposals in the wet, a lot of dropped marks and a lot of stoppages. Picking the ball up and having clean skills is going to be a boon in the wet. Without having numbers on things like “loose ball gets” (I know they’re recorded, just not publicly available!), I have to rely on looking at other stats to infer these things.

weather-clangersweather-intercepts

A clanger accounts for many different errors including unforced dropped marks, turnovers, free kicks conceded, etc. Also note that intercepts are the consequence of a turnovers. More evidence that wet-weather footy is a scrappy affair.

It’s the little things that count, or not?

weather-onepercenters.png

Spoils, tap-ons, shepherds, smothers all come under the “One Percenter” stat. On average there are about 25 more per game in the wet. More one percenters fits the narrative of less clean possession. Unfortunately,  the correlation of One Percenters with the outcome of winning a game of footy is very poor.

What about the wind?

In all of the plots above I have plotted distributions for “windy” games as well. I take these with a grain of salt, really. Most of the “windy” distributions are bimodal and probably could be further split into just wind-affected games and rain-and-wind-affected games, but then the sample size would be irrelevant. I would wager most games that are wind-affected but aren’t really obvious are just blown over (yes, I did) in the match reports, so I don’t have a record of it.

What about the heat?

There’s just not enough data to make any meaningful observations.

How do you win wet weather football?

Well I haven’t answered that, and I don’t think anyone can with the available stats. What I can say is that almost all facets of the game are affected. The reduced ability to execute skills properly is a clear result of wet conditions. Being more efficient correlates strongly with winning a game in the AFL so having those skills to handle the wet ball and dispose of it smartly is surely going to be effective. But that’s no revelation.

Scoring accuracy is strongly affected too, the obvious recommendations are kicking straighter and setting up easier shots at goal (introducing more possibilities of turnovers). The publicly available stats are just not good enough to evaluate things like this.

What would be real interesting is to look at stats like loose ball gets. Being able to capitalise on the natural inefficiency of disposals in the wet should be a good predictor of the desired outcome.

I would also think that player positioning would play a key role. Having players in the right zones; close to both pick up loose balls out of a contest and ~60 metres back to intercept long bombs forward seems like the way to go. With lower disposal efficiency it should be less about covering a player and more about covering a probable landing zone.

Aside from analysing player GPS data (which I don’t have and am not good enough to do anyway) a easier measure may be total distance run by a team. I don’t have this data either.

The first few plots are the most interesting to me. In “Damp” games (including things such as dew-affected games, wet ground, etc.) there are counter-intuitively more Inside 50s, and these are less efficient, than both “Dry” and “Rain” games. Do teams neglect to switch into “wet-weather mode” when they should?

I intended to use the weather data in my models to better predict things such as upcoming game totals and margins, and I shall, but with a bit of uncertainty regarding how many of the actually weather-affected games I’ve recorded.

Round 16 Review

Pretty close to a very unique 9/9 but the Giants didn’t quite get up in the end.

r16results.PNG

I barely lost out on the bits and I seem to be getting some better looking probabilities now (close to other models). It’s going to be difficult to catch up.

r16tables

Some devastating losses for Sydney, GWS, Essendon, Adelaide this week to really impact their finals chances. Sydney still odds on to make it, but they drop back to the pack of 6 fighting for 5 spots in the 8. GWS a bit of a smoky but injuries (and a pretty key suspension) will probably see them fade in the next couple once their expected “form” catches up.

Out of 10,000 simulations, Richmond made the Top 8 in every single one. It’d take something seriously dramatic (probably involving multiple key player outs) to change that significantly.

2018-AfterR16.png

 

Hopefully by next week I’ll have a game-total model so I can simulate actual results (and thus, percentages) because it’s getting pretty bloody tight!

-Adam

Environmental factors affecting AFL outcomes – the weather

As part of my previous piece that began to explore elements of home ground advantage (HGA), I identified that in order to be able to isolate the effects of HGA, one would need to first account for player performance, team performance and other environmental factors.

diagram-20180628
A model to predict the outcome of a sporting match

As discussed in the previous piece, the environmental factors consist of of many measurable and immeasurable modifiers that affect the outcome of a game. This could include home ground advantage, the weather, if a team is coming off a short break, if a team has traveled a lot lately, if players are carrying injuries, if there’s a player milestone, if it’s a “rivalry” game, and the list goes on. Some of these are easier to look at than others. In time I hope to investigate and, if necessary, account for all of these factors.

In this piece, I’m going to focus on the weather conditions and how they affect the outcome. Like my previous piece, there are more questions than answers at the moment. Nevertheless, I hope you find this thought-provoking!

Wet Weather Football

Playing in the wet is a different game altogether. The physics of the game changes. The ball is slippery; making slick handballs too slick, contested marks rare and ground ball pickups difficult. The players are slippery; tackles don’t stick as well. The slippery ground however makes the ball bounce a little straighter so there are benefits to exploit, but I digress. It’s easy to argue that wet weather affects the game, but how can we measure it?

“Now over to Tony Greig at the weather wall”

The lads that manage fitzRoy (a magnificent package for those interested in a leg-up start in probing AFL stats) include some BoM rainfall data for each match in the 2017 season; which seems like a good place to start. From BoM I scraped historical rainfall data from the nearest weather stations to each AFL ground and attached the game day’s rainfall to each game from 2011 onwards. There were plenty of gaps in the data, from 1563 games I ended up with 953 records. The standard stereotype of wet weather football is low scores. Below I present a set of histograms of the total points scored in matches with different daily rainfalls.

RainfallVsTotalPoints.png

Wow! Either rain has little effect on the total points scored, or these daily rainfall figures do not represent the conditions at the football ground during the match.

In Round 5, 2015, the Gold Coast local weather station recorded a daily rainfall aggregate of 132mm after 9am. By 4:35pm when the Suns took on the Lions, the ground was seemingly dry and there was no report of wet/slippery weather and scoring certainly wasn’t affected. These types of anomalies (I’m sure there are many) make the daily rainfall near the ground a poor measure of the actual effect of the rain on the game.

While this annoyed me a bit; quite a bit of time was spent scraping and organising the rainfall data, it did illuminate a possible next step. As mentioned above, the match report did not mention the weather or the conditions being a problem. Maybe parsing match reports for keywords could work! This year in Round 10, Geelong took on Carlton at Kardinia Park. The game was low scoring, the daily recorded rainfall was nil, but the match report mentions the “dewy” conditions.

So this seems like a sensible option; if the game conditions are going to be mentioned anywhere it will be in the match report — if the conditions affected the game. I doubt this will be absolutely error-free but I’ll wager it will more accurate than rainfall data.

How Loquacious are Footy Journalists?

The actual task of parsing every match report for keywords is formidable. It can be done, I’m sure, with a big enough vocabulary and suitable processing strategy. I’m currently in the stages of sampling match reports and manually finding a suitable set of keywords for different conditions. As this task sucks it’s happening very slowly. While I work up the enthusiasm to tackle this, another question popped into my thought process.

How do you quantify the conditions?

One could be as descriptive as they wanted with the conditions, specifying how much rain there was, if it was windy (and if it’s prevailing or swirly), if it’s dewy, if it’s hot, humid, etc. The more information you use will likely lead to better model fitting to the existing data. This presents problems:

  • for many combinations of conditions there will be a paucity of data.
  • if your parsing of match reports is wrong (or the journo was exaggerating to forgive their team’s performance!), you’re trying to fit a sophisticated model with rubbish data.
  • if predicting future results is your goal, you need to know exactly what the conditions are going to be to get a good prediction using your model.

At the other extreme, the most simple way to quantify conditions would be to attach a binary variable to each match: Is it weather-affected? Yes/No. This is as fool-proof as you can get, any keyword showing up in the match report will trigger it, and you can be fairly certain a day or two in advance whether weather will affect a game.

As I like to do in almost all areas of my life, I look at the extremes and always end up somewhere between them. In this case, I plan to parse match reports for keywords relating to rain/dew, wind, and heat/humidity separately. I will be giving each game a nominal score from say, 0-10, a measure of the strength of the condition. This will give me the option of implementing each weather type as a binary (yes/no) or ordinal (0-10), or just a single “is it weather-affected?” binary variable.

It’s more than just a number

While I think even the simplest approach outlined above will give a decent idea of how the weather will affect the margin/total points — certainly better than the rainfall data I hope! — it’s about more than just that. While the current aim of my modelling is to improve my understanding of the available stats through predicting future results, there are also more interesting questions I hope to be in a position to answer in the future.

Wet weather football, what is it all about? What type of team does it best? My model uses a number of different variables to measure each player/team’s performance:

  • Scoring,
  • uncontested play,
  • contested play,
  • ball movement/delivery,
  • defence
  • experience
  • air (ruck and contested marking)

Including all of these measures; along with weather measures, has the potential of elucidating what skills, team balance and game plan work in different conditions.

Cheers for now.

-Adam

 

Round 15 Review

A horror round for almost everyone it seems.

r15results

A couple of things I’m thankful for:

  • my model’s chronic underestimation of the margin came good this week!
  • Breust the late out for Hawthorn pushed the game in GWS’s favour, yielding me the right tip. (go player models!)

r15tables

Meanwhile, in the actual footy, things are getting tight around 8th-10th!

2018-AfterR15.png

You’d have to say percentage is going to play a big part (unfortunately I can’t really simulate that yet)

-Adam

Home Ground Advantage – A Mess

Everyone has done a piece on home ground advantage, and now it’s my turn. This will hopefully be one of a series of posts, the next one or two will hopefully complete this module of my model and hopefully not be a complete waste of time.

In the development of my model, figuring out how to best quantify home ground advantage was difficult to approach. At the moment, I use a very simple measure to account for “team travel”, and use adjustments for each team and player as to how they play at home or away given their upcoming fixture (i.e. Scott Pendlebury would be expected to contribute less to a Collingwood away game as his recent away form is poor.)

I have identified seven possible predictors of home ground advantage, and how each of them may be quantified:

  1. The actual venue itself
  2. “Morale” from playing to a home crowd (?)
  3. “Favouritism” from the umpires (free kick differential)
  4. Familiarity with the ground/facilities (count of previous games played for each team)
  5. Not having to travel far (travel time for each team)
  6. Players sleeping at normal home (boolean for each team)
  7. How often they travel (interstate games per season)

Most of these are measurable from available data on past games, and predictable through the fixture.

Other models deal with HGA by applying a correction to the margin in the form of a flat number (Matter of Stats), or a percentage (possibly different for each venue?), or consideration of some of the above to get a HGA variable into their model (i.e. FiguringFooty, The Arc). Some just ignore it altogether and do pretty well (HPN).

In this post I will investigate the first 3 of these identified predictors and I will investigate their usefulness (or lack thereof). Following this a general discussion of the difficulties of distilling HGA out of existing data.

***

First, let’s have a look at some of the available data to explore some of the elements of HGA. Here I am using data from 2011 onwards. I could use data from further back but I like to keep things modern.

A broad viewing of game result data shows distinct differences between many of the common AFL venues. For each ground, the distribution of the margin and total points is presented in the following figure.

EachGround

There’s a lot to unpack here. I’ve only included venues with more than 25 games played in the period or you get some real outliers (Jiangwan Stadium, for example). For clarification, a positive margin indicates a home victory.

While not a huge focus of mine at this stage, the total points scored does show variation, indicating it may be better to consider a percentage HGA bonus rather than a flat points bonus.

On the surface, the ‘Gabba is often a disadvantage to the home team; but that home team is Brisbane, who haven’t cracked the finals since 2009. York Park provides a median 42 point advantage; but Hawthorn mainly play there and they’ve been rather good. Without discounting individual margins by the strengths of the teams on the day, it’s difficult to tell whether each ground has an independent HGA, a common HGA, or no HGA at all! I’m keeping (1) as a possible predictor at the moment until more analysis can be done.

The more interesting data, perhaps, is that of the Melbourne venues MCG and Docklands. Firstly, the large number of games played there gives a better set of data to examine. Secondly, all Melbourne teams play home games there so on average, there should be less bias towards “how good” the home team is. If we filter games to Melbourne teams vs Melbourne teams (i.e. not Geelong) at the MCG and Docklands, things look very even!

MelbvsMelb

For this data (360 games), the mean is -0.825 and the median margin is -1. There is no perceptible skewness in the distribution. From this sample, it cannot be said that there is an advantage (p\approx 0.71). But is this actually important? The only differences for the home and away team in this set of games is the change rooms they use (I think?). I suspect there may be a larger ratio of home fans in attendance but given the capacity of the grounds, not many fans would be locked out. Either way it makes no perceptible difference. At least for moderate differences in crowd it’s probably acceptable to dismiss (2) as a possible predictor.

***

Let’s now consider another common gripe about Home Ground Advantage, that of the perceived favouritism of umpiring decisions. My personal view is that the free kick differential is not indicative of favouritism, and more indicative of player indiscipline. Possibly this is a mental effect from playing away from home! Without reviewing every decision and classifying each as a “justified” free kick or an “umpiring error”, it is not possible to comment on favouritism as a concept. Nevertheless, let us look at whether teams get more free kicks at home, and if this results in more wins.

FKvsMargin

This is the data from all games since 2011. In the central plot, a darker colour means a higher frequency of data. On the right-hand side is the distribution of margins (positive means a home victory) and on the top is the distribution of free kick differential (positive means more home free kicks).

Firstly, home teams DO get more free kicks (p<10^{-12}). From the 1554 samples, on average, home sides get 1.70 more free kicks. And of course, home teams score more than their opposition, (p<10^{-10}), 7.97 points on average.

On the face of it you could easily make the connection that free kick differential correlates with the margin. The central plot tells the story that this is simply not true. The free kick differential is not a good predictor of the margin. There are many games where the free kick differential and margin have the opposite sign, almost as many as where they have the same sign. Just beacuse I’m playing around with visualisations at the moment, here is a plot of the Inside 50s differential vs. the Margin:

I50vsMargin

This is a much better predictor.

I aim to look at some of the other predictors (4-7) in a later piece after I have done some more work on it. For the moment I’m just going to consider some thoughts on how to proceed after doing this work!

***

The Scale of the Problem

There are a number of challenges facing this analysis. Firstly, let us assume the following model for predicting the outcome of a match

diagram-20180628

The team performance and player performance of each team may be predicted using their form. Environment factors include things such as HGA, the weather, and other possible factors such as if a team is coming off a short break or the bye.

To get a good measure for HGA one would need to dial out, for each past game, the effect of team performance, player performance, and non-HGA environment factors to work out an adjusted “game HGA”. From this measure, a model with each of the relevant HGA “predictors” identified could be matched.

Without doing any of the quantitative measurements, it’s easy to argue why this is going to at least be very difficult. The HGA is prevalent in the team and player performance too. Although this can be predicted from past data, this means that the full effect of HGA will be difficult to sum up. Furthermore, after removing player and team performance bias, the question remains on how to account for other environmental factors. It will likely be necessary to fit all environmental predictors (HGA, weather, etc.) simultaneously.

Then there are other problems. Is it possible that each venue has its own HGA independent of other factors? Does this change over time, i.e. how does stadium development affect this?

While I have a decent grasp on team and player performance, my model currently neglects to take weather into account (more on this in a future post I hope) and already includes HGA bias for the team and player performance. I am not in a position to attempt this quantitatively at this stage.

Nevertheless, I have some better ideas of how to proceed with this difficult problem. Firstly, I need to use player and team performance to quantify a residual “environmental” margin for each game (encompassing HGA, weather effects and noise), then examine the effects of venue, travel time, days between matches, and determine a way of describing the effect of weather.

It’s easy to see why a simple measure of HGA is attractive.

To be continued.

-Adam