Win Probability vs. Margin

Tony from Matter of Stats recently posted a great visualisation with a comparison of the Squiggle models’ predictions for the first six rounds of 2019. The visualisation shows how each model related the projected margin to the percentage change of winning the game.

Source: Twitter @MatterOfStats

Everything looks fairly nice and smooth apart from the AFL Lab, which looks very noisy! Is this ideal, problematic, or fine? Either way, it’s not necessarily a conclusion you can arrive at just from this image. What is definitely clear is the AFL Lab model does things a little differently. I thought it would be a nice idea to clarify what is causing this “abnormal” behaviour and that it is somewhat intentional.

The SOLDIER model is fitted by correlating player performances, team performances, and game conditions (weather and venue adjustments) from past AFL games to the resulting margin of that game – the output. Like all models, is it not deterministic; that is the chosen inputs to the model are not all of the variables that contribute to the outcome of a game. There is always some uncertainty based on what the model doesn’t know. In total, there are fourteen input variables that go in to the model to produce the single output of the game margin.

The structure of the SOLDIER model

If, for a game previously unseen by the model, all of the player and team performances are known, the predicted model by the margin rarely differs from the actual margin by more than a goal. This method of cross-validation is a demonstration that the model is not over-fitted and is suitable for fresh observances.

When predicting the outcome of a future game one does not know what the inputs to the model will be! So, there needs to be a way to predict what the team and player performances will be in that game in order to determine the inputs; in order to determine the output. In examples, will Lance Franklin kick 0.4 or 6.1? Will Scott Pendlebury have 30 disposals at 90% efficiency, or 30 disposals at 60% efficiency? Naturally it’s impossible to predict this directly, but it is possible to forecast a distribution of expected performances, based on the players’ and teams’ past performances. 

A player can perform better in some areas, and worse in others.

The SOLDIER model assumes all player and team inputs to a game to be predicted are normally distributed. The distribution measures (mean, variance) for each player and team input are calculated from their past performances. Because of this, as the inputs have a distribution of possible values, the output (predicted margin) also has a distribution – but due to the nonlinearity of the model the distribution of predicted margins is not necessarily normal (it may be skewed, bimodal, etc.)

As a brief aside, why is the model nonlinear and what does that mean? Consider a hypothetical case study game where Team A beats Team B by 20 points. If the same game was played again, and Team A performed 10% better (in some sort of overall measure), would the margin be exactly 10% higher, for 22 points? Maybe that performance increase resulted in only a single behind (for a margin of 21 points), or two goals (32 points). Even with a single performance measure, there is not necessarily a linear relation between the inputs (team and player performance) and outputs (margin). The SOLDIER model takes fourteen different performance measure inputs, and relates them to a single output. The combinations of categories over-, under- and par-performing is exhaustive and any such combination could lead to a different, or no change in the outcome.

Anyway, back to the main narrative of the post. Now that there are distributions of player and team performance, and a model relating performance to margin, how do the predictions come about? For a particular future game, it is simple to randomly sample a performance for each team and player and put this into the model to predict a margin. One such realisation is just that, one possibility of what could happen. To gauge an broader overview of the possibile outcomes, a large number (say 50,000) realisations can be rapidly calculated to get a distribution of margins. This is called a Monte-Carlo simulation.

From this distribution of margins, a number of predictions can be pulled out. Firstly, the median of this distribution is chosen as the predicted margin for the game. The proportion of realisations where the home team wins represents the home win probability. Even though the margin distribution may not be a normal distribution, the standard deviation of the distribution can be calculated and represents the margin standard deviation. These three predictions are sufficient to adequately describe what the model believes could occur – and are the predictions that advanced tipping competitions like the Monash competitions take.

The margin standard deviation calculated by the model is the key driver behind AFL Lab’s anomalous probability-margin “curve” demonstrated by MatterOfStats. The standard deviations the model produces are generally between 18 and 50 points, and mostly on the lower end. This far lower than many other models (usually between 30 and 40). I would argue that it makes sense that the standard deviation SHOULD vary between games – an expected blowout could be a modest or huge victory (large standard deviation), but a closely-matched game in wet conditions suggests a smaller total score and a lower standard deviation is appropriate. Due to this variable standard deviation, two games with the same predicted margin can have a vastly different home win probability.

For a game with a predicted margin of 20, the standard deviation changes the probability significantly.

It is this reason why my probability-margin “curve” is not a curve at all – the probability is a function of both the predicted margin and the margin standard deviation.

I do have some reservations on the low standard deviations produced by the model – the random sampling methodology currently used is flawed and still very much under construction. Hopefully by the end of the season I will have a large enough sample to work on improvements from.

Until next time, which hopefully won’t be as long.


There’s No 2019 Preview, Why?

In the past 10 months of coding, I’ve built up a number of tools and modules that navigate the data I have collected and ratings I have calculated. I had lofty ambitions of using these, and developing more, to provide a comprehensive preview of the upcoming season from the SOLDIER perspective.

It took a while to properly elucidate, but it was soon blindingly obvious that although such a prediction was possible, it would be pointless and inaccurate! The reason is easy to explain. The key outcomes I have chosen to focus on with the SOLDIER model thus far are on form-based predictions with the purpose of predicting upcoming games. The game-prediction model uses recent (5-game) and longer-term (20-game) form of player and team performances to predict an outcome using the players selected on the team sheets. Predicting 24 rounds ahead (plus finals) is a very long bow to draw when the ammunition is a set of darts.

The Problem

Later in the 2018 season, I started producing some weekly predictions simulating the rest of the season to establish end-of-season predictions based on current form. Naturally, my first port of call in predicting 2019 was to use this method to pit the teams against each other on both level playing fields (round-robin, each team playing each other home and away) to rate each team, and simulating the actual 2019 fixture as a more practical measure.

The results were surprising at first.

Wow, are Geelong that good? Are Sydney that bad? There’s a lot to unpack here, but something doesn’t quite add up. With such a broad prediction, the first sanity check for me is to compare to the bookies. After all, they’re the professionals in this caper. Just looking at the top 8 percentages, the ladder simulation is a lot more certain of things than the odds suggest*. Sydney, in particular, are about a 50% chance with the bookies to get into the top 8.

While this serves as a strong argument for the uselessness of such a long-range prediction, it serves as a reminder of the strengths and disadvantages of what I’m modelling. Furthermore, it provides direction for what could be done to improve such predictions in the future

The hurdles to overcome are plentiful if I were to predict a season with the model as it is:

  1. Which players will play each game?
  2. What effect does the off-season have?
  3. How do you account for natural evolution of players?
  4. Will a team’s game plan change?
  5. Will rule changes or “in-vogue” tactics change what statistical measures win games?

*mainly because the player/team form distribution is fixed in the above simulation rather than using a Brownian-motion inspired model

1. Which players will play each game?

The SOLDIER model encompasses player statistics and form. The chosen players for each team have a small but noticeable effect on the predicted outcome of the game. For the above simulations I used the “first choice team” as opined on for each club, and adjusted it for known injuries. But of course, no team is going unchanged all season; injuries play a part, younger players get tried out, and selection can be based on the opposition. A more sensible way would be to look at a squad (say, of 30) and average out to 22 players, which would be fairly easy to do.

2. What effect does the off-season have?

It’s simple to argue that form will not necessarily carry over the off-season. There are too many immeasurables and unknowables to look at the individual off-season and pre-seasons of all 500+ players and adjust “form” accordingly. Are there any rules of thumb? To probe this question, I browsed player data for the last five home-and-away rounds of 2017, and the first five home-and-away rounds of 2018, and looked at the difference of each players’ average performances in these two periods. While there were mostly drops in stats across the board going to a new season, it was not statistically significant. As an easily relatable example, out of the 293 qualifying players for this study, the players scored on average -0.5 less Supercoach points after the season break (p=0.7)*.

A further thought was probed that less- and more-experienced players may be affected differently by the summer break. Further splitting the already-filtered data into players with less than 50 games at end of 2017 (N=92) and more than 200 games (N=28) also proved fruitless with no statistically significant differences across the break. There’s more that could possibly be looked at here but I strongly suspect little progress.

*This isn’t the best measure to use here as Supercoach points are scaled per game, but the p-values are similar for other unscaled measures.

3. How do you account for the natural evolution of players?

Sam Walsh will definitely play this year, barring tragedy. So, how does one predict what Sam Walsh will produce this year? There is no data on how he performs against other AFL-quality teams in games for premiership points. How could I handle him and every other rookie that may or may not play this year?

Currently, if a debutante is playing the model will assign the player’s performance to be a plain old average of every first-game player’s historical outputs, regardless of the draft pick, playing role, team, etc. By the player’s second game, and subsequent games, it uses their personal recorded data. This is a decent trade-off for simplicity in hanging debutantes on a week-by-week basis, but does not hold for long-term predictions.

HPNFooty have done some magnificent work with player-value projection based on analysing a current player’s output and comparing with other players on a similar trajectory, discounting for other factors such as player age. On a slightly different arc, The Arc has used clustering algorithms to classify players into particular roles. By implementing similar concepts and merging the two together, it could be possible to (manually) assign a debutante a playing role (say, key defender or small forward) to project a more meaningful prediction for a season ahead.

Other thoughts this question brought up is how to handle players undergoing positional change (i.e. James Sicily, Tom McDonald) and old fogies put out to stud in the forward line (GAJ), but these are more one-offs that are probably not worth trying to manually override.

4. Will a team’s game plan change?

While player performance is a focus of this model, as important (if not more so) are the team measures that input into the predictions. Each team itself gets a rating in 6/7 SOLDIER categories based on team form. These measures incorporate team-aggregate statistics that cannot be allocated to individual players (say, tackles per opposition contested possessions). These could, conceivably, be a function of both team performance as a whole and the team’s game plan.

How well does this team form carry over to a new season? Do big personnel changes effect a noticeable change in a team’s output? These are questions I planned to have answered before this moment but they’ll have to wait.

5. Will rule changes affect game balance?

The AFL is an evolving competition, there are very frequent rule changes that never really allow the game to settle to a point where all strategies with a given set of rules are explored. Having said that, assuming the player and team performances are projected as well as possible over a whole season, will the model’s prediction be accurate when the effect of rule changes is unknown?

The fit of the model is updated every round when there is fresh data. It compares the player and team SOLDIER scores, as calculated from the published statistics, and fits them against the game results. More recent games are weighted more strongly to reflect the prevailing style of football – and what combinations and strategies beat what combinations and strategies. Significantly better results have been obtained using this approach rather than using all historic data equally weighted to fit the model.

The effects the new 2019 rules will have is very much unknown, and the response in tactics from teams will naturally evolve over the season.


I had planned on presenting more data to back up the above points but time got the better of me and hopefully I’ll expand on this throughout the season.

Without a number of pending and unplanned improvement to my processes, a long-term prediction covering a whole season is not going to be purposefully indicative of reality. Sure, Geelong could top the ladder, but for the above reasons I wouldn’t bank on it!

The SOLDIER model

In this first post for 2019 I will give an outline of the AFL Lab model to be used for its debut full season. I originally started this project from my love of sport and desire to learn about machine learning and data science in general. It also coincided with a career move, which left me with a bit of free time for a while! It is by no means complete, professional and optimised and never will be.



AFL is a team sport that, like many sports, relies on a combination of individual and team efforts. A number of freely-available statistics are accessible to the public, recorded for each player in each game. For many of these statistics, the difference between the teams’ aggregates correlates well with the game outcome. These key statistics are selected from the data and assigned one of seven categories (SOLDIER); and each player in each past game are given a rating in these seven areas. In addition to these “player” ratings, certain team features (derived statistics) that strongly correlate with match outcomes have been identified and this affords a “team” rating in these seven areas. These team statistics are not attributable to particular players and could be considered a descriptor of an overall game plan, or just team performance. A machine learning model was trained using standard supervised learning techniques and parameter tuning. The inputs to the model are the difference in the teams’ aggregate player ratings (seven variables), and the difference in the teams’ team ratings (seven variables), with the output being the game margin. Future games are predicted by using recent data for the selected players and teams to predict each the model inputs for the game, with appropriate error tolerances for variation in form. This allows Monte Carlo simulations of each game, producing a distribution of outcomes. Simulations of past seasons produce accuracy similar to other published AFL model results. The model has the potential to bring deeper insight into many facets of the sport including team tactics, the impact of individual players, and


The aim of building this model is to implement machine-learning techniques to predict and understand the outcomes of AFL games.

Raw Data

There are a number of sources available that compile and store statistics for AFL matches, without which projects like these just can’t go ahead. AFLTables provides a comprehensive coverage of historical matches in an easy-to-handle manner. Footywire provides additional statistics for more recent games, and fills the gaps and provides some text describing games. All three of these sources are implemented and scraped responsibly to maintain a database.

The statistics recorded, and the availability of statistics has changed in the past decade. The full gamut of statistics that is used in the current model have been available since 2014 and so earlier data is not used. While it would be possible to adjust the following analysis to account for missing statistics in past data, a key focus of this work is to consider the changing nature and tactics of AFL football and as such, including earlier games may be counterproductive to understanding the game as it is played today.


The raw statistics were analysed against game outcomes to understand which have the strongest correlation: In each game, for each raw statistic, the sum of away player contributions was subtracted from the sum of home player contributions to obtain a “margin” statistic (if the margin statistic is positive, the home team accrued more of the statistic). The margin statistics were tested against the score margin and a simple Pearson’s correlation coefficient was calculated.

Following this, rather than naively looking at the raw statistics, a number of features (derived statistics) were calculated and tested in the same manner. Features have the potential of providing better context for the raw statistics, for example, a team recording many Rebound 50s (defensive act where the ball is moved out of defensive 50) is not that impressive if their opponent has many more Inside 50s; their success at defending the opponent’s Inside 50s is important, not the raw number.

The relevant raw statistics and features were then allocated different categories depending on what aspect of gameplay they represent. Seven different categories were identified:

  • Scoring – Directly scoring goals/behind, setting up others who do the same.
  • Outside Play – Also called “Uncontested”. Staying out of the contest and being efficient at it.
  • Long Play – Moving the ball quickly with marks and kicks. Getting the ball in the forward 50.
  • Discipline – Also called “Defence”. Doing the tough stuff. Spoils, intercepts, winning free kicks (and not giving away free kicks)
  • Inside Play – Also called “Contested”. Getting the ball in the contest, clearing it from the contest, and tackling. Efficiency important but less so than uncontested play.
  • Experience – How experienced are the players? The number of games played, finals played, Brownlow votes received.
  • aeRial Play – Commanding the ball overhead. Contested marks, hit outs, raw height.

The raw statistics and some of the features can be directly attributed to individual players, but most of the features are representative of the team itself rather than the individuals. These team measures could be considered a way to quantify teamwork and/or game plan. Each chosen statistic and feature has been distributed to each of the above seven categories, each split up by whether they are player-specific or team-specific.

Category Player Examples Team Measure Example
S Scoring Goals, Goal Assists, Points/I50, Marks I50 Percentage
O Outside Metres Gained, Uncontested Possessions Cont. Pos. Ratio
L Long Play Marks, Kicks, Inside 50s Inside 50 Efficiency
D Discipline One Percenters, Intercepts, Free Kicks Rebound 50 Efficiency
I Inside Contested Possessions, Clearances, Tackles Cont. Pos. Margin
E Experience Games Played, Past Brownlow Votes HGA adjustment*
R aeRial Height, Contested Marks, Hit Outs Cont. Marks conceded
*The Team Experience measure is currently taken to be a completely deterministic variable that depends on how far each team has travelled to get to the venue.

For each game, each player and each team get a rating in these seven categories based on the above statistics and features. From this, it is natural to consider extensions such as overall ratings, analysis of form, and determination of a player’s role in a team. However, for the moment, the focus will be on development of the model for predicting match outcomes.

Model Construction

Match outcomes are to be predicted using a machine-learning model. The large number of input variables chosen in this project favours machine-learning models over other models widely (and very successfully) adopted in the sports modelling space.

Machine-learning models, in particular supervised-learning models, are designed to learn from known results and determine non-linear relationships that relate the inputs to particular outcomes. The variety and complexity of machine-learning models is vast, each with their advantages and disadvantages. This project implements techniques in the libraries, allowing many models to be tested side-by-side.

The model has fourteen inputs, and a single output:


  • Margin of player SOLDIER scores (7 variables)
  • Margin of team SOLDIR scores (6 variables)
  • Venue/HGA adjustment (1 variable)


  • Points margin of the game

Fitting models is very simple once player and team SOLDIER scores are calculated and rescaled. A common measure for selecting a model and tuning its parameters is a train-and-test model, where a proportion (say, 70%) of the data is used to fit the model and is tested against the remaining proportion. However, predicting an unplayed game is quite different; the player and team SOLDIER scores are not known a priori. It is necessary make predictions as to how each player and each team will perform in a given game; in order to predict the outcome using the proposed model.

In a previous piece, I examined how one could measure a player’s form, and what other mitigating factors can affect a player’s output. For the game to be predicted, the form of the involved teams and their players are calculated (mean and variation) to determine probable distributions for the inputs to the model. As predicting unplayed games is the goal, simulating games using no foreknowledge (i.e. only considering the past) is the only appropriate way to test the model. The only exception is that the Team Experience (aka Home Ground Advantage) is known as this is determined from the fixture.

Results and Discussion

I have performed full simulations of the 2017 and 2018 seasons to test a variety of models and tune parameters. The testing procedure is as follows, using 2017 as an example:

  1. Train model using pre-2017 data.
  2. Predict round one performances using pre-2017 data.
  3. Predict round one results a large number of times (N=10,000) and record.
  4. Retrain model with real round one data.
  5. Predict round two performances using pre-2017 data and real round one data.
  6. Predict round two results a large number of times (N=10,000) and record.
  7. Repeat 4-6 for remaining rounds.

The large number of predictions gives a distribution that allows a win probability and a median margin to be recorded for comparison against the actual results. In the following tables, the results from four models are presented with the number of tips they got correct, the number of “Bits” (higher is better) and the average error in the margin (lower is better). The “BEST” row is the best performances in each measure from


Model Tips Bits Av Margin
SVR1 120 12.06 31.08
SVR2 125 12.72 30.27
XGBoost 128 11.19 30.18
KNR 121 1.73 30.48
BEST 137 20.57 29.18


Model Tips Bits Av Margin
SVR1 141 35.68 28.42
SVR2 143 34.98 28.11
XGBoost 141 29.55 28.57
KNR 150 33.39 27.80
BEST 147 39.76 26.55


What is immediately noticeable is not only that different models are better at different prediction types, but also performance is season-dependent. On that second point, if these machine-learning models are picking up gameplay and tactics patterns, doesn’t it make sense that this would change from season to season? In training the models, more recent data is given stronger weighting to reflect this and small improvements (consistent but not necessarily statistically significant) have been observed.

The actual performance of the SVR2 model appears to be the most consistent over many seasons and in 2018 was comparable in success to other models with published results. This model, with a few additional tweaks, is the one that will be adopted for the 2019 season.

A deeper investigation into individual games reveals that with all the models, there is a tendency to under-predict the target. Games expected to be blowouts are predicted to be merely comfortable victories. While this does not affect the tip for the game, it evidently does have affect the margin, and to a lesser extent Bits. One example is the 2018, Round 18 game between Carlton and Hawthorn. Hawthorn were expected to win by over 10 goals, and they did. The SVR2 model predicted a median margin of -25 points (away team’s favour).

2018 Round 18, Carlton vs Hawthorn predicted margin distribution (SVR2). Actual margin -72.


That this happens with all models tested suggests an issue with how the inputs to the model are calculated rather than the models themselves. Recall that each player and team performance is simulated based on samples from a normal distribution along with their individual means and variances. This infers that in a given game it’s equally probable that each player will perform better or worse than their average. This doesn’t really make sense! One would assume that against a very strong team, player outputs would be less than a normal distribution would suggest. Of course, against a very weak team, player outputs would be higher. The best way to adjust for this is not obvious and is a focus of ongoing work.

Conclusions and Further Work

The model as presented today is in working order, has the capacity of predicting results in the ballpark of other models, and still has many avenues to improve. In particular, the following have been of interest:

  • Home Ground Advantage: The model still uses a flat score based on where the teams are from and where the game is played. There is clearly a lot more to Home Ground Advantage than that.
  • Team Experience score: Currently this is where Home Ground Advantage lives, originally it was planned to be a measure of how experienced the team is playing together; are there a lot of list changes? Coaching staff changes? This is difficult to quantify, and difficult to account for without manual intervention so it has been shelved for the moment.
  • Weather Effects: Wet weather affects the outcome of AFL matches, especially with regards to the expected scores and efficiency (see Part 1 and Part 2)

The game prediction model is just one arm of this project but is definitely the most technical one. By learning about and improving this model it is hoped that further insights into the sport can be uncovered.

Environmental factors affecting AFL outcomes – the weather, part 2

Today I continue my focus on the weather. In particular, I will look at some key statistics that differentiate dry-weather football from wet-weather football. Unfortunately, for the most part, the results are completely self-evident. However, there is a nugget or two in there that I think are interesting.

In the first part, I discussed how the prevailing weather and conditions of the game affect the outcome, as part of the larger overview of how all environmental factors affect all aspects of the game. There are a few obvious independent variables within the weather space that could affect the game; precipitation, wind and heat.

Precipitation anecdotally affects the game in a number of ways. Rain falling during the game keeps the ball and the ground wet, impacting on the efficiency of skills and even the choice of skills (“wet weather football”). The lasting effect of rain, perhaps after it stops, and other effects such as dew also causes these impacts perhaps to a lesser extent. Wet weather games are anecdotally characterised by “scrappy football”; less handballing, more kicking, and low scores.

Wind comes in a couple of flavours. In all cases the main effect is expected to be on long kicking and consequently goal kicking accuracy. Prevailing winds down the length of the ground provide a bias towards scoring at one end of the ground (i.e. a “five goal breeze”). A prevailing cross wind is a bit more of an unknown. Swirling winds can result from changeable conditions or heavy weather conditions, but also from the geometry of the venue; large grandstands particularly at the goal ends can produce some erratic conditions. It’s possible that this can be somewhat predictable based on knowledge of the venue.

Footy is a winter game but heat sometimes plays a part, especially near the front and back ends of the season. I don’t expect heat to be a huge factor, perhaps it affects player fatigue and creates a more open, high-scoring game.

Evaluating the conditions in past games

It is a relatively straightforward procedure to watch a game of football (or merely some highlights) and, with some knowledge of the game, evaluate the effect of certain environmental conditions on the outcome. An avid football watcher could easily do this on a week-by-week basis and keep a database. However, lacking an ongoing database, it would be very time consuming to individually to do this for a past season’s games, let alone multiple seasons. How can one efficiently and accurately record the conditions at past matches?

In the previous piece I scraped daily rainfall data from the Bureau of Meteorology at the closest weather station to each AFL ground and attached the data to each game. Then I examined the distribution of total points scored for a few different rainfall ranges. The hope was that games with more rainfall would be lower scoring.


Unfortunately this was quite unsuccessful. It did, however, elucidate that what really matters is the conditions at the ground at the time of the game. The “daily rainfall” numbers reset at 9am, there’s a good chance that game-day rain could fall well before or after the game has been played and not affect conditions at all.

I then moved on to looking at published match reports for past games. The main idea is that if the conditions affected the game significantly, it would be discussed in the match report.

Methodology for match report scraping

I chose to use the match reports published on for the simple reason that the URL is formulaic and therefore easy to scrape large quantities of data. In example, is from 2018, Round 17, is an Adelaide home game against Geelong. For all games from 2014 onwards, I scraped the match report text into a database for ease of handling.

I then used Microsoft Excel to flag match reports that contain certain weather-related keywords. The keywords I chose (i.e. rain, slippery, windy, storm, etc.) were borne through a brainstorm and through reading samples of match reports. This allowed me to pass by a vast majority of match reports where weather wasn’t (seemingly) a factor. I also, as a matter of curiosity, flagged some reports where the total points was particularly low.

For the flagged match reports, I pasted the report text into Notepad++ and defined a custom syntax to highlight the list of keywords. This allowed me to efficiently and selectively read match reports to summarise the conditions described.

False flags

If journalists could stop using the following cliches, that would be marvellous:

  • <TEAM> stormed into contention…
  • <TEAM> stormed home…
  • It was raining goals…
  • <TEAM> came home with a wet sail…
  • <PLAYER> put the heat on…
  • etc.

How do you represent the conditions quantitatively?

Now, we have a good summary of the weather for weather-affected games. How do we quantify this in a meaningful way so that it may be used in a model? As mentioned in the first piece, one could be as specific as they like in describing the conditions of a game that’s already happened. This would give a very good measure of the effect on past games. However, my interest (at the moment, at least) is modelling games that haven’t happened yet. Having sophisticated measures for conditions is useless if you can’t predict the conditions with the same accuracy you can measure it with. After looking at the summaries of conditions I recorded, I decided to record weather with four binary (yes or no) variables:

  • If there was mention of wind (or inferred through description of “sideways rain”, etc.) the game would be classified as “windy”.
  • If there was mention of heat it would be a “hot” game.
  • If conditions were slippery (wet ground, actual rain, dew, humidity, etc.) it would be “damp”.
  • If rain fell for a significant portion of the match if would be “rainy” (and of course, also “damp”)

These variables should be very easy to measure in the future, and also relatively predictable from looking at weather forecasts.

Some initial results

The first thing to do is to see if the data passes a sniff test. When looking at the rainfall data I looked at total points scored as a measure. Generally reports mention the rain/damp conditions moreso than wind or heat, so let’s start with this.


This was a relief, the time spent was worth it! The samples are statistically different (t-test: p(Dry~Damp)<10^{-7}, p(Dry~Rain)<10^{-16}, p(Damp~Rain)\approx 0.0015) and are logical in that a dry game is expected to be higher scoring than a damp game, and damp game higher scoring than a rainy game.

For what it’s worth, the “Rain” mode (peak of the curve) is approximately 132 points, “Damp” is 145. The median total score is probably a better measure though:

  • Dry: 178 points
  • Damp: 151.5 points
  • Rain: 136 points

Some problems

While these results look good, they must be scrutinised. The AFL Data Twitterati suggested a number of things to look into when I tweeted the above plot.

  1. What if the conditions just aren’t mentioned in the match report?
  2. Are certain match report authors more likely to mention the conditions?
  3. Is there an agenda in the reporting that might affect exaggerating/understating of the conditions?

These are excellent points. The first was also a prime concern of mine when doing this. To alleviate this going forward, bow that I have a database of past games, for subsequent games I plan to record conditions week-by-week based on my own observations.

Over round 17, when watching games/highlights I kept some notes about the conditions. I noted two games where the conditions were present. Fremantle v Port was affected by rain (and atrocious skills, mind :/) and this was noted in the match report. Hawthorn v Brisbane in Tasmania was beset by dew (as mentioned by the commentators regularly) and conditions were slippery. There was no mention of the slick conditions in the match report. Arguably the conditions were on the minor side and scoring wasn’t hugely affected, but nevertheless I would want this to be recorded for my database.

It is conceivable that my match-report parsing process is mainly flagging games where the adverse conditions had a noticeable effect, or the reporter mentioned it in passing (“skills were good despite the tricky conditions” and the like appeared sometimes). The consequence is that the distributions plotted above are most likely biased. This is not good for predictive purposes; I can predict whether a game will be damp but not whether the teams will perform/score well despite the conditions.

What I can say for sure is the games marked as wet, damp or windy are affected. So let’s see what sets these games apart. Today I’m just going to look at the distributions of certain key statistics that are considerably affected by the weather. Most of it is really self-evident, but it’s always good to have some quantitative confirmation of well-known theories.

Wet Weather Football

What changes? Everything! Well, almost. Let’s start with something that shouldn’t be affected too much as a bit of a control measure. An inside 50 is the movement of the ball (by carrying or disposal) into the forward 50 from outside the 50. I would argue the numbers should be largely independent of the weather; the efficiency will be the main difference.


There’s a noticeable increase in Inside 50s in “Damp” games and it is a statistically significant difference. Speaking of efficiency, let’s look at how “Inside 50 Efficiency” is affected. I define this as:

\text{I50 Efficiency}=\frac{\text{Inside 50s} - \text{Rebound 50s}}{\text{Inside 50s}}\times 100\%

weather-I50efficiency.pngThere is less efficiency in “Rain” games, as expected, but even less in “Damp” games! Perhaps this can be explained by teams not respecting slightly difficult conditions and trying to play a normal game style. While we’re on Inside 50s, Marks Inside 50 are a strong predictor of AFL success.

weather-marksi50Indeed there are less Marks Inside 50s in weather-affected games. Not surprising at all, it’s harder to mark in the wet and harder to hit targets.

Scoring in the wet

Sticking with scoring still, goal accuracy is strongly affected by the weather, not just rain, the wind too.



There are less scoring shots, and a lower goal accuracy. Unfortunately there is no data available on goal attempts that fall short or are kicked out-on-the-full. Strangely (?), the number of behinds scored in games (including rushed behinds) are not distinguishable statistically in different conditions.

Moving the pill

Disposal efficiency is crap in the wet. It drops dramatically and is one of the key differences in wet-weather football. It’s more of a measure of performance rather than tactics. Something that is more of a measure of tactical changes is players choosing to kick or handball.


Perhaps the two stats are related, kicking is more inefficient than handballing but there is the prevailing thought in the wet that you should boot it long rather than dish it around with the hands.

Nevertheless, in the modern game of many stoppages and flooding the contest there’s a lot of contested ball. In fact, in the wet there is much more contested ball than in the dry.


Interestingly, tackles per contested possession (a team measure I used called “Tackling Pressure”) is almost unchanged in the wet. I would have expected tackles to not “stick” as much but the definition of a tackle requires one to affect the efficiency of the disposal, and with many disposals inefficient by default in the wet there may naturally be more tackles recorded.

Picking up the soap

There are a lot of inefficient disposals in the wet, a lot of dropped marks and a lot of stoppages. Picking the ball up and having clean skills is going to be a boon in the wet. Without having numbers on things like “loose ball gets” (I know they’re recorded, just not publicly available!), I have to rely on looking at other stats to infer these things.


A clanger accounts for many different errors including unforced dropped marks, turnovers, free kicks conceded, etc. Also note that intercepts are the consequence of a turnovers. More evidence that wet-weather footy is a scrappy affair.

It’s the little things that count, or not?


Spoils, tap-ons, shepherds, smothers all come under the “One Percenter” stat. On average there are about 25 more per game in the wet. More one percenters fits the narrative of less clean possession. Unfortunately,  the correlation of One Percenters with the outcome of winning a game of footy is very poor.

What about the wind?

In all of the plots above I have plotted distributions for “windy” games as well. I take these with a grain of salt, really. Most of the “windy” distributions are bimodal and probably could be further split into just wind-affected games and rain-and-wind-affected games, but then the sample size would be irrelevant. I would wager most games that are wind-affected but aren’t really obvious are just blown over (yes, I did) in the match reports, so I don’t have a record of it.

What about the heat?

There’s just not enough data to make any meaningful observations.

How do you win wet weather football?

Well I haven’t answered that, and I don’t think anyone can with the available stats. What I can say is that almost all facets of the game are affected. The reduced ability to execute skills properly is a clear result of wet conditions. Being more efficient correlates strongly with winning a game in the AFL so having those skills to handle the wet ball and dispose of it smartly is surely going to be effective. But that’s no revelation.

Scoring accuracy is strongly affected too, the obvious recommendations are kicking straighter and setting up easier shots at goal (introducing more possibilities of turnovers). The publicly available stats are just not good enough to evaluate things like this.

What would be real interesting is to look at stats like loose ball gets. Being able to capitalise on the natural inefficiency of disposals in the wet should be a good predictor of the desired outcome.

I would also think that player positioning would play a key role. Having players in the right zones; close to both pick up loose balls out of a contest and ~60 metres back to intercept long bombs forward seems like the way to go. With lower disposal efficiency it should be less about covering a player and more about covering a probable landing zone.

Aside from analysing player GPS data (which I don’t have and am not good enough to do anyway) a easier measure may be total distance run by a team. I don’t have this data either.

The first few plots are the most interesting to me. In “Damp” games (including things such as dew-affected games, wet ground, etc.) there are counter-intuitively more Inside 50s, and these are less efficient, than both “Dry” and “Rain” games. Do teams neglect to switch into “wet-weather mode” when they should?

I intended to use the weather data in my models to better predict things such as upcoming game totals and margins, and I shall, but with a bit of uncertainty regarding how many of the actually weather-affected games I’ve recorded.

Round 16 Review

Pretty close to a very unique 9/9 but the Giants didn’t quite get up in the end.


I barely lost out on the bits and I seem to be getting some better looking probabilities now (close to other models). It’s going to be difficult to catch up.


Some devastating losses for Sydney, GWS, Essendon, Adelaide this week to really impact their finals chances. Sydney still odds on to make it, but they drop back to the pack of 6 fighting for 5 spots in the 8. GWS a bit of a smoky but injuries (and a pretty key suspension) will probably see them fade in the next couple once their expected “form” catches up.

Out of 10,000 simulations, Richmond made the Top 8 in every single one. It’d take something seriously dramatic (probably involving multiple key player outs) to change that significantly.



Hopefully by next week I’ll have a game-total model so I can simulate actual results (and thus, percentages) because it’s getting pretty bloody tight!


Round 15 Review

A horror round for almost everyone it seems.


A couple of things I’m thankful for:

  • my model’s chronic underestimation of the margin came good this week!
  • Breust the late out for Hawthorn pushed the game in GWS’s favour, yielding me the right tip. (go player models!)


Meanwhile, in the actual footy, things are getting tight around 8th-10th!


You’d have to say percentage is going to play a big part (unfortunately I can’t really simulate that yet)


Round 14 Review


Not a brilliant week with the Melbourne tip but I was happy with it at the time.


The “exotic” tip brings me back to the pack a bit but it’s still a very good pack!


Working on a few more things, including a simulation of the rest of the season.


A lot of changes since last week, West Coast lost out a lot. These simulations are based off no changes to the latest team list, so with JJK/Darling back that’d probably change. The order is based on the mean ladder position from 10,000 season simulations. My percentages are a bit more definitive than some other (better) models’ ladder simulations but isn’t too far off!

I’m also trialing a new measure of the “best team”; I simulate a home-and-away round-robin fixture, so each team plays each other twice at each of their home grounds. I’m still working on a nicer way to present this, but for the moment I’m using the same ladder presentation.


I expected this to be a bit more definitive but it’s much more spread than I thought! Note that again, the percentage is chance of getting in the (hypothetical) top 8. There’s a tremendous divide from 14th and under and it’s VERY tight at the top. It’s a shame the draw isn’t even!


Round 13 Review

I had made some changes to my model on Friday and screwed things up a little bit, after finding the mistake late Friday evening (post-bounce) I noticed my simulation then predicted a Sydney victory but I was happy enough to take the gamble on the Eagles! Some tough games went my way in the tipping (thanks Saints!) but it didn’t reward me with too many bits.


I’m currently drafting some changes to my model to include a predictor variable for weather once I nail down a process for scraping and categorising weather for past matches. I’m sure this will have a positive effect on reducing my MAE; which is unacceptable.

I’m hanging on to second place in my table since I’ve gone “live”… assuming bits are more important 😛



Round 12 Review

A bit of a tough round to tip with plenty of close games, a few upsets and a few blowouts really damaging my MAE, especially with the model’s margin underestimation. Oh well, it’s a beta!


Still measuring up well overall over the generous sample size of two live rounds. The relative indecisiveness in closer games is helping me salvage some Bits.


Onwards and upwards… after this man-flu leaves!


An Introduction to the Model

Absolutely no-one has asked me for details on my methodology yet, and I’m happy to provide answers for these non-existent questions.

I guess the main idea behind the model is the concept of a team being more than just a sum of its players.

\displaystyle P = P_t + \sum_i P_i

Overall performance P is a sum of team-related performance P_t and the total contribution from each of the players P_i.

I arrived at this idea from observing the freely available statistics that are published by invaluable sites such as AFLTables and Footywire. Individual player contributions are easy to see and understand, but there are other features in the stats I was interested in; for example, does a Rebound 50 reflect on the performance of the player awarded the stat, or is it more closely related to the defensive structure of the team as a whole? Is five Rebound 50s worth as much if the opposition have had 80 inside 50s, as opposed to 40?

I divided the relevant statistics (almost all of them?) into different categories of team and player performance. I arrived at seven different categories. As is the go in footy data analysis circles I came up with a snappy acronym; SOLDIER. For each of the categories, I painstakingly weighted each relevant statistic to favour the statistics that better correlate with the outcome (winning the game). A team performance in a game is described by FOURTEEN (!) variables; seven for the sum of player performance and seven for the team performance. For each game, the difference in these fourteen variables is hypothesised to relate to the difference in the final scores, i.e. the margin.

I recognise that this model is considerably more complicated than other footy models I’ve read about online, but footy is a complicated game!

I began this project after spending some time learning about data analysis; in particular applying machine learning techniques. After doing a few beginner projects through sites like Kaggle, I figured I had enough of the basics to give this project a crack. Unlike the rest of my mathematical life where I use techniques that I have a strong base of understanding in, I have no more than a basic understanding of how machine learning actually works.

Once I have a better grasp on machine learning and refine my model, and the many parameters embedded within, I may publish more details on the categories and the statistics important to each.

I hope to be in a position to be able to predict results, rank players, make ladder predictions, but also to see if the machine learning models can give any insights into concepts such as team balance, matching up of teams with different strengths and weaknesses, etc.

This is primarily a learning exercise for me but I believe (please correct me!) that no other well-discussed footy model is using machine learning techniques, so I hope this is of interest.