Absolutely no-one has asked me for details on my methodology yet, and I’m happy to provide answers for these non-existent questions.

I guess the main idea behind the model is the concept of a team being more than just a sum of its players.

Overall performance is a sum of team-related performance and the total contribution from each of the players .

I arrived at this idea from observing the freely available statistics that are published by invaluable sites such as AFLTables and Footywire. Individual player contributions are easy to see and understand, but there are other features in the stats I was interested in; for example, does a Rebound 50 reflect on the performance of the player awarded the stat, or is it more closely related to the defensive structure of the team as a whole? Is five Rebound 50s worth as much if the opposition have had 80 inside 50s, as opposed to 40?

I divided the relevant statistics (almost all of them?) into different categories of team and player performance. I arrived at seven different categories. As is the go in footy data analysis circles I came up with a snappy acronym; SOLDIER. For each of the categories, I painstakingly weighted each relevant statistic to favour the statistics that better correlate with the outcome (winning the game). A team performance in a game is described by FOURTEEN (!) variables; seven for the sum of player performance and seven for the team performance. For each game, the difference in these fourteen variables is hypothesised to relate to the difference in the final scores, i.e. the margin.

I recognise that this model is considerably more complicated than other footy models I’ve read about online, but footy is a complicated game!

I began this project after spending some time learning about data analysis; in particular applying machine learning techniques. After doing a few beginner projects through sites like Kaggle, I figured I had enough of the basics to give this project a crack. Unlike the rest of my mathematical life where I use techniques that I have a strong base of understanding in, I have no more than a basic understanding of how machine learning actually works.

Once I have a better grasp on machine learning and refine my model, and the many parameters embedded within, I may publish more details on the categories and the statistics important to each.

I hope to be in a position to be able to predict results, rank players, make ladder predictions, but also to see if the machine learning models can give any insights into concepts such as team balance, matching up of teams with different strengths and weaknesses, etc.

This is primarily a learning exercise for me but I believe (please correct me!) that no other well-discussed footy model is using machine learning techniques, so I hope this is of interest.

-Adam