Saturday, May 8, 2010

World Cup Statistics

From Javier

We are asked frequently what is the most "probable" path in the World Cup according to the Isthmus Partners World Cup Simulator. People want us to boldly predict the pairings that will take place in the first round of the knockout stages and then in quarterfinals, semifinals and finals.

What needs to be realised is how difficult it is to predict the World Cup. Let's just take the Group Stage games. There are 8 groups and in each group there will be 6 games, so a total of 8 x 6 = 48 games just in the Group Stage. How easy would it be to predict the winner or draw of each game (Team A wins, Team A&B draw, Team B wins)? Well, the number of possibilities (possible combinations of results) is huge, beyond comprehension. It is 3^48, which is 79,766,443,076,872,500,000,000. That is 79.77 billion trillion. I know that in each game there are favourites and that you fancy that your chances of prediction are better than a 33.3% chance of getting it right in each game. But even if you 75% confident, the probability that you will get all 48 games right is 0.000101% (about one in a million, calculated from 0.75^48). Do one thing, look at the 48 games now and write down your prediction for those game just in terms of which team wins or if it is a draw. Then on June 26 after the Group Stages compare the actual results against your prediction (may be a humbling experience). And we have not even considered the different possibilities in terms of actual scorelines in those 48 games. The combinations are simply staggering.

Some have told us that the possible combinations in the knockout rounds are much smaller. Well, yes the combinations are much smaller, but the numbers are still huge. Let's look at potential pairings of first round of knockout stages. The tournament organizers have already determined the following pairings:
Winner Group A - Second Group B
Winner Group B - Second Group A
Winner Group C - Second Group D
Winner Group D - Second Group C
Winner Group E - Second Group F
Winner Group F - Second Group E
Winner Group H - Second Group G
Winner Group G - Second Group H

How many pairings of Groups A & B are possible? In each Group there are 12 different combinations of First/Second. So in total there are 144 combinations (=12 x 12). What about across all Groups? There are in total 4 sets (A/B, C/D, E/F, H/G). In total that is 144^4, which is 429,981,696. And then in the knockout stages there are 15 games to the final (we exclude the game for 3/4 place as it is the only one not relevant to determine which team wins the World Cup). That is 2^15 = 32,768 combinations in the knockout stage once we now the first round pairings of the knockout stage. So we multiply 32,768 x 429,981,696 and that is equal to 14.09 trillion combinations. That is a lot of combinations.

So how can we dare to represent that the World Cup Simulator has any degree of accuracy with only a maximum of 9999 iterations? Once we have modelled the probability distributions of the head-to-head pairings of all the teams and the structure of the World Cup, with a few hundred or thousand iterations (each iteration being a whole tournament played out by the model) we can have good estimations of what the model is predicting. Big teams will have higher probabilities of winning, but from time to time the simulator plays out a World Cup where a big surprise happens (Algeria or Korea Republic wins). But the probability of a Honduras - Korea DPR final is zero, it just will not happen. This is the strength of Monte Carlo simulation, that with the brute force of computers running simulations we can have an idea of the distribution of outcomes. It is also the weakness, as the human tendency to have blind faith in models could lead to models that predict too narrow a distribution range. More about this in future posts.



No comments:

Post a Comment