Content




British Premier League I

Code location: PremierLeague3



This example shows application of DDR to prediction of outcomes of football matches. Since most number of readers do not have experience in sports betting, we have to start this article from short introduction. The picture below shows snapshot of bookmakers' offers from oddsportal.com



Symbols $1X2$ in upper right corner is a label denoting the bet type: HOME-DRAW-AWAY, which means bettor can bet money on home team wins, away team wins or draw. The left team in the list plays at home and right team plays away. In the game of Chelsea-Bournemouth, Chelsea is favorite, its chances to win is over 50%, negative number -323 means that bettor has to bet 323 in order to win 100 in case Chelsea actually wins. Bournemouth is underdog, its chances to win is below 50% and positive number 907 means that bettor can win 907 by betting 100 on away win. And, obviously, bettor can bet on draw 100 and win 461 in the case of draw. When bettor wins he/she received published amount and his/her bet back.

Obviously, the bets are calculated using estimated probabitlities of all outcomes and with the condition of providing some profit for bookmakers, typically it is near 5% of total betting amount. That means that near 95% of all monies move from one bettor to another. On that reason random betting leads to small loosing, near 5% of average bet.

The explanation of how and why it is possible to win money by mathematical modelling is very simple. Assume that bookmakers models are very accurate, but public has bias, prejudice and favorite teams and people are betting on HOME in a disproportional number. In this case bookmakers make other bets more attractive and raise AWAY winning amount. Assume that actual chance to win for AWAY team is 1/8 and not 1/10. That means statistically bettor loose 700 and wins 907 in each virtual 8 repetitions. And software only needs to identify slightly underestimated outcomes, which are present not because bookmakers don't use AI models, but because they have to adjust their bets to biased public.

Inputs and outputs

The inputs are names of teams and the outputs are classes HOME, DRAW, AWAY, but we use regression model, so we need convert it all to real inputs and outputs. We can train model on goal difference and the predicted difference becomes a real number. Then we round it to an integer and determine the outcome. The names of teams are converted to positions in the current standings. The standings is a sorted list by the end of the season, here is example:

Standings for 2019-2020, British Premier League


We can assign integers 0 through 19 to teams and for the match Liverpool-Chelsea the inputs become $[0,3]$ for the next season 2020-2021. There is, however, two problems. Bottom 3 teams are relegated and 3 new teams are introduced. So we simply replace relegated team names by new names and first problem solved. Second, we have to account new results. So after each match, we simply replace old result by new and recompute the positions according to British Premier League scoring rules which use number of winning matches, number of draws and goals. So after the match, Liverpool and Chelsea may change positions in standings and assigned different input values in the next preditions.

Training

Having historical records for all scores and standings from 2004 through 2019, we trained DDR model with Kolmogorov-Arnold representation as deterministic component. The historical data is publically available in multiple places, we used oddsportal.com.

Prediction

The predicted season is 2019-2020. As it was noticed by some other authors and confirmed in our tests, it is very hard to accurately predict the DRAW. On that reason we excluded DRAW and predict either HOME or AWAY depending on the sign of predicted goal difference. The printed by code summary is below:



Providing bookmakers bets and winning points and having estimated probabilities, the program selects the highest profit and not the most probable outcome. After placing the bet the code reads actual result and updates the balance. The number of playing teams is 20, each playing with each at home and away, so there are totally 380 matches. The program bets on each, although some could be skipped for higher profit. Interesting, that our AI decides to bet on underdogs, it shows 124 right predictions out of 380. If to bet always on favorites, the number of right predictions will be near 50%, but the balance will be negative, lost of 5% to 15% of money. The code is not specifically written to choose underdogs, it is result of a training. Betting on underdogs in all matches lead to lost, it should be selected case. The amount $5553.46 is the profit that could be obtained by following the logic. The profit varies in different runs of code due to random starting points. It may be from 5% to 25%.

Don't try that at home

Answer to the question 'Do you play?' is 'No'. There are several reasons for that, but I refrain from explanation.