British Premier League I
This example shows application of DDR to prediction of outcomes of football matches.
Since most number of readers do not have experience in sports betting, we have to start this article
from short introduction. The picture below shows snapshot of bookmakers' offers from oddsportal.com
Symbols $1X2$ in upper right corner is a label denoting the bet type: HOME-DRAW-AWAY, which means bettor can bet money on
home team wins, away team wins or draw. The left team in the list plays at home and right team plays away. In the game
of Chelsea-Bournemouth, Chelsea is favorite, its chances to win is over 50%, negative number -323 means that bettor has
to bet 323 in order to win 100 in case Chelsea actually wins. Bournemouth is underdog, its chances to win is below 50%
and positive number 907 means that bettor can win 907 by betting 100 on away win. And, obviously, bettor can bet on draw
100 and win 461 in the case of draw. When bettor wins he/she received published amount and his/her bet back.
Obviously, the bets are calculated using estimated probabitlities of all outcomes and with the condition of providing some profit for bookmakers,
typically it is near 5% of total betting amount. That means that near 95% of all monies move from one bettor to another.
On that reason random betting leads to small loosing, near 5% of average bet.
The explanation of how and why it is possible to win money by mathematical modelling is very simple. Assume that bookmakers
models are very accurate, but public has bias, prejudice and favorite teams and people are betting on HOME in a disproportional
number. In this case bookmakers make other bets more attractive and raise AWAY winning amount. Assume that actual chance to win
for AWAY team is 1/8 and not 1/10. That means statistically bettor loose 700 and wins 907 in each virtual 8 repetitions.
And software only needs to identify slightly underestimated outcomes, which are present not because bookmakers don't use
AI models, but because they have to adjust their bets to biased public.
Inputs and outputs
The inputs are names of teams and the outputs are classes HOME, DRAW, AWAY, but we use regression model, so we need convert
it all to real inputs and outputs. We can train model on goal difference and the predicted difference
becomes a real number. Then we round it to an integer and determine the outcome. The names of teams are converted to positions in the current standings.
The standings is a sorted list by the end of the season, here is example:
Standings for 2019-2020, British Premier League
We can assign integers 0 through 19 to teams and for the match Liverpool-Chelsea the inputs become $[0,3]$ for the next
season 2020-2021. There is, however, two problems. Bottom 3 teams are relegated and 3 new teams are introduced. So
we simply replace relegated team names by new names and first problem solved. Second, we have to account new
results. So after each match, we simply replace old result by new and recompute the positions according to
British Premier League scoring rules which use number of winning matches, number of draws and goals. So after the
match, Liverpool and Chelsea may change positions in standings and assigned different input values in the next
Having historical records for all scores and standings from 2004 through 2019, we trained DDR model with Kolmogorov-Arnold
representation as deterministic component. The historical data is publically available in multiple places, we used
The predicted season is 2019-2020.
As it was noticed by some other authors and confirmed in
our tests, it is very hard to accurately predict the DRAW. On that reason we excluded DRAW and predict either HOME or AWAY depending on the sign
of predicted goal difference. The printed by code summary is below:
Providing bookmakers bets and winning points and having estimated probabilities, the program selects the highest profit and not the
most probable outcome. After placing the bet the code reads actual result and updates the balance.
The number of playing teams is 20, each playing with each at home and away, so there are totally 380 matches. The program
bets on each, although some could be skipped for higher profit. Interesting, that our AI decides to bet on underdogs, it shows 124
right predictions out of 380. If to bet always on favorites, the number of right predictions will be near 50%, but the balance
will be negative, lost of 5% to 15% of money. The code is not specifically written to choose underdogs, it is result of a training.
Betting on underdogs in all matches lead to lost, it should be selected case. The amount $5553.46 is the profit that could be obtained
by following the logic. The profit varies in different runs of code due to random starting points. It may be from 5% to 25%.
Don't try that at home
Answer to the question 'Do you play?' is 'No'. There are several reasons for that, but I refrain from explanation.