Two class problem in machine learning
In the entire set of classification problems case of two classes is special. It is the only case when class labels may be replaced by two numbers,
such as 1.0 and 1.0, and regression model can be used for estimation of probabilities. For three or more classes this approach fails
miserably, but for two classes amazingly works.
As data generation we consider two random circles given by coordinates $x_1, y_1, x_2, y_2$ and radii $r_1, r_2$
The observed coordinates $x_1, y_1, x_2, y_2$ are uniformely distributed random values from $[0, R]$, observed radii are uniformely
distributed on $[0, R/2]$, where $R$ is range (simply constant value). The output is assigned either 1.0 or 1.0 depending on overlapping.
Aleatoric uncertainty is modelled by adding unobserved random addends with zero mean to each input. The range for unobserved addends
for coordinates is $\delta R$ and for radii $\delta R/2$. In provided example $\delta = 0.7$.
The probabilities for targets are computed by repeated adding of random noise and
counting number of cases when circles are overlapped. This Monte Carlo simulation is used for estimation of
input dependent probabilities for the targets which are conventionally considered as true values.
The format of generated data is shown below, all inputs are different and targets (in the last position)
may not necessarily be accurate for each individual record. They are computed for unobserved (noisy)
values.
48.00, 70.00, 43.50, 87.00, 38.00, 21.50, 1.00
98.00, 32.00, 28.50, 33.00, 33.00, 9.50, 1.00
83.00, 44.00, 48.00, 86.00, 65.00, 25.00, 1.00
0.00, 72.00, 1.00, 25.00, 37.00, 19.00, 1.00
41.00, 49.00, 2.50, 39.00, 78.00, 11.50, 1.00
The size of data set is $4000$ records.
Here is typical print out of the program:
Data is generated.
Building models ...
Training time 1.95 seconds
Probabilities for validation sample
0.72 0.38 0.71 0.86 0.61 0.80 0.21 0.43 0.18 0.48
0.75 0.90 0.94 0.75 0.97 0.05 0.10 0.33 0.41 0.35
0.97 0.96 0.46 0.26 0.69 0.45 0.08 0.26 0.43 0.99
0.26 0.30 0.28 0.83 0.76 0.93 0.98 0.61 0.87 0.88
0.98 0.92 0.74 0.59 0.98 0.36 0.92 0.83 0.29 0.35
0.84 0.54 0.97 0.61 0.95 0.76 0.30 0.72 0.59 0.96
0.34 0.12 0.57 0.41 0.76 0.74 0.38 0.94 0.77 0.72
0.22 0.44 0.75 0.00 0.47 0.21 0.87 0.88 0.95 0.34
0.53 0.71 0.86 0.22 0.32 0.25 0.94 0.73 0.05 0.61
0.91 0.79 0.91 0.44 0.94 0.94 1.00 0.16 0.90 0.96
Right predictions 91, out of 100
Pearson correlation for Monte Carlo and model probabilities 0.98
The validation set is 100 inputs not used in training. The actual probabilities were estimated by Monte Carlo generated noise.
The print out shows probabilities for overlapping for circles of 100 validation inputs (the other is $1.0  p$). They were compared
to socalled actual values by Pearson correlation coefficient, which was $0.98$ for this execution. The number of correct predictions
is $91$, but this result is not as informative as probabilities. Since data is approximate and system is
stochastic, the accurate prediction is simply impossible, but accurate prediction of probabilities is possible and even
more critical than accuracy of outcome prediction.
Why prediction of probabilities is more important than outcomes
For example, day trading. It is buying and selling stocks within the same day to avoid paying commission. For successful trading
broker needs only accurate estimation of probabilities for either raising or declining stock price for the next few
hours.
Comparison to expert software
For comparison we chose Infer.Net, which is large collection of
libraries and methods for machine learning. The closest example to our task appeared to be
Bayes Point Machine.
We customized this example to process our data and passed same data set, which print out result is shown above.
The accuracy of Infer.Net was $81\%$, in our implementation it is $98\%$.
Number of correct predictions in Infer.Net is $82$, in our test $91$
Implementation details
The class labels were assigned 1.0 or 1.0 and KolmogorovArnold representation was used as a regression model. The modelled
output value was interpreted as probability with 0 as 50%. For stability of the result elementary bagging was used.
The output was an average of prediction of four models built concurrently. Each model initialized from the random
state.

