Content
|
Second failed benchmark
Published benchmarks examples, usually, have two components: code and data. The data, obviously, must be kind of challenging,
exposing code strength, but they are not always chosen that way. This example is taken from
Keras -
another big and well maintained collection of source code for deep machine learning.
The data used in this Keras example is Wine Quality.
It has 4 898 records with 11 observed features. The target is quality score 0,1,...10.
Since this data is experimental and only one output value is available for each record, the predicted
distribution can't be compared to actual one and accuracy can't be assessed. So, in order to assess accuracy we
replaced Wine Quality Data by synthetic
data:

where $C_j$ are uniformly distributed random values on $[0,1]$, $X_j$ are observed values, $X^*_j$
are used in computation of the outputs $y$, parameter $\delta$ is an error level. When $\delta=0$, the system becomes deterministic
and can be modelled with near 100% accuracy by neural network, so we are dealing with aleatoric uncertainty only.
The generated data set size was 10 000 records.
This formula was designed by mathematician Mike Poluektov. I call it Mike's benchmark data set. Recalculation of outputs
with different random terms $C_j$
allows estimation of probability densities for $y$ and they all are complex enough and depend on observed inputs $X_j$.
The code from Keras provides only
expectations and standard deviations. So I replaced experimental
Wine Quality Data and compared expectations and
standard deviations returned by model to so called 'actual' or
Monte Carlo simulated values.
The error level was $\delta = 0.8$. Single deterministic model for such data gives accuracy near $75\%$.
The accuracy for returned expectations was near $98\%$ and for standard deviations near $92\%$.
The used accuracy metric was Pearson correlation coefficient. The modified code can be found in my repository
backup location,
but it is almost same as original, only data is replaced and accuracy assessment is added.
Is it success or failure?
It may look like a good result. The data is very complex, single deterministic model is very inaccurate, correlation
for estimated $\hat{y}$ and given $y$ outputs for single model is only $75\%$, but BNN returns expectations with $98\%$ accuracy
and standard deviations with $92\%$ accuracy.
Ok, in the next section I will explain why this result is very weak.
'Vandalization' of the beautiful picture
My idea in counter example was to model variance by another deterministic model and compare accuracy to BNN. Assume we built single deterministic
expectation model $M_E$ by minimizing residual errors and it provides estimated outputs $\hat{y}$. Now we can compute squared errors for each
individual input $e_i = (M_E(X_i) - y_i)^2$ and build another model $M_V$ using $e_i$ as new targets.
The choice of the models $M_E, M_V$ should not necessarily be a neural network. I chose here the one I was using for several years in research
$$ M(x_1, x_2, x_3, ... , x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right), $$
which is Kolmogorov-Arnold representation. This model will be explained in details further in this site. For
this moment I only state that its accuracy in many comparison tests appeared to be near neural networks
but training needs much less time. The functions $\Phi_q, \phi_{q,p}$ are not specified prior to training, their
shapes is fully determined in the training process.
Now I simply announce the end result for Kolmogorov-Arnold model. The accuracy for expectation is 98%
and for variance 92%. Two deterministic models did the same job. The code can be found
in my storage, the project name is residual.
Using two deterministic models for uncertainty estimation, obviously, is much more quick and reliable process. When technology
is significantly more complex, it should bring certain advantages, which I did not notice in this example.
Needless to say that training two deterministic models took few seconds and using BNN near two minutes on the
same machine.
| |
|