DDR vs. Bagging
In this experiment we compared deep ensemble built by bagging and
DDR.
In original article deep ensemble is built by neural network and our DDR tests are built for KolmogorovArnold model.
Since the latter exposed in all test cases same or better accuracy, we use KolmogorovArnold for both tests.
We built 8 models by bagging and 8 models by DDR, then we generated 100 inputs, passed them to both ensembles
and compared the accuracy.
The data is generated by Mike's formula
with error level $\delta = 0.8$, $[X_j, y]$ observed and $C_j$ random values uniformly distributed on $[0,1]$.
Observed values were also randomly generated as uniformly distributed on $[0,1]$.
The data set size was $10 000$ records. When single deterministic model is built for entire data set, its Pearson correlation
coefficient for estimated and provided outputs is approximately $0.75$, which is kind of real life case.
In the accuracy test we generated 100 random inputs, passed them to models and obtained 3 samples per input:
 Array of 8 estimated outputs for bagging model
 Array of 8 estimated outputs for DDR model
 Array of 1024 possible Monte Carlo generated outputs
Monte Carlo sample is considered as TRUE ouput, to which other two are compared. The metric was Pearson correlation coefficient.
It is applied for expectations and standard deviations. The other metric was KolmogorovSmirnov goodnessoffit test, which shows
if two samples are taken from the same population. The results are shown in the program print out below:
The accuracy for expectation is very high $0.99$ for both bagging and DDR, however, the standard deviation is not accurately estimated
by bagging, it is $0.29$ when DDR gives $0.96$. Also KolmogorovSmirnov test passed for DDR in $84%$ cases, but for bagging
only in $53%$ cases.
The general conclusion can be made, that bagging not returns accurate samples allowing make judgements about
input dependent distributions and
can be used only for estimation of expectations.

