Multimodal posterior distribution

When models like Bayesian Neural Networks (BNN) and Divisive Data Resorting (DDR) return distribution parameters or samples of outputs for the inputs, not used in training, we wish to know how accurate they are. The experimental data or recorded observations of the physical systems usually have all different inputs, so we can't use them even to assess accuracies of returned expectations not speaking about distributions. For comparison we need so-called true or actual distributions, and the only way to get them is to use synthetic data. This generated data must be challenging to expose weaknesses and strengths of the models. Challenging means that outputs should have not normal and not unimodal distributions which vary significantly for the different inputs.

The simulating formua, answering such needs, is derived by the authors of this site:

where $C_j$ are uniformly distributed random values on $[0,1]$, $X_j$ are observed values, $X^*_j$ are used in computation of the outputs $y$, parameter $\delta$ is an error level (we use $\delta = 0.8$).

Below are two histograms for different inputs, as exampes:

Bayesian Neural Network test

For BNN test we used published code sample referred here as Keras benchmark. The original data Wine quality, used in published example, was replaced by 10,000 records generated by above formula. The slightly modified Python code with new data and assessed result can be found in author's repository.

After training of the model is completed, the test program generates 100 new inputs and passes them to model which returns 100 output samples in a form of arrays with 1024 possible output values. They are compared to same size arrays generated by the formula. Now we can compute and compare expectations, variances and even histograms.

The metric for expectations was Pearson correlation coefficient computed for 100 predicted and actual values. Same metric was used for the variances. The histograms were compared by Kullback-Leibler divergence parameter and, in order to have a single value as accuracy measure, we present the average KL divergence for 90% best results out of 100 tested samples. The histograms are computed as 4 bar values. Although 4 bars is rather rough estimation for a contnuous distribution, it can be used to tell unimodal from multimodal distribution or symmetric from skewed.

For visualization we provide two pairs of 4 bar histograms for different KL divergences equal to 0.11 and 1.24.

Two histograms with KL-divergence = 0.11

Two histograms with KL-divergence = 1.24

Test results for 8 executions of Python benchmark example
Expectation 0.990.990.990.990.990.990.990.99
Variance 0.930.920.860.910.950.920.940.87
KL divergence1.331.341.351.161.511.341.261.31

All distribution samples for 100 test inputs returned by this Keras benchmark model are near normal, below is one of them for example, they do not even remotely resemple the actual distributions.

Divisive Data Resorting test

The deterministic component of the model was Kolmogorov-Arnold representation (details for the training of this model can be found in published paper)

$$ M(x_1, x_2, x_3, ... , x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right).$$
The expectation values are obtained by a single deterministic model $M_E$ by minimization of residual errors $e_i$

$$e_i = [y_i - M_E(X^i)]^2.$$
Variances were computed also by a single deterministic model $M_V$ for a new output values $v_i$
$$v_i = [y_i - M_E(X^i)]^2.$$ by minimization of residuals
$$e_i = [v_i - M_V(X^i)]^2.$$
KL divergence was estimated by DDR model with sliding window (details in archive article). The source code can be found at author's repository. The test results are shown in the table below.

Test results for 8 executions of DDR code
Expectation 0.980.980.990.990.990.990.990.99
Variance 0.970.970.970.960.970.960.970.98
KL divergence0.


The samples returned by BNN reproduce accurately only expectations. Variances are even less accurate than obtained by two deterministic models. The values in returned samples not reproduce the actual distributions even approximately.

We have to repeat at this point that data is specifically designed to make getting accurate results as mission impossible project. Most datasets are not that challenging, the outputs usually have unimodal bell-shape near symmetric distributions and, if we pass such type of data to BNN, the result for both expectations and variances will be near 99% accurate. But same is true for two deterministic models.