~ Nonlinear Adaptive Filtering as a Form of Artificial Intelligence ~

Challenging Data Test

Demo project, code and data code and data
This is reusable code that tests challenging data sample that are mathematically generated. The reason for mathematically generated data is to test the descriptive capacity of the model. Computational formula is the following:

$$y = \frac{|{sin(x_2)}^{x_1}-1/e^{x_3}|} {x_4} + x_5 \cdot cos(x_5)$$

The number of records is 4000. The ranges are:

x1[0.00, 0.99]
x2[0.00, 1.55]
x3[1.00, 1.49]
x4[0.39, 1.39]
x5[0.00, 0.49]
y[0.00, 2.15]

Linear regression built for entire data gives Pearson correlation coefficient between modelled and actual data 0.88, which is expected.

Single Uryson operator works much better. Its Pearson correlation coefficient is 0.93.

For Kolmogorov-Arnold model the data were shuffled and split into training and validation sets in proportion of 0.9 to 0.1. The experiment is repeated 10 times with rotating validation records in such a way that every record falls 9 times into training set and 1 time into validation set. The modelled data was so-called unseen on training. The Pearson correlation coefficient computed for validated data only is 0.995. So we can see that Kolmogorov-Arnold model is capable to identify nonlinear properties of above equation exposed via data and that it works significantly better compared even to a single Urysohn model, which is considered as strong non-linear model as well.

The number of terms according to Kolmogorov-Arnold model must be 11.

$$ f(x_1, x_2, x_3, ... , x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right), $$

however the necessary accuracy is obtained for less number of terms. Here is the table with Pearson correlation coefficients for different number of terms: