Challenging Data TestDemo project, code and data code and dataThis is reusable code that tests challenging data sample that are mathematically generated. The reason for mathematically generated data is to test the descriptive capacity of the model. Computational formula is the following: $$y = \frac{{sin(x_2)}^{x_1}1/e^{x_3}} {x_4} + x_5 \cdot cos(x_5)$$The number of records is 4000. The ranges are:
Single Uryson operator works much better. Its Pearson correlation coefficient is 0.93. For KolmogorovArnold model the data were shuffled and split into training and validation sets in proportion of 0.9 to 0.1. The experiment is repeated 10 times with rotating validation records in such a way that every record falls 9 times into training set and 1 time into validation set. The modelled data was socalled unseen on training. The Pearson correlation coefficient computed for validated data only is 0.995. So we can see that KolmogorovArnold model is capable to identify nonlinear properties of above equation exposed via data and that it works significantly better compared even to a single Urysohn model, which is considered as strong nonlinear model as well. The number of terms according to KolmogorovArnold model must be 11. $$ f(x_1, x_2, x_3, ... , x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right), $$however the necessary accuracy is obtained for less number of terms. Here is the table with Pearson correlation coefficients for different number of terms:

