Content

KolmogorovArnold network with cubic spline interpolation
The univariate functions $U_i$, $f_{i,j}$ used in KA model as adequate replacement of multivariate $M$
$$ M(x_1, x_2, x_3, ... , x_n) = \sum_{i=1}^{2n+1} U_i\left(\sum_{j=1}^{n} f_{i,j}(x_{j})\right) $$
can be presented in the different forms. When dataset is the result of observation of physical system, it has uncertainties
due to unaccounted factors (called epistemic), features and targets may also be recorded with errors. On that reason
piecewise linear representation is sufficient, since modeling errors are less than data inaccuracies. But there are some
cases when data represent unknown but very accurate model and we wish to reduce modeling errors. It may be needed for
using KA for solving partial differential equations.
The piecewise linear version
was published by authors of this page in March 2021. And there is also friendlier explanation by
video.
Here I explain how to increase accuracy of modeling by using cubic splines for all involved functions $U_i$, $f_{i,j}$. The
explanation has 3 sections. First is identification of a single univariate function using elementary dataset, which is sequence of
values $x_i$, $y_i$, implemented in the same way as it is used in KA. Second is identification of a single generalized additive model
or discrete Urysohn operator presented by
$$ z(x_1, x_2, ..., x_n) = \sum_{j=1}^n f_j (x_j) $$
and also implemented in a way used in KA model. And the third is the method of combining multiple GAMs in a tree known as KA model. The methods
for all 3 sections are used in a form of nested dolls. Single $f$ is used for each $f_j$ in a GAM and GAMs are used as the nested objects in KA. Programmatically,
they are 3 classes, used as objects in variable length lists at runtime.
Univariate function
Dataset is a list of values $x_i$, $y_i$. It is assumed that at least approximate transformation $f(x_i) = y_i$ exists. We divide entire
interval of $[x_{min}, x_{max}]$ into even segments. The more segments, the higher the accuracy, but the slower identification and there
is a risk of overparameterization. Typical number is 5 segments. I will use 5 in explanation, not $N$, for making all as clear as possible.
We introduce 5 unit vectors $e_1^T = [1 0 0 0 0], e_2^T = [0 1 0 0 0], ..., e_5^T = [0 0 0 0 1]$ and make splines for each of them.
It is very old and known method, there are tons of documentation, but I recommend
James Keesling. My implementation in code is sitting in
SplineGenerator. Two of these spline approximations are shown in the image below
I denote these spline approximations as $S_1, S_2, ... S_5$. Each $S_k$ is built for $e_k$. Now we consider these $S_k$ as basis functions and
build approximation to $y_i$ as linear combination
$$ \hat y_i = \sum_{k=1}^5 c_k S_k (x_{i}) $$
Vector of coefficients $C^T = [c_1 c_2 c_3 c_4 c_5]$ is updated for each new pair randomly selected $x_i$, $y_i$. The method is called
Projection Descent.
It is very quick and effective for near orthogonal data vectors $S^T = [S_1(x_i), S_2(x_i), S_3(x_i), S_4(x_i), S_5(x_i)]$. In the code it is
Univariate.Update method.
This part can be researched and tested independently. The coding sample is published.
GAM or discrete Urysohn
Input or features are vectors ${X^i}^T = [x_{1,i} \: x_{2,i} \: x_{3,i} \: x_{4,i} \: x_{5,i}]$, the model is below
$$ \hat z_i = \sum_{k=1}^n f_k (x_{k, i}). $$
The identification is iterative update of every function for each difference between model and dataset value $\Delta = z_i  \hat z_i$.
The difference $\Delta$ is evenly divided between all above univariate functions and each of them is updated according to previous
section. It is also very quick and effective algorithm published in article of
the authors of this page. In the code it is Urysohn object.
This part can also be researched and tested independently. The coding sample is published.
KolmogorovArnold
We introduce intermediate variables $\theta_i $ and split KolmogorovArnold representation into set of equations:
$$ M = \sum_{i=1}^{2n+1} U_i\left(\theta_i\right) $$
$$ \theta_i = \left(\sum_{j=1}^{n} f_{i,j}(x_{j})\right) $$
We initialize model randomly. For each record we have difference between computed and actual value $\Delta M$. We divide this
difference according to derivatives of the outer operator
$$ D_i = \Delta M \frac{dU_i}{d\theta_i} $$
and pass these differences to inner operators, which are updated according to previous section. The outer operator is then also updated
in the same way using $D_i$ as new inputs.
The theoretical part and proof of local convergence is published by authors also.
Some notes on research
In addition to what already has been cited we can provide more links KASAM: Spline Additive Models for Function Approximation
and KAN: KolmogorovArnold Networks.
Comparison test
The coding example on the top has a self test, which uses built in data generation, similar to latter example. Two formulas are used for generating datasets:
$$ z = exp[sin(\pi x) + y^2] $$ and
$$ z = exp\left[[sin(\pi (x_1^2 + x_2^2)) + sin(\pi (x_3^2 + x_4^2))] / 2 \right] $$
The concept published in KAN: KolmogorovArnold Networks is different but the accuracy and performance
for these two examples is about the same.
The entire code is very simple. It does not use any additional libraries or any programming tricks and is 710 lines long, including building splines which involves
tridiagonal matrix inversion. It has room for runtime optimization, but is written in the most simple way possible for people to learn and practice
the concept.
 
