Urysohn model

Reference to published paper Urysohn

It starts from original integral equation $$ \phi ( t) = \lambda \int\limits _ \Omega U ( x ( s), s, t ) d s + \psi ( t) ,\ \ x \in \Omega , $$ where model is defined by the kernel $U$. For the stationary case, variable $t$ drops out of picture $$ y = \int\limits _ {s_1}^{s_2} U(x(s), s)ds .$$ If we represent continuous $x(s)$ as a sequence of adjacent values $x_1, x_2, ...$ and surface $U$ as slices of adjacent functions $f_1(x_1), f_2(x_1), ...$, than we get a sum, which is, actually, a result of numerical quadrature $$y=\sum\limits_{j=1}^{n}f_j(x_j).$$ It is a regression model converting components $x_1, x_2, ... $ of vector $X$ into scalar $y$ by functions $f_1, f_2, ...$

Now I can show absolutely elementary and extreamly quick way of obtaining all these functions, which needs literally few lines of computer program. These functions are identified from the data, their shapes are determined in the training process.

In the image below I show two piecewise linear functions $f_1$ and $f_2$ with ordinates $f_{1,1}, f_{1,2}, ..., f_{2,1}, f_{2,2}, ...$

We make a block vector with unknown ordinates:
$$F^T = [f_{1,1} \;\; f_{1,2} \;\; f_{1,3} \;\; f_{1,4} \;\; f_{1,5}] \;\;\; [f_{2,1} \;\; f_{2,2} \;\; f_{2,3} \;\; f_{2,4} \;\; f_{2,5}],$$ and block vector with known abscissas: $$X^T = [0.0 \;\; 0.1 \;\; 0.9 \;\; 0.0 \;\; 0.0] \;\;\; [0.0 \;\; 0.9 \;\; 0.1 \;\; 0.0 \;\; 0.0].$$ Since values $x_1$ and $x_2$ are given, than we know relative abscissa distances within the segments that they belong to, and we can use them in a linear interpolation formula. I use numbers $0.1$ and $0.9$ to make this example as clear as possible. The inner product $X^T F$ must be equal to $y$, and having training data $X^i, y_i$, we can build system of linear algebraic equations and solve it, using some regularization technique for stability and filtering of observation errors or other types of uncertainty.

Note: The system is linear, but the model is not. Only the locations of projected to abscissa node points have to be selected, the actual shapes of all functions are determined in the training process.

Projection descent

Mathematically each equation $X^T_i F = y_i$ in training data set is a hyperplane. Dominated zeros in equations make most of them pairwise orthogonal. For such type of data there is very effective way of iterative solution by projecting point $F_k$ to given hyperplanes squentially in any order. This method was published by Stefan Kaczmarz in 1937. We can easily visualize the advantage of this method for pairwise orthogonal or near orthogonal hyperplanes.

When data is noisy and single common point for all hyperplanes not exists, the method navigates iterative solution to the area where most hyperplanes cross and holds it there with the small changes at each step. The iterative formula is very simple:
$$F_{k+1} = F_k + \alpha \frac{y_i-{X^T}_i F_k}{||X_i||^2} X_i, $$ where $\alpha$ is a regularization parameter, subscript $k$ denotes iteration, subscript $i$ denotes data record.

Obviously, it is not necessary to build all these vectors with dominated zeros and compute inner products in the code. The computations use elementary linear interpolation steps for each involved segment, the code can be written by a computer school student for an hour in the class room, there is no need even show the code.