Nonlinear Adaptive Filtering (SISO)

~ Nonlinear Adaptive Filtering as a Form of Artificial Intelligence ~

Single Input, Single Output (SISO object)

Discrete Urysohn filter can be derived by elementary modification of finite impulse response (FIR) $$y_i = \sum_{j=0}^m h_j \cdot x_{i-j}.$$ We only need replace each linear term in above equation by the function $$y_i = \sum_{j=0}^m f_j (x_{i-j}).$$ The only limitation on the introduced functions at this point is piecewise linearity. That means the functions are continuous and represent the sequences of connected straight lines sharing edge points. Each $x$ value in equation falls into a single linear block and two weights are computed for it (left and right) $L = 1 - x$ and $R = x$, how it is explained in basic. If we build the table with data, it may look like follows

$f_0$ values				...	$f_m$ values
$f_{0,0}$	$f_{0,1}$	...	$f_{0,n}$	... other functions	$f_{m,0}$	$f_{m,1}$	...	$f_{m,n}$
$L{i}$	$R_{i}$	...	0	... other functions	$L_{i-m}$	$R_{i-m}$	...	$0$
...	...	...	...	...	...	...	...	...

The second row in the table contains estimated function values, all other rows contain knwon weights built on raw inputs how it is explained in basic. All weights are in the range [0,1], since they are relative distances within the block and the sum of $L + R = 1$. Obviously, the inner product of second and any other row must give the output $y_i$. If to build system of linear algebraic equations on these data, the matrix will contain blocks of columns as it is shown in the table. Each block will have only two values different from 0 with the sum equal to 1. So the matrix is sparse, rows are either orthogonal or near orthogonal, and applying LMS or projection descent lead to a quick convergence. When initial function values are all zeros, the solution has minimum norm.

We can also introduce very simplified particular case, when functions are constant within each block. Such model is not continuous, but, assuming that adjacent function values are close and gradually changing over the field of definition, the model can be built by applying few lines of code as it is already shown in introduction . For this continuous case the DEMO code can be obtained from the link at the top.

Since the sum of all columns within the block gives vector with all elements equal to 1.0, we have at least $m-1$ linear dependent vectors and rank of the matrix is always smaller than a number of estimated parameters. That means that models are always redundant and differences in some elements can be compensated by others.