|
Idiots Neural Networks in application to image compression The name of the topic can be explained by two reasons. First: Neural Network is only non-linear parametric model and can be easy understood. Second: I noticed that when any topic precedes with word Idiots it is better recognized by Google. Try to Google, for example, my other topic: Idiots Adaptive Arithmetic Coding. So Idiots is Google magic word, before they correct it. Neural network is non-linear model with hundreds (or even thousands) unknown parameters. When these parameters are assigned proper values the network can produce reasonable output for certain combination of input parameters. The process of estimation of all these parameters for given sample is called training. The classic example of successful training of neural network in making decisions close to a human logic is approval of mortgage application. The training sample is the list of data that applicants provided in order to obtain mortgage and the decision of an expert. These data were used for adjustment of network parameters and, when training was completed, the approval or denying of mortgage was passed from human expert to a program. Possibly we all now facing the consequences of that idea in the form of world financial crisis, but this article is not about it. The developers of neural networks made a bold claim that by training (estimation of parameters) a machine can approach human logic. If we try to apply such logic to image compression, how the program can replace human brain? One solution is to reproduce the mind of an artist who looks at the unfinished image and put the missing pixel to the lower right corner. If prediction works well we have to save trained network and differences between predicted and actual pixel, which may provide certain level of compression. How it works you can see in the article. Another idea that is also considered in the article is the replacement of basic brick of neural network a single neuron by discrete Urysohn operator. It appeared that this replacement works same or better than original neuron, which destroys the beautiful theory of modeling the brain activity with neural networks because discrete Urysohn operator have nothing in common with biological neuron and represents simply powerful non-linear parametric model that is very easy to identify numerically and on that reason training needs seconds instead of hours. The simple predictor that is hard to beatIn the beginning of 90th the median edge detector was introduced and successfully applied to image compression. It makes logical prediction of current point P based on left L, top T and diagonal D points surrounding the predicted.
The logic is defined by formula
The difference between predicted and actual value is saved and subjected to arithmetic encoding. In decoding it is used in the same way. The value is predicted and added to preserved difference. In spite of all efforts the progress in image compression did not move very much further. The best known algorithms for the moment achieve compression about 15 percent better than MED. Some algorithms, however, achieve compression up to 30 percent better but for such unreasonable time, that makes its usage impossible. The time could be as long as 40 seconds per 500 * 500 picture. Amazingly MED performs compression for difference between colors, that means it can be applied with success to difference between Red and Green and between Blue and Green. Typically the entropy of a difference after applying MED is 2.5 bits/pixel and when MED is applied to Green it makes entropy of result near 5 bits/pixel. Usual ratio is 10 bits per 24 bits that defines lossless compression of high quality photographic picture between 2 and 3 times. Experiment with ideal memory modelIn experiments with neural network I tried to replace MED by parametric model following the usual logic of artificial intelligence, which tells that the program can be trained and replace an expert. In this case the expert is MED. If we presume that we do not know it but know that every point P somehow depends on L, T, D we apply mechanism of NN to find parametric model in the blind. Before we do it we may wonder about the limits. What if we have ideal memory model, which compression ratio we can achieve? As a model I chose simple array of 16,777,216 elements, which is capable to remember every possible 3 bytes combination. The predicted byte P was passed to an array at the address that was a combination of L|T|D. On the second run I used this filled array for prediction. Due to repetition of L|T|D combinations the prediction was not 100 percent accurate. Obviously, MED can not be better this ideal model and so the other static mechanisms of memorizing the bytes combinations. The result can be seen in the output of simulation program
The upper number 7.22 is entropy of original data (Green color only). The next number 3.94 is entropy after applying MED. The fourth from the top 3.39 is result obtained for ideal memory mechanism. This is what effective parametric model can hopefully achieve. Obviously we can not use this memory model for anything other than experimenting because it needs 16,777,216 bytes to save the data. The other two numbers in the shown output are results for neural network and Urysohn model that will be considered further. Neural NetworkThe foundation brick of NN is single neuron that can be explained as combination of linear and non-linear elements![]() The linear part is linear combination with weight coefficients. It is passed to non-linear function (that can be optional). The neurons are connected to a traditional network as shown below ![]() Every circle in the picture is a single neuron that has the same structure. The training is identification of 30 weight coefficients using image data. In the training process the classical back propagation technique was used. The matter of this technique is passing the portion of the error back to internal elements for adjustment of every weight coefficient on every step. There is no need to repeat here a concept of back propagation, I only can say that this technique is highly debatable and not necessarily the best possible, but widely accepted. In order to avoid writing the codes for NN I took already available program from here, which appeared to have convenient interface and happened to be good enough for experimenting. First experiment showed that even long enough training did not provide reduction in entropy close to MED. However, in adaptive training combined with prediction the technology appeared to be effective. In experiment NN is used in prediction step first and than in upgrading the model in the same way it supposed to be used in decompression of the image: prediction, reading the difference between predicted and actual value, using new value for upgrading model, next prediction and so on (similar to MED). The result is at satisfied level. It showed that NN actually works as desired. It reduced entropy to 4.43 bits/pixel. I can point out that NN is used in case when we have no idea about nature of the object but can identify inputs that make principal contribution to the outcome. Discrete Urysohn operatorThe generic identification technique for discrete urysohn operator can be found in my other article. I called it non-demanded idea because it is known for at least 30 years but not used by engineers and not familiar to many researchers. The result of identification is matrix of elements U[256][2] that can be used using L and T bytes as addresses. (I found that diagonal elements D make no significant contribution with this type of model and can be ignored). For example, if for particular point we have L=124 and T=38 we compute predicted value as U[124][0] + U[38][1]. In the identification process we have matrix U originally unknown but we know which combinations of matrix elements must match predicted points. That is enough to create an optimized matrix U[256][2] in a fraction of a second, that means much faster than usage of NN. The discrete Urysohn operator is used as a stationary model that means it is not adapted at run time. It is identified once and has to be saved along with compressed data. For all images that I tried this parametric model was better than NN and always very close to MED. In the shown result it reduce entropy to 3.91 while NN reduced it to 4.43.Discrete Urysohn networks (DUN)Warning: according to US Patent Law office accepts applications during one year after first publication of the patentable concept.Similar to NN discrete urysohn operators can be combined into networks. The ways of back propagation of the error is subject to future research. I can suggest one as jumping to a quick solution in order to show that such technique exists and is not very complicated. Presume we identified operator using original Left and Top values and have model shown below ![]() After identification step is completed we correct input arrays L and T to provide best possible match of Prediction with real value. After correction is applied we have slightly different inputs Lcorrected and Tcorrected, which can be used in the next level of identification as shown in the next picture ![]() After correction of the signals we create two new operators that model the corrected signals using original inputs. Needless to say that this procedure can be repeated multiple times iteratively until satisfied level is achieved. While compression may take noticeable time the decompression is straight forward usage of look up tables that are saved operators. |