Content




Mushroom classification

Code download link: M-class.zip



This example shows application of quantized Urysohn to Mushroom Data Set. Data has 8124 records with 22 observed mushrooom features. The output labels are either Edible or Poisonous. Below is example of data format (first few lines):
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
The output labels are sitting in the first column. Symbolic features were converted to sequential integers 1,2,3,... and Urysohn model became quantized $$y=\sum\limits_{j=1}^{n}f_j(x_j),$$ which means that arguments $x_j$ take only integer values, functions $f_j$ are defined only in several points and a single two dimmensional array $U$ is used in code instead of multiple quantized functions $$y=\sum\limits_{j=1}^{n}U[j, x(j)].$$ The test is conducted as 10 fold validation, which means we use 90% records for training and 10% for validation and switching them 10 times, so each record becomes once validation record. Total execution for 10 retraining steps is about 0.6 second, so 0.06 per training and 0.0006 per epoch. The number of errors in validation typically 0, but it depends on random initialization and sometimes few validation errors may be reported, such as 3 wrong predictions out of 8124. The error probability is so small that the authors of code agree to eat every mushroom that program qualifies as edible.

The data set is published in 1987 and is frequently used by students. The reported accuracy is always near 100%, so it is not a surprise, but the peformance of quantized Urysohn model is significanly higher than all other examples that we managed to find online.

Other implementation for comparison C#Corner. They report accuracy 0.992 that means 64 errors.