Content




Bank churn

Code download link: bankchurn.zip



This example shows application of Kolmogorov-Arnold model to Bank Churn Data. The goal is to predict leaving clients depending on some basic demographic and credit parameters. The data can be found in multiple locations. We used data set published on Neural Designer site. It has 10 000 records with 10 features. First column is customer id. The outputs (last column) are labels marked as integers 0 or 1. Below is example of data format (first few lines):
15634602;619;France;Female;42;2;0;1;1;1;101348.88;1
15647311;608;Spain;Female;41;1;83807.86;1;0;1;112542.58;0
15619304;502;France;Female;42;8;159660.8;3;1;0;113931.57;1
15701354;699;France;Female;39;1;0;2;0;0;93826.63;0
It is rather a classification problem than regression, but when targets have only two classes, it can be solved as regression. The classes are assigned real numbers -1.0, +1.0 and predicted outputs can be treated as probabilty with 50% when equals 0.0. We repeat, it only works with two classes. When number of classes more than two, the problem needs completely different approach. Here is example of program print out:



At first, we trained and tested linear model on entire data set. This model can't be qualified as AI, since it has to be tested on unseen data, but it gives some idea of data quality.

Next model was a single Urysohn. It was also trained and tested on entire data set. The result also not qualified, but it indicates that nonlinearity is present. The reason for this test is to compare result to linear model. Sometimes errors are so significant that nonlinear properties simply can't be detected from data. This comparison shows that nonlinear model should be applied.

The last is Kolmogoro-Arnold 10 fold validation. The accuracy is estimated on unseen data. This result can be compared to Neural Designer benchmark, which can be found on their help page. Their reported accuracy is 78.8%.