While I was seeking informations about brain wave sensors, I've been led to wonder how these waves could be interpreted. I therefore built a prototype of an Artificial Neural Network and tried to let it learn. This article explains the steps I followed and the results I got.
20/10/2012

Let's imagine the following exercise

Please imagine that, using some magics (emotiv.com, neurosky.com, interaxon.ca, ...), I could read my brain and get 8 values in the range [0..32767] :

  • Delta : 32546
  • Theta : 31020
  • Low Alpha : 15074
  • High Alpha : 4932
  • Low Beta : 5102
  • High Beta : 2563
  • Low Gamma : 876
  • High Gamma : 1053

Because I fluently know how to read my brain, I know that this spectrum of waves belongs to a class of spectra that means "I want to close the window".

A little later, I read my brain again and get another set of wave values, that falls in the category of spectra meaning "Next page please".

The exercise is : elaborate the fundamentals of a software that gets wave inputs and classifies them into actions.

Neural networks

I'm far of being an expert in neural networks. At the beginning of this study, I just had heard that they could be used in classification. Well, let's see..

What is needed

Some minimal bibliography

First of all, you should have already read some general bibliography about Artificial Neural Networks (ANNs). You should approximately know that a ANN is composed of layers of neurons that are able to learn. As we will see, the structure of the layers interacts with the ANN behavior :

  • number of neurons (input layer = number of input values = 8 in this example ; output layer = number of categories + 1 = 5 for example ; hidden layer = number of neurons to be defined...)
  • activation function of each layer. The activation function sort of defines the way input values are transformed. Common activation functions are "linear", "sine", "sigmoid" ; I also heard about "softmax" and some others.

Once the structure of the ANN is defined, the network is assigned a "learning rate" in the range ]0..1]. This is a sort of "step size" used while the network propagates and backpropagates values between neurons. The lowest this value is, the more precise the learning will be, the more time it will need, and the more important is the risk of falling into a local minimum. I'll explain that in a while. The highest this value is, and the quicker the learning will be.

What is a "local minimum" ? In fact, you can consider the ANN as a black box that, during training, tries to *minimize* the global error between its outputs and the desired output values provided by the training set. This global error is calculated by a Meaned Squared Error (MSE). The lower the MSE is, the best the network has learned the training set. But maybe you know that, when one looks for a minimum, one can fall into a local minimum ; if the "step" (here, the learning rate) is not big enough, the local minimum is considered as the definite minimum, although it is not actually the true minimum. This is the reason why the learning rate can be adjusted.

Last important element : the number of "training epochs". What is an "epoch" ? You know that the network will propagate values among its neurons, calculate the MSE, backpropagate among neurons, and so on. An epoch is one cycle of backpropagation. The more epochs you assign to your training process, the more time the network will have for finding a minimal MSE. However, the highest "trainingEpochs" is, the slower the training is.

A Neural Network Engine

Maybe you've already your own NN engine. You're probably not reading these lines. Maybe you have MatLab. It seems as if there were ANN modules in MatLab, I don't know anymore about that. If you don't, where to find a NN Engine ?

The only one I found is a dead opensource project in C# named NeuronDotNet. Most of the links about NeuronDotNet lead to invalid urls on the freehostia domain. I just found this link : sourceforge.net/projects/neurondotnet/

Quick start with NeuronDotNet

The main steps for setting up a NN are the following.

Build the network

Something like that :

  1. ActivationLayer inputLayer = new SigmoidLayer(this.nbWaves);
  2. ActivationLayer hiddenLayer = new SigmoidLayer(16);
  3. ActivationLayer outputLayer = new LinearLayer(this.nbCateg);
  4. new BackpropagationConnector(inputLayer, hiddenLayer);
  5. new BackpropagationConnector(hiddenLayer, outputLayer);
  6. BackpropagationNetwork network = new BackpropagationNetwork(inputLayer, outputLayer);
where :
  • this.nbWaves = the number of input neurons = the number of inputs
  • 16 = the number of neurons in the hidden layer
  • this.nbCateg = the number of output neurons = the number of categories

Build a training set

Something like that :

  1. // The whole training set
  2. TrainingSet trainingSet = new TrainingSet(this.nbWaves, this.nbCateg);
  3. // One training sample
  4. double[] waves = new double[this.nbWaves];
  5. double[] outputs = new double[this.nbCateg];
  6. waves[0] = 32546;
  7. waves[1] = 31020;
  8. waves[2] = 15074;
  9. waves[3] = 4932;
  10. waves[4] = 5102;
  11. waves[5] = 2563;
  12. waves[6] = 876;
  13. waves[7] = 1053;
  14. outputs[0] = 0;
  15. outputs[1] = 1;
  16. outputs[2] = 0;
  17. outputs[3] = 0;
  18. outputs[4] = 0;
  19. trainingSet.Add(new TrainingSample(waves, outputs));
  • Note 1 : of course, you won't input the input values manually. Either you write a small algorithm that populates inputs with random values in ranges depending of the desired output for this sample, or you already have actual input values to classify !
  • Note 2 : the output index 0 (zero) should have no corresponding training sample. As we will see further, this output will be dedicated to the "unclassified" answers of the network.
  • Note 3 : I made my experiments with 100 training samples for each output category (therefore with training sets of 400 samples).

Train the network

Core of the training

  1. network.SetLearningRate(this.learningRate);
  2. network.Initialize();
  3. network.Learn(trainingSet, this.nbTrainingEpochs);
  4. network.StopLearning();
  5. trainingMSE = network.MeanSquaredError;

Best number of training epochs

Depending on a lot of factors, the above code snippet car return a trainingMSE value of 0.0001 (10^^-5) as well as 10^^-1 or, in fact, any other magnitude. Of course, the lower the error is, the best the network have learnt. You might want to find the minimum number of epochs that provide the best MSE, by incrementing nbTrainingEpochs values. Something like that :
  1. int nbTrainingEpochs = 100; // starting value
  2. int incTrainingEpochs = 50; // increment while meanSquareError not enough
  3. int maxTrainingEpochs = 400;
  4. bool correct = false;
  5. bool error = false;
  6. BackpropagationNetwork bestNetwork = null;
  7. double bestMeanSquaredError = -99.99;
  8. while (correct == false && error == false && nbTrainingEpochs <= maxTrainingEpochs)
  9. {
  10.     Application.DoEvents();
  11.     BackpropagationNetwork network = new BackpropagationNetwork(inputLayer, outputLayer);
  12.     network.SetLearningRate(this.learningRate);
  13.     int count = trainingSet.TrainingSampleCount; // the count might have changed inside the loop
  14.     network.Initialize();
  15.     network.Learn(trainingSet, nbTrainingEpochs);
  16.     network.StopLearning();
  17.     double meanSquaredError = network.MeanSquaredError;
  18.     if ((meanSquaredError <bestMeanSquaredError) || (bestMeanSquaredError == -99.99))
  19.     {
  20.         bestMeanSquaredError = meanSquaredError;
  21.         bestNetwork = network;
  22.     }
  23.     correct = (meanSquaredError <= 0.00005);
  24.     error = (meanSquaredError> 10000); // error case
  25.     lstTrain.Items.Add(nbTrainingEpochs.ToString() + " epochs, Error = " + network.MeanSquaredError.ToString() + " " + correct.ToString());
  26.     if (!correct) nbTrainingEpochs += incTrainingEpochs;
  27. }
  28. if (error)
  29. {
  30.     lstTrain.Items.Add("Abandon");
  31.     this.network = null;
  32.     btnTest.Visible = false;
  33.     return null;
  34. }
  35. btnTest.Visible = true;
  36. lstTrain.Items.Add("bestMeanSquaredError = " + bestNetwork.MeanSquaredError.ToString());
  37. this.network = bestNetwork;

Well, let's imagine that you have generated a trainingSet, defined the structure of your network, selected a learning rate and have run this code : if this.network is not null, you now have a trained network and you know, using its MSE, if it is supposed to be "good enough".

You might want to repeat your trials, changing the layers activation function (classes SigmoidLayer, LinearLayer, SineLayer, LogarithmLayer, TanhLayer), the hidden number of neurons, and selecting different learning rates until you get a satisfying MSE. I tried to implement a SoftMax layer (en.wikipedia.org/wiki/Softmax_activation_function), but it seems as if it has not derivative function, and therefore can't be used for backpropagation.

Run the network on the training set

A good way of knowing whether our network is "good" or not consists in "running" it on the training set input values, and compare its output values with those expected in the training set. We will run the network against each training sample "input vector" :

  1. int errCount = 0;
  2. lstLog.Items.Add(errCount.ToString() + " erreur");
  3. for (int sampleIndex = 0; sampleIndex <trainingSet.TrainingSampleCount; sampleIndex++)
  4. {
  5.     double[] output = bestNetwork.Run(trainingSet[sampleIndex].InputVector);
  6.     bool isCorrect = isCorrectAnswer(trainingSet[sampleIndex].OutputVector, output);
  7.     if (!isCorrect || true) lstLog.Items.Add(dump(trainingSet[sampleIndex].InputVector) + " : " + dump(trainingSet[sampleIndex].OutputVector) + " = " + dump(output));
  8.     if (!isCorrect) errCount++;
  9. }
  10. lstLog.Items[0] = errCount.ToString() + " err/" + trainingSet.TrainingSampleCount.ToString();
Notes :
  • dump(double[] values) function simply returns a string of the array of doubles.
  • the isCorrectAnswer(double[], double[]) is discussed below.

Here, you should get 0 error. By toggling "|| true" to "|| false", you can select displaying all the results or only the wrong ones in the UI listboxes.

If you get errors, you can assert that your network is not efficient : you should change one of the parameters described above.

Run the network

Now comes the time to apply the network to actual, or pseudo-actual data which the network has not learnt. If you have actual data, which has already been manually categorized, you'll use it to verify the network behavior. If you haven't, I suggest you generate a new trainingSet as above, but adding samples that do not belong to any category, with expected output 0 (zero).

Then, simply run the network on this "pseudo-actual samples set" and, with a code similar to above, count and display errors.

If your network is efficient, every output should correspond to the expected category, and the unclassifiable samples should output to category 0.

Determining the "signal classification" from the network output

I used above a function isCorrectAnswer(double[] expectedOutput, double[] networkOutput) that compares the expected output to the actual output of the network. Basically, this function is quite simple :

  1. return getClassification(expectedOutput) == getClassification(networkOutput).

The getClassification(double[] output) function is straight when applied to a training sample output, which comes from the trainingSet and therefore is valued with zeroes, except the expected output which is valued with 1 : simply return the index of the "1" value.

But when applied to the network output, zeroes might be "0 +/- 10^^-1" and ones might be "1 +/- 10^^-1". You first have to round the output values. Moreover, if you submit the network an input signal that does not correspond to any learnt categories, the output will look strange :

  • the sum of the output values might not be "approximately 1"
  • you might get negative, significative values
  • you might get more than one "most significative" values

One can't just apply a simple rule such as "return the index of the maximum output value", or "return the index of the output value 1". One has to apply some more logics (by summing, comparing max values, ...) to reliably return either a category index, or 0 (meaning "unclassified") for messy outputs.

Concrete conclusions

By trying and trying again different network structures, learning rates, input and output ranges, I found a particularly efficient network organization :

Input values in the range [-32000..32000]
Expected output values are either 0 or 1 (of course, 1 for the expected category of the training sample, 0 for the other outputs)
Learning rate = 0.01
Structure of the network :

  • Input layer : Sigmoid (8 neurons, corresponding to my 8 input values)
  • Hidden layer : Sigmoid (8 neurons)
  • Output layer : Linear (5 neurons, corresponding to my 4 classes of waves and the "unclassifiable sample" output)

With this structure, the networks learns 400 samples and results to a MSE between 10^^-5 and 10^^-30 with only 100 training epochs ! And of course, when running the network, it makes 0 classification error.

Starting from this structure, I played a little with the different parameters.

  • Surprisingly, setting the learning rate to 0.1 instead of 0.01 returns a lower MSE (constantly less than 10^^-25)
  • Increasing the number of hidden neurons (16, for example) returns a higher MSE (10^^-5 .. 10^^-7) and increases the learning duration.
  • Decreasing the number of hidden neurons (4, for example) also returns a higher MSE (10^^-5 .. 10^^-7)

Article conclusion

Please remember that I only knew the words "Neural network" before starting this study, without any particular knowledge about this concept. I decided to have a concrete, pragmatic approach rather than understanding the whole theoretical principles. Therefore, my experiments may have forgotten some parameters (e.g. I heard about "momentum"), my conclusions may be side effects of my own implementation or NeuronDotNet's one, and I do not attempt to scientifically prove anything...


 

Jean-Christophe BURNEAU
par Jean-Christophe BURNEAU

Consultant indépendant en informatique, spécialisé en Conception Logicielle, au service des startups, TPE/PME sans service informatique ou des DSI se rapprochant de leurs utilisateurs.

À propos de...

Consultant indépendant en informatique, spécialisé en Conception Logicielle, au service des startups, TPE/PME sans service informatique ou des DSI se rapprochant de leurs utilisateurs.

Siège social

SAS Computences
4bis rue Piroux
F-54048 NANCY CEDEX
+33 6 08 90 44 71