(MLP)
MLP User Manual download
A Neural Network (NN) is usually structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected and the various types and architectures are identified both by the different topologies adopted for the connections and by the choice of the activation function. Such networks are generally called Multi Layer Perceptron (MLP; Bishop 1995) when the activation functions are sigmoidal or linear.
The MLP network, one of most popular NN, is suited to a wide range of application, such as pattern recognition, prediction, process modelling, etc. An MLP network comprises a number of identical units organized in layers, with those on one layer connected to those on the next layer (except for the last layer or output layer).
The output of the j-th hidden unit is obtained first by forming a weighted linear combination of the d input values, and then by adding a bias to give:
where
d
is the number of the input,
denotes a weight in the
first layer
(from input i
to hidden unit j).
Note that
denotes the bias
for the hidden unit j,
and f
is an activation function such as the continuous sigmoidal function:

The outputs of the network are obtained by transforming the activation of the hidden units using a second layer of processing elements.

Where
M
is the number of hidden unit,
denotes a
weight in the first layer
(from hidden unit j
to output unit k.
Note that
denotes the
bias for the output unit k,
and g
is an activation function of the output units which does not need to
be the same function as for the hidden units. MLP are tipically
trained using a supervised training algorithm known as 'back
propagation' which works as follows: we give to the input neurons the
first pattern and then the net produces an output. If this is not
equal to the desired output, the difference (error) between these two
values is computed and the weights are changed in order to minimize
it.
These
operations are repeated for each input pattern until the mean square
error of the system is minimized. Given the p-th
pattern in input, a classical error function
(called
sum-of-squares) is:

is the p-th
desired output value and
is the output of the
corresponding neuron.
Due to its interpolation capabilities, the MLP is one of the most
widely used neural architectures.The MLP can be trained also using probabilistic techniques such as the Bayesian learning framework that offers several advantages over classical ones (Bishop 1995):
-
it cannot overfit the data;
-
it is automatically regularized;
-
the uncertainty in the prediction can be estimated.
A few additional points are worth to be stressed.
The universal approximation theorem (Haykin 1999) states that the two layers architecture is capable of universal approximation and a considerable number of papers have appeared in the literature discussing this property (cf. Bishop 1995). An important corollary of this result is that, in the context of a classification problem, networks with sigmoidal non-linearities and two layer of weights can approximate any decision boundary to arbitrary accuracy. Thus, such networks also provide universal non-linear discriminant functions. More generally, the capability of such networks to approximate general smooth functions allows them to model posterior probabilities of class membership. Since two layers of weights suffice to implement any arbitrary function, one would need special problem conditions (Duda 2001) or requirements to recommend the use of more than two layers. Furthermore, it is found empirically that networks with multiple hidden layers are more prone to getting caught in undesirable local minima. Astronomical data do not seem to require such level of complexity and therefore it is enough to use just a double weights layer, i.e a single hidden layer.
MLP User Manual download