Abstract
Backward error propagation is a widely used procedure for computing the gradient of the error for a feed-forward network and thus allows the error to be minimized (learning). Simple gradient descent is ineffective unless the step size used is very small and it is then unacceptably slow. Conjugate gradient methods are now increasingly used as they allow second-derivative information to be used, thus improving learning. Two different implementations are described; one using an exact line search to find the minimum of the error along the current search direction, the other avoids the line search by controlling the positive indefiniteness of the Hessian matrix. The two implementations are compared and evaluated in the context of an image recognition problem using input bit-maps with a resolution of 128 by 128 pixels.