Abstract:
Second order optimization methods have always been less widely used for training neural networks than first order methods such as Stochastic Gradient Descent. This is mainly due to the complexity and high costs in terms of both processor and memory resources of second order methods. In recent years more work has been done to adapt these methods to make them more suitable for training neural networks. In this paper we demonstrate how trust region methods can be used to improve the convergence and cost-effectiveness of second order optimization. This is achieved by only using cheap first order information when it is an appropriate approximation for the expensive second order information, based on the relative size of the trust region. We also present techniques to automatically tune the hyperparameters these methods introduce; including a novel approach to adaptive regularization. These methods are demonstrated on autoencoders and image classifiers in comparison to first order methods.