On the Complexity of Training a Neural Network - Santosh Vempla

views comments

The empirical successes of deep learning currently lack rigorous explanation and the effectiveness of gradient descent in particular is still mostly a mystery. We focus on training neural networks with a single hidden layer of unbiased activation units and find a (surprisingly?) general polynomial-time analysis of gradient descent, exploiting new connections with tools from spherical harmonics. On the other hand, when the target function is generated by a network using arbitrary biases, the problem is intractable. We construct a family of simple neural networks with a single hidden layer, a smooth activation function and benign input distributions that is hard to learn in the sense that any statistical query algorithm (including all known variants of stochastic gradient descent with any loss function, for any model architecture) needs an exponential number of queries. We will discuss a few open problems.

Joint work with John Wilmes (both parts), Le Song and Bo Xie (the lower bound).

Tags

On the Complexity of Training a Neural Network - Santosh Vempla

Related Media