Towards Theoretical Understanding of Overparametrization in Deep Learning - Jason D Lee
From Scott Jacobson
views
comments
From Scott Jacobson
Next, we analyze the implicit regularization effects of various optimization algorithms. In particular we prove that for least squares with mirror descent, the algorithm converges to the closest solution in terms of the bregman divergence. For linearly separable classification problems, we prove that the steepest descent with respect to a norm solves SVM with respect to the same norm. For over-parametrized non-convex problems such as matrix sensing or neural net with quadratic activation, we prove that gradient descent converges to the minimum nuclear norm solution, which allows for both meaningful optimization and generalization guarantees.
This is joint work with Suriya Gunasekar, Mor Shpigel, Daniel Soudry, Nati Srebro, and Simon Du.