Statistical theory for deep ReLU networks - Johannes Schmidt-Hieber
From Scott Jacobson on October 19th, 2018
We derive risk bounds for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to logarithmic factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. We also discuss some theoretical results that compare the performance to other methods such as wavelets and spline-type methods. This is joint work with K. Eckle (Leiden).