Understanding Best Subset Selection
Sparsity plays a key role in linear statistical modeling and beyond. In
this talk I will discuss the best subset selection problem, a central
problem in statistics, wherein the task is to select a set of k relevant
features from among p variables, given n samples. I will discuss recent
computational techniques relying on integer optimization and first
order optimization methods, that enable us to obtain high-quality,
near-optimal solutions for best-subsets regression, for sizes well
beyond what was considered possible. This sheds interesting new
insights into the statistical behavior of subset selection problems
vis-a-vis popular, computationally friendlier methods like L1
regularization -- thereby motivating the design of new statistical
estimators with better statistical and computational properties. If
time permits, I will also discuss another closely related, extremely
effective, but relatively less understood sparse regularization scheme:
the forward stage-wise regression (aka Boosting) in linear models.
- Tags
-