Csaba Szepesvari - Compressed computation of good policies in large MDPs
From Katie Gentilello on March 11th, 2021
Markov decision processes (MDPs) is a framework that captures keys aspect of many real-world decision making problems with the few assumptions. Unfortunately, as a result MDPs lack structure and as such planning and learning in large MDPs is strongly intractable and the only hope is that adding additional structure can help. One such possibility is to study MDPs where the optimal value function can be well approximated by using only a few basis functions. As early as in the advent of the field in the 1950s, it was recognized that many problems of practical interest have this structure. As knowing the optimal value function is sufficient for optimal behavior, one hopes that this compressibility of the optimal value function will lead to efficient algorithms. While our field has witnessed a healthy explosion of algorithmic ideas, until recently not much has been known about whether these compressed computations are possible and when. In this talk, I will discuss a few recent results (some positive, some negative) concerned with compressed computations and conclude with some open problems. As we shall see, even though progress during the last year have tremendously accelerated, still today, there are more questions left open than questions that have been satisfactorily answered.