Welcome to the homepage of our reading group for reinforcement learning.
Host: Chao Tao
Time: bi-weekly on 1:00pm - 3:00pm
Place: Luddy Hall 3069
Markov Decision Process (MDP); Value Functions; Bellman Equations; State Occupancy; Q Learning;
Tabular Episodic MDP; Model-based; Optimistic Algorithm; Frequentist Regret; $\widetilde{\mathcal{O}}(H\sqrt{SAT})$;
2/14, Chao Tao on Is Q-learning Provably Efficient? (notes)
Tabular Episodic MDP; Model-free; Optimistic Algorithm; Frequentist Regret; $\widetilde{\mathcal{O}}(H^2\sqrt{SAT})$;
Tabular Episodic MDP; Thompson Sampling; Bayesian Regret; $\widetilde{\mathcal{O}}(HS\sqrt{AT})$;
Tabular Infinite Undiscounted MDP; Weakly Communicating; Thompson Sampling; Bayesian Regret; $\widetilde{\mathcal{O}}(D'S\sqrt{AT})$;
5/1, Nikolai Karpov on Concurrent PAC RL