Bibliography¶
- AAGZ17
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, and Assaf Zeevi. Thompson sampling for the mnl-bandit. In COLT. 2017.
- AAGZ19
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, and Assaf Zeevi. Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5):1453–1485, 2019.
- AG17
Shipra Agrawal and Navin Goyal. Near-optimal regret bounds for thompson sampling. Journal of the ACM, 64(5):1–24, 2017.
- AB09
Jean-Yves Audibert and Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. In COLT. 2009.
- AB10
Jean-Yves Audibert and Sébastien Bubeck. Best arm identification in multi-armed bandits. In COLT. 2010.
- AMSzepesvari09
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
- ACBF02
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235–256, 2002.
- ACesaBianchiFS02
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48–77, 2002.
- JMNB14
Kevin Jamieson, Matthew Malloy, Robert Nowak, and Sébastien Bubeck. Lil’ucb: an optimal exploration algorithm for multi-armed bandits. In COLT. 2014.
- KKS13
Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In ICML. 2013.
- LGC16
Andrea Locatelli, Maurilio Gutzeit, and Alexandra Carpentier. An optimal algorithm for the thresholding bandit problem. In ICML, volume 48, 1690–1698. 2016.
- TZZ19
Chao Tao, Qin Zhang, and Yuan Zhou. Collaborative learning with limited interaction: tight bounds for distributed exploration in multi-armed bandits. In FOCS. 2019.