266
Views
0
CrossRef citations to date
0
Altmetric
Articles

TSEC: A Framework for Online Experimentation under Experimental Constraints

, , &
Pages 513-523 | Received 16 Jan 2021, Accepted 26 Aug 2022, Published online: 08 Nov 2022

References

  • Agrawal, S., and Goyal, N. (2012), “Analysis of Thompson Sampling for the Multi-Armed Bandit Problem,” in Proceedings of the 25th Annual Conference on Learning Theory, eds. S. Mannor, N. Srebro, and R. C. Williamson, pp. 39.1–39.26.
  • Albert, J. H., and Chib, S. (1993), “Bayesian Analysis of Binary and Polychotomous Response Data,” Journal of the American Statistical Association, 88, 669–679. DOI: 10.1080/01621459.1993.10476321.
  • Ash, T., Ginty, M., and Page, R. (2012), Landing Page Optimization: The Definitive Guide to Testing and Tuning for Conversions (2nd ed.), Alameda, CA: SYBEX Inc.
  • Audibert, J.-Y., and Bubeck, S. (2010), “Best Arm Identification in Multi-Armed Bandits,” in Proceedings of the 23rd Annual Conference on Learning Theory (COLT).
  • Auer, P. (2002), “Using Confidence Bounds for Exploitation-Exploration Trade-Offs,” Journal of Machine Learning Research, 3, 397–422.
  • Black, F., and Litterman, R. (1992), “Global Portfolio Optimization,” Financial Analysts Journal, 48, 28–43. DOI: 10.2469/faj.v48.n5.28.
  • Brandt, M. W. (2010), “Portfolio Choice Problems,” in Handbook of Financial Econometrics: Tools and Techniques, pp. 269–336, Amsterdam: Elsevier.
  • Bubeck, S., and Cesa-Bianchi, N. (2012), “Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems,” arXiv preprint arXiv:1204.5721.
  • Burtini, G., Loeppky, J., and Lawrence, R. (2015), “A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit,” arXiv e-prints, arXiv:1510.00757.
  • Chapelle, O., and Li, L. (2011), “An Empirical Evaluation of Thompson Sampling,” in Advances in Neural Information Processing Systems, pp. 2249–2257.
  • Chipman, H., Hamada, M., and Wu, C. F. J. (1997), “A Bayesian Variable-Selection Approach for Analyzing Designed Experiments with Complex Aliasing,” Technometrics, 39, 372–381. DOI: 10.1080/00401706.1997.10485156.
  • Combes, R., Magureanu, S., and Proutiere, A. (2017), “Minimal Exploration in Structured Stochastic Bandits,” arXiv preprint arXiv:1711.00400.
  • Ding, W., Qin, T., Zhang, X.-D., and Liu, T.-Y. (2013), “Multi-Armed Bandit with Budget Constraint and Variable Costs,” in Twenty-Seventh AAAI Conference on Artificial Intelligence. DOI: 10.1609/aaai.v27i1.8637.
  • Djolonga, J., Krause, A., and Cevher, V. (2013), “High-Dimensional Gaussian Process Bandits,” in Advances in Neural Information Processing Systems (Vol. 26).
  • Domingo, C., Gavaldà, R., and Watanabe, O. (2002), “Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms,” Data Mining and Knowledge Discovery, 6, 131–152. DOI: 10.1023/A:1014091514039.
  • Even-Dar, E., Mannor, S., and Mansour, Y. (2006), “Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems,” Journal of Machine Learning Research, 7, 1079–1105.
  • Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2013), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC.
  • Gelman, A., and Rubin, D. B. (1992), “Inference from Iterative Simulation using Multiple Sequences,” Statistical Science, 7, 457–472. DOI: 10.1214/ss/1177011136.
  • Gittins, J., Glazebrook, K., and Weber, R. (2011), Multi-Armed Bandit Allocation Indices, New York: Wiley.
  • Gupta, S., Chaudhari, S., Joshi, G., and Yağan, O. (2019), “Multi-Armed Bandits with Correlated Arms,” arXiv preprint arXiv:1911.03959.
  • Gupta, S., Chaudhari, S., Mukherjee, S., Joshi, G., and Yağan, O. (2020), “A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting,” IEEE Journal on Selected Areas in Information Theory, 1, 840–853. DOI: 10.1109/JSAIT.2020.3041246.
  • Hedayat, A. S., Sloane, N. J. A., and Stufken, J. (2012), Orthogonal Arrays: Theory and Applications, New York: Springer.
  • Ishwaran, H., and Rao, J. S. (2005), “Spike and Slab Variable Selection: Frequentist and Bayesian Strategies,” The Annals of Statistics, 33, 730–773. DOI: 10.1214/009053604000001147.
  • Jiang, H., Li, J., and Qiao, M. (2017), “Practical Algorithms for Best-k Identification in Multi-Armed Bandits,” arXiv preprint arXiv:1705.06894.
  • Joseph, V. R. (2006), “A Bayesian Approach to the Design and Analysis of Fractionated Experiments,” Technometrics, 48, 219–229. DOI: 10.1198/004017005000000652.
  • Katehakis, M. N., and Robbins, H. (1995), “Sequential Choice from Several Populations,” Proceedings of the National Academy of Sciences, 92, 8584–8585. DOI: 10.1073/pnas.92.19.8584.
  • Kearns, M., and Singh, S. (2002), “Near-Optimal Reinforcement Learning in Polynomial Time,” Machine Learning, 49, 209–232. DOI: 10.1023/A:1017984413808.
  • Krause, A., and Ong, C. (2011), “Contextual Gaussian Process Bandit Optimization,” in Advances in Neural Information Processing Systems (Vol. 24).
  • Lai, T., and Robbins, H. (1985), “Asymptotically Efficient Adaptive Allocation Rules,” Advances in Applied Mathematics, 6, 4–22. DOI: 10.1016/0196-8858(85)90002-8.
  • Lattimore, T., and Munos, R. (2014), “Bounded Regret for Finite-Armed Structured Bandits,” in Advances in Neural Information Processing Systems (Vol. 27), pp. 550–558.
  • Li, X., Sudarsanam, N., and Frey, D. D. (2006), “Regularities in Data from Factorial Experiments,” Complexity, 11, 32–45. DOI: 10.1002/cplx.20123.
  • Mak, S., and Wu, C. F. J. (2017), “Analysis-of-Marginal-Tail-Means (ATM): A Robust Method for Discrete Black-Box Optimization,” Technometrics, 61, 545–559. DOI: 10.1080/00401706.2019.1593246.
  • Markowitz, H. M. (1952), “Portfolio Selection,” The Journal of Finance, 7, 77–91.
  • Martin, A. D., Quinn, K. M., and Park, J. H. (2011), “MCMCpack: Markov Chain Monte Carlo in R,” Journal of Statistical Software, 42, 1–21 DOI: 10.18637/jss.v042.i09.
  • McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models, London: Routledge.
  • Mukerjee, R., and Wu, C. F. J. (2007), A Modern Theory of Factorial Design, New York: Springer.
  • Pandey, S., Chakrabarti, D., and Agarwal, D. (2007), “Multi-Armed Bandit Problems with Dependent Arms,” in Proceedings of the 24th International Conference on Machine Learning, pp. 721–728. DOI: 10.1145/1273496.1273587.
  • Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., and Wen, Z. (2018), “A Tutorial on Thompson Sampling,” Foundations and Trends in Machine Learning, 11, 1–96. DOI: 10.1561/2200000070.
  • Ryan, J. A., and Ulrich, J. M. (2008), “quantmod: Quantitative Financial Modelling Framework,” R package version 0.3-5. http://www.quantmod.com; http://r-forge.r-project.org/projects/quantmod.
  • Scott, S. L. (2010), “A Modern Bayesian Look at the Multi-Armed Bandit,” Applied Stochastic Models in Business and Industry, 26, 639–658. DOI: 10.1002/asmb.874.
  • Sharpe, W. F. (1994), “The Sharpe Ratio,” Journal of Portfolio Management, 21, 49–58. DOI: 10.3905/jpm.1994.409501.
  • Shen, W., and Wang, J. (2016), “Portfolio Blending via Thompson Sampling,” in International Joint Conference on Artificial Intelligence, pp. 1983–1989.
  • Shen, W., Wang, J., Jiang, Y.-G., and Zha, H. (2015), “Portfolio Choices with Orthogonal Bandit Learning,” in International Joint Conference on Artificial Intelligence, pp. 974–980.
  • Sutton, R. S., and Barto, A. G. (1998), “Reinforcement Learning: An Introduction,” IEEE Transactions on Neural Networks, 16, 285–286. DOI: 10.1109/TNN.1998.712192.
  • Thompson, W. R. (1933), “On the Likelihood that one Unknown Probability Exceeds Another in View of the Evidence of Two Samples,” Biometrika, 25, 285–294. DOI: 10.1093/biomet/25.3-4.285.
  • Torossian, L., Picheny, V., and Durrande, N. (2020), “Bayesian Quantile and Expectile Optimisation,” arXiv preprint arXiv:2001.04833.
  • Wang, Z., Zhou, R., and Shen, C. (2018), “Regional Multi-Armed Bandits,” in International Conference on Artificial Intelligence and Statistics, pp. 510–518. PMLR.
  • White, J. (2012), Bandit Algorithms for Website Optimization, Sebastopol, CA: O’Reilly Media.
  • Wu, C. F. J., and Hamada, M. (2009), Experiments: Planning, Analysis, and Optimization (2nd ed.), Hoboken, NJ: Wiley.
  • Wu, C. F. J., Mao, S. S., and Ma, F. S. (1990), “SEL: A Search Method based on Orthogonal Arrays,” in Statistical Design and Analysis of Industrial Experiments, ed. S. Ghosh, pp. 279–310, New York: Marcel Dekker.
  • Zhou, D., and Tomlin, C. (2018), “Budget-Constrained Multi-Armed Bandits with Multiple Plays,” in Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32). DOI: 10.1609/aaai.v32i1.11629.
  • Zhu, M., Zheng, X., Wang, Y., Li, Y., and Liang, Q. (2019), “Adaptive Portfolio by Solving Multi-Armed Bandit via Thompson Sampling,” arXiv preprint arXiv:1911.05309.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.