Volume 17, no. 1Pages 27 - 36

Invariant Description of Control in a Gaussian One-Armed Bandit Problem

A.V. Kolnogorov

We consider the one-armed bandit problem in application to batch data processing if there are two alternative processing methods with different efficiencies and the efficiency of the second method is a priori unknown. During the processing, it is necessary to determine the most effective method and ensure its preferential use. Processing is performed in batches, so the distributions of incomes are Gaussian. We consider the case of a priori unknown mathematical expectation and the variance of income corresponding to the second action. This case describes a situation when the batches themselves and their number have moderate or small volumes. We obtain recursive equations for computing the Bayesian risk and regret, which we then present in an invariant form with a control horizon equal to one. This makes it possible to obtain the estimates of Bayesian and minimax risk that are valid for all control horizons multiples to the number of processed batches.

Full text

Keywords: one-armed bandit; batch processing; Bayesian and minimax approaches; invariant description.
References: 1. Berry D.A., Fristedt B. Bandit Problems: Sequential Allocation of Experiments. London, New York, Chapman and Hall, 1985.
2. Presman E.L., Sonin I.M. Sequential Control with Incomplete Information. New York, Academic Press, 1990.
3. Tsetlin M.L. Automaton Theory and Modeling of Biological Systems. New York, Academic Press, 1973.
4. Sragovich V.G. Mathematical Theory of Adaptive Control. Singapore, World Scientific, 2006.
5. Gittins J.C. Multi-Armed Bandit Allocation Indices. Chichester, John Wiley and Sons, 1989.
6. Lattimore T., Szepesvari C. Bandit Algorithms. Cambridge, Cambridge University Press, 2020.
7. Kolnogorov A.V. One-Armed Bandit Problem for Parallel Data Processing Systems. Problems of Information Transmission, 2015, vol. 51, no. 2, pp. 177-191. DOI: 10.1134/S0032946015020088
8. Perchet V., Rigollet P., Chassang S., Snowberg E. Batched Bandit Problems. The Annals of Statistics, 2016, vol. 44, no. 2, pp. 660-681. DOI: 10.1214/15-AOS1381
9. Vogel W. An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem. The Annals of Mathematical Statistics, 1960, vol. 31, no. 2, pp. 444-451.
10. Kolnogorov A. Gaussian One-Armed Bandit Problem. 2021 XVII International Symposium "Problems of Redundancy in Information and Control Systems". Moscow, Institute of Electrical and Electronics Engineers, 2021, pp. 74-79. DOI: 10.1109/REDUNDANCY52534.2021.9606464
11. Bradt R.N., Johnson S.M., Karlin S. On Sequential Designs for Maximizing the Sum of n Observations. The Annals of Mathematical Statistics, 1956, vol. 27, pp. 1060-1074. DOI: 10.1214/aoms/1177728073
12. Chernoff H., Ray S.N. A Bayes Sequential Sampling Inspection Plan. The Annals of Mathematical Statistics, 1965, vol. 36, pp. 1387-1407. DOI: 10.1214/aoms/1177699898
13. Kolnogorov A.V. Gaussian One-Armed Bandit with Both Unknown Parameters. Siberian Electronic Mathematical Reports, 2022, vol. 19, no. 2, pp. 639-650. Available at: http://semr.math.nsc.ru/v19n2ru.html