Views

CrossRef citations to date

Altmetric

Theory and Methods Special Issue on Precision Medicine and Individualized Policy Discovery, Part II

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Haoyu ChenDepartment of Statistics, North Carolina State University, Raleigh, NC

Wenbin LuDepartment of Statistics, North Carolina State University, Raleigh, NC

Rui SongDepartment of Statistics, North Carolina State University, Raleigh, NCCorrespondence[email protected]

Abstract

Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.

Keywords:

View correction statement:

Correction

Correction statement

This article was originally published with errors, which have now been corrected in the online version. Please see Correction https://doi.org/10.1080/01621459.2020.1915023.

Notes

1 For example, $β_{[1 : p]}$ is not used in $μ (1, X; β)$ . The same rule applies to the true parameter β₀ and the estimators $\hat{β}, {\hat{β}}_{t}$ , and ${\bar{β}}_{t}$ that are introduced below.

2 To distinguish between the iid setting and the online decision making setting, we use the tilde symbol to mark the data, the conditional mean response model and loss functions from the iid settings and use b to denote the parameters.

3 Code for the numerical studies is at https://github.com/ideechy/Online-Decision-Making.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Related Research Data

Information for

Open access

Opportunities

Help and information

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Abstract

Correction statement

Notes

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature