1,578
Views
16
CrossRef citations to date
0
Altmetric
Applications and Case Studies

Penalized and Constrained Optimization: An Application to High-Dimensional Website Advertising

, &
Pages 107-122 | Received 09 Jun 2018, Accepted 07 Apr 2019, Published online: 19 Jun 2019
 

Abstract

Firms are increasingly transitioning advertising budgets to Internet display campaigns, but this transition poses new challenges. These campaigns use numerous potential metrics for success (e.g., reach or click rate), and because each website represents a separate advertising opportunity, this is also an inherently high-dimensional problem. Further, advertisers often have constraints they wish to place on their campaign, such as targeting specific sub-populations or websites. These challenges require a method flexible enough to accommodate thousands of websites, as well as numerous metrics and campaign constraints. Motivated by this application, we consider the general constrained high-dimensional problem, where the parameters satisfy linear constraints. We develop the Penalized and Constrained optimization method (PaC) to compute the solution path for high-dimensional, linearly constrained criteria. PaC is extremely general; in addition to internet advertising, we show it encompasses many other potential applications, such as portfolio estimation, monotone curve estimation, and the generalized lasso. Computing the PaC coefficient path poses technical challenges, but we develop an efficient algorithm over a grid of tuning parameters. Through extensive simulations, we show PaC performs well. Finally, we apply PaC to a proprietary dataset in an exemplar Internet advertising case study and demonstrate its superiority over existing methods in this practical setting. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Acknowledgments

The authors would like to thank the associate editor and the review team of the paper for their insightful comments, which greatly improved this article for publication. We are grateful for the anonymous review team’s time and effort toward enhancing and clarifying our article.

Notes

1 Internet display advertising refers to static advertisements shown on specific websites, while search advertising refers to ads shown on the search result pages. This article focuses on Internet display advertising.

2 Note that we distinguish “click rate” from the more common “clickthrough rate” here since clickthrough applies to users who click on an ad after being exposed to it, and we instead consider the overall probability of clicking on an ad given a user may or may not have been exposed to it.

3 Note that using this formulation, while βjγj0, it is not a requirement that βjγj1. In practice, we ensure this condition by adding βjγj1 as a constraint. However, this constraint is rarely invoked, only when a single website has very low cost but a large reach of viewership relative to the other websites under consideration, or the problem under consideration has very large budget.

4 For further details on the relationships among categories, see Table B.1 in Appendix B for an overview of viewership correlations within and across each of the 16 website categories during January 2011.

5 To simplify the presentation of our algorithm we have assumed equality constraints in (5). However, by introducing slack variables, the same basic approach can be used to optimize over inequality constraints. See Appendix C for further details.

6 To reduce notation, we assume without loss of generality that the elements of β are ordered so that the first m correspond to A.

7 sign(a) is a vector of the same dimension as a with the ith element equal to 1 or –1 depending on the sign of ai.

8 To measure computational efficiency between PaC and quadratic programming, both were implemented in R on a personal laptop computer using a 2.59 GHz i7 processor. The solve.QP function from the quadprog library was used to perform quadratic programming, while PaC used the lars function.

9 We use a larger value for n in the binomial setting because these distributions provide less information for estimating the regression coefficients.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 343.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.