Abstract
Headlines play an important role in both news audiences' attention decisions online and in news organizations’ efforts to attract that attention. A large body of research focuses on developing generally applicable heuristics for more effective headline writing. In this work, we measure the importance of a number of theoretically motivated textual features to headline performance. Using a corpus of hundreds of thousands of headline A/B tests run by hundreds of news publishers, we develop and evaluate a machine-learned model to predict headline testing outcomes. We find that the model exhibits modest performance above baseline and further estimate an empirical upper bound for such content-based prediction in this domain, indicating an important role for non-content-based factors in test outcomes. Together, these results suggest that any particular headline writing approach has only a marginal impact, and that understanding reader behavior and headline context are key to predicting news attention decisions.
Acknowledgements
The authors thank Christopher Breaux, Josh Schwartz, and the Charbeat organization, as well as the reviewers on prior versions of this article for their valuable feedback.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
3 Chartbeat’s testing system distinguishes between hard convergence–in which the system is 95% confident that one headline is more successful–and soft convergence. In the latter case, the system selects the variant which it is confident no other headline beats by more than 25%. Because of this relaxed criterion for selecting a winner, soft-converged tests convey a less certain and clear-cut signal of performance for predictive modeling and are therefore excluded.