147
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Linking asset prices to news without direct asset mentions

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
 

ABSTRACT

Advances in Natural Language Processing (NLP), computing power and data availability are driving an explosion in research about the impact of news on asset prices. However, when relating news to individual assets, this research is based on mentions of specific assets or related terms in the news stories. Such an approach has two shortcomings. First, it requires a substantial time investment in a specific NLP technology. Second, and more importantly, it ignores news articles that do not directly mention a given asset or a pre-defined asset-related term, even if these articles are logically related to the asset in question. Our approach relies instead on a novel NLP technology called ‘semantic fingerprinting’, which projects any text onto a binary vector representing its meaning. The greater the overlap between the semantic fingerprint of a news article and a given asset description, the more relevant we expect the article to be, whether or not the given asset is mentioned in the news directly. We show that this approach successfully picks up the positive impact of news on prices of commonly traded commodities using a dataset of general news published by The Guardian. We include the needed data and instructions for implementing this approach.

JEL CLASSIFICATION:

Acknowledgement

We are grateful to Jasper Ginn, Søren Tjagvad Madsen, Francisco Webber, Fang Xu and Maxim Zagonov for many insightful conversations. Special thanks to Jasper Ginn for extensive assistance with the data. All errors are ours. We gratefully acknowledge research funding from Europlace Institute of Finance, from the Romanian Ministry of Education (CNCS - UEFISCDI, project number PN-II-ID-PCE-2012-4-0631) and from the Israeli Science Foundation (project number 1957/19).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 For more detailed illustrations of the semantic fingerprinting process, see also Pungulescu (Citation2022).

2 The semantic space of the Retina engine has K = 128 × 128 = 16,384 positions. The term fingerprint corresponding to a commodity identifies a subset of 328 positions, whereas the text fingerprint corresponding to a news article highlights a subset of 984 related “contexts” from the semantic space.

3 Our news dataset is available at https://osf.io/edky5.

4 The Retina engine can be accessed at http://languages.cortical.io. Note that, consistently with a convention used by many programming languages, Cortical.io numbers the positions starting from 0 and not from 1.

5 Sample Python code for semantic fingerprinting is available from https://tinyurl.com/fpcodepython.

Additional information

Funding

The work was supported by the Israeli Science Foundation [1957 / 19]; Romanian Ministry of Education [PN-II-ID-PCE-2012-4-0631]; Europlace Institute of Finance.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.