Blue chips, black ink: How newspaper coverage predicts stock performance
Many forecasters use economic indicators like inflation or interest rates to forecast future equity premia, that is, excess return of a stock index over the risk-free rate. However, investors rely on a much wider universe of information when making decisions, and Big Data allows us to create new models that take these into account. Using machine-learning techniques, Adämmer and Schüssler analyse hundreds of thousands of newspaper articles from 1980 to 2018, isolating 100 major news topics. They find that the prevalence of topics in news coverage – including non-economic topics – can predict future stock performance better than models relying solely on economic indicators.
Bankers, brokers and investors build various models to forecast future stock values. They construct these from a host of different indicators, some relating to the wider economy (like inflation), and others to the characteristics of specific firms (like an earnings-to-price ratio). But investors examine more than tables of economic data; they also monitor wider political and international developments, and the politico-economic contexts that affect measures like inflation and taxation. In short, they read the news.
Relying on this insight, Philipp Adämmer, an economist at Helmut Schmidt University in Hamburg, and Rainer A. Schüssler, assistant professor at the University of Rostock, used Big Data methods to demonstrate how newspaper coverage predicts aggregate US stock returns (S&P 500 index).
In their 2020 article, Forecasting the Equity Premium: Mind the News!, published in the Review of Finance, the two researchers used machine-learning to analyse close to 700,000 newspaper articles from two leading US newspapers. Using topic modelling, they identified 100 news topics and then incorporated them into a model, testing whether their prevalence could better predict returns on equity than standard models. They found that patterns of news coverage better foretell stock performance than models using only economic indicators.
The correlated topic model
Though the idea that news impacts the economy is not novel, scholars have only recently started using text-mining to analyse news coverage, allowing the amount or tone of coverage to be converted into quantitative data. Text can carry a vast number of meanings and computer algorithms have made it much easier to use mass quantities of text as data.
Adämmer and Schüssler chose to use two US newspapers, generally considered the most prestigious general-interest journals in the country: The New York Times and The Washington Post. Using the LexisNexis database, the two researchers obtained any articles that either contained the character string ‘econom’ in its text or a proportion of economic relevance greater than zero, for a period between June 1980 and December 2018. After screening out the longest and shortest articles (often corrections or dossiers), and those written in Spanish, they had assembled 694,506 articles.
Once these articles were collated, Adämmer and Schüssler used a “correlated topic model” to identify themes within the text. Treating each article as a “bag of words”, the algorithm clusters these words into coherent topics. For example, Topic 62 includes the following word strings: “Iraq,” “iran,” “gulf,” “war” and “iraqi”. The researchers could then conclude that Topic 62 likely referred to the Iran-Iraq War or the First Gulf War (1990-91).
The correlated topic model is an extension of the most prominent topic model, namely “latent Dirichlet allocation”. Topic models assume that documents are probability distributions over topics, whereas topics are probability distributions over words. The “correlated topic model” can model the correlations between the topics within a document as well. For example, a topic focusing on oil may be more likely to appear in the same piece as one on Iraq than on cooking. Adämmer and Schüssler used the correlated topic model to select 100 topics from which to build their forecasting model.
The forecasting model
Adämmer and Schüssler’s forecasting model uses the proportion of each selected topic in the body of news articles as a measure of the topic’s influence in the media, averaged over a month. The dependent variable is return on equity investments (e.g. stocks), measured as the return of the S&P 500 index in excess of the treasury bill rate. Although the authors downloaded articles dated between 1980 and 2018, not all of these were used in the final analysis: Those collected from June 1980 to December 1995 were used in a “training sample” to refine the algorithm. The actual statistical analysis involved the articles published between January 1996 and December 2018.
The predictive variables are the proportion of news topics represented in each article. First, the model tests the effect of each topic individually, ranking each topic by its out-of-sample mean squared prediction error (MSPE) to determine which has the highest ability to explain changes in stock returns. The smaller the error, the more predictive power a single topic has. This measurement is called CTMSel. Secondly, the model uses an average of the values for all 100 topics, approximating a general effect of “news” on returns, a measurement labelled CTMAvg.
Rather than choose one measurement mode over another, Adämmer and Schüssler use both to maximise the sensitivity of their statistical model. “If a particular predictor [individual topic] shows temporarily stronger forecasting power [eg, a lower MSPE] compared with the model average, then the model selection strategy [CTMSel] is preferred over the model averaging strategy. If no single predictor emerges as more powerful, then the model averaging strategy [CTMAvg] is selected.”
The goal of the model is to anticipate changes in future equity returns, so Adämmer and Schüssler are most interested in whether their model has predictive power over equity returns in the “out-of-sample” portion of the dataset.
The model leads to substantially higher forecast accuracy than the historical average of excess returns and other benchmark models using economic predictors. From the viewpoint of an investor, the model generated high risk-adjusted returns.
The model tended to switch from using CTMAvg to CTMSel during periods of recession. One interesting finding was that a specific topic, Topic 20, “is always included as the single predictor when the algorithm switches to the model selection approach” (that is, when it picks a single topic to act as the independent variable), which means it tends to become relevant during recessionary periods. Topic 20 seems to relate to Germany or German reunification, as it contains the words “german,” “east” and “west.” However, it is not entirely clear whether this topic refers to Germany itself, or whether Germany simply appears in articles about other meaningful topics.
Adämmer and Schüssler ran an analysis of how similar all 100 topics were to each other in terms of their sharing of keywords. They found that of the six topics most similar to Topic 20, “that five out of six topics are related to geopolitical events, especially to Russia, China, and Israel”. They then chose three economic and three geopolitical topics, and examined how well they predicted stock-return volatility during the period between January 1999 and December 2006 (a period marked by 9/11 and the Iraq War), and the span between January 2007 and December 2018 (an era defined in large part by the global financial crisis). They also tested a combination of both. They found that geopolitical topics had more explanatory power than economic ones, and that the geopolitical predictors were especially strong in 1999-2006. Thus, non-economic factors can have a powerful effect on stock markets and equity returns, calling into question more narrowly focused predictive systems.
There are few things more critical in finance than the ability to calculate expected returns on investment. Accurate projections are necessary to calculate profits, plan investments, estimate needed financial reserves and detect market trends. However, financial models often fail, partly because they focus on too narrow a set of indicators or rely solely on conventional economic and financial signals. In their new article, Adämmer and Schüssler demonstrate how using a wider array of variables can produce a more robust model for predicting equity returns. By incorporating one of the major sources of investor information – the main US newspapers of record – they produce a more accurate understanding of market dynamics, and so a better forecast of returns on equity. Hopefully, this will influence more innovative approaches towards equity forecasting across the discipline.
- Adämmer, P. and Schüssler, R.A. (2020). Forecasting the Equity Premium: Mind the News! Review of Finance 24(6): 1313–1355. https://doi.org/10.1093/rof/rfaa007
Professor Schüssler and Dr Adämmer combine statistical machine learning techniques with time-series econometrics to understand complex data structures in finance and economics.
Professor Schüssler and Dr Adämmer hold a PhD in Economics from the University of Münster. Dr Schüssler is an assistant professor at the University of Rostock. Dr Adämmer is a postdoctoral research associate at Helmut Schmidt University.
Creative Commons Licence(CC BY-NC-ND 4.0) This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Creative Commons License
What does this mean?
Share: You can copy and redistribute the material in any medium or format