Effect of Real Estate News Sentiments on the Stock Returns of Swedbank and SEB Bank

DOI: 10.2478/erfin-2021-0005 © 2021 The Author(s). Licensed under the Creative Commons Attribution 4.0 International License. ABSTRACT: This paper explores the effect of real estate news sentiment on the stock returns of Swedbank and SEB Bank, which are leading banks in Sweden and the Baltic region. For this purpose, we have selected sentiments from news about real estate in the markets of these banks in Sweden, Estonia, Latvia, and Lithuania between 4 January 2016 and 19 February 2019. Estimation results showed that sentiments about the housing market affect stock returns for both banks, and the effect is different for positive and negative news. We also found that there is a difference in the stock returns of these banks in terms of when and to what extent they react to news coming from the Baltic States and Sweden. Moreover, we found that the number of negative news affects the stock returns of the banks more than the strength of the news. We also apply several GARCH specifications to explore if negative and positive news affect the volatility processes to some extent. We found out that the volatilities are explained better by the GJR-GARCH and NAGARCH models. Overall, the volatility of SEB stock returns depends more on the news sentiments compared to the volatility of Swedbank stock returns.


Introduction
The Nordic and the Baltic banking sectors are closely connected. Two banks which are systematically important to the Baltic financial system, namely Swedbank and SEB, are highly exposed to risks in their real estate home market in Sweden. Moreover, we cannot put aside the risk coming from other home markets in the Baltic states. Real estate news appearing in all markets can influence for the banks' activities related to lending services, and impact mortgage volume growth. As a result, we can expect an impact on the profitability of these banks and their stock price returns.
News regarding a housing shock and future real estate crash is attracting more and more attention. This is happening both in Sweden and in the Baltics. Although the grounds for such discussions about possible market exposure are different for all countries, the result might have the same effect.
The Swedish economy is characterised as exhibiting fast growth, declining unemployment, population increase, and low-interest rates. On the other hand, the debt burden rising faster than household incomes is also observed (Statistics Sweden, 2019). The combination of these factors is the reason for active discussions in social media and for (loud) headlines in news articles. In addition, even after the Swedish Finansinspektionen introduced measures to handle the risks associated with the rise in household debt, the tension in society and among investors remains.
Baltic news channels also raise the topic of real estate market vulnerabilities. However, factors influencing such discussions differ from that of the Swedish economy. Solid economic growth in Baltic countries is followed by increased activity on the housing market as capital cities, large cities, and resorts are becoming hot locations for real estate objects. The Estonian economy does not show any signs of overheating, as there is no structural economic imbalance. The Lithuanian market is being actively boosted due to rising investments. In the case of Latvia, the residential market is growing due to an increase in the average salary (Ober Haus Report, 2018). Nevertheless, there is concern in the media about a repeat of the housing bubble that happened in the years 2005-2010 in the Baltic states. Therefore, the aim of this paper is to study the extent to which this tension in the media over real estate markets in Sweden and the Baltics can influence the stock returns of the Swedish banks. To this end, we enhance traditional econometric models using a new component -news sentiment. Such a simultaneous use of text analysis and econometric analysis can potentially benefit the financial world and draw attention to the importance of tracking the market mood. In particular, we use the autoregressive moving average (ARMA) models with news sentiment variables to estimate the conditional mean for stock returns. The ARMA model makes it possible to identify the best model for a conditional mean equation, and its residuals can be used further to estimate GARCH specifications.
To gather sentiment data, we used open-source tools and libraries, such as the Python library "Beautiful Soup" to conduct web scraping of news pages, and the VADER model for sentiment analysis (Hutto and Gilbert, 2014). Moreover, we wrote an algorithm using open-source libraries and tools, instead of using the news analytic and data gathered by commercial providers, as was done for instance by Sidorov et al. (2014a); Verma and Soydemir (2009) ;Yu (2014).
This study contributes to the literature by investigating real estate news sentiment for a period between 4 January 2016 and 19 February 2019. We take into consideration news related to the real estate market in countries which are the home markets of the banks Swedbank and SEB, and its effect on the returns for these banks. In addition, we consider asymmetrical effects of the news sentiments in our model. To the best of our knowledge, there are no other papers investigating the real estate news sentiments in Swedish and Baltic markets, and their effect on these banks' stock returns. Most studies consider company news and other generalized financial news columns and investor sentiments as explanatory factors for stock price movements (Tumarkin and Whitelaw, 2001;Groß-Klußmann and Hautsch, 2011;Li et al., 2014;Arik, 2011). Housing market sentiments are mostly considered as a factor for real estate price prediction (Soo, 2018).
The paper is further divided into five sections. Section 2 contains a review and discussion of the literature on the impact of news sentiment on stock prices, the linkages between banks and the real estate market, and approaches to stock price modelling. Section 3 provides description of the methods used for data collection, the procedure for news sentiment estimation and an overview of the econometric models used. Section 4 presents the data used for modelling and provides its main characteristics considered later in the modelling part. Section 5 reveals the results and interpretations. Finally, Section 6 concludes the paper.
2 Literature Review 2.1 News Sentiment and its Impact on Stock Prices Nowadays, there are huge volumes of publicly accessible and quickly distributed news, and this amount is increasing. This makes news one of the main sources of information that helps form opinions and support decision-making. In addition, the news is one of the main sources which forms and reflects the market behaviour at the same time. And this is why behavioural economists believe that the understanding of the whole market lies in understanding the behaviour of market players (Arik, 2011). News analytics is a highly popular research topic due to its effective application in predictions in regard to market volatility, prices and trading volumes predictions (Sidorov et al., 2014a). In finance, news sentiment is considered an event and a quantitative reflection of information. Simply saying it measures the emotional tone of available news, and its possible values can be: positive, negative or neutral. News sentiments expressed numerically can be used as a component for mathematical and statistical models (Sidorov et al., 2014a).
Different authors, such as Kothari andShanken (1997), De Long et al. (1990) analysed the relationship between news sentiment and stock returns. Evidence was found that how positive and negative sentiment influences market volatility differs, and this was proved to be substantial in the case of negative sentiment (Engle and Ng, 1993;Tetlock, 2007). But if we talk about the use of news sentiment as a factor, Tetlock (2007) was the first to prove its significance in a predictive model. Later, Tetlock et al. (2008) obtained better prediction results in comparison with forecasts prepared by analysts by applying the "Bag-Off-Words" model for news sentiment analysis (Harris, 1954).
News sentiment in predictions of stock prices is not new but it is still an elusive concept (Yu, 2014). The classical theory about stock price formation is that asset price reflects all available market information (Fama, 1965). However, Fisher and Statman (2000) argued and proved that sentiment is a considerable part of asset price formation.
Before the active spread of the internet in the world, there were quite many studies about the influence of macroeconomic news (Ederington and Lee, 1993) and also the impact of messages coming from the stock market (Mitchell and Mulherin, 1994). After an increase in the internet connection coverage, information spread in the World Wide Web became to be actively used for explaining the stock price changes; for instance, by Antweiler and Frank (2004); Tetlock (2007); Engelberg and Parsons (2011). Today, news sources not only include official news agencies or television outlets, but also publish company reports, publicly available statistics, Security Exchange Commission reports -all of these are known as "pre-news" and are the first to influence the public mood (Sidorov et al., 2014a). Moreover, more power, in the sense of the number of people covered, is getting wielded by social media (e.g. social network posts, blogs, tweets, etc.). For example, anonymised Facebook data is commonly used to evaluate people's beliefs, as was done by Bailey et al. (2017) to investigate home buyers' beliefs regarding future price movements and how this could impact decisions regarding mortgage leverage.
The analysis of news releases and their impact on the financial world was particularly important during the euro-zone sovereign debt crisis. Starting with Greece in early 2010, Ireland, Portugal, Spain and Cyprus required financial bailouts or assistance in the next few years. This was followed by the Brexit voting in 2016, which brought some speculations on the EU economy. Afterwards, the news and speculations were further fuelled by the worsened situation of the Italian banks. Acharya et al. (2018) provides a discussion on the time line, causes and effects of the euro-zone debt crisis, while Beetsma et al. (2013Beetsma et al. ( , 2017 analyses the effect of the news sentiments on the euro-zone sovereign years during the crisis period. Official news channels such as Thomson Reuters, Bloomberg, Dow Jones, and the Wall Street Journal provide reliable information which is used by investors in forming their opinion about future stock price movement. But blogs, forums, and social media form the opinions of the general public, and their popularity is growing (Yu, 2014). In many cases, these resources create informational topics for mentioned 'reliable' sources. The impact and predictive power of blogging platforms, like Twitter, was studied by Bollen et al. (2011);Zhang et al. (2011);Ranco et al. (2015) and others. Deng et al. (2018) used a large microblog data set to analyse the effect of positive and negative microblog sentiments on the stock returns. However we should take into account that sentiments taken from social media differ in some sense from the overall news sentiments, it can be described more precisely as public mood, but it is still a valuable factor for daily stock price movement prediction (Yu, 2014).
The impact of news sentiment has been proved using different types of econometric models. Arik (2011), using GARCH-in-mean models with a combination of 17 external variables in the mean equation, found a positive and significant relationship between changes in sentiment and S& P 500 excess returns. Earlier, Verma and Soydemir (2009) investigated stock market returns and investor sentiment relationships using a Value at Risk model. But, it has been proved that GARCH models better explain the financial fat-tailed data with excess kurtosis. Furthermore, Sidorov et al. (2014a) considered the GARCH-Jumps model augmented with news intensity and proved that it has a better performance in comparison with traditional GARCH model with autoregressive conditional jump intensity described by Maheu and McCurdy (2004).

Real Estate Market and Bank Stock Returns
Banks are highly exposed to the real estate market (Igan and Pinheiro, 2010;Martins et al., 2016;Kwan, 2019). This functions via the following scenario: when the housing market has a downtrend, banks have less capital means and their expansion will be shortened, and the most significant changes for the population might be a credit reduction.
The role of the real estate market in the pricing of bank stocks was studied by a number of authors. Real estate market risk and its influence on US bank stocks were estimated by Carmichael and Coën (2018). Also, the high sensitivity of stock returns to changes in real estate returns was shown by He et al. (1996). Moreover, the vast majority of studies have explored the effects of the real estate crisis on stock prices dynamics. The main idea behind their study results is that financial institution stock price movements and the level of its real estate exposure are significantly connected (Ghosh et al., 1997;Martins et al., 2016;Igan and Pinheiro, 2010). But it is important to mention that the size of bank matters. As was discussed in papers Mei and Lee (1994); Mei and Saunders (1995) and Ding et al. (2017), greater sensitivity to changes in the housing market is mostly experienced by small banks.
Numerous studies have been carried out to understand the influence of financial news on stock returns, and, the main conclusions are about the need to include sentiment factors in the prediction models (Kelly, 2016;Gupta and Banerjee, 2019). Soroka (2006) finds strong evidence for the asymmetric responses, stating that negative economic news generate a bigger impact on the individuals' attitudes. Cepoi (2020) considers the asymmetric dependence between the stock market returns and news during the Covid-19 crisis.
Following the previous research, we used the news sentiment as an explanatory factor for banks' stock returns and considered the asymmetric effect of news on the stock return behaviour.

Data Collection and Aggregation
In order to collect and structure a text, we decided to use Python as the main programming language due to the availability of the necessary libraries and working web scrappers, which allow us to obtain the article's text from online sources automatically. All the web page URLs with relevant news were provided in a separate file with a .csv extension. In order to read the links of pages, we used the Python Data Analysis Library -"Pandas" (NumFOCUS), and its function "pandas.read csv".
Every web page is compiled using "Hypertext Markup Language" (HTML), and special tags help to identify the beginning and end of the text placed on the web page as well as point to other parts of the article, such as the date of publishing or heading. Web scrapers navigate through provided web URLs, looking for the requested information using HTML tags and then download the text upon the user's request. We use the Python "Beautiful Soup" (Richardson) library for parsing web pages. Its functions such as "soup.find", "soup.title" or "soup.get text" are used to extract text from the web page.
One of the difficulties when dealing with web scrapping is that using the same markup language for web pages does not require that the same tags for formatting are used. This is inconvenient when writing a universal algorithm to recognize the necessary tags and download the text. For instance, when inspecting the web pages' structure, we found that in the case of The Local SE web pages the body of the article was contained under the tag "div id ="article-body", while for Reuters, this tag is "div class ="StandardArticle-Body body". This is why we have decided to create an algorithm which only requires a couple of tags as input pointing out the part of the web page containing the necessary text.
The resulting algorithm is capable of extracting the main part of the article, its heading and publishing date from different web sites. The only manual step is to inspect the web page and identify tags which lead to the necessary part of the page. Moreover, this algorithm removes unnecessary tags, for example, those used to format the text, merge the heading and the main part of the article to the one data object and save it with the corresponding date when the article was published. This data will be used to construct the sentiment time series. Such structured and stored news, together with dates is called the text corpus.

Text Analysis and Getting the Sentiment Time Series Data
Sentiment analysis is one of the areas in the field of Natural Language Processing (NLP) which aims to identify the sentiment of the human text -emotions and attitude-which is delivered by the author via the text. For computers, reading and understanding the language is a highly complex process, involving complicated algorithms with thousands of lines of code.
There are several NLP libraries for Python such as spaCy (Explosion AI, 2019), NLTK (NLTK Project, 2019) and TextBlob (Loria, 2018), which provide plenty of useful functions to facilitate the text analysis for researchers and interested parties from fields other than NLP.
Most of the sentiment analysers are based on sentiment lexicon, a list of words labelled either positive, negative or neutral. There are several widely used lexicons, such as LIWC (Linguistic Inquiry and Word Count), mostly used for social media texts, where it is possible to estimate the intensity of words. Hu and Liu (2004) describes the sentiment of text at a high level but has no ability to recognise emoticons or acronyms; ANEW (Affective Norms for English Words) (Bradley and Lang, 1999) is more advanced because it provides emotional ratings for words in list; SenticNet 1 also contains estimated sentiment polarities using the range from -1 to 1. The latter is used for the VADER model (Hutto and Gilbert, 2014) which is applied for sentiment analysis in our research. VADER (Valence Aware Dictionary and Sentiment Reasoner) is an open-source tool and a rule-based model used for general sentiment analysis. The lexicon and rules used by VADER are open and easily accessible, which makes its use for research purposes advantageous (Hutto and Gilbert, 2014).
The main task for the evaluation of text sentiments is to calculate sentiment polarity. Before applying the sentiment analyser, in most cases, data pre-processing is needed. In our case, this was carried out after web scraping. Afterward, the "polarity scores" method is used to obtain polarity indices for the text data.
Sentiment values for every word used in the text are calculated using the "polarity scores" function in combination with the lexicon of the VADER model (Hutto et al., 2019). The result of the primary calculations is that every word has precalculated values -polarity scores with the format: [x,y,z], where x, y, and z are negative, positive and neutral sentiments, respectively. The compound value is calculated via Equation 1 using the sum of all sentiments and the normalisation function, which places the value in the range [-1,1] where a positive sentiment corresponds to a compound score greater than or equal to 0.05, a neutral sentiment is located in a range (-0.05, 0.05), and a negative sentiment corresponds to values less than or equal to -0.05. This indicator is used further in econometric models. The function has the form: where C is the compound sentiment score, and S the sum of negative (x), positive (y) and neutral (z) scores. The value 15 is the default normalization constant mentioned in (Hutto and Gilbert, 2014).
The main advantage of the VADER model is that it takes into account, for example, exclamation marks, which show the intensity of the expressed emotion, the capitalization of words, conjunctions signalling that there is a change in sentiment polarity, and even emojis, slang words and emoticons (e.g. ":D", ":)").
Afterward, the VADER model is included in the algorithm implemented in Python. Therefore, as soon as the text data is extracted from the web page, it is used by VADER to evaluate the sentiment and generate time series used as an input for the econometric models. The VADER model and the calculated sentiments are saved to time series for further modelling. The algorithm can be visualized by the flowchart (Figure 1). Figure 1: The algorithm used to extract news and estimate its sentiment Note: This flowchart represents an algorithm used to extract the news from web pages, calculate the sentiment of text and save as a time series with the date the article was published. Source: Authors' depiction of the steps in the algorithm.

Overview of Models Used
The base model we consider is an ARMA(p, q) model, where p and q are the autoregressive and moving average orders, respectively. In all the models that follow, the ARMA parameters satisfy the typical stationarity and invertability conditions. We call the base model Model 1.0 and it is nested in Model 1.1. If we set the coefficients of the news components equal to zero in Equation 2 of Model 1.1, we obtain Model 1.0 equation. The choice of orders p and q is discussed in Section 5.
Model 1.1. ARMA(p, q) model with news component The news sources we consider are from Sweden and the Baltic States. Moreover, we include news from Reuters and other international news sources to take into account the international news. News travels very fast in the financial world, and therefore we wouldn't expect many lags for the news variables. Another restriction is that we consider many news variables. Therefore the number of lags should be low for a parsimonious model. The preliminary estimations with two lags for news variables gave the most sensible model.
In Equation 2, news variables are defined with the names of their sources (as explained in section 4.1) and are by construction in the range [−1, 1], where a positive news is higher ranked than negative news. Therefore any increase in the news variable indicates less bad news or more good news; hence, should have a positive effect on the returns. This positive effect may come on the same day or with a delay. Therefore, we apply the following restrictions on the news sentiment coefficients: δ ji ≥ 0 for all i = 1, ..., 4, j = 0, 1, 2.
For the next models, we distinguish negative and positive news in the equations in order to take into account their asymmetric effects.
Model 1.2. ARMA(p, q) model with asymmetric effect of news, neutral news discarded In Equation 3, a news sentiment is considered to be positive (superscript "+") if its value is higher than 0.05, and negative (superscript "-") if its value is less than -0.05 2 . Otherwise, the news sentiment is neutral. Neutral news is discarded because T heLocal and Others news sources do not have neutral news. Considering negative and positive sentiments separately is in line with Soroka (2006), where the author pointed out the larger impact of negative economic news on individuals' attitudes.
As discussed in Model 1.1, an increase in a news variable would mean less bad news or more good news. It is trivial that more good news should increase returns. An increase in the negative news sentiment would mean less bad news and therefore, this should increase returns. The effect on the returns may appear on the same day or with a delay. Therefore, the restrictions on the news sentiment coefficients are: Model 1.3. ARMA(p, q) model with merged news data (positive and negative news distinguished) In Equation 4, the N ews − t and N ews + t variables are the sum of negative and positive news at time t, respectively. Given that in some dates there may be more than one piece of news, we also consider the number of negative and positive news at time t with the variables N − t and N + t , respectively. A related approach was considered by Sidorov et al. (2014a), where the authors used the number of all relevant news as an exogenous variable for the GARCH model and referred to this number as news intensity.
It was discussed before that an increase in the news sentiment affects the returns positively. The same can be said about the number of positive news, that it affects the returns positively. However, an increase in the number of negative news would decrease the returns. These effects can be seen on the same day or with delay. Therefore, the restrictions on the news sentiment coefficients are: To proceed with the conditional volatility modelling, we first identify the model that fits the conditional returns best. For this purpose, we use the Akaike (AIC) and Bayesian (BIC) criteria. From the chosen conditional mean model, we receive the estimated residuals,ε t . We use this vector of estimated residuals to estimate the following GARCH specifications. For simplicity, we assume the GARCH(1, 1) order and keep the focus on the extensions of GARCH models 3 .
When modelling the volatility, we specifically focus on the asymmetric effects. In the literature, asymmetric effects is a stylized fact observed in the volatilities of financial returns, which suggests that volatility increases much more when a negative return shock appears compared to a positive return shock of the same magnitude (Ghysels et al., 1996;Asai and McAleer, 2006). In the models we discuss below, we also explore the possible asymmetric effect of positive and negative news sentiments on conditional volatility.
In the following models, the notation is as follows: is the information available up to t − 1 and h t is referred to as conditional variance or volatility, which is modelled via GARCH (1, 1) and its extensions.
Model 2.1. GJR-GARCH(1, 1) model with asymmetric news, neutral news discarded A commonly used model that takes into account the asymmetric effects is the Glosten, Jagganathan and Runkle -GJR -GARCH model (Glosten et al., 1993). We present the GJR-GARCH model with the news sentiment variables in the Equation 5.
Model 2.2. NAGARCH(1, 1) model with asymmetric news, neutral news discarded We also consider that the asymmetric effect of the return shocks could be in nonlinear form. For this reason, we employ the nonlinear asymmetric GARCH (NAGARCH) model (Robert and Victor, 1993;León et al., 2005). We present the NAGARCH model with news sentiment variables in Equation 6.
In Equations 5 and 6 the effects of negative and positive news sentiments on the volatility are distinguished. Braun et al. (1995) and Malik (2011) state that volatility increases strongly in response to bad news and decreases in response to good news. Following this thought in Equations 5 and 6, an increase in the negative news sentiment would mean less bad news and this would decrease volatility, hence δ − i,j ≤ 0, for all i = 1, 2, 3, 4 and j = 1, 2, 3. On the other hand, if there is an increase in the positive news sentiment the volatility is expected to decrease, which implies that δ + i,j ≤ 0, for all i = 1, 2, 3, 4 and j = 1, 2, 3. Incorporating this reasoning to the positivity restrictions, in Equations 5 and 6, the volatilities h t are positive if γ > 0, α 1 , α 2 ≥ 0. Moreover, one could impose the restriction that γ − i j δ − i,j + i j δ + i,j > 0. This condition comes from the fact that the news sentiment variables takes values between -1 and 1 and even in the extreme case where the squared residuals and volatility is very close to zero and the news sentiments take extreme values, the volatility stay positive. We didn't impose this restriction, but we made sure that volatility stays positive in the estimations.
In all the ARMA models above, we assume that the error term is distributed normally. Typically, financial returns have thick tailed distribution. Ghysels et al. (1996) discusses the stylized facts about financial returns. Assuming that the returns are distributed normally has some statistical advantages, including that the estimation becomes easier. (Engle and Sheppard, 2001) In particular, when the errors are normally distributed, the estimation of conditional mean and variance structures can be done separately. The resulting estimator is a quasi-maximum likelihood estimator and it is consistent, asymptotically normal but not efficient (Bollerslev and Wooldridge, 1992;Engle and Sheppard, 2001;Carnero and Eratalay, 2014). Since separating the estimation of mean and variance models would bring inefficiency due to the misspecification 4 , we use the Bollerslev-Wooldridge robust covariance matrix. As Bollerslev and Wooldridge (1992) point out, this is a consistent estimator of the White (1980) robust asymptotic covariance matrix, from which one can obtain the heteroscedasticity consistent standard errors.

Data
We utilize two data sets for this research. The first contains all news sentiments gathered for the selected topics, countries and times when the news was published. The second data set is the daily adjusted closing stock prices of Swedbank and SEB Bank, both operating in Sweden and the Baltic countries.

News Sentiments Data
The main difficulty in gathering data is to choose the most suitable sources that are able to provide reliable and necessary information on a certain topic. In our selection criteria, we intend to select the news about the housing market from Sweden and the Baltic countries (Estonia, Latvia, Lithuania) and use this data for the sentiment analysis.
To this end, two main sources were chosen to search for news related to the Swedish real estate market: "The Local SE" -a portal that posts Swedish news in English, and "Reuters" -an international news provider. Posts related to the real estate market in Sweden appear more frequently in The Local news portal, and they are necessary to capture possible interactions between daily stock price changes and news. Messages in international news portals about real estate appear more rarely and only in cases of high target-reader interest; or, in other words, when this news is highly important for international audiences. Such news can be a signal for the market about upcoming up or downturns. In addition, this type of news might influence market stability to a greater degree.
To cover those periods when the main selected sources may not provide any articles, we refer to other sources and look for them using the news aggregator "Google News" hereinafter referred to as "Other Sweden". To these sources, we also include such news providers as "Financial Times", "Business Insider Nordic", "Bloomberg", "The Wall Street Journal" and others 5 . It was highly desirable for further modelling to decrease the number of gaps in news sentiments data set, or simply, to have as many days as possible with at least one news sentiment value.
Regarding the news from the Baltic countries, the main sources of information are "ERR News" -Estonian Public Broadcasting service in English, English-language monthly newspaper "The Baltic Times", "Baltic News Network", and others hereinafter referred to as "Other Baltics". To find these, we refer to the previously mentioned news aggregator "Google News".
In order to find relevant content, we had to select the most precise keywords to direct search engines towards news that might contain content useful for this analysis. We use the following combinations to find articles about the Swedish market: "Swedish real estate", "Real estate Sweden", "Swedish housing market", "Sweden housing", "Sweden property", "Construction Sweden", "Stockholm real estate", "Stockholm housing", "Stockholm flats", "Real estate bubble Sweden". For the Baltic market, we use "Baltics real estate", "Dwelling prices in Baltic countries", "Housing market Baltics" and "Baltic property prices".
The best way to see what data we have is to plot it. Figure 2, presents the values for news sentiments by country: Sweden, Latvia, Lithuania, and Estonia, and also news published about the Baltic market in general. Two horizontal lines (y = 0.05 and y = -0.05) were added to show how many news sentiment values are positive (above y = 0.5 line), negative (below y = -0.05 line), and how many are neutral (between these two horizontal lines). A more detailed explanation about sentiment scores is provided in Section 3.
More positive Swedish news is provided by news portals other than "Reuters" or "The Local SE" but the difference is not considerable -all positive news are distributed almost equally. More negative Swedish news is provided by "Reuters", and the distribution is not as equal as for the positive news.
For Baltic news, we cannot divide the results by news portals due to the difficulty in finding such resources with sufficient news about the real estate market in these countries. Yet taking into account all the sentiments we have, we can see that they are mostly positive and concentrated near the maximum sentiment score. Negative sentiments are mostly related to real estate news about the Baltic countries in general ("Baltics" in Figure 2), and also about Estonia and Latvia.
It is also interesting to observe the difference in sentiments' distribution of Swedish news, and also for all Baltic countries news without dividing by country (Figure 3). Overall, we can see a considerable difference in the amount of negative news about the real estate market. Furthermore, for all markets, the neutral amount of news is extremely low, and it was one reason why neutral sentiments were discarded for the estimation models, in which we considered negative and positive sentiments.
From the histogram of news sentiments (Figure 4), we can see how sentiments are distributed overall. It is clear that positive sentiments prevail, most of the negative sentiments are in the range: (−0.8, −1), most of the positives are in the range: (0.9, 1).
To have a better overview of the data we scraped, we provide its descriptive analysis in the Table 1. Statistical values confirm the conclusions made during the visual analysis conducted previously.
The extreme values of the sentiments are very close to the maximum possible values of the positive and negative sentiments. The mean value for news sentiments varies from 0.43 to 0.75 for news in the Baltic states. For Sweden, this value is much less indicating that the Baltic region is more optimistic about their real estate market. But we should take into account the fact that the standard deviation is big enough for all countries and represents that sentiment fluctuations are high. If we look at the skewness of the sentiment data for all states and apply the rule of thumb, we see that Baltic data is highly negatively skewed, which means that a larger part of the sentiments presented in the data set for these countries are above the average value -mostly positive. Only for Swedish data can we state that the data is fairly symmetrical, as the skewness is closer to zero (-0.53). The second value to pay attention to is excess kurtosis -a way to notice the outliers in the distribution of data. Any value above 0 (Estonia, Latvia and in particular Lithuania) indicates leptokurtic or heavy-tailed news sentiment data, and there is a probability that we might have outliers. Regarding other excess kurtosis values, all of them are platykurtic or simply, the data is light-tailed, and the outliers, which also can be present in data set are smaller than those of the normal distribution.
The 10 most positive news sentiments are listed in the Table 2. For example, on 27.08.2018 an article gives a brief overview of the real estate market and what to expect in the future in Estonia. The author states that the Estonian real estate market is stable and low risks make it a good investment area 6 . Another example is that on 19.12.2018 an article writes about the attractive real estate market in Latvia, more specifically in Vilnius and Kaunas, and about potential interest of the investors in Klaipėda 7 . From   Table 2 it can be seen that half of these 10 most positive news were published during 2018. Out of these 10 news, the amount of news describing the Baltic and Swedish real estate market is the same. But taking into account that the data set of Swedish news sentiment is much larger in comparison with Baltic countries, we can state that the news in Baltic countries is more positive in comparison to Sweden. The same can be said when taking into account the mean values of sentiments, the Swedish mean is the lowest among all listed countries (Table 1).
The most negative news sentiments are given in Table 3. These news sentiments are mostly for Sweden for different years. For example, on 04.07.2016 an article writes about the lack of housing in Sweden and its impact on immigrants and startups 8 . One other example is that on 03.08.2016 an article describes how Baltic states worry that Swedish banks, when in crisis, have the possibility of withdrawing money from Baltic subsidiaries to cover their losses in the domestic market, and this will negatively impact the economies of the Baltic states 9 . We also see that few Baltic news is also placed at the top of the negatives, but among them, there are only news related to the Latvian market or the Baltic market in general. As a visual representation of the most frequent words in the news, and an overall understanding of the main topics covered in positive and negative news, we have used the word clouds shown in Figure 5. The most frequently used words have larger font size and are placed closer to the middle part of the word cloud 10 . It appears that the most positive news contained keywords related to housing, shopping, city, center and macroeconomic terms such as growth, investment and prices, while the most negative news contained keywords such as housing, banks, Sweden, Swedish and prices. This word cloud could suggest that negative news in the Baltics and Sweden concentrates on real estate and banking in Sweden, which confirms the significance of the aim of this paper stated in Section 1.

Stocks Prices Data
To analyse the stock returns, we start with daily series of adjusted closing stock prices from Swedbank and SEB banks from 04.01.2016 to 19.02.2019. The adjusted closing stock price is the price of last stock traded on a particular day which was adjusted according to the relevant split and dividend paid out to investors (Balasubramaniam, 2018). Historical stock prices data is downloaded directly from "Yahoo! Finance", and the quality of data is assured because it is not stored locally but loaded directly from the web source to the R software 11 . To see how stock prices vary in time, we can look at Figure 6, which presents the adjusted daily closing stock prices for Swedbank and SEB banks for the period: 04.01.2016 -19.02.2019, and shows that there are some similarities in the behaviour of bank stock prices. An uptrend in Swedbank stock prices is observable after a considerable downturn at the beginning of 2016, albeit with little certainty. For SEB Bank, a decline was observed mostly from March 2016. Similarly for both banks, from July 2016 till March 2017, prices climbed steadily, and remained almost the same with moderate fluctuations till October 2017. Past this date, Swedbank stock prices fell to a level slightly above 160 SEK per share and remained within the range of 160-175 SEK till July 2018. For SEB Bank, the situation was less positive -falls were present from autumn 2017 till the beginning of June 2018. However, the stocks for both banks jumped considerably around summer and mid-autumn 2018. The stock prices of Swedbank and SEB Bank don't always move in the same direction. Around February of 2018, the SEB Bank stock price had a negative trend while Swedbank stock prices were rising. Also around March-April 2017, we can see that SEB Bank stock prices enter to a decline trend after Swedbank stock prices which may suggest late reaction. This is in line with some findings in Section 5.
We calculate the daily returns of the banks' stocks as follows: Figure 7 shows the daily stock price returns for Swedbank and SEB banks. We see that returns fluctuate near the mean value (close to zero) and the spread of these fluctuations change over time.
A statistical overview of the log returns for both stocks is provided in Table 4. From the skewness and excess kurtosis, we see that the distribution is not close to normal (skewness is -0.67 and -0.68 and excess kurtosis is 3.42 and 6.49 for Swedbank and SEB Bank, respectively). Returns data is negatively skewed: the left tail is longer than the right. This may suggest some extreme negative returns. The high excess kurtosis suggests that the distribution of the returns is fat-tailed, which is noticeable in particular with SEB Bank returns. Our analysis of the data starts with preliminary tests. We first check if the return series we obtained are stationary using the Augmented Dickey-Fuller (ADF) test. In this test the null hypothesis is that the return series is not stationary. From Table 5, we can see that this null hypothesis can be rejected for the returns of Swedbank and SEB Bank. Therefore we conclude that these return series are stationary.
We also look at the autocorrelation and partial autocorrelation structures of the returns, which allow us to determine the order of the MA(q) and AR(p) terms, respectively. Figure 8a shows that there are significant autocorrelations at lags 4 and 10 for Swedbank. The partial correlation plot in Figure 8b indicates an increase at lags 4, 10 and 13. For the SEB Bank data, we see significant autocorrelations at lags 1, 4, 10 and 16 (Figure 9a). Figure 9b shows an increase at lags 1, 4, 10 and 16. These lags were taken into account when choosing the ARMA orders for Swedbank and SEB Bank returns. Notes: This figure shows 8a autocorrelation function (ACF) and 8b partial autocorrelation function (PACF) plots for Swedbank's daily stock returns data. From the ACF, we can conclude that the returns are not highly correlated with its lagged values because most of the spikes are not statistically significant. The PACF plot shows that there is no correlation between residuals and the next lag values.

Results
The news vectors contain many zeros and relatively less non-zero values. Moreover, they are not autocorrelated 12 , and therefore cannot explain or capture the autocorrelation Notes: This figure shows 9a autocorrelation function (ACF) and 9b partial autocorrelation function (PACF) plots for SEB Bank's daily stock returns data. From the ACF, we can conclude that the returns are not highly correlated with its lagged values because most of the spikes are not statistically significant. The PACF plot shows that there is no considerable correlation between residuals and the next lag values. Source: Authors' calculations. structure in the return series. For this reason, when choosing the orders of the AR model, we don't take the news sentiment variables into account. Based on the AIC criteria from the models which yielded serially uncorrelated residuals 13 , we found that the AR(4) model is the most adequate for both Swedbank and SEB Bank returns. The models with news sentiment variables are built on the AR(4) model. In order to avoid the trap of local optima during the estimations, we considered many starting values for the optimization procedure. In what follows, the estimation results that we want to focus on are presented in tables where the coefficients and their statistical significance are presented for each model.

Estimation Results
To save from space, we mention here that the AR and GARCH parameter estimates of all the models below satisfy the corresponding stationarity requirements. We discuss the restrictions on the coefficients of the news sentiment variables in detail in Section 3.3.

Model 1.0 AR(4) model (base-line model)
For both Swedbank and SEB Bank stock returns, the appropriate ARMA model was an AR(4). The estimated model is given in Table 10 in Appendix B. From the results, we can conclude that AR coefficients for both banks' stocks are significant at a significance level of 0.01 and stock returns are correlated with returns from the last 4 periods.
For the following models which extend the base model by including news sentiment variables, we perform likelihood ratio tests comparing them with the nested smaller models. Although the likelihood ratio test results suggest that adding the news sentiments variables did not improve the fit significantly, some of the individual coefficients of interest are found to be significant 14 .

Model 1.1 AR(4) model with news component
The model AR(4) that contains news sentiment variables follows the Equation 2. The estimation results are given in Table 11 in Appendix B. For Swedbank and SEB Bank stock returns, we can see that the autoregressive parameter estimates changed slightly compared to the base model and the intercept is smaller.
The coefficient estimates suggest that the location (Baltic States or Sweden) of news matter. The Local and Others are news sources about the real estate market in Sweden and therefore it is not surprising to see that they lead to an increase in the Swedbank returns on the same day, while the effect of the former continues two days later. Reuters and Baltics news effect the Swedbank returns positively on the next day and the following day. For SEB Bank stock returns, Swedish news from The Local and Others make the returns increase on the same day and the positive effect of The Local continues two more days. On the other hand, Reuters news affects the SEB Bank returns positively in the next two days and Baltic news affect the SEB Bank returns positively in two days. The differences compared to the estimations with Swedbank returns are that (1) SEB Bank returns react slower to Baltic news: one day later, (2) the SEB Bank returns seem to depend comparatively more on the Others news sources, (3) in total, the SEB Bank returns are less exposed to the news from Reuters and The Local.
Model 1.2 AR(4) model with asymmetric effect of news, neutral news discarded The AR(4) model with asymmetric effect of news follows from Equation 3 and the estimation results are given in Table 6. Table 6 shows that when the news are distinguished for negative and positive sentiments, interesting results emerge. For both banks' stock returns, all the news sentiment coefficients are positive as the restrictions required, while some are not significant even at 0.1 significance level. We focus our attention on the significant coefficients.
For Swedbank stock returns, it is observed that less bad news 15 from the Baltics mean higher returns on the same day but the main influence comes with a two-day delay. This is a result that Model 1.1 could not capture. Less bad news from Reuters affects returns positively with a delay of one and two days. The biggest effect of less bad news comes from The Local and Others on the same day and the impact continues two days more in 14 In a model where the likelihood ratio tests or equivalently the F-test suggest that several coefficients in the larger model are jointly insignificant, if a few of those coefficients are individually significant, then they shouldn't be ignored by the researcher. See Wooldridge (2015) pp. 149-150 for a detailed discussion.
15 Less bad news means an increase in the negative news sentiment variable. An alternative way to read the results would be more bad news which would mean a decrease in the same variable.   the case of The Local. In the case of The Local, Others and Baltics news, the total effect of negative news is higher than that of the positive news. Finally, the positive impact of positive news from Reuters is relatively high, compared to the other news sources.
For SEB Bank stock returns, we can see that less bad news from Reuters, The Local and Baltics sources increase the returns significantly on the same day, while for the first two sources, the effect continues two days later. The effect of the bad news from Baltics is relatively small and comes on the first day. When looking at the positive news sentiments, we see the positive impact of the news from The Local and Others news sources, where the effect from the latter continues in the next day.
As with the Swedbank returns, SEB Bank returns are largely, significantly and positively affected by the news from The Local and somewhat less by the news from the Baltics on the same day. Interestingly the SEB Bank returns are influenced positively by less bad news from Reuters on the same day, which is not the case of Swedbank returns. The news from the Others news sources affect the SEB Bank returns more compared to the Swedbank returns, but this coefficient is insignificant. Overall, there are many significant coefficients for the negative news sentiment variables, while not as much for the positive ones.

Model 1.3 AR(4) model with merged news data (positive and negative news distinguished)
The AR(4) model with merged news comes from Equation 4 and the estimation results are given in Table 7.   For the Swedbank stock returns, the results interestingly suggest that the content of the news matters only in the case of negative news and with a delay of two days. It seems that the return process is more concerned with the number of news rather than the content of news. The impact of one more piece of negative news affects the returns largely and negatively on the same day and to some extent on the next day. This is true even if the news of little importance. It seems that when there is large number of negative news from different sources, the returns are pulled down to a large extent on the same day. In contrast, if there is only one source providing bad news whose size might be large (which is perhaps speculative or fake news), the effect on the returns is relatively smaller and with a delay. Typically, speculative or fake news do not appear in many sources on the same day. Therefore, in this model the return process takes into account the probability that the news might be fake if it appears in one source 16 Finally, the effect of the number of positive news affect the returns with delays of one and two days, but not on the same day.
For the stock returns of SEB Bank, the findings are similar to the case of Swedbank returns. The estimation results suggest that the size of the negative news affect the returns only two days later, however, the number of negative news in the media affect the returns negatively on the same day. The effect of the number of positive news is distributed over the lags and not as large as that for negative news. Finally, we can say that SEB Bank returns are more exposed to the negative news and to the number of negative news compared to Swedbank returns.
In all these estimations for the conditional mean, the effect of news sentiments were restricted to be positive, for which the explanation was provided in Section 3.3. In fact, this restriction and the results which follow are in line with the findings of Arik (2011), where the author finds positive relationship between the news sentiments and S&P 500 excess returns.
We found out that the lowest AIC and BIC values among the conditional mean models we estimated belonged to the AR(4) model without news sentiment variables. Hence, we use the residuals taken from this model in the GARCH model estimations. As in the conditional mean models, we keep our attention on the extended models rather than the base GARCH model.
When we look at the AIC and BIC values of the Tables 8 and 9, we noticed that for Swedbank volatilities GJR-GARCH model with asymmetric news sentiments perform better, while for SEB volatilities NAGARCH model with asymmetric news sentiments is a better fit. For Swedbank volatilities GJR and NAGARCH results are similar for the coefficients of the news sentiment variables. However, for SEB volatilities, NAGARCH model with news sentiments variables have several more significant coefficients.
For Swedbank volatilities, the estimation results for Model 2.1 in Table 8 indicate that   there is a significant asymmetric effect of the return shocks on the volatility, indicated by the value of θ. On the other hand there are only a few significant coefficients for the news sentiment variables. According to the results presented in Table 8, less bad news sentiments from Reuters and Baltics would decrease volatility two days later. On the other hand an increase in positive news sentiments in the Others news source would decrease volatility on the same day and next day. We should say that this result is not in line with the result of GARCH model with news sentiment variables (no asymmetry),   which revealed that an increase in the negative news sentiments in The Local news source would decrease the volatility on the same day. Since this model is clearly outperformed by the GJR-GARCH extension, we do not present its results in this paper. However they are available upon request from the authors. The probability value for the likelihood ratio test of the GJR-GARCH model with news sentiments variables versus the GARCH model with news sentiments variables is less than 1% significance level. Looking at the other p-values, we see that the news sentiment variables are jointly insignificant when compared to a GJR-GARCH model. For SEB volatilities we concentrate our attention to the results of Model 2.2 in Table 9. A very interesting result is that the nonlinear specification (NAGARCH) for the asymmetric effect of the return shocks works better for the SEB volatilities than the linear counterpart GJR-GARCH. In fact the asymmetry coefficient δ is quite large. The results indicate that less bad news from The Local and Others news sources would decrease the volatility significantly on the same day and the effect continues two days later for the Others news source. We also notice that less bad news from Reuters decreases volatility in two days. When looking at the positive news sentiments, we can see that the more good news from The Local would decrease volatility on the same day. Similar effect is seen for the Others news source on the next day and the third day. The effect from Reuters arrives in two days as it was for the negative news sentiments. We can notice that the NAGARCH coefficient and the news sentiments coefficients are jointly significant at 5% compared to a GARCH specification.
The similarity between the results for Swedbank and SEB Bank is that both volatility series maintain asymmetric effects from return shocks and the effect of Swedish news from Reuters source on the volatility of the returns arrive with two days lag.
The result that negative and positive news sentiments affect the volatility differently is in line with the works of Engle and Ng (1993) and Tetlock (2007). However, based on the likelihood ratio tests we couldn't find clear evidence that news sentiments jointly improve the fit over and above the asymmetric GARCH models: GJR and NAGARCH. Only a few coefficients were individually significant. On the other hand, the GJR and NA-GARCH models clearly present a better performance compared to a GARCH(1, 1) model.

Further Discussion
We considered some other models in order to see whether different ways of introducing news sentiments would make a difference. In particular, we estimated smaller models which are nested in the ones we discuss in this paper. We were able to calculate the likelihood ratio tests from these estimations. We did not include the estimation results of these other models in this paper, however they are available upon request.
For the conditional returns, we extended the AR(4) model with news sentiments in Equation 2 by including dummy variables for extreme news. These dummy variables took value 1 when the news sentiment were 3 standard deviations away from their means. They were also distinguished for positive and negative news. We discarded this model since it presented high level of multicollinearity between the news sentiments variables and the dummy variables. In addition, most of the coefficient estimates for the dummy variables were equal to zero. We also considered a set up where we extended the AR(4) model with news sentiments in Equation 2 by adding the same news sentiment variables but multiplying them with an indicator that takes value 1 for negative news sentiments. This yields a similar construction as in the GJR-GARCH model. We didn't include this model in our paper since the interpretation of the results we obtained for it were similar to that of Model 1.2 that considered negative and positive news sentiments separately.
For modelling the conditional volatility, we extended the GJR-GARCH and NA-GARCH models to include merged news as in Equation 4. Except for the number of negative news sentiments, all coefficients were restricted to be negative. Interestingly, almost all the coefficients for these news sentiments were estimated to be zeros for both banks' return volatilities. That's why we didn't include this model in the paper. The only exception was that in the GJR-GARCH results for the Swedbank volatilities, the coefficient of the number of negative news with two days lag was estimated to be 0.0751 and it was significant.
Another possible question could be about the distribution assumption on the return shocks. Financial returns usually have leptokurtic distributions. This is confirmed in the descriptive statistics of our returns data as well. In this paper we assumed that these shocks are normally distributed to be able to use the two step estimation procedure. We first estimated the conditional mean parameters, and after that we estimated the conditional volatility parameters. This idea was mentioned in Bauwens et al. (2006) and with that motivation numerically analyzed by Carnero and Eratalay (2014). De Almeida et al. (2018) also mention this estimation method that the returns or the residuals of a VARMA model can be used for the two step estimation procedure (volatility+correlation) of a DCC-GARCH model, which corresponds to the same estimation method. We followed this procedure to avoid possible convergence problems in numerical optimization during the implementation of the model, in particular when the models have too many parameters to estimate. If we had assumed a Student-t distribution for the return shocks, then we would have had to estimate the conditional mean and variance parameters altogether, which would have brought about this problem.

Conclusions
In this paper, we analysed the effect of the sentiment of news related to the real estate market in Sweden, Estonia, Latvia, and Lithuania, on the stock returns and volatilities of Swedbank and SEB Bank. In this analysis, we considered the period between 04.01.2016 and 19.02.2019. First, we applied the Python open-source tools and libraries for web scraping to obtain text from news web pages, then used a rule-based sentiment analysis tool -VADER model -to compile a news sentiment time series. Subsequently, we used this data to estimate four ARMA models in which we considered several aspects such as extreme news, asymmetric effect of the news, and content versus the number of news. We also investigated the effect of the sentiment of news on the volatilities to see if positive and negative news can generate different effects on volatility.
We found that, for both banks the local Swedish news has more immediate and stronger effect than the Baltic and Reuters news. Moreover, the positive and negative news sentiments affect the returns and volatilities of Swedbank and SEB Bank stocks differently. The size of the negative news only affects the returns two days later; however, the number of negative news affects the returns on the same day. One possible interpretation of this is that the bank stock returns may not react as strongly to one big negative news but may react much more sharply to many small negative news. We also took into consideration the time lags, and we found that there was a difference between how soon the two banks react to the news. SEB Bank stock returns respond slower to Baltic news than Swedbank stock returns. On the other hand, SEB Bank stock returns are affected by negative Reuters news on the same day, while Swedbank stock returns are affected with a delay. Finally, we found out significant asymmetric effect of the return shocks on the volatilities of the two banks' returns. When this asymmetric effect was taken into account, the volatility of Swedbank stock returns was influenced by negative news from Reuters and Baltics with two days delay, and by positive news from local news sources on the same day and the next day. Similarly, the volatility of SEB returns depend on the news from Reuters with a delay of two lags. The volatility of SEB stock returns was influenced more by both negative and positive news from the local news sources on the same day and with two days lag. It is surprising to see that the volatility of SEB stock returns do not depend on Baltics news sentiments, and the volatility of Swedbank returns depend on it to a small extent. This paper can be further extended in several ways. It would be interesting to see how the Swedbank and SEB Bank stock returns and volatilities react to fake or speculative news. It could be that the investors are sensitive to such news, and therefore some volatility behaviour can be explained by this. Another extension could be to analyse the posts shared in social media such as Facebook, VK or Twitter to see the mood of the investors. Although these posts could be more emotional compared to actual news, it is a fact that less well-informed investors are influenced by these posts. It is also possible to incorporate news in local languages, i.e. Swedish, Latvian, Lithuanian, Estonian, to the sentiment analysis. This would be a challenging task because one needs to construct related libraries in Python for this analysis. However, the news sentiment data extracted from these local news would be highly valuable for this analysis.

B Estimation results presented in detail by equations
In what follows we present the estimation results of other models. In each table LL is the loglikelihood value, AIC and BIC are Akaike and Bayesian Information Criterion, respectively, and SSE is the sum of squared residuals of the estimation results.
Model 1.0. AR(4) model (base-line model) Estimation results of AR(4) model, which is the base-line model, for Swedbank and SEB Bank stock returns.