THE APPLICATION OF SPATIAL AUTOREGRESSIVE MODELS FOR ANALYZING THE INFLUENCE OF SPATIAL FACTORS ON REAL ESTATE PRICES AND VALUES

The spatial distribution of real estate in specific geographic locations, real estate transactions, and the prices and values of properties are a highly complex spatial phenomena that should be analyzed with the use of multidimensional methods. Spatial factors are taken into account in the modeling process to increase the reliability of real estate market analyses, and spatial autoregressive models are applied to determine the effect of spatial factors on real estate prices and values. The present study relies on a review of the literature and the results of an experiment. The concept and principles of market analysis were designed with the use of spatial autoregressive models, and the influence of selected spatial factors on real estate prices was presented on maps. Analyses involving autoregressive models enable reliable modeling and support correct interpretation of the observed processes. The application of spatial autoregressive models for analyzing the influence of spatial factors on real estate prices and values. Real and


Introduction
Spatial factors play a very important role in the analyses of the real estate market, but they are not easy to incorporate in models of the evaluated space. In the literature, most market analyses have been conducted with the use of multiple regression models and their derivatives (Isakson, 1998;Czaja, 2001;Benjamin et al., 2004;Sirmans et al., 2005;Adamczewski, 2006;Bitner, 2007;Czaja & Dąbrowski, 2008;Barańska, 2010;Sawiłow, 2010;Dąbrowski, 2011). Multiple regression models are a highly useful tool for analyzing transaction prices, but they are not frequently applied in practice, mainly due to problems with meeting formal requirements during model design (Hozer, 2001). The effectiveness and statistical reliability of regression models have to be verified before these models are used to predict transaction prices on the real estate market (Bitner, 2007). The problems associated with the application of multiple regression models in real estate market analyses have been extensively described in the literature (Mark & Goldberg, 1988;Czaja, 2001;Hozer, 2001;Wang & Wolverton 2002;Lis, 2005;Adamczewski, 2006;Bitner, 2007;Barańska, 2010;Sawiłow, 2010).
The significance of parameters in classical regression models is not influenced by the spatial structure of the investigated phenomenon, which can lead to an incorrect interpretation of the results (Charlton & Fotheringham, 2009), in particular when real estate markets are assumed to be spatially heterogeneous. Models that do not account for spatial autocorrelation can produce erroneous results (Anselin, 1998;Tu et al., 2007), which is why many researchers have relied on spatial models to www.degruyter.com/view/j/remav vol. 29, no. 4, 2021 analyze real estate markets and predict transaction prices (Can & Megbolugbe, 1997;Bowen et al., 2001;James et al., 2005;Bourassa et al., 2010). If spatial correlations exist between real estate prices in market datasets, classical regression models produce similar results because they do not meet the assumption of independent observations; the sample therefore carries less information than an independent sample of the same size. These autocorrelations are taken into account in spatial autoregressive models (SAR), where the prices of neighboring real estate constitute additional explanatory variables (Ligas, 2006).

Spatial autocorrelation
Spatial autocorrelation denotes the presence of correlations between observations in geographic space (Can & Megbolugbe, 1997). Spatial autocorrelation is expressed by a quantitative relationship between locations and relates to continuous spatial changes which are visualized by clusters of similar values or consistent spatial patterns (Kopczewska, 2006). Basu and Thibodeau (1998) have identified two main causes of spatial autocorrelation on the real estate market: structural similarity of real estate in a given location, and the influence of the same location factors in a given area. These factors can be associated with municipal infrastructure or environmental factors. The specificity of the real estate market, namely the presence of spatial interactions between property prices in selected locations, is affected by the behavior of market actors. Location is a factor that initiates autocorrelation effects (spatial correlations), which is consistent with Tobler's first law of geography. The parties to real estate transactions base their decisions on information about similar properties that were traded in the neighborhood, including their individual attributes. These phenomena are responsible for spatial correlations between real estate prices, and these relationships should be considered in the developed model. An autoregressive component that represents the influence of real estate prices in the neighborhood on the price of the traded property should be taken into consideration in the process of modeling real estate prices and values (Kobylińska et al., 2017). For this reason, spatial autoregressive models were applied in the present study.
Spatial autoregressive models can be used to investigate spatial variations based on the mutual interactions between variables describing the studied objects (Goodchild & Haining, 2003). Correlations or interactions can involve explained (endogenous) variables, explanatory variables as well as random effects. Interactions that involve the endogenous variable are indicative of spatial autoregression. The above implies that the values of the endogenous variable in other locations influence the value of the endogenous variable in the studied location. Interactions involving random effects indicate that spatial autocorrelation exists between random effects in the model. The above also applies to explanatory variables when the explanatory variable in a given location is influenced by the values of exogenous variables in other locations. This situation is described by a spatial crossregressive model (Suchecki, 2010).
A spatial weight matrix is a prerequisite for and the first step in a spatial autocorrelation analysis. Moran's I is most widely used to test spatial autocorrelation (Longley et al. 2005, Kisiała 2016). This test statistic is calculated with the use of the below formula (1): where: n -number of observations, z i -value of variable z in the i th location, z -mean value of variable z for all observations (locations), w ij -weight of spatial interactions between observations i and j. If the null hypothesis states that variable z has a random distribution (absence of spatial autocorrelation), the significance of spatial autocorrelation tested with Moran's I is verified based on parameter Z I with normal distribution (expected value -0, variance -1) according to formula (2): (2) where: Residual spatial autocorrelation in a linear model, verified by diagnostic tests for linear models, requires the application of spatial estimation methods (Kopczewska, 2006). Depending on the type of spatial interactions, two main types of spatial regression models are generally used: the spatial lag model (SLM) and the spatial error model (SEM) (Anselin, 1998;Wilhelmsson, 2002;Arbia, 2006).

Spatial lag model
The neighborhood matrix W exerts different effects on the endogenous variable in the spatial lag model and the spatial error model. The spatial lag model contains a spatially lagged endogenous variable Wy; it therefore is an autoregressive model. The spatial error model is developed on the assumption that spatial autocorrelation exists between the residuals from the model. The general form of the SLM is given by Formula (3) (Anselin, 1998;Wall, 2004;Arbia, 2006;Kopczewska, 2006;Ligas, 2006): which can be reduced to (Formula (4)): ( 4) where: y -vector of the observed explained variable (real estate price, value) (n x 1); X -matrix of independent variables (real estate attributes) (n x k+1); β -parameter vector of coefficients (k+1 x 1); W -spatial weight matrix (n x n); e -model error vector, ρ -spatial autoregressive coefficient, u -model error, vector of the random effect u ~N (0,σ 2 I) (n x 1). When spatial autocorrelation does not occur ( = 0), the result is a classical model of multiple linear regression. The model can be reduced only when matrix (I -ρW) -1 is an invertible matrix (Cellmer, 2014).
In most cases, the autoregressive coefficient takes on values in the range of (-1, 1). In this case, matrix (I -ρW) -1 can be expanded (Won Kim et al., 2003) with the use of Formula (5): The attributes of the above decomposition can be used to define the elements of a variancecovariance matrix of the random effect in reduced form (Formula (6)) (Anselin, 1998;Suchecki, 2010): Spatial lag models cannot be estimated by the least squares method due to estimator loading, and have to be described by maximum likelihood estimation (Anselin, 2009). Spatial lag Wy is interpreted as the value of dependent variable y in neighboring regions. If the value of Wy is significant, the value of variable y in region i is influenced by the level of the analyzed phenomenon in the neighboring regions j and by the remaining explanatory variables (Kopczewska, 2006).

Spatial error model
The general form of the spatial error model is given by Formulas (7) and (8) (Anselin, 1998;Wall, 2004;Arbia, 2006;Kopczewska, 2006;Ligas, 2008): where: y -vector of the observed explained variable (real estate price, value) (n x 1), X -matrix of independent variables (real estate attributes) (n x k+1), β -parameter vector of coefficients (k+1 x 1), www.degruyter.com/view/j/remav vol. 29, no. 4, 2021 W -spatial weight matrix (n x n), u -model error, vector of the random effect u ~N (0,σ 2 I) (n x 1), λ -spatial autocorrelation parameter, Wu -spatial lag error, mean error from the neighboring regions, e -independent error in the model. The original random effect u is a function of a "true" random effect that meets standard assumptions ~N (0, σ 2 I) (Anselin, 1998), and is described by Formulas (9) and (10): If matrix (I-λW) is invertible, the SEM can be reduced with the use of Formula (11) (Anselin, 1998;Osland, 2010;Cellmer, 2014): The value of the explained variable in every location is influenced by error and the multiplication factor . The lower the value of the spatial autocorrelation coefficient, the weaker the effect of the multiplication factor. The above formulas can be used to derive Formula (12): (12) Unlike the spatial lag model, the spatial error model accounts for the effects of spatial lag λWXβ of exogenous variables (Cellmer 2014).
In a spatial error model, parameter β is estimated by the generalized least squares (GLS) method, and parameter λ is estimated by optimization. The spatial autocorrelation of the residuals is also determined (Kopczewska, 2006). Both types of spatial correlations can be presented in a single model. A general spatial process model is given by Formulas (13) and (14): The above model accounts for the effects of the spatial lag of variable y and the autocorrelation of random effect . Spatial weight matrices W 1 and W 2 correspond to the neighborhood matrices of the autoregressive process and the random effects covariance matrix (Arbia, 2006). If matrix is invertible, the model is described by Formula (15): (15) The model can be reduced with the use of Formula (16): The model is transformed similarly to the spatial lag model, and it is expressed by Formula (17) (Anselin, 1998):

Data and Methods
The analysis involved spatial vector data and a database of real estate transactions for the city of Olsztyn (north-eastern Poland, Region of Warmia and Mazury, Olsztyn county, Olsztyn municipality) that were acquired from the Department of Geodesy and Real Estate Management of the Olsztyn City Office. The factors influencing the prices and values of real estate on the Olsztyn market were analyzed, and variables were selected and described. The collected spatial vector data described the attributes of vacant land plots (zoned for residential construction) that are most often included in real estate appraisals and can potentially influence the prices and values of real estate: distribution of utility networks (power network, water supply network, gas network, telecommunications network, sewer network), urban planning attributes -designation in the local zoning plan, public roads, railway lines, development density, forests, water bodies (lakes, main rivers), distribution of public transportation stops, location of public utility services, potential supply -vacant land that can be potentially traded on the local market, distance from the city center, noise levels -acoustic map of Olsztyn. Real estate that was not traded on the free market was excluded from the analysis, including land plots for public road projects, land occupied by ditches, ponds, trees, shrubs and forests. Only real estate supporting an objective evaluation of the local market was selected for the study. Data concerning 932 residential property transactions were ultimately included in the analysis. The collected data were integrated and spatial analyses were conducted with the use of ArcGIS 10.0 software. Calculations were performed in Microsoft Excel 2007. Open Street Map and Geoportal.gov.pl services were also used to obtain additional information about the analyzed market.
Fifteen variables describing each traded real estate were adopted for the analysis: noise levels ("noise", in dB), distance from public transportation stops ("stops", Euclidean distance in meters), distance from the city center ("center", junction of Pieniężnego Street and Piłsudskiego Avenue; Euclidean distance in meters), distance from a forest ("forest", Euclidean distance in meters), as well as variables described by kernel density estimation: gas network ("gas"), water supply network ("water"), telecommunications network ("telecom"), district heating network ("heat"), development density ("density"), potential supply, vacant land that can be potentially traded on the market ("vacant"), power network ("power"), distance from a railway line ("railway"), distance from public roads ("roads") and the availability of public services ("services").
A reference layer containing information about real estate prices (PLN/m 2 ) and attributes (attribute table) was generated in ArcGIS. A fragment of the reference layer is presented in Figure 1. The influence of location and other spatial attributes on the prices and values of real estate was determined with the use of a classical multiple regression model and spatial autoregressive models. The applicability of spatial autoregressive models, also referred to as spatial regression models, is determined by the presence of spatial autocorrelations.

Empirical results
Additive models have been most widely used in real estate valuation. However, models that sum up the analyzed attributes do not always produce optimal results on all real estate markets (Barańska 2005). Non-linear multiplicative models can estimate the results with greater accuracy. A multiplicative model, where the estimated coefficients are raised to the power of the represented attributes, always generates positive market values of real estate for each attribute and describes attribute variability with the use of a monotonic function (constantly increasing or constantly decreasing). One of the main advantages of a multiplicative model is that it presents the percent www.degruyter.com/view/j/remav vol. 29, no. 4, 2021 variation in real estate value relative to the value estimated based on real estate attributes (Barańska 2005).
The model applied in the study is described by formula (18): where: c -unitary price or value of real estate in the database, x 1 , x 2 , ..., x m -real estate attributes in the database, B 0 -unitary value of real estate for zero value of all attributes, B j -estimated model coefficients. To estimate the value of coefficients Bj, the above function has to be transformed to a linear function by taking the natural logarithm on both sides of the equation (5.1) with the use of formulas (19), (20) and (21): These operations generate the parameters of the model (22): The above transformation produces a classical multiple regression model, where the explained variable is the logarithm of real estate price in PLN/m 2 . The selected spatial attributes are the explanatory variables (Table 1). The results generated by the multiple regression model are presented in Table 1. The percentage influence of selected attributes on real estate prices was estimated in the "Influence % (multiplication factor)" column.
The coefficient of determination measures how well the model fits the analyzed data. In the estimated multiple regression model, the coefficient of determination R 2 reached only 0.167, and the standard error of the estimate was determined at 46.06. Based on the values of six variables (noise, gas, telecom, density, roads and services) at the significance level of 0.05, there are not any grounds for rejecting the hypothesis that independent variables do not influence the explained variable. The plus/minus sign before the estimated parameters indicates that these attributes exert a positive or a negative effect on the explained variable, which is not always consistent with expectations (which is the case for variables such as noise, water, telecom or heat). The relatively poor results of the estimated model can be attributed to the spatial structure of the analyzed data which, in turn, can affect the accuracy of parameter estimation. The analyzed model does not support reliable predictions of the explained variable, and it should be used only as a reference point for comparison with spatial models. The presence of spatial correlations expressed by spatial autocorrelation was investigated in the next step of the analysis.
Spatial autocorrelation is tested with global Moran's I which measures the strength of correlations between the value of the analyzed variable in a given location and the value of the same variable in other locations. The presence of such correlations indicates that the variables are spatially clustered. A positive autocorrelation occurs when the values of the analyzed variable are similar (high or low) and form clusters (groups). In contrast, a negative autocorrelation occurs when high values of the observed variable are located in the proximity of low values, and when low values of the analyzed variable are situated in the proximity of high values (Suchecki 2010).
The calculated values of Moran's I, including the expected value and variance, are presented in Table 2. Source: own elaboration.
The calculated probability density function of Z(I) is presented in Figure 2. In Figure 2, the critical values of the test statistic at various levels of significance (0.1; 0.05; 0.01) are marked with different colors. There are no grounds for rejecting the null hypothesis when the test statistic is in the critical region. The null hypothesis should be rejected when the test statistic is outside the critical region. www.degruyter.com/view/j/remav vol. 29, no. 4, 2021 Based on the results, the hypothesis postulating the absence of a global spatial autocorrelation (p< α; α= 0.05) should be rejected. Statements I > E(I) and Z(I) > 0 indicate that a positive autocorrelation exists between real estate prices: high prices occur in the proximity of high prices, and low prices occur in the proximity of low prices. The probability that the observed spatial correlation is accidental is low (Z(I)= 18.8655, p <0.001). The examined phenomenon was also analyzed based on the spatial distribution of real estate prices (Fig. 3). The points in the first (I) and third (III) quadrant of the scatterplot represent objects/real estate surrounded by similar neighbors (similar in price and attributes). The objects in quadrant I have high prices and are surrounded by high-priced real estate, whereas the objects in quadrant III have low prices and are surrounded by low-priced real estate. The points in the second (II) and fourth (IV) quadrant denote real estate located in the proximity of non-similar neighbors. The objects in quadrant II have low prices and are surrounded by high-priced real estate, whereas the objects in quadrant IV have high prices and are surrounded by low-priced real estate.
The results of the analysis indicate that transaction prices are spatially autocorrelated; therefore, spatial regression models can be applied. The modeled parameters were estimated in the GeoDa program (developed by the Center for Geospatial Analysis and Computation at Arizona State University; https://geoplan.asu.edu/geodacenter-redirect). The results of spatial autoregressive model estimation are presented in Table 3. Source: own elaboration.
The results of spatial error model estimation are presented in Table 4. Both the spatial error model and the spatial lag model contain explanatory variables that are not statistically significant. These variables differ from the variables in the multiple regression model. Development density, railways and roads were not significant, and they exerted a weak influence on transaction prices in all models, which could be attributed to the specificity of market data.
The statistical verification of the results of spatial regression model estimation is a complex process. These results were verified with the use of the appropriate statistical tests. Models estimated by the least squares method are compared with spatial error models and spatial lag models with the use of the log-likelihood function that maximizes log L, the Akaike information criterion (AIC) or the Bayesian information criterion (BIC, Schwartz information criterion), rather than the coefficient of determination.
www.degruyter.com/view/j/remav vol. 29, no. 4, 2021 The applied statistical tests are described by formula (23): , The estimated models were analyzed based on the selected criteria, and the results are presented in Table 5. Source: own elaboration.
The results of the log-likelihood test indicate that the spatial error model was characterized by the highest goodness of fit. Similar observations were made based on AIC and BIC values. The spatial error model also best fits the data in the analysis based on the values of the coefficient of determination. Despite the above, all models were characterized by relatively poor goodness of fit, and only minor differences were observed between the results of statistical tests.
The analyzed variables exerted relatively varied effects on real estate prices. The spatial distribution of the varied effects exerted by selected real estate attributes on real estate prices is presented graphically in maps in Figure 4.

Constant β
Effect of noise β www.degruyter.com/view/j/remav vol. 29, no. 4, 2021 Effect of distance from the city center β Effect of distance from railway lines β Effect of distance from the road network β Effect of access to public services β

Discussion and Conclusions
In the present study, autoregressive models were used to determine spatial variations in the effects exerted by real estate attributes on real estate prices. The effects of the examined attributes were not statistically significant (p=0.05) in all cases. Real estate market analyses usually involve real estate attributes that are most often considered in the valuation process, but the market of vacant land is characterized by numerous and highly varied attributes, including random factors, behavioral factors as well as uncertainty which is an immanent