Econometric Support of a Mass Valuation Process

Abstract Research background: The issues undertaken in the paper include the specification of an econometric model in real estate mass appraisal. Advantages and disadvantages of using econometric models in real estate mass appraisal are discussed. Purpose: The issue of aiding the valuation process with an econometric model based on the Szczecin algorithm of real estate mass appraisal is discussed in the paper. Such problems like multicollinearity, lack of coincidence and nonmonotonic influence of attributes are pointed out. Also, potential solutions to these problems are mentioned. Moreover, the paper features a discussion of cases in which econometric appraisal is not sufficient. Research methodology: The base for constructing an econometric model is the so-called Szczecin algorithm of real estate mass appraisal. Based on the algorithm, the econometric model was created to enable determining the impact of real estate attributes and location on their value. Results: problems related with specification, estimation and verification of the real estate mass appraisal econometric model are discussed in an empirical example. Novelty: A non-linear model is proposed, which features explanatory variables introduced into the model, and by taking into consideration the scale of their measurement. The proposed model, by introducing dummy variables, also account for the impact of a location, which significantly improves the fit to empirical values.


Introduction
The process of real estate mass appraisal usually require econometric thinking. Mass appraisal involves numerous sets of real estate, which are evaluated in accordance with a uniform methodological approach, in a relatively short time. Econometric methods may significantly facilitate that process. Those methods ought to supplement the expert approach, but not exclude it. The application of econometric methods is not always possible. The reason for that usually entails a lack of sufficient statistical information, or its low quality.
Obtaining reliable and complete databases about the real estate market is not always feasible. It may happen that, for instance, certain attribute states are not featured in a database.
In such a case their impact needs to be determined with an expert approach. When the market features a low number of transactions, econometric determination of the impact of attributes on real estate value is also impossible. Real estate attributes are often collinear and have low variability and it could provide the wrong signs of estimates or their insignificance. If a given attribute is positively correlated with real estate value, better states of attribute should have a higher impact. Multicollinearity might cause that this assumption is violated. Such types of problems, frequently resulting from the nature of the real estate data, the incompleteness or low quality of databases, are thus far more numerous.
The objective of the paper is to discuss the problems of using econometric models in real estate mass appraisal. In the empirical example model it will be estimated with the use of an existing database for residential land use.

Literature review
The literature concerning the possibility of applying econometric methods in appraisal is fairly extensive. Usually multiple regression models are discussed in the context of appraisal or mass appraisal, e.g. (Barańska, 2010;Benjamin, Randall, Guttery, Sirmans, 2004;Isakson, 1998;Dell, 2017;Parzych, Czaja, 2017;Doszyń, 2012). A review of the quantitative methods used in mass appraisal could be found in (Pagourtzi, Assimakopoulos, Hatzichristos, French, 2003). Many quantitative methods useful in mass valuations are presented in (Kauko, d'Amato, 2008), where appraisal methods are classified into model driven methods, data driven methods, methods based on machine learning and expert methods.
Model driven methods include mostly econometric models, hedonic regression models and spatial econometric models. Nowadays often spatial econometric models are applied in mass valuations (Cellmer, 2013;Fik, Ling, Mulligan, 2003;Widłak, Waszczuk, Olszewski, 2015). Many authors assume that spatial effects could be treated as a proxy for location. Data driven methods include non -parametric models, such as GWR (Geographically Weighted Regression).
Methods based on machine learning are nowadays likewise often applied. This class of tools accounts for the ANN (Artificial Neural Networks), rough set theory, fuzzy logic, genetic algorithms, etc. Results of the application of machine learning methods could be find e.g. in (Zurada, Levitan, Guan, 2011).
If the quality of databases is low, expertise methods, such as AHP (Analytic Hierarchy Process), Conjoint Analysis, CV (Contingent Valuation) could efficiently support the mass appraisal process.
If there is serious the multicollinearity of explanatory variables, econometric models with restrictions in the form of inequalities could be estimated. Those kinds of models, in context of mass valuations, are analysed in (Pace, Gilley, 1990). General descriptions of inequality restricted least squares models can be found in (Judge, Takayama, 1966).
To compare the results of different methods, good prediction accuracy measures should be used. Many useful proposals of those kinds of measures can be found in (McCluskey, McCord, Davis, Haran, McIllhatton, 2013).
In the paper the econometric model is based on the so called Szczecin algorithm of real estate mass appraisal. This algorithm is described, for instance, in (Hozer, Foryś, Zwolankowska, Kokot, Kuźmiński, 1999;Hozer, Kokot, Kuźmiński, 2002). Problems related to the econometric specification of this algorithm are discussed in the paper of (Doszyń, Hozer, 2017). The issues of accounting for the type of measurement scales of attributes are discussed in (Doszyń, 2017).
General problems of estimating cadastral values are described e.g. in (Czaja, 2001). A broad description of econometric methods used in the paper can be found in (Greene, 2003).

Methods
The econometric model was constructed on the grounds of the so-called Szczecin algorithm of real estate mass appraisal. This algorithm for real estate having an identical purpose can be presented as follows (Hozer et al., 1999;Hozer et al., 2002): where: w ji -value of i-th real estate in j-th location attractiveness zone, It may be noted that algorithm (1) has a multiplicative form. The base value (w baz ) constitutes a point of reference for real estate value. With the algorithm, the impact of appraised real estate attributes is added to the base value. The impact of the attributes (A kpi ) can be defined by an expert approach; however econometric methods may be of help in the process. The important elements of the algorithm (1) include market value coefficients (wwr j ) defined for each location attractiveness zone, which reflect the impact of a broadly understood location.
Algorithm (1) may serve as the grounds for the specification of an econometric model, on the basis of which the impact of real estate attributes on their values, can be defined. After "transposing" the surface area (pow i ) to the left side of equation (1), applying the logarithm and adding an error term, we arrive at the following model hypothesis: where α 0 -constant term (logarithm of base value -lnw baz ), K -number of real estate attributes, x kpi -zero-one variable for p-th state of attribute k for i-th real estate, α j -market value coefficient for j-th location attractiveness zone, sal ji -dummy variable equal to one for j-th location attractiveness zone, u i -error term.

Model (2) constitutes an econometric version of algorithm (1). The explanatory variable
is a natural logarithm of a real estate unit value. Real estate attributes are typically qualitative characteristics measured on weak scales, such as a nominal or ordinal scale. If a variable is measured on an ordinal scale, then it is introduced into the econometric model through dummy variables for each state of an attribute.
In the model there is a constant term. In order to avoid strict collinearity of the explanatory variables, each dummy variable for the worst state of an attribute is ignored, hence the summation of p = 2, ..., k p in the formula (2). The ignored state of an attribute serves as a point of reference for the interpretation of the impact of the remaining states.
Let us assume, for instance, that the considered attribute is the neighbourhood of real estate, which may feature three states: unfavourable, average, favourable. Adding neighbourhood into the set of explanatory variables involves adding two dummy variables, assuming the value of one for a given state of an attribute. The first dummy variable is ignored. The second one will be equal to one, if the neighbourhood is average and it will be equal to zero in the remaining cases; the third dummy variable is equal to one in the case of favourable neighbourhood (and zero otherwise). Other attributes measured on an ordinal scale are introduced in a similar manner. If a variable is measured on a nominal scale, it is introduced with the use of one dummy variable. If a real estate value is also affected by variables measured on strong scales (ordinal or ratio scale), then formula (2) may be supplemented with them.
There are market value coefficients in algorithm (1). They can be estimated by introducing dummy variables for each location attractiveness zone, skipping "the cheapest" location attractiveness zone (on account of the strict collinearity of explanatory variables). The skipped location attractiveness zone constitutes a point of reference in interpreting market value coefficients for the remaining location attractiveness zones. The market value coefficient for j-th location attractiveness zone entails the estimation of parameter α j in model (2). If the "cheapest" location attractiveness zone is omitted, then the constant term is an estimate of the logarithm of base value (assuming that all attribute states are present in a database).
Individual states of attributes in model (2) are taken into account through separate dummy variables. It may involve a large number of explanatory variables, which increases the probability that their strict linear combinations may occur. In such a case, the impact of certain states of attributes cannot be estimated.

Data and empirical results
Econometric model (2) was estimated in the study, mostly for residential land use, but also recreational, public land uses and garages were also taken into account. The database comprises the values of land real estate, determined by property appraisers for the purpose of revaluating perpetual usufruct fees. Assessed values were taken (not transaction prices) because they are present in the Szczecin algorithm of real estate mass appraisal and the aim of the estimated model is to support this algorithm.
Attributes and their states are presented in (Table 1). In the entire paper designations (symbols) will be the same as in (Table 1). Attributes and their states were chosen by an appraiser, after taking into consideration local market conditions. All attributes, except surface area, are qualitative variables, so they were introduced into the econometric model as a dummy variables for each state of an attribute (with exclusion of the first, worst, state). Surface area is a quantitative variable, but according to the appraiser's opinions, it should be treated as a qualitative feature with three states: large, average and small. Market participants often treat this variable in that way. It was assumed that a small surface is "better" in context of real estate unit value than a large surface. All attributes were measured on an ordinal scale, so to describe variables, only positional descriptive statistics were calculated (Table 2). Real estate unit value (dependent variable) was measured in Polish zloty. Attribute states were coded as 0, 1, 2, etc. (from worst to best state).
The general conclusion is that in the database unit real estate values and attributes have low variability. In particular, this concerned plot utilities and neighbourhood, for which volatility was almost meaningless. All the analyses were conducted for the model in a logarithmic form. As an dependent variable logarithm of real estate unit value was taken. The number of observations on the basis of which the model was estimated was 524. On the grounds of the estimated model an attempt was made to assess the impact of individual attribute states on real estate value (Table 3). In the Breusch-Pagan-Godfrey test, in which squared residuals were regressed against all regressors, hypothesis stating homoskedasticity should be rejected (p emp < 0.05). On the basis of Jarque -the Bera test, also hypothesis stating the normality of residuals has to be rejected (p emp < 0.05).
To check the multicollinearity of explanatory variables, variance inflation factors (VIF) were analysed. 3 Variance inflation factors were quite high for physical plot properties (cf 1 , cf 2 ) and surface area (pw 1 , pw 2 ), but it seems that the low volatility of variables has greater importance than collinearity. The econometric record of algorithm (2) does not change the fact that the basis for mass appraisal ought to be algorithm (1) itself. It is because the results of econometric analyses typically require adjustments to be made by experts. It concerns the following cases: -number of observations is too low (to estimate a model), -the database, on the grounds of which a model is estimated, does not feature certain states of attributes, -parameter significance is lacking, -negative signs of parameter evaluations, A database may not feature certain attribute states, which does not allow assessing their impact econometrically. In such a case the impact of attribute state needs to be defined by employing an expert approach.
Furthermore, the parameters may insignificantly differ from zero. It does not have to mean that a given attribute state does not influence real estate value. It may arise from other reasons of a statistical nature, e.g. from an insufficient number of observations, from low variable volatility or the multicollinearity of attributes.
It may happen, for instance, that all real estate found in a database features a certain attribute at the same level. Its impact is going to be insignificant, but for other databases it does not need to be so. Such cases require appraisal through an expert approach as well.
3 The centered VIF were calculated as a 1/(1 -R 2 x ), where R 2 x is a determination ratio from the regression of variable x on all of the other regressors in the equation. Only VIF for attributes are presented. VIF were also estimated for sal 1 , sal 2 , ..., sal 8 , but these results are of minor importance here.
If the attributes are positively correlated with the unitary value of a real estate, then the evaluation of the parameters must be positive (the rule of coincidence). Negative parameter estimates also require an adjustment with an expert approach.
In the case of real estate, attributes are often collinear. It makes that estimators have high variances, which might cause their insignificance or wrong signs. Experts corrections might, to some extent, cope with this problem. But there is also an econometric solution. Models with restriction in the form of inequalities could be helpful. These kinds of models, in the context of mass valuations, will be analyzed in future research.
The cases discussed above occurred in the evaluated model. In each of those cases, an expert opinion is required to define the impact of an attribute state on the basis of the algorithm (1). So an econometric model would be a first step in using algorithm (1), which enables experts corrections.
It needs to be noted that in the database the worst state of a given attribute may not occur.
In such a case, in order to avoid collinearity, a subsequent state is omitted (the weakest state of an attribute, but one that appears in the data base) and it is assumed that the impact of that state is present in constant terms. In such circumstances, the worst state of an attribute that does not occur in a database ought to be assessed with an expert approach. For instance, let us assume that the neighbourhood features only two states in the database: average and favourable, whereas the attribute of 'neighbourhood' may potentially assume three states: unfavourable, average and favourable. In an econometric model there will be a dummy variable for the "favourable" state.
The dummy variable for the "average" state is ignored, in order to avoid collinearity and its impact is present in a constant term. The influence of the "unfavourable" state (which does not occur in the database) must be assessed with an expert approach.

Discussion and conclusions
The main aim of the paper was to present an econometric model that might support the real estate mass appraisal process. The model was constructed on the basis of the Szczecin algorithm of real estate mass appraisal. If real estate databases are suitable, the impact of attributes can be estimated with the use of econometric models. The Szczecin algorithm of real estate mass appraisal may serve as a starting point for its construction. Although the estimated econometric model predicts real estate value quite well, there are some problems that require attention.
Firstly, databases may not contain certain attribute states, which does not allow assessing their impact by an econometric model. Furthermore, the parameters may insignificantly differ from zero. It could be a reason for the low number of observations, low volatility of explanatory variables or their multicollinearity.
If attributes are positively correlated with real estate unit value, then the estimates of the parameters should be positive (the rule of coincidence). If there is a collinearity of explanatory variables, and if estimators have high variances, then this might cause their insignificance or wrong signs.
To avoid these kinds of problems experts corrections might be made. But there is also an econometric solution. Models with restriction in the form of inequalities might be estimated.
These kinds of models will be analyzed in future research.