Land Price Regression Model and Land Value Region Map to Support Residential Land Price Management: A Study in Nghe an Province, Vietnam


 The real estate market in areas with many socio-economic activities needs to be strictly managed due to the difference between the market price of urban land and the price of land set by the state. This study identifies and analyzes the influence of some factors on land prices in peri-urban areas to develop land pricing standards consistent with the price level in Nghe An province. The study surveyed 362 land users and 200 samples of successfully transferred properties in the study area. Based on the multivariate regression method, the study builds a residential land price model and calculates the market price of residential land. The authors also established a map of land value areas to help State agencies manage land prices effectively. The research serves as a basis for State agencies to study the formation and development of the real estate market to develop appropriate land price management measures.


Introduction
Since implementing the Doi Moi policy, Vietnam has enjoyed strong economic growth, significantly reducing poverty from 58% in 1993 to 2.8% in 2020 (GSO, 2021). However, Vietnam is still a country with most of the population living in rural areas, and two-thirds of the population working in the agricultural sector; labor productivity remains low (Nguyen, 2021a). In some countries, urbanization has been used as a tool to promote economic growth and reduce poverty (Arouri et al., 2014;Kuddus et al., 2020). If Vietnam maintains a high growth rate, supporting urbanization, in which cities contribute significantly to job creation and Gross Domestic Product, it will be an essential measure (World Bank, 2020). This structural change will cause population and housing demand to increase in cities, whereby good quality affordable housing solutions in reasonable serviced settlements will be www.degruyter.com/view/j/remav vol. 30, no.1, 2022 essential (World Bank, 2015). In other words, land price is the bridge between the relationship of land -the market -the management of the state; it is an economic tool for land managers and users to access the market mechanism (Tran Tuan, 2021a). At the same time, it is also the basis for assessing fairness in land distribution.
Vietnam has also gone through various phases of housing policy. Before 1988, Vietnam's formal housing sector was managed through a centralized planning regime, but Doi Moi policy changed to a market-oriented one (Shanks et al., 2004;Tran & Yip, 2019). This change caused the area to proliferate. However, this market orientation offers almost no solution to facilitate the access of the poor and nearpoor to housing (Thanh et al., 2013). Growth driven by foreign direct investment and speculation has driven house prices up (Khanh, 2021). One of the important reasons leading to the above shortcomings is that, although Vietnam has a complete information system and land price database, it is not close to market prices (Vietnam has a two land price systems, including the state land price and the market land price). This system has not yet met the reliability necessary to serve the market management and society's development needs. In order to overcome these limitations, it is essential to determine land price reasonably and close to its market value (Tran Tuan, 2021b). In other words, it is necessary to develop a suitable land price list using a land valuation tool. Land valuation is not only intended to deal with individual valuation cases, but it is also used to carry out a mass valuation on a large scale (Demetriou, 2016;Cauto et al., 2021).
According to the international association of appraisers (IAAO, 2013), bulk valuation is the process of valuing a group of properties at a given time, applying common data, standardized methods, and check statistics. Land pricing models apply three approaches to value: a cost approach, price comparison approach, and an income approach (Wincott, 2001). In land valuation, multivariate linear regression analysis (MRA) is one of the best-known statistical approaches with many applications, especially forecasting land prices from regression models (Benjamin et al., 2004). The land price regression model uses factors affecting land price as independent variables to calculate valuation (Alimudin et al., 2017;Karakayaci, 2018). The fundamental factors that affect land prices commonly used in studies include (1) the location of the parcel of land, (2) the distance to important sites, (3) the characteristics of the land plot, and (4) the land plot's environmental and security conditions.Some macro factors, such as the economic and financial situation of countries, also impact real estate prices (Renigier-Bilozoz and Wiśniewski, 2013).
There are quite a few countries with applied methods that use computational models to determine land value in the world. Since 1963, Bailey et al. (1963 introduced a method to determine the value of a property based on a linear regression function. Until 1964, Alonso, a famous researcher in real estate, suggested that the value of a property depends on its location (Alonso, 1964). In 1966, in customer theory, Lancaster proposed a model according to which an asset's value depends on some characteristics of that asset (Lancaster, 1966). By 1974, Rosen applied the ideas of Bailey and Lancaster to come up with a mathematical model called the Hedonic model to determine the value of a product and analyze the equilibrium value in the market for that product (Rosen, 1974). During the following period, many studies applying the Hodenic model to the determination of real estate prices were carried out by researchers in different countries, such as Sweden (Englund et al., 1998), America (Zhou & Sornette, 2008), and France (Gouriéroux & Laferrère, 2009).
Meanwhile, a study on land price forecasting in India using neural network techniques and multiple regression sheds light on the spike in land prices in the southern and western regions of Chennai Urban Area (Sampathkum et al., 2015). In Indonesia, a study aimed at predicting house prices in Malang city with regression analysis and Particle Swarm Optimization (PSO) was optimal with the lowest error (Alfiyatin et al., 2017). In Guatemala, a multivariable regression model has shown that the land value in this city has a difference of up to 253% after approaching the central land (Morales et al., 2019). In Kuwait, land prices were determined using traditional spatial regression methods showing the effects of population density, educational facilities, and air pollution levels (Mostafa, 2018).The use of regression analysis, in which independent variables were analyzed and used to predict real estate sales prices in the Belmont and Eastside neighborhoods of Pueblo, proved a success because it estimated the neighborhood's exact selling price in 2017 (Mize, 2017). Kolebe et al. 2019 also estimated land prices by regression method in Germany. They concluded that the method could generate land value estimates that were consistent with estimates from the experts' point of view. From the above analysis, it can be seen that the use of a regression model is a possible way to establish a better estimate of the actual value of land (Mize, 2017). Therefore, this article will develop a www.degruyter.com/view/j/remav vol. 30, no. 1, 2022 land price regression model and land value region map to assist authorities in managing residential land prices in a locality in Nghe An province.

Data and Methodology
Nghe An is a province located in the center of the North Central region with a 419 km long border with the Lao People's Democratic Republic in the west and 82 km in the east with a coastline. Nghe An is also located in the economic corridor connecting Myanmar -Thailand -Laos -Vietnam -East Sea to Cua Lo port. This position gives Nghe An a vital role in domestic and international economic exchanges. It also creates many favorable conditions for Nghe An in calling for investment in socio-economic development.
With the main purpose of building a regression model of residential land prices in Nghi Tan ward -Cua Lo town -Nghe An to create a map of residential land value areas, the main research methods used include:

Secondary data collection method
Secondary data such as land conditions, land use and management status, and land prices in Cua Lo Town were collected from 2015 to 2020. Some data on the socio-economic development report was provided by the Department of Natural Resources & Environment of Nghe An province.

Primary data collection method
Data collection was carried out for land users in the study area to collect land prices traded on the market and their influence on people's psychology. In order to evaluate the factors affecting the price of residential land in urban areas in Nghi Tan ward, the study calculates the number of samples according to research by Barbara G Tabachnick and Linda S Fidell (2013) with the formula n ≥ 8 x 39 + 50 = 362 samples (39 is the number of independent variables in the model shown in Table 1). In addition, in order to build a forecasting model to determine urban land prices in the area, the study conducted a survey of 200 samples of successfully transferred real estate in Nghi Tan ward, with a sample number based on a formula by Barbara G Tabachnick and Linda S Fidell (2013) n ≥ 8 x 18 + 50 = 194 samples (18 is the number of independent variables shown in Table 5). These surveys were conducted based on a combination of two methods, cluster and random. The cluster method was applied in selecting properties located on main roads and evenly distributed over the study area. For samples with the same route, the sample was selected based on factors affecting land prices, such as road width and land width. After conducting cluster classification, the authors randomly selected households for the survey.
Meanwhile, to assess the factors affecting the price of residential land, the research team conducted a staff interview survey using the "semi-structured" method. These interviews aimed to assess factors (39 factors in Table 1) on residential land prices of 32 land lots traded in the Nghi Tan ward. An additional five interviews were conducted with staff from the Department of Natural Resources and Environment, the Department of Natural Resources and Environment, the land registration office, a real estate investor, and a land broker.

Data analysis methods -Factor analysis:
Factor analysis is carried out in two stages.
Stage 1: Building and verifying the quality of the scale. Stage 2: Exploratory Factor Analysis (EFA) includes the following processes: Checking the appropriateness of the model, extracting factors, rotating factors and making decisions to keep, name the factor.
The suitability test for factor analysis is based on the following criteria. KMO (Kaiser-Meiser-Olkin) criterion is an indicator to consider sufficient sample size and correlation between variables. If 0.5<KMO<1, factor analysis is appropriate. The Bartlett test is a statistical quantity used to test the hypothesis that the variables do not correlate in the population, the analysis is only used when the hypothesis is rejected (p<0.05) and there is a correlation between the variables. The extraction of factors is usually based on Eigenvalues. Precisely, the study only retains factors with Eigenvalue>1.

Identifying some factors affecting the price of residential land
Through research, study, and a survey of land prices on the market in Nghi Tan ward, the results show that the price of residential land in this area is affected by many factors that cause the transaction price on the market to always diverge greatly from the price regulated by the State. The impacts ofthese factors' are not the same in terms of scale and level, but each factor affects a different aspect. Most of the factors in each group of factors impact the land price level in the ward. Although the rate of impact varies among factors, they contribute to creating the difference between the market price and the price regulated by the State in the locality. Based on the factors affecting land prices in the Town, this study presents a model consisting of 7 scales representing the factors affecting land prices, with 39 observed variables as shown in Table 1.

Checking the reliability of the scale (Cronbach's Alpha coefficient)
According to the analysis of testing the scale, the scale's overall Cronbach's Alpha value is guaranteed according to the set standards (Cronbach's Alpha > 0.6). In addition to the Cronbach Alpha standard, we also consider the correlation coefficient of the total variable (Corrected Item -Total Correlation). According to the standard, any coefficient < 0.3 is discarded. The specific test results are as follows: According to the analysis of the scale test, the overall Cronbach's Alpha value of all the scales has a Cronbach's Alpha coefficient of the population greater than 0.6. However, the correlation coefficient of the total variable Corrected Item -Total Correlation of 10 observed variables is not qualified when the value < 0.3 is: VT1, VT4, VT5, VT7, VT10, VT11, VT13, KT3, XH1, XH7. The remaining 24 variables have the standard correlation coefficients of all variables > 0.3. Thus, the scales formed according to the above analysis results ensure good quality of the research.

Exploratory Factor Analysis (EFA)
After testing the scale's reliability, the residential land value scale with 24 observed variables in the previous section continued to be analyzed for factors to determine the relevant factors and variables. By analyzing and verifying the scale's quality and the EFA model's tests, we identified seven groups of factors representing 24 measurement variables for factors affecting land prices in the ward. The results are summarized in Table 2.

Evaluating the influence of factors by regression model
The variables with statistical significance through the regression model tests include X1, X2, X3, X4, X5, X6, X7. These variables have the theoretical ability to influence the determination of land prices. Based on the normalized regression coefficients, these variables can be converted to percentages and arranged in order of priority from highest to lowest, as shown in Table 4.   Through the tests, it is possible to confirm that seven groups of factors affect the land price in Nghi Tan, which are considered the strong points. These groups of factors are statistically significant and ranked in order of importance, as shown in Table 4.

Defining variables for the model
In general, the price of a land parcel (which may include buildings on the land parcel) in the regression model depends on the characteristics of the land plot (like the location compared to the center and near the utility areas) and the value of the buildings on that land parcel (like the house area, number of bedrooms, number of floors). The goal of the model is to determine an equation for the price of the land plot based on the above characteristics that are closest to the market price. Models for determining prices can be simple models, such as linear models, or more complex models, such as exponential and logarithmic models. Thus, the selection of these models will be evaluated and applied according to each data set. In this study, the author used a multivariable linear regression model.
By surveying the current situation, as well as researching and evaluating groups of factors affecting urban land prices in Nghi Tan ward, the authors divided the 24 factors into seven groups. The study evaluates the influence of the selected groups of factors on land prices and excludes the variables that have a minor influence on land prices. Then, the topic proceeds to reformat the variables shown in Table 5

Model building
It is possible to build many models to determine the dependent variable Y for each model with different independent variables Xi. The question is which of the proposed models is the best. Usually, choosing the best model is based on the coefficient of determination R2. The higher this index, the better the model. However, it should also be noted that each regression model has many attributes. In order to evaluate model quality, it is necessary to consider those attributes simultaneously.
a. Land price regression model for the first time Running the regression model for the first time with all 18 independent variables, we get the resuls in Table 6. Through the analysis results in Table 6, it can be seen that most of the observed variables have signs as expected, and have coefficients R2 = 0.893, especially variables TT_LL, P_LY have negative signs contrary to expectations. The significant level of the variables KCTI_2, TT_LL, DIEN_NUOC, D_TICH, CR_MT, AN_TOT, and AN_BT is greater than the 5% significance level. Therefore, these variables are not statistically significant, and the study will exclude them from the model.

b. Land price regression model for the second time
After removing the variables P_LY, KCTI_2, TT_LL, DIEN_NUOC, D_TICH, CR_MT, AN_TOT, AN_BT from the model and rerunning the model with the remaining ten variables, we find ourselves with the results in Table 7.

Model comment and verification
a. Check the multicollinearity phenomenon First, the authors test the phenomenon of multicollinearity using the variance magnification factor VIF. In Table 7, in the column Variance Inflation Factor value, VIF < 10. Otherwise, VIF = 1/(1 -R2) = 1/(1 -0.896) = 9.61 < 10. Thus, the model exhibits no multicollinearity. Then, the authors test the phenomenon of multicollinearity by conducting sub-regression for each independent variable for the remaining independent variables, and the results are shown in Table 8. The above results show that the correlation coefficient R2 = 0.894 of the land price regression model is larger than the R2 of the sub-regression models. This means that these sub-regression models are meaningless, and there is no autocorrelation between the independent variables. Both ways of www.degruyter.com/view/j/remav vol. 30, no.1, 2022 testing multicollinearity above conclude that a multicollinearity phenomenon does not exist in the model.

b. Check for autocorrelation
According to the regression results, we have the value d = 1,878 in the range of conditions 1 < d < 3, so autocorrelation does not occur in the above regression model. c. Check for the phenomenon of variance change According to the test results, all significance levels of the independent variables are > 0.05, which means there is no residual variance. From there, it says the model is stable, and the data is reasonable. Thus, the test shows that the residual variance does not change. The variables with statistical significance through the regression model tests include KCTT; KCTI_1; CR_DUONG; CL_DUONG; H_THE; MTST_TOT; MTST_BT; MTKD_TOT; MTKD_KHA; QH.
From the above analysis, the authors conclude that the regression model satisfies the conditions of unbiased linearity; that is, there is no autocorrelation, multicollinearity, and variable variance. On the other hand, the variables in the model are all statistically significant. This means that the proposed model is quite suitable and can be applied in practice. The selected model therefore reads as: Using the above regression model to calculate land prices by the tool on ArcGIS software, the land price results are shown in the Column "Results calculated by the regression model" as shown in Table 9. Then, comparing the actual survey land price results (these properties are not the investigated points for building the land price regression model) with the land price results calculated by the regression model, these two results are not significantly different and fluctuate by about 10%.

Map of residential land value area
After building the land price regression model utilizing multivariate regression analysis, the authors used that regression model to calculate the bulk prices for land plots in the Nghi Tan ward. After interpolating to calculate the land price in the study area and dividing the price range for residential land according to the rules of natural zoning, the result is the formation of 5 sub-regions of residential land value in Nghi Tan ward, shown in Figure 1. Based on the built value zone map, the area of Nghi Tan ward is divided into five sub-regions of land value as follows. Sub-region 1 has shallow land value and fluctuates below 4 million VND/m 2 (<173.24 USD/m 2 ). Sub-region 2 has a 4-6 million VND/m 2 (173.24 -259.85 USD/m 2 ), while subregion 3 has a land price of 6-8 million/m 2 (259.85 -346.47 USD/m 2 ); sub-region 4 has a land price of 8-10 million VND/ m 2 (346.47 -433.09 USD/m 2 ), whereas the price of land in sub-region 5 is more than 10 million VND/m 2 (>433.09 USD/m 2 ). It can be seen that the sub-regions with the highest land value belong to areas located along National Highway 46 running through the ward. This is also the main traffic route of Nghi Tan ward.

Conclusions
The land price model in the study area consists of 10 independent variables, with the main influencing factors being the business environment, road width, and road quality. The study used 200 survey sample points in the Nghi Tan ward and obtained an R2 = 0.89. According to spatial data, the land price model may give more reliable results because the database can quantify several socio-economic and environmental factors. Research results have shown that the price of residential land built in the ward correctly reflects the significant difference between the price of land plots in problematic street areas and the price of land plots in streets far from the center. The price of land plots on the same street in a favorable location for business and trade will be higher than in less convenient locations. The price of land in a location with an extensive road surface is much higher than in other locations.
In short, mass valuation helps state management agencies in charge of land offer a price close to market price based on factors affecting the land price for each specific area and area. The biggest www.degruyter.com/view/j/remav vol. 30, no.1, 2022 challenge and impediment to the batch valuation approach lies in the complexity of the regression analysis technique. In addition, the data must be up-to-date and large enough. Therefore, building a database that stores complete information about the price and characteristics of each property being traded is necessary.