The Application of Random Noise Reduction By Nearest Neighbor Method To Forecasting of Economic Time Series

Abstract Since the deterministic chaos appeared in the literature, we have observed a huge increase in interest in nonlinear dynamic systems theory among researchers, which has led to the creation of new methods of time series prediction, e.g. the largest Lyapunov exponent method and the nearest neighbor method. Real time series are usually disturbed by random noise, which can complicate the problem of forecasting of time series. Since the presence of noise in the data can significantly affect the quality of forecasts, the aim of the paper will be to evaluate the accuracy of predicting the time series filtered using the nearest neighbor method. The test will be conducted on the basis of selected financial time series.


Introduction
The nearest neighbor method originated from the theory of nonlinear dynamical systems and was developed to predict the future values of a time series, but it can also be used to reduce random noise in the time series. The real time series (s t ) consist of the deterministic part (y t ) and the stochastic part (ε t ), which describes the level of random noise in the time series. The reduction of random noise allows to determine the properties of a time series (y t ) based on the analysis of a series of observations (s t ). Literature offers a number of methods used to reduce the level of random noise in dynamical systems and the main benefit of using these methods seems to be the improvement in time series forecasting capabilities.
In this article the hypothesis that the time series which are filtered by the nearest neighbor method give more accurate predictions than the unfiltered time series was verified. The aim of the paper was to assess the effect of random noise reduction using the method of nearest neighbors on the accuracy of the predictions obtained by the application of the method of the largest Lyapunov exponent and the nearest neighbor. The empirical research was based on the actual data of economic nature -the financial time series set up with the logarithms of daily returns on closing prices of selected stock exchange indices, equity prices, foreign exchange rates and commodity prices. The data cover the period from 3.01.2000 to 8.26.2013. To carry out the necessary calculations the author wrote programs in the Delphi programming language and an Excel spreadsheet.

The random noise reduction by the nearest neighbor method
The real time series can be described as dynamical systems (X, f ) with the following equations 1 : -function describing the real dynamics of the system, -measuring function generating time series observations s t of the dynamical system, -state of the unknown original multidimensional system at the moments t and t + 1 respectively, s t+1 -an observation of the time series at the moment t + 1, η t -dynamic noise inside the system, ξ t -measurement noise.
In short, the real time series can be written in an additive form: where: s t -an observation of the time series at the moment t, z t -the deterministic part of the time series, ε t -the stochastic part of the time series (random noise consists of observation noise, system noise or their combination).
The main causes of observation noise in the time series are measurement errors and rounding errors, while the causes of system noise are exogenous factors affecting the dynamics of the system, which are impossible to identify 2 .
The basis of the nearest neighbor method which is used to reduce random noise is the reconstruction of the state space 3 . This reconstruction allows to restore the state space of the dynamical system based on the one-dimensional time series observations. The elements of the reconstructed state space are delays vectors, so-called d-stories, in the following form: where: s t -an observation of time series at the moment t, The algorithm for determining the value y n , 1 < n < N of the time series (s 1 , s 2 , s N ) using the nearest neighbor method is as follows: 1. For estimated embedding dimension d 4 and delay time τ = 1 we create the delay vector in the following form: (5) so that the filtered observation s n is one of the central coordinates of the vector d t s .
2. We determine k nearest neighbors (in Euclidean distance sense) of the vector d t s in the following form: 3. Based on the designated nearest neighbors we estimate the value y n as the arithmetic average of the first coordinates of the nearest neighbors: In the nearest neighbor method, the forecast for (N + 1)-th element The weights are chosen so that the closer neighbors have a greater impact on the obtained forecast. Accordingly, the weight of the i-th neighbor is estimated by the formulas 6 :

The largest Lyapunov exponent method LEM
Lyapunov exponents are defined as limits 7 : where ( ) 0 , x n i µ are the eigenvalues of the Jacobi matrix of mapping f n , f n is an n-fold submission of function f, and f is the function that generates a dynamic system.
The Lyapunov exponents measure the rate of divergence or convergence of neighboring trajectories, i.e. the level of chaos in a dynamic system. The largest Lyapunov exponent allows to specify the extent of a change (an increase or a decrease) in the distance between the current state x N of the system and its nearest neighbor x i in the evolution of the system, and also estimate the distance between the vectors x N+1 and x i+1 . Based on this distance the value of the forecasts where Δ 0 is the initial distance between two initially close (in the Euclidean distance sense) points of the reconstructed state space, n ∆ is the distance between these points after n iterations and max λ is the largest Lyapunov exponent.
the predicted value s N+1 can be determined from the relation (15) as the solution to the following equation:

The empirical research
The author investigated the logarithms of daily returns on world's stock exchanges indices: Vistula (VST) and Wawel Castle (WWL); and prices of the following commodities: crude oil (SC), silver (XAG) and gold (XAU); using the following formula: where: s t -an observation of time series, traded in the period 3.01.2000 r. -26.08.2013 r. Data come from the archive file soft the website stooq.com.
In the first stage of the study, we estimated the parameters of the state space reconstruction for the selected time series using the delays method: the time delay τ was estimated by means of the autocorrelation function ACF and the embedding dimension d -with the false nearest neighbor method FNN (Table 1). Then, the analyzed time series underwent the process of random noise reduction by the nearest neighbors method for the estimated embedding dimension d and the delay time τ = 1. The filtered time series were designated with the symbol NameofTimeSeries_red and for those time series we also carried out the reconstruction of state space. Table 1 gives the parameters of the reconstruction d and τ for the analyzed time series before and after filtration. Source: own work.
In the next stage of research we estimated forecasts for the selected time series using the nearest neighbor method -NNM -and the method based on the value of the largest Lyapunov exponent -LEM. In order to determine the forecast using the NNM method, we took into account the ( ) NNM_D. The assessment of designated forecasts was made with the following metrics: d -the average forecast error ME, q -the average absolute forecast error MAE, σ-root mean square error RMSE, and I -Theil coefficient. Tables 2 and 3 show the prediction errors over the entire verification range for the forecast horizon equal to 10, obtained by the NNM method in four above cases estimating the weights of the nearest neighbors. Table 4 shows the prediction errors over the entire verification range for the forecast horizon equal to 10, obtained by the LEM method for revalued (LEM+) and undervalued (LEM-) forecasts.
Analyzing the results obtained using methods NNM_A, NNMB, NNM_C and NNM_D (Tables 2 and 3 Based on the data in Table 4 it can be concluded (as in the case of forecasts obtained by the nearest neighbors method), that the reduction of random noise allowed to improve the accuracy of the obtained predictions. The ex-post errors of the time series which were filtered by the nearest neighbor method are lower than the forecasts obtained for the unfiltered series.
The exceptions are the time series JPY, SC, SPX and XAG for the method LEM+ and JPY, NKX and SPX for the method LEM-, for which the reductions increased the mean absolute forecast error q and the root mean square error σ. After applying random noise reduction for more than Comparing the results obtained using the largest Lyapunov exponent method (Table 4) it can be seen that in the entire verification range the overestimated forecasts proved to be more accurate for most of the analyzed series. However, the analysis of forecast errors received by NNM methods indicates that forecasts using the weighted average of the first coordinates of the nearest neighbors (MMN_B, MMN_C and MMN_D) proved to be more accurate than forecasts based on the arithmetic mean of the first coordinates (method MMN_A) for most of the analyzed time series.

Conclusions
In the paper we studied the effect of random noise reduction by the nearest neighbors method on the accuracy of the forecasts of selected financial time series. The research results show that for the majority of analyzed time series the ex-post forecast errors obtained for the series that used noise reduction are much lower than the forecasts obtained for the unfiltered series.
In addition, basing on the selected financial time series we compared two methods of forecasting: the nearest neighbor method (four versions) and the largest Lyapunov exponent method (two versions). The research showed that three versions (B, C, D) of the nearest neighbor method using a weighted average to estimate forecasts were the most effective. These forecasts were characterized with the smallest values of forecast errors for the majority of the analyzed financial time series.
It should be noted that the values of forecasts determined by these methods to a large extend depend on the adopted metric, the weights of the nearest neighbors, the values of parameters of the reconstructed state space and the number of nearest neighbors. Thus, it seems that in order to improve the quality of the forecasts, additional studies should be performed with changed parameters.