Price Analysis and Forecasting for Bitcoin Using Auto Regressive Integrated Moving Average Model

This paper investigated Bitcoin daily closing price using time series approach to predict future values for financial managers and investors. Daily data were sourced from CoinDesk, with Bitcoin Price Index (BPI) for 5 years (January 1, 2016 to May 31, 2021) extracted. Data analysis and modelling of price trend using Autoregressive Integrated Moving Average (ARIMA) model was carried out, and a suitable model for forecasting was proposed. Results showed that ARIMA(6,1,12) model was the most suitable based on a combination of number of significant coefficients and values of volatility, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). A two-month test window was used for forecasting and prediction. Results showed a decline in prediction accuracy as number of days of the test period increased; from 99.94% for the first 7 days, to 99.59 % for 14 days and 95.84% for 30 days. For the two-month test period, percentage accuracy was 84.75%. The study confirms that the ARIMA model is a veritable planning tool for financial managers, investors and other stakeholders; especially for short-term forecasting. It is however imperative that the influence of external factors, such as investors’/influencers’ comments and government intervention, that may affect forecasting be taken into consideration.


Introduction
Rapid advancement in digital technology and increasing capacity of computer systems (in terms of speed and data storage) has created an opportunity for Digital Signal Processing (DSP) techniques in engineering to be applied to the finance industry. One of the applications of DSP in Finance is in prediction of future market value of a business, through the use of historical financial data whose quantity is usually massive and requires absolute objectivity in its calculations (Nepal, 2015). Thus, financial managers can make decisions based on statistical analysis of financial time series and the modeling of its behavior; the aim being to perform predictions and systematically optimize investment strategies which has become fundamental to successful investments (Feng and Palomar, 2016).
Bitcoin (BTC) is the world's largest cryptocurrency and its emergence as a veritable digital currency which has captured global attention, in just over a decade, has been unexpected. It is a form of peer-to-peer electronic cash system, without the need to reveal one's identity for a transaction to happen and without a middle man (Nakamoto, 2008). Despite its modest beginning in 2009 when it was launched at $1.00 value, it has grown into tens of thousands of dollars in value. It is measured by market capitalization and amount of data stored on its blockchain (Shen et al, 2018) and it offers lower transaction fees than traditional online payment mechanisms.
As with all businesses and trades, the COVID-19 pandemic has had an impact on trading of Bitcoin and its price. On March 11, 2020, the World Health Organization (WHO) declared COVID-19, a disease caused by a strain of Coronavirus, a global pandemic (Ghebreyesus, 2020). Data from CoinDesk (Coindesk, 2021), an Organization involved in the monitoring and publishing of Bitcoin data was used to observe Bitcoin price behaviour, before and during the pandemic. Bitcoin price index from January 2016 to May 2021 is shown in Figure  1.
It was observed that the increase in price of Bitcoin was gradual from inception to the beginning of 2017 when its price was about $1,000.00. From then, it witnessed steady increases until December 2017 when it increased sharply and peaked at $19,116.979 unit price on December 17, 2017. Thereafter, the price witnessed a decline to minima of $3,952.448 on November 30, 2018 but reversed the downward movement and increased steadily to $5,800.209 on March 13, 2020 (two days after declaration of the pandemic). Despite the pandemic, it was observed that the price of Bitcoin experienced a steep incline and peaked at $57,128.643 on February 22, 2021. This is an increase of almost 1000% within a year. On the one-year anniversary of COVID-19, being March 11, 2021, the closing price of Bitcoin was $56,915.170. The increased interest in Bitcoin and the subsequent price surge can be attributed to investors using it as hedge, being protection against financial loss, (Demir et al, 2020) due to uncertainties raised by the pandemic; and the subsequent national restrictions and lockdowns which led to the suppression of major world economies and global recession. Other factors (CNBC, 2021;Tepper, 2021) that coincidentally contributed to the rise of Bitcoin price during the period include: i. Institutional Adoption of Cryptocurrencies Increasing adoption of cryptocurrencies by some traditional financial institutions (e.g., BNY Mellon, Fidelity, Mastercard) which was seen as an acknowledgement of the future viability of digital assets.
ii. Halving of Bitcoin 'Halving' (Masters, 2019) of Bitcoin in May 2020 which is an event that happens every four years when the reward that bitcoin "miners" receive for mining gets cut in half as a built-in mechanism to slow the creation of new bitcoins and limit bitcoin's supply. It is an event that reminds investors of bitcoin's scarcity thus leading to increased demand.
iii. Adjustment of View Revision of criticism and softening of views of major Wall Street investors/players about cryptocurrencies. iv. Acceptance by Major Payment Platforms Acceptance of cryptocurrencies by major payment platforms (PayPal and Square) with its announcement that it will soon allow buying, holding, and trading of bitcoin and other cryptocurrencies, on its platform which has contributed to the surge.

v. Pandemic-related Stimulus Programs
Stimulus programs by governments around the world have created fear of inflation with investors looking for alternative assets to invest ClosingPriceUSD in, thereby leading to high demand for Bitcoin. It is believed that government monetary aid strengthens the appeal of Bitcoin.
However, by middle of May 2021, there was a dramatic drop in Bitcoin price and this continues till date. The rapid growth in Bitcoin price and its volatility continues to pique the interest of researchers (Demir et al, 2020;Amjad and Shah, 2017;Roche and McNally, 2018;Jang and Lee, 2018;Baur and Dimpfl, 2020;Fauzi et al, 2020).
Various methods have been developed and applied in time series analysis. These include ARIMA model (Box and Jenkins, 1976;Brockwell and Davis, 2002) which uses the current value of the stationary time series based on its values at previous times and errors in values at previous time periods; Artificial Neural Network (ANN) model, which has the ability to learn patterns from time series data and uses these to model the problem and deduce solutions (Zamani et al, 2012;Selvamuthu et al, 2019) and hybrid models which combine the strengths of the ARIMA and ANN models (Merh et al, 2010;Wang, et al, 2012). While models based on neural networks have been found to present higher accuracy in some cases, the ARIMA model is selected for its robustness, simplicity, ease of application and high accuracy for short term forecasting.
The Auto Regressive Integrated Moving Average (ARIMA) model, also known as the Box-Jenkins methodology (Box and Jenkins, 1976;Brockwell and Davis, 2002) in financial analysis, was used in analyzing Bitcoin time series data and forecasting. The ARIMA model is a combination of the autoregressive (AR) model and the moving average (MA) model with the stationarity (differencing or integration) of the time series taken into account. Stationarity (Feng and Palomar, 2016) is an important characteristic for time series analysis which describes the time-invariant behavior of a time series and is much easier to model, estimate, and analyze. Stationarity of a time series is a major assumption in ARIMA modeling and since market prices by nature are non-stationary, stationarity must be ensured by differencing the time series (Brockwell and Davis, 2002) before forecasting can be done. The ARIMA model is simple but nonetheless powerful and it aims to describe autocorrelations in time series data (Brockwell and Davis, 2002;Ariyo et al, 2014). Essentially, the future value of a variable is based on a linear combination of past values of observation (lags) and past errors. Lags are very useful in time series analysis because they indicate the tendency for values to be correlated with previous copies of itself. The ARIMA model can be represented as ARIMA( , , ) model in Equation (1) or ARIMA( , , ) model in Equation (2) respectively.
In this paper, Bitcoin daily closing price time series spanning January 2016 to May 2021 (as represented graphically in Figure 1) was analyzed using MATLAB (R2018a); and forecasts made. This is of particular importance due to the popularity of Bitcoin and volatility of its price. Forecast values can be useful to investors in developing profitable trading strategies. For government regulators and policy makers, it helps to formulate appropriate policies. Overall, it assists relevant stakeholders to take informed decisions.

Experimental
In this section, the methodology used for this work is described. This includes steps such as data collection and data analysis.

Data Collection
Bitcoin daily closing price time series data from Jan 2016 to May 2021 (as represented in Figure 1) was obtained from (Coindesk, 2021). The Bitcoin data comprises four variables: Closing Price, 24h Open, 24h High and 24h Low; all in USD. The daily closing price (USD) was chosen to represent the price of the index to be predicted since it reflects all the activities of the index on a trading day.

Data Analysis
To determine a suitable model, the following steps as described in subsequent paragraphs, were carried out on Bitcoin price time series: 1. Series inspection for determination of stationarity 2. Differencing to ensure stationarity 50|Annals of Science and Technology 2021 Vol. 6(2) 47-56 This journal is © The Nigerian Young Academy 2021 3. Modeling through the 4-step process of i) Model Identification ii) Parameter Estimation iii) Diagnostics iv) Forecasting Inspection of the time series must confirm if it is stationary or otherwise. This is done by visual inspection and plots of the partial autocorrelation function (PACF) and the autocorrelation function (ACF) of the series which is a measure of the relationship between a variable's current value and its past values. Auto correlation summarizes the relationship between the values of the same series at previous times and its plot by lag is called the auto correlation function ACF. Partial autocorrelations summarizes the relationship between an observation at prior time steps with the relationships of intervening observations removed and its plot by lag is called partial autocorrelation function, PACF (Brockwell and Davis, 2002).
Stationarity is further confirmed by the Augmented Dickey-Fuller test which is based on a null hypothesis that there is a unit root in the data (Brockwell and Davis, 2002). In general, a probability value (p-value) of less than 5% indicates rejection of the null hypothesis and proves stationarity while a p-value of greater than 5% indicates acceptance of the hypothesis and hence non-stationarity. Non-stationary data as a rule can be unpredictable and therefore cannot be modelled or forecasted. It must be converted through the process of differencing which can be said to be the number of times that raw observations are differenced. If a time series is made stationary, any model that is inferred from it can be taken to be stationary, therefore providing a valid basis for forecasting (Al-Shiab, 2006).
Model identification involves using the ACF and the PACF (as explained above) of the differenced time series to plot correlograms from which coefficients ( , ) which give the best fitting are determined. The number of times the time series was differenced to ensure stationarity, (Brockwell and Davis, 2002) is also taken into consideration. Hence the coefficients ( , , ) are determined.
Parameter estimation involves determining the number of significant coefficients in the model that is being considered, volatility (variance) values, Akaike Information Criterion (AIC) value, Bayesian Information Criterion (BIC) value and the Ljung-Box test value. The AIC is an estimator of prediction error and evaluates how well a model fits the data it was generated from and the relative amount of information lost; the less the loss, the higher the quality of the model. The Bayesian Information Criterion is another criterion for model selection among a finite set of models. The model with the lowest value of AIC, BIC and volatility is considered the most suitable (Anderson, 2008). The Ljung-Box test is also a unit root test.
Model diagnostics involves running residual ACF to ensure that all time series data is captured by the selected model. This is indicated by all coefficients being within the significance bounds. If this is not the case, parameters must be re-estimated. However, in re-estimating, parsimony must be taken into consideration. This is because parsimonious models give better forecasts than over-parameterized models. Thus, in choosing the most suitable ARIMA model, it is important to keep parsimony in view.
When the model has been confirmed as suitable with the best coefficients, forecasting of future prices of Bitcoin from April 2021 to May 2021 was done using MATLAB Econometrics Tool and was validated by plotting forecasted values against actual series for comparison. Prediction accuracy (MAPE) was also plotted.

Results
By visual inspection (Figure 1), the Bitcoin closing price time series is not stationary. Non-stationarity is further confirmed by the sharp drop-off of the Partial Autocorrelation Function (PACF) plot at lag 1 ( Figure 2a) and the very slow decline of the Autocorrelation Function (ACF) plot (Figure 2b).
The Augmented Dickey-Fuller (ADF) test (Table 1) is applied to the Bitcoin daily closing price time series and it can be observed that the ADF did not reject the null hypothesis and has a p-value of 0.7756 which is greater than the significance level value of 0.05; thus indicating non-stationarity. Therefore, it is necessary to difference the series to obtain stationarity (Brockwell and Davis, 2012).  (Table 2) accepting the null hypothesis with a p-value of 1.0000e-03 which is less than the significance level value of 0.05. All these indicate stationarity. Therefore, series became stationary with first difference. With stationarity confirmed, the process for ARIMA modelling of the Bitcoin daily closing price time series was carried out. The following likely models were identified and investigated: ARIMA(2,1,2), ARIMA(2,1,3), ARIMA(2,1,6), ARIMA(3,1,2), ARIMA(3,1,3), ARIMA(3,1,6), ARIMA(6,1,2), ARIMA(6,1,3) and ARIMA(6,1,6). Each model had its parameter values and goodness of fit determined using the combination of number of significant coefficients, volatility, Akaike Information Criteria (AIC) and Bayesian Information Criterion (BIC) values. See Table 3. As a starting point, ARIMA(6,1,6) was conditionally selected based on highest number of significant coefficients and lowest values of volatility and AIC; but must be confirmed by running residual diagnostics to ensure that all its coefficients are within the significance interval.
Running Residual ACF (Figure 5a) on ARIMA(6,1,6) showed that there were outliers at lags 10, 12 and 14 which indicates that not all information of the time series has been captured in the model and there was therefore a need for model re-estimation. Re-estimation involved taking the outliers mentioned above into consideration and re-running residual diagnostics. ARIMA(6,1,12) model was found to present a better performance and its residual diagnostics showed that it has all coefficients located within the confidence interval (Figure 5b). In addition, it has lowest values of volatility and AIC (Table 3)

. Thus, of
This journal is © The Nigerian Young Academy 2021 Annals of Science and Technology 2021 Vol. 6(2) 47-56 |51 all the models considered, ARIMA (6,1,12) is the most appropriate model for this time series.
The Ljung-Box test for residual correlation, squared residual ACF ( Figure 6) shows all coefficients outside the 95% confidence interval signifying that there is no correlation between the coefficients and thus ARIMA(6,1,12) is a good model for forecasting. Table 4 shows actual versus forecast values for Bitcoin daily closing price in April 2021 from which prediction accuracy (MAPE) was derived. It can be observed that ARIMA(6,1,12) gives very close forecast values for the first seven days (April 1-7, 2021), with a prediction accuracy of 99.94%. Prediction accuracy however decreases for longer forecast periods; dropping to 99.59% for 14 days forecast (April 1-14) and 95.84% for 30 days (April 1-30) forecast period. These are all considered good results; being above 95% accuracy. In other words, close predictions resulting in higher accuracy values were obtained for shorter prediction periods. This was the case until after April 18 when a significant dip was experienced and was subsequently followed by a continuous decline.
In addition, forecast for a two-month (April-May 2021) window period ( Figure 7a) and the prediction accuracy ( Figure 7b) were presented. It was observed that as number of forecast days increased, MAPE decreased; having a value of 84.75% at the end of the period. This confirmed that ARIMA modelling is better suited for short-term predictions and less so for longer-terms.
Bitcoin daily closing price time series with forecast values for April -May, 2021 and April -June, 2021, respectively, are shown in Figures  8a and 8b. The model for the selected time periods predicted an upward movement of Bitcoin price. This is in agreement with predictions of some market analysts (McGlone, 2021;Bambysheva et al, 2021;White, 2021). A series of events in May 2021, however, led to an unexpected decline in the fortunes of Bitcoin. Specifically, Bitcoin plummeted to nearly $30k after reaching a record high of more than $64k in April 2021. This can be ascribed to external factors which have been broadly categorized as follows:

Influencers' Comments
Comments of influential persons/investors that directly impact prices. For example, the tweet of Elon Musk on May 12, 2021 in which he said Tesla will no longer accept Bitcoin as payment method due to concerns over its energy usage, leading to loss of billions of dollars in value of the crypto market. Another tweet on June 4, 2021, suggesting 'breakup' with Bitcoin led to a 4.3% decline in price. (Browne, 2021).

Government Intervention:
For instance, Chinese Government's ban on May 18, 2021 whereby domestic banks and financial institutions were forbidden from supporting Bitcoin mining and transactions due to energy and money laundering concerns (BBC, 2021; CBS, 2021).

Other influences
Bitcoin price fluctuations occurred for various other reasons including but not limited to media coverage, actions of Speculators and availability of Bitcoin.
While these factors would have been mostly reflected in the historical data, not all influences can be captured and due to unforeseen events, this can lead to variances between forecasted and actual values of Bitcoin. This highlights the importance of including external factors into forecast models.

Conclusion
Rapid advancement in digital technology has created an opportunity for DSP techniques in engineering to be applied to the Finance industry; such as price forecast of financial products using financial time series. Various methods have been developed and applied in time series analysis which include ARIMA, ANN and Hybrid models which combine the strengths of the ARIMA and ANN models. While models based on neural networks have been found to present higher accuracy in some cases, the ARIMA model is selected for its robustness, simplicity, ease of application and high accuracy for short term forecasting.
In this paper, we have conducted the forecast of Bitcoin daily closing price using the ARIMA model in order to assist investors in their investment decisions. This is because price forecast of Bitcoin constantly attracts attention due to its direct monetary advantage. MATLAB was used for model identification, parameter estimation, diagnostics and forecasting and ARIMA (6,1,12) model was selected as the most suitable based on number of significant coefficients, values of volatility, AIC and BIC, and having all coefficients within the significance interval for residual diagnostics. Prediction accuracy or mean absolute percentage error (MAPE) was obtained for a twomonth (April-May 2021) test window. ARIMA (6,1,12) model gave very close forecast values for the first seven days of forecast (April 1-7, 2021) with a prediction accuracy of 99.94%. This however decreased for longer forecast periods; dropping to 99.59% for 14 days forecast period (April 1-14) and 95.84% for 30 days (April 1-30) forecast period. Despite the reduction, these are considered good results; being above 95% accuracy. Thus, this reinforces the ease of application and suitability of ARIMA models for short -term forecast only; as against more complex models such as artificial neural network models. The study confirms that the effect of the global pandemic on Bitcoin price was positive with surge in its value which can be attributed to investors using it as hedge against uncertainties raised by the pandemic and the subsequent national restrictions and lockdowns which led to the suppression of major world economies and global recession. The time series for Bitcoin prices with forecasted values showed an upward trend of daily closing price but this is contrary to actual market value. This variance can be attributed to the effect of external factors on Bitcoin prices such as tweets/comments of influential persons (e.g. Elon Musk), government intervention (e.g. China's ban on institutional support for Bitcoin mining and transactions) and other factors (e.g. media coverage and activities of speculators) which all combined to weaken prediction.
In conclusion, even though the ARIMA model has been shown to present efficient capability in generating short-term forecasts; other external factors and influences as stated above must also be taken into consideration for a more robust forecast.