The Influence of the Accuracy of Statistical Data on the Results of a Classification of Eu Countries in Terms of Innovation

Abstract Research background: The article attempts to include the accuracy of statistical data in a synthetic evaluation and classification of EU countries in terms of innovation. Purpose: The aim of the article is to evaluate an influence of the accuracy of statistical data on a classification of EU countries in terms of innovation. Research methodology: The research employed diagnostic variables determining the innovation of EU countries and a methodology proposed by the European Commission in the European Innovation Scoreboard 2019. The influence of the uncertainty of the measurement of the diagnostic variables on the Summary Innovation Index of EU countries was evaluated. In order to do this, a procedure employing the Monte Carlo method was proposed. Results: Taking into account the uncertainty of the measurement of variables in the evaluation of the innovation of EU countries resulted in qualifying one of the countries to another innovation group. Novelty: The article draws attention to an important but often neglected problem related to the accuracy of statistical data used in research, and the evaluation of their influence on the calculation of a value of synthetic measure (based on the innovation of EU countries).


Introduction
Nowadays, innovation is one of the most important factors of economic development in the world. The ability of countries' economies and economic entities to create, implement and adopt innovations affects their competitiveness. That is why the European Commission pays a lot of attention to innovation policy, treating it as a tool for strengthening the EU economies. A methodology of measurement and evaluation of the innovation of EU countries is quite difficult and complex. The problem is credible, comparable and also accurate innovation indicators for individual countries.
The aim of the article is to evaluate the influence of the accuracy of statistical data on the classification of EU countries in terms of innovation. The evaluation of innovation of EU countries was based on the Summary Innovation Index (SII) which was calculated with 27 indicators presented in the European Innovation Scoreboard 2019.
The evaluation of the influence of the uncertainty of measurement of diagnostic variables on SII values for EU countries and their classification was made using the Monte Carlo method.

Innovations -review of definitions
The word 'innovation' from the beginning is derived from the Latin word 'innovare' which means into new (Costello, Prohaska, 2013). Many definitions of innovation can be found in literature which pertain to a product, entrepreneurship, organization, the whole economy, etc.
Innovation is production or adoption, assimilation, and exploitation of a value-added novelty in economic and social spheres; renewal and enlargement of products, services, and markets; development of new methods of production; and the establishment of new management systems. It is both a process and an outcome (Edison, Bin Ali, Torkar, 2013).
According to G.D. Sardana (2016) innovation is more than just gaining knowledge it is continuous learning and that the knowledge also has to be translated into actions.
The most popular definition of innovation is presented in the Oslo Manual.
The Oslo Manual distinguishes between innovation as an outcome (an innovation) and the activities by which innovations come about (innovation activities). This edition defines an innovation as "a new or improved product or process (or combination thereof) that differs significantly from the unit's previous products or processes and that has been made available to potential users (product) or brought into use by the unit (process)" (Manual, 2018).
Yet the innovation of economy is an ability and willingness of the economic entities to constantly search, and use in practice, the results of scientific research, research and development works, new concepts as well as ideas and inventions; an ability to improve and develop the technologies of material and non-material (services) production; to implement new methods and techniques in organization and management; to improve and develop an infrastructure and body of knowledge (Frankowski, Skubiak, 2012).

Indicators and method of research of the innovation of countries
For 18 years, the European Commission has been conducting research on the innovation of EU countries in order to present the results in a form of the European Innovation Scoreboard (EIS).
The EIS distinguishes between four main types of indicators -Framework conditions, Investments, Innovation activities and Impacts -and ten innovation dimensions, capturing in total 27 indicators (Table 1). 5. Transforming data that have highly skewed distributions across countries.
6. Calculating re-scaled stores (after correcting for outliers and a possible transformation of the data) for all years are calculated by first subtracting the Minimum score and then dividing by the difference between the Maximum and Minimum score. The maximum Continuation of Table 1 re-scaled score is thus equal to 1, and the minimum re-scaled score is equal to 0. For positive and negative outliers, the re-scaled score is equal to 1 or 0, respectively.

Quality of statistical data -theoretical basis
There is no doubt that the statistical data used in the research should be of high quality.
In the literature, three characteristics of statistical data quality are mentioned. These are: usefulness of data for users' needs, validity and accuracy. The most important is the accuracy of data, expressed by the approximation of statistical information to a true value, which is the one which could be obtained if the data for all units of a researched group was collected and processed without errors (Kordos, 1987;Kordos, 1988).
At present, the analysis of the influence of errors resulting from the inaccuracy of statistical data is overlooked in the related literature, and data taken from the Statistical Yearbook or appropriate data bases is treated by users as accurate. To limit data errors, statistical offices perform corrective calculations, but still such data cannot be deemed as accurate. It is bound to a way of obtaining statistic information. Random errors (connected mainly with sample choice) and non-random errors (e.g. related to data processing, or discretization errors, i.e. truncating significant digits after the decimal point) are the source of errors in statistical data.
In engineering sciences the issue of measurements and their accuracy was precisely specified in relevant documents (JCGM/WG 1, 2008;International Vocabulary…).
According to a definition, measurement error is an arithmetical difference between a measured value and a true value. Error value can be written down as an absolute (a) or where: X ∆ -absolute measurement error, X δ -relative measurement error, The accurate values are known in case of a model study or a simulation test, where the variable's value is set, and its change results from conversions or other mathematical calculations, or interactions of simulated external factors.
If a true value is unknown (as in the case of all measurements) and only an estimate of a true value is known, therefore, the uncertainty of determining measured values, expressed by the dependency provided below, is being analyzed (JCGM/WG 1, 2008): which can be also written as: where: X R -limits of the range in which the real value of a variable is, u Bc -estimated value of a variable's uncertainty.
Uncertainty of measurement is parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.
The parameter an uncertainty of measurement may be, for example, a standard deviation called standard measurement uncertainty (or a specified multiple of it), or the half-width of an interval, having a stated coverage probability (Balazs, 2008).
In Figure 1 the essence of a variable's uncertainty or synthetic measure is presented. where: u (x i ) -standard uncertainties of measuring input quantities, u c (y) -combined standard uncertainty, In economic research the most occurring dependencies are the product or quotient of two quantities. For the product or quotient of two quantities, the uncertainty of a determined variable comes down to the sum of the uncertainties of these variables.

Results
During the investigation on the level of innovation in EU countries some questions were posed: Could the statistical data, which is the basis for the Summary Innovation Index (SII), for individual EU countries, be deemed as accurate, i.e. devoid of any errors and inaccuracies?
And if the data is not ideal, how do these errors influence the final results? Might there be a case that due to the inaccuracy of data, one of the objects could be incorrectly allocated to a group of a certain level of innovation?
In order to calculate with what accuracy the value of the variable has been determined, it is necessary to estimate the uncertainty range in which it is located. The uncertainty value of each of the variables making up the synthetic measure (SII) was calculated (or estimated) on the basis of knowledge about the manner in which the given variable was measured. The most accurate statistical data are those that come from registers maintained by government institutions, hence relatively low uncertainty values have been adopted for such variables. However, when estimating the total uncertainty, the uncertainty related to the sample selection error and the uncertainty resulting from the discretization or truncation of significant digits (e.g. to two decimal places) were taken into account and calculated from the relationship (4). The uncertainty of variable units consists of two uncertainties. For example, the variable: 2.2.1 R&D expenditure in the business sector (% of GDP) consists of the uncertainty of determining the variable: R&D expenditure in the business sector (1%) and GDP (1.5%). The total uncertainty value is determined from the dependency (5). The following values of total uncertainty were adopted for individual variables: From the essence of the uncertainty of measuring variables, it appears that the actual values of innovation indicators for the analyzed objects (countries) are within the range around the nominal value of the variable (read from the statistical year). This situation makes it uncertain whether a given object has been correctly classified. It may be that a true value of a variable (or synthetic measure) differs so much from the nominal value that a given country should be assigned to a different group in terms of innovation, than was done on the basis of data from the statistical yearbook (without taking into account the uncertainty of the measurement of the variable). Using the relationship (6), it is possible to calculate the probability of a situation in which due to a change in the variables' value (the true value differs from the estimated one), there will be a change in the country's assignment to another group). Based on the estimated total uncertainty of each of the 27 variables for each object (country), 1,000 values were drawn at random in such a way that they met the conditions of normal distribution. Using such determined data, 1,000 synthetic measures were calculated for each object, which were used to calculate the standard deviation. The value of the standard deviation was considered as the uncertainty value of the synthetic measure u c , and after multiplying by the extension factor of 1.96 it allowed to obtain uncertainty at the significance level of 0.05.
The nominal value of the synthetic measure was considered to be the measure calculated in the "classical" way -based on nominal variable values.
A problematic situation was found in one case ( Figure 2    Luxembourg, Belgium, the United Kingdom, Germany, Austria, Ireland, France and Estonia were included in the Strong Innovators group which also had a very good situation in terms of innovation. 14 EU countries created the Moderate Innovators group; whereas Bulgaria and Romania achieved a low level of innovation by creating the Modest Innovators group.
In the study of complex phenomena, which also includes the innovation of countries, it is worth paying attention to the variables' accuracy underlying the construction of the Summary Innovation Index. The method of obtaining statistical data causes that they are burdened with uncertainty as to their true values. Taking into account the uncertainty of variables for SII does not change the ordering of objects, but only allows one to specify the confidence interval for the obtained results.
Analyzing SII values and uncertainty intervals, it was found that there was only one case of moving a country between the created innovation groups. 3 This was influenced by the use of data with low uncertainty values and their large numbers (27), which resulted in making an average of aggregate uncertainty. For a smaller number of variables (studies for 10 dimensions of innovation) there were already more collision cases.
It seems that it is worth being aware of the impact of variable uncertainty on the final value of the synthetic measure, which quite often influences decision making.