Competing Risks Models for an Enterprises Duration on the Market

Abstract Research background: Enterprises are an important element of the economy, which explains that the analysis of their duration on the market is an important and willingly undertaken research topic. In the case of complex problems like this, considering only one type of event, which ends the duration, is often insufficient for full understanding. Purpose: In this paper there is an analysis of the duration of enterprises on the market, taking into account various reasons for the termination of their business activity as well as their characteristics. Research methodology: A survival analysis can be used to study duration on the market. However, the possibility of considering the waiting time for only one type of event is its important limitation. One solution is to use competing risks. Various competing risks models (naive Kaplan-Meier estimator, subdistribution model, subhazard and cause-specific hazard) are presented and compared with an indication of their advantages and weakness. Results: The competing risks models are estimated to investigate the impact of the causes of an enterprises liquidation on duration distribution. The greatest risk concerns enterprises with a natural person as the owner (regardless of the reason of failure). For each of the competing risks, it is also indicated that there is a section of activity which adversely affects the ability of firms to survive on the market. Novelty: A valuable result is considering the reasons for activity termination in the duration analysis for enterprises from the Mazowieckie Voivodeship.


Introduction
Enterprises are an important element of the economy, and their condition greatly influences development both on a regional and global level. The importance of the condition of companies (especially those from the SME sector) is the reason for the broad interest in the problem of their ability to survive on the market (Boratyńska, Włoczewska, 2013;Kisielińska, 2016;Markowicz, 2018Markowicz, , 2019Mikulec, 2017;Ptak-Chmielewska, 2013). In the study of this phenomenon, on the one hand, it is necessary to identify the factors that impact on the enterprises duration on the market. However, for a complete understanding of the issue of enterprises survival, various reasons for the termination of their business operations should also be considered.
A survival analysis, which is a set of statistical methods allowing determining the distribution of non-negative random variables, can be successfully used to study the duration of enterprises on the market. However, the basic survival analysis methods allow considering the waiting time for only one type of event to occur. In the case of complex phenomena, which include economic processes like the survival analysis of firms, this is a significant limitation of this method. In this situation, the solution is to extend survival analysis methods by applying competing risk models. In these kinds of models, the particular forms of business activity termination are considered as various risks.
The aim of this work is to explain the impact of the characteristics of the firms (as place of residence, ownership form and sector of activity) on the distribution of enterprises duration on the market as well as to analyze the duration distributions for different types of business activity termination (like legal bankruptcy or the suspension of business). The application of competing risks models allows this kind of study to be carried out. In addition, the results of various approaches to competing risks are compared with an indication of the main advantages and

Analysis methods
This section outlines the applied methodology. First, the basics of a survival analysis are presented. Then, there is explained the idea of the competing risks models that allow analyzing the differences in duration distributions for particular realizations of final event (various risks).

Basics of a survival analysis
A survival analysis is a set of statistical methods dedicated to study non-negative random variables, especially duration distributions. Let T be a random variable describing the waiting time for the occurrence of the expected event. Then the survival function S, describing the probability that the duration is longer than t, is defined as follows (Klein, Moeschberger, 2003): where F is a cumulative distribution function, f is a density function of duration distribution.
In turn, the risk of occurrence of the expected event along the entire duration distribution can be a study by an analysis of the hazard function that is expressed in the following form (Klein, Moeschberger, 2003): The Kaplan-Meier estimator is a non-parametric maximum partial likelihood estimator for the survival function. Let t i be time when at least event occurred, n i be a number of observations at risk and d i -a number of events that occurred at time t i . Then the estimator is given by (Kaplan, Meier, 1958): The above basic method of a survival analysis (like Kaplan-Meier estimator) can find its application in terms of enterprises duration on the market. However, it allows considering the waiting time for only one type of event to occur. In case of complex phenomena, like an enterprises survival, it is necessary to take into account several forms of the events occurrence.
This requires extending basic survival analysis methods by applying competing risk models (Satagopan et al., 2004).

The extent of the Kaplan-Meier estimator (naive KM estimator)
The basic and intuitive approach to considering competing risks in a survival analysis is to construct the Kaplan-Meier estimator for various causes of the events occurrence. This kind of estimator is called the naive Kaplan-Meier estimator and for each reason j belonged to the set of competing risks J is as follows (Austin, Lee, Fine, 2016): The estimator is determined separately for each event j excluding other competing risks. For this reason, the estimator often overestimates the probability of the occurrence of event j (Austin et al., 2016). Moreover, a significant limitation is also the necessity to make an assumption about the independence of competing risk in order to obtain a total distribution (including all of the competing risks):

The idea of a subdistribution
Due to the limitations of the naive Kaplan-Meier estimator, finding other methods to consider competing risks is required. The idea of a subdistribution (Fine, Gray, 1999), also known as Fine-Gray model, could be one of them.
Let us assume that T is the waiting time for any of the competing events to occur j and ɛ defines the type of the event. Then for each event j the density function f j may be defined as follows: The function f j is called the subdensity function and it determines the subdistribution.
If a finite number of competing risks is considered, the density function for the distribution including all of the competing events can be obtained as follows: It is important that in this case it is not necessary to make an assumption about the independence of competing risks during defining the total distribution.
Similarly to the density function, we have for the cumulative distribution function F (CIF): where ˆ( ) j F t is the cumulative subdistribution function.

The non-parametric estimator of CIF
Considering subdistributions for competing risks, the non-parametric estimator of CIF is often used (Pintilie, 2006): is an estimator of the hazard for the risk j (d ij is the number of entities for that the event j occurred at time t i ) and Ŝ is the Kaplan-Meier estimator.
The important advantage of the estimator is the fact that no assumption about the independence of competing risk is required.
Moreover, for every j there is: where the equality occurs in the absence of competing risks or in general in the case of the independence of events. However, for a censored sample (like in this case) the independence of the events cannot be assumed. That is the reason why the identification of marginal distribution does not allow identifying the distribution of a multidimentional variable for censored samples (Peterson, 1976).

The two concepts of the hazard function
The hazard function can also be defined for the subdistribution. There are two concepts for determining hazard for competing risks. The way of considering the other competing risks than the one currently being investigated j, is the main difference between the two hazard models.
While determining a cause-specific hazard, events that occur due to j are considered, provided that the entity has survived until time t. If the duration ends for a reason other than j the duration time for this entity is considered as censored. This is partly inconsistent with the consideration of risks occurring for reasons other than j those is used for the cumulative subdistribution function. The hazard function in this case is as follows (Pintilie, 2006): In turn, in case of subdistribution hazard, we assume that if the event occurs, but for a reason other than that currently being investigated j, we do not consider the entity's duration as censored in a normal way. This entity is still being considered. As a result of this, there is the same number of events to occur for reason j, but more entities at risk are considered. Hence the hazard values in this approach may be underestimated relative to the cause-specific hazard.
It is significant that in this model there are the same approach and assumptions to explanatory variables and competing risks in hazard function and subdistribution CIF (Fine, Gray, 1999).
The hazard function can then be given by (Gray, 1988):

The conditional probability (CP)
The conditional probability (CP) function determines the probability that event j occurs by time t, given that no other event has occurred by t (Pepe, Mori, 1993). In the case of two competing risks ε = 1 and ε = 2, it is defined as follows:

Results of the empirical analysis
This section is devoted to presenting the results of the empirical analysis applying competing risks models. First, the database is described. Then, the cumulative distribution functions and conditional probabilities are estimated for each of the considered reasons of an enterprises business activity termination. Finally, the results of the regression subhazard and cause-specific hazard models are provided and compared.

Database
In this work data from the National Official Business Register (REGON) for enterprises from the Mazowieckie Voivodeship are applied (the database state at the end of 2017).
The analysis relate to a cohort of firms that started their business in 2007. The database consists of a sample of 32,788 enterprises. The applied variables with their description are presented in Table 1.

PKD2007
section of activity by PKD2007 (Polish Classification of Activities) (A -agriculture; forestry, hunting and fishing; B -mining and quarrying; C -manufacturing; D -electricity, gas, steam, hot water and air conditioning manufacturing and supply; E -water supply, sewerage, waste management and remediation activities; F -construction; G -wholesale and retail trade, repair of motor vehicles including motorcycles; H -transportation and storage; I -accommodation and food service activities; J -information and communication; K -financial and insurance activities; L -real estate activities; M -professional, scientific and technical activities; N -administrative and support service activities; O -public administration and defence, compulsory social security; P -education; Q -human health and social work activities; R -arts, entertainment and recreation activities; S -other service activities; T -households as employers, goods-and services-producing activities of households for own use; U -extraterritorial organisations and bodies) Source: own elaboration.
In addition, the data from REGON was supplemented with macroeconomic indicators like GDP growth rate, inflation and unemployment in individual years (data source: Statistics Poland). In order to include macroeconomic indicators for particular periods of time in the REGON data, the episode splitting method was used. In this method if a variable changes its value during an episode, the episode is split up at that point of time t k (k = 1, ..., s), creating two new sub episodes (with constant values of that variable). The episode can be split up several times (from 1 to s) if it is required. All sub episodes, except from the last one, are considered as right-censored. The last sub episode ends at the same state as an original episode (Blossfeld, 2014). As a result, we get a data set that can be used to build models for time-constant variables (such as parametric hazard models).

Comparison of CIFs, CPs and S functions
In the first stage of the research, the estimators of function S, CIF and CP are applied for each for considering competing risks (termination of the activity without a known reason, suspending the activity and legal bankruptcy). The naive Kaplan-Meier estimator (4) for Based on the results, we can conclude that the most frequently occurring cause of enterprise liquidation on the market, observed in the REGON database, is the termination of a business without a known reason. Suspension is less significant, whereas bankruptcy in legal terms is the least frequent.
The estimators of CP, CIF and survival function satisfy the following: wherein it is known that equality occurs in the absence of competing risks or when the considered risks are independent (but it cannot be assumed in the case of the censored sample).
The obtained estimators for empirical data satisfy theoretical inequality (14). Moreover, it can be seen that the estimators values are not equal for any of the considered causes of termination of the activity (see Figure 1). Therefore, we cannot assume the mutual independence of competing risks in this case. Basic models like the naive Kaplan-Meier estimator are therefore insufficient to study the duration of enterprises on the market. It seems reasonable to look for more complicated competing risk models but that do not require the occurrence of event independence.   (Figures 3-5).

Empirical CIFs with the characteristics of the enterprises
In the case of explanatory variables such as headquarters ( Figure 4) and section of the activity (Figure 5), we can observe a clear separation of curves for particular competing risks. The occurrence of differences in distributions for particular cases was verified applying a statistical test for the equality of distributions dedicated to investigate the differences between CIF curves (Gray, 1988). For all cases, the p-value is much lower than the significance level α = 0.01, therefore we accept the alternative hypothesis (the differences in duration distributions occur for the studied risks). The highest probability of an event occurring is for termination without a known reason, then for suspension and bankruptcy. In turn, for the form of ownership ( Figure 3) still the most significant role is played by termination of activity without a known reason, but we cannot conclude that suspension is always more popular than bankruptcy in the legal terms. It is worth noticing that suspension is realized mainly by natural persons.   Analyzing the results for particular competing risks and without taking them into account, it can be seen that the shape of the curves for the dominant risk (Figures 3-5) is similar to the total duration distribution form ( Figure 2). Noteworthy, is that there is an intersection of the curves observed only for the termination of the activity without a known reason and an increase of the probability of this event occurring for enterprises localized outside the administrative borders of Warsaw between the third and fourth year of a firms existence. This phenomenon also occurs for total distribution, but for smaller duration values, which may be a result of the impact of other competing risks.

Cause-specific and subdistribution hazard regression models
Two different regression models of hazard are estimated to investigate the effect of explanatory variables. The cause-specific and subdistribution hazard models are applied.
The way the competing risks are treated in both approaches is the difference between them (Zhang, 2017).
In Tables 2 and 3 -for  As technical comments to the results, we can see that the following values have been treated as a reference: for section PKD2007 -section A, for the form of ownership -organizational entity without legal personality, and for the place of residence -outside the administrative boundaries of Warsaw. It should also be noted that in the case of the cause-specific hazard model for some sections, the coefficients with the expected level of significance could not be estimated. This is due to the low number of enterprises representing these sections in the database. In addition, the p-values for the coefficients of these variables are much higher than the level of significance, so no conclusions can be made on their basis. Considering the results for macroeconomic indicators, it is also worth emphasizing that the time dependence of these variables was taken into account applying the episode splitting method (a brief description of the method in subsection 3.1), which allowed to meet the assumption of the proportionality of hazard.
For both estimated hazard models, model verification was carried out applying the convergence criterion of Newton-Raphson iterative algorithm. In addition, Schoenfeld residuals were analyzed. The residuals oscillate around zero. The systematic time dependence of residuals is not observed for any of the variables. This proves that the assumption of the proportionality of hazard is met.
Analysing the results, we can see that the estimated values of coefficients are a little smaller in the case of the subdistribution hazard model (Table 3). This is for the reason of different approaches to competing risks considering (other risks as censored observations in the causespecific hazard model) (Haller, Schmidt, Ulm, 2013). However, it is worth emphasizing that the results of both of the estimated models are consistent, i.e. the same firm characteristics for particular causes of termination of the activity are treated as increasing hazard to a greater extent (i.e. dangerous to the enterprise).
We can conclude that for activity termination without a known reason the most dangerous PKD2007 section is I (accommodation and food service activities), for suspending section K (financial and insurance activities) and for legal bankruptcy section O (public administration and defence). However, one must be aware of the differences in the specifics of doing business in these sectors. The surprisingly high coefficient for section O (for legal bankruptcy) does not necessarily mean that these enterprises are particularly at risk in general (for other risks the coefficients are smaller), but for example that in this sector full bankruptcy proceedings must be carried out in the event that the enterprises terminate their operations. However, this does not change the fact that for companies from other sectors legal declaration of bankruptcy is less common. Moreover, we can see that for each of the considered causes of termination a natural person is the form of ownership that most clearly causes the increase in hazard values. It may be related to the fact that for small enterprises (which most often are characterized by this form of ownership) the dynamics of market activities is generally greater. Based on the results of regression, the place of residence of the enterprise headquarters only slightly affects hazard in the case of termination of activity without a known reason (Warsaw slightly increases the risk).
For both other risks this variable is not statistically significant.
The results regarding macroeconomic indicators are similar for termination of activity without a known reason and suspension of activity (both for the subdistribution and causespecific hazard models). Higher GDP growth reduces the risk of termination of operations, and rising inflation has an adverse effect. The impact of unemployment is insignificant (but this variable is statistically significant). On the other hand, in the case of legal bankruptcy, a growth of GDP increases hazard and increasing inflation lowers it (although the impact is insignificant in both of these cases). We observed a significant impact of unemployment when this growth increases hazard. It should be noted here that legal bankruptcy is represented by the least numerous group in the sample and, on the basis of the other results, it can be seen that it behaves in a different way than the other two considered competing risks.

Conclusions
This paper is focused on the presentation of competing risks models and applying a selected number of them to investigate an enterprises survival on the market. The most common competing risks models are discussed with an indication of their advantages and weaknesses. It is noteworthy that in the case of empirical data, when we do not have enough knowledge about the investigated phenomenon to make a priori assumptions, the essential advantage of the models is a lack of the need to apply assumptions about the mutual independence of competing risks. This problem occurs in the case of the naive Kaplan-Meier estimator, while the solution is the subdistribution concept proposed by Fine and Gray.
This approach is used to estimate CIFs. The obtained results show a large diversity in the distribution of the enterprises duration on the market, both due to the reason for termination of activity and the characteristics of the firms. It is worth noticing, that distribution without competing risks is similar to the distribution of the dominant risk, which is in agreement with intuition. The use of competing risks also allows examining the distribution for other causes of liquidation. In each of the considered cases, the form of ownership that is associated with the shortest duration of the enterprises is a natural person. The dominant risk is the termination of activity without a known reason, although the suspension of business operations by a natural person as an owner is characterized by higher values of CIF than other forms of ownership in the case of termination of activity without a known reason.
In addition, the application of the regression models allows indicating the differences between them, as well as a more detailed analysis of particular causes of termination of business activity by enterprises. The results are consistent with those obtained earlier in the study. For each of the considered risks, the features of enterprises that have a negative impact on their duration on the market are indicated. It was emphasised, that in the analysis of the results, a broader view and awareness of the specifics of business activities for various business entities are needed.
The termination of business activity without a known reason and suspension of activity behave differently from legal bankruptcy, which emphasizes the completely different nature of these forms of enterprise and thus their failure on the market.
In summary, it is important to emphasize that competing risks models allow for more accurate results and a more comprehensive understanding of the phenomenon. Moreover, if models that do not require assumptions about the mutual independence of competitive risks are chosen, then it is possible to estimate the distribution of enterprises duration on the market (both for reasons and the total distribution) without making a priori assumptions. This is extremely valuable in the case of complex problems that we have to deal with in microeconomic research concerning for example the situation of enterprises on the market.
The obtained results are valuable because of a lack of analysis of duration on the market for firms from the Mazowieckie Voivodeship with considering various reasons for their termination of activity. It is worth considering performing an analogous study for cohorts of firms that started their business activity in other years than 2007 and then a comparison of their results could be analysed.