Modeling the pathway of breast cancer in the Middle East

This paper proposed an approach for the identification of mutation mechanisms of breast cancer in women in four member countries of the Middle East Cancer Consortium i.e. Egypt, Jordan, Cyprus and Israel (Arabs and Jews). We set up multistage models including both gene mutation and the clonal expansion of intermediate cells. We fit the data-set related to the incidence of female breast cancer in the four member countries. Our simulation results show that the maximum number of driver mutations of breast epithelium stem cells of Egyptian women is 13, whereas there are 14 driver mutations in the genome of stem cells of female patients in Jordan, Cyprus and Israel (Arabs and Jews). In addition, the 3, 10, 5, 5 and 4 stage models are the optimal ones for the tumorigenesis of females in Egypt, Jordan, Cyprus, Israel (Arabs) and Israel (Jews), respectively. The genomic instability is caused by first three driver mutations.


Introduction
Cancer is the uncontrolled growth of abnormal cells that exist in a human body. This disease has become one of the major cause of death all over the world. Breast Cancer is the most common type of cancer that affect women. It constitutes 10% of all new cancer cases worldwide and 1-2% of mortality among women populations throughout the world [1,2]. Recently, studies on female breast cancer have attracted the attention of biologists and geneticists [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]. Breast cancer emerges when an unregulated growth of abnormal cells begins in different parts of tissue. A tissue is a group of similar cells which have an identical general function. This may develop in milk ducts and glands of the breast. Every year, over one million new cases are diagnosed with breast cancer. However, recently many breast cancer deaths have been prevented. This is due to improvements in treatment and early detection through mammography. It is globally estimated that at least 400,000 patients are dying of this disease every year [18][19][20][21].
Breast cancer is considered a major public health problem in both developing and developed countries. The incidence of breast cancer differs in various regions. In developing countries, it is lower than in developed countries. They range between 22 and 71/100,000 of women population, but mortality rates are considerably higher [1,2,[22][23][24][25][26]. Nevertheless, breast cancer is the most commonly diagnosed cancer among women in the Middle East, because many women do not seek medical care promptly [22-24, 27, 28]. Therefore, the increase in the rate of incidences slows down in this region. The average age of women affected by breast cancer in the Middle East is around a decade earlier than the average of that in the west. Comprehensive studies show that the incidence rates of breast cancer are increasing among four countries of the Middle East Cancer Consortium (MECC).
The Middle East Cancer Consortium (MECC) is an example of regional cooperation in cancer registry. The comparison of the countries in the region helps to identify common risk factors and sharing cancer prevention and control strategies. Najla et al. [2] showed age-standardized and age-specific incidence rates of breast cancer among Lebanese women and compared them with those of the regional and western countries. Sherko et al. [29] demonstrated that the incidence of breast cancer among Iraqi Kurdish women is lower than some other countries of Middle East and the west. Hanhan et al. [30] suggested eight biological capabilities that should be acquired during the multistep development of human tumors, namely, self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, resistance of programmed cell death (apoptosis), limit-less replicative potential, sustained angiogenesis, tissue invasion and metastasis, reprogramming of energy metabolism, and evading immune destruction.
The process of carcinogenesis is considered as a result of the degeneration of a cell from the normal to a malignant state through a finite number of intermediate stages. The deterioration of a normal cell leads to the proliferation of transformed cells whose descendants generate a tumor. A normal cell may generate a tumor after it has gone through two mutations. How many driver mutations are needed for human breast cancer which are necessary for tumor diagnosis and therapy? Wood et al. [31] obtained an important result that less than 15 (< 15) driver mutations were likely to be responsible for genomic landscapes of human breast and colorectal cancers. The earliest two-stage model with clonal expansion of intermediate cells were developed by Moolgavkar et al. in age-specific incidence rates of breast and colorectal cancers [32][33][34]. Zhang and Simon in 2005 [35] extended the two-stage model to six-stage model with clonal expansion of intermediate cells in each compartment. The two to six stage models with clonal expansion of intermediate cells fit the age-specific incidence rates of breast cancer very well. Zhang et al. [36] in 2014 used the statistical inference (Chi-square test) to test both the two to six stage models with clonal expansion of intermediate cells and the selection between mutation and clonal expansion of intermediate cells. By using the method of statistical inference, Li et al. [37] obtained the maximum number of driver mutations for the breast tumorigenesis of females in United States of America.
The data pattern for the age-specific incidence rates of breast cancer in the USA [38] is different from that in the Middle East countries. In this work, our main objective is to investigate the main differences in the mutation mechanism of breast cancer of women among four member countries (Egypt, Jordan, Cyprus and Israel) of the Middle East Cancer Consortium (MECC) and compare them with the results obtained by Li et al. [37] for the United States of America (USA). In section 2, we briefly describe the data resources. In section 3, we present the mathematical model. Section 4 shows the inference method. Section 5 provides the simulation results. Finally, section 6 describes brief discussion of the results.

Data Resources
The data used in this work are the age-specific incidence rates of breast cancer for female patients in four member countries of the Middle East Cancer Consortium (MECC), i.e., Egypt, Jordan, Cyprus and Israel (Arabs and Jews) [1]. The source of Egypt data is the National Cancer Registry Program (NCRP) during the period 2008-2011 [39]. The Jordan Cancer Registry covers the population from 1996-2001 [1]. The Cyprus National Cancer Registry reports the cyprus data from 1998 to 2001 [1]. Israel data for Arabs and Jews are extracted from the Israel National Cancer Registry from the year 1996 to the year 2001 [1]. The collection of data by the MECC registries is guided by the Manual of Standards for Cancer Registration [40]. For our analysis, we used the incidence of breast cancer rates by age at diagnosis, date of birth and sex. Incidence rates are expressed as cases per 100,000 females. Figure 1 shows the age-specific incidence rates of female breast cancer in Egypt, Jordan, Cyprus, Israel (Arabs and Jews) and the USA. Most women diagnosed with breast cancer are over 45. Female breast cancer is rare under the age of 25. In the Middle East countries, the lowest age-specific rate is from age groups of 0-4 and 20-24, then increases gradually till it reaches the highest rate and then decreases. Whereas, in the USA, incidence rate increases gradually till it reaches the highest rate without decrease. The majority of female breast cancer cases were diagnosed in the USA and Israel (Jews). The data pattern for the age-specific incidence of female breast cancer is not similar in Middle East countries, and it is different from that in the USA. The peak of the incidence rate is at the age group of 60-64 especially for Egypt, Jordan and Israel (Arabs) and 70-74 for Cyprus, and Israel (Jews). However, the peak of the incidence rates is at the age group of 75-79 for the USA data.

Mathematical Model
The oncology processes should be a sequence of genetic events according to the findings of Hanahan et al. [30]. Figure 2 shows the schematic representation of multi-stage model with clonal expansion in each compartment of intermediate cells. We supposed that the growth of normal stem cells can be denoted by a logistic curve from age 0 to age 20. Our assumptions are according to ref. [35,36]. There are ten cells in each breast tissue at birth or at age 0 and 10 7 cells at the age of 20. There are 10 6 stem cells during age 80. Since 10 7 × e −0.0667(80−45) = 10 6 , in this work, it is assumed that the number of stem cells decreases after age 45 at a rate -0.0667 per cell per year because of the expression levels of hormone and menopause.
For a k-stage model (k = 2, 3, · · · ), let Y 1 (t),Y i (t)(i = 2, 3, · · · , k), and Y k+1 (t) represent, respectively, the number of normal progenitor cells, intermediate cells in compartment I i , and fully malignant cells per breast at time t. Ψ(y 1 , y 2 , · · · , y k+1 ;t) introduces the probability generating function at time t starting with a single normal cell at time 0 [34,[41][42][43]: Then, the Kolmogorov forward differential equation [33,36,43] for Ψ is defined by: where α i (t), β i (t), µ i (t) are the growth rate per cell per year, death or differentiation rate per cell per year, and mutation rate per cell per year in each compartment I i at time t, respectively. The hazard function (incidence function) is the instantaneous rate of the appearance of malignant tumor in a previously tumor free tissue [33,35]. Let T represent the time of the appearance of the first tumor. Then From equation (2), The hazard and probability are related by the given equation [33,35] where P(t) and h(s) denote the probability of breast cancer occurring at age t and the incidence rate of breast cancer at age s, respectively. According to ref [44], Then hazard function or the incidence function should be written in terms of Ψ and Ψ ′ as follows: where Ψ(1, 1, · · · , 1, 0;t) is the survival function and Ψ ′ (1, 1, · · · , 1, 0;t) is the derivative of survival function. By using equation (1), we get Furthermore the conditional expectation is given by By substituting from equations (7) and (8) in (6), we obtained Since cancer is a rare disease, the probability P (t) of the presence of a malignant cell at time t is close to zero.
Then, we have the approximation for one breast tissue: In this section, we proposed the multi-stage model of ordinary differential equations [35,36] that we used in our calculations, in order to estimate the expectation E[Y i (t)] (i = 1, 2, · · · , k), to estimate the number of Y i (t) before and after age 45.
For t ≤ 45, Y i (t)(i = 1, 2, · · · , k) should satisfy the following ordinary differential equations: At t > 45, Y i (t)(i = 1, 2, · · · , k) should satisfy the following ordinary differential equations: where K is the maximal number of normal progenitor cells so K = 10 7 , γ 1 = 1.0074, and γ ′ 1 = −0.0667. In addition, γ i and γ ′ i denote the growth rates of the intermediate compartments I i (i = 2, 3, · · · , k) before and after age 45. The growth rates change after age 45 in response to the changing hormonal environment of a women.

Chi-Square Test
In order to study some biological mutation mechanisms of female breast cancer such as impact of genetic mutation and clonal expansion of cells on cancer evolution, we have to hypothesize two of the following: Hypothesis one: There is a mutator phenotype in the progression of the tumor of breast cancer for multistage models, that is, the mutation rates per cell per year in i-stage model (i = 2, 3, · · · , k) should satisfy the relation µ 1 ≤ µ 2 ≤ · · · ≤ µ k .
Hypothesis two: There is no mutator phenotype in the tumorigenesis of breast cancer for multi-stage models, that is, the mutation rates per cell per year in the i-stage (i = 2, 3, · · · , k) model should satisfy µ 1 = µ 2 = · · · = µ k .
To study the mutation mechanism of female breast cancer, we use the statistical inference to test whether the optimal fittings for the 2-15 stage model are available for each country. In this work, Chi-square test is employed to test those optimal fittings. For these countries, there are insufficient data available for the age of 79 years. However, there are only 16 values for the age-specific incidence rates of women breast cancer patients in Egypt, Jordan, Cyprus and Israel. For the 15-stage model, there are 43 unknown parameters. Therefore, it is impossible for us to examine the 15-stage model because the amount of data is small. To produce enough data to implement the Chi-square test, we use the previous formula (4) to convert the data set of incidence data into data set for probability of female breast cancer. Since the probability values are always small, we handle this problem by amplifying P(t) with a suitable multiple (i.e. 10 3 ); we get 49, 47, 48, 46 and 49 data points for Egypt, Jordan, Cyprus, Israel (Arabs) and Israel (Jews), respectively, which are greater than the one which is adequate for performing the Chi-Square test. The goodness of fit is measured by Chi-square test which is applied within this relation where P i and P * i are the probability of breast cancer derived from the real data and that derived from numerical simulation, respectively. The chi-square goodness of fit test is used to accept or reject the null hypothesis. Our calculations are compared with those obtained by statistical distribution at 5% significant level.

Results
The key question in this study is how many driver mutations are needed in the oncology processes for women patients in the four countries of the MECC, i.e., Egypt, Jordan, Cyprus and Israel. A useful tool to answer this question is the use of statistical inference. The mortality may decrease if the breast cancer is diagnosed at early stages, by the implementation of proper awareness and screening programmes. In this paper, Chi-square test is employed to test whether the 2-15 stage models are available. The calculations were obtained using the numerical optimization routine, fminsearch in MATLAB. Table 1 exhibits the results of Chi-square test for the four countries calculated using equation (14), and comparing them with the values at 5% significant level for the 2-15 stage models under the first hypothesis. The fitting of models more than 15 stages given worse results than the 15-stage model.
In Egypt data, the Chi-square values of the multi-stage models starting from 2 to 13 are less than the values at 5% significant level, whereas the 14 and 15 stage models give Chi-square values which are greater than the values at 5% significant level. Therefore, we reject 14 and 15 stage models and accept 2-13 stage models. However, we reject only the 15 stage model and accept 2-14 stage models for Jordan, Cyprus, and Israel, which are similar to that of the USA [37]. It is clear that the 2-14 stage models fit the data very well and these models provide a very small error. The error is the difference between real data and simulated data. It is obvious from the optimal parameters for the net proliferation rates that the net proliferation rates of premalignant cells of the 2-14 stage models before age 45 are larger than those after 45 years of age, which are similar to that in the USA [37]. Net proliferation rates of intermediate cells with one mutation approach zero. This confirms the fact that normal stem cells need two mutations in tumor suppressor genes to become the intermediate cells that expand clonally to destroy the homeostasis of the cells, or the first mutation occurs in genetic instability genes. By the optimal parameters for the mutation rates, it is found that only the first three mutation rates are small; however, the mutation rates approach to one after three stages. Therefore, the genomic instability is caused by the first three driver mutations (see supplementary materials). Our results are consistent with results obtained by Li et al. for USA [37]. They have used stochastic multi-stage models to determine how many mutations are needed for the breast cancer processes. In addition, 3, 10, 5, 5 and 4-stage models are optimal models for Egypt, Jordan, Cyprus, Israel (Arabs) and Israel (Jews), respectively. This means that for tumorigenesis, only Egypt has the three-stage model which is the optimal stage for USA. We have given the optimal fittings for the 2-15 stage models in Figures 3-7. It is clear that models from stage 2 to 14 can fit all data very well except the data from Egypt. The models which are more than 13-stage cannot fit the data from Egypt. Figure 3 displays the optimal fitting for the 2-15 stage models of Egypt for the period from 2008 to 2011. We can see that 14 and 15 stage models cannot fit the data, so we can only accept the mathematical models with 2-13 mutations at the 5% significant level. The optimal fittings for the 2-15 stage models in Jordan from 1996 to 2001 are shown in Figure 4. We accept the 2-14 stage models at the 5% significant level and reject the 15 stage model at the 5% significant level. The age-specific incidence rates per 100,000 females for breast cancer in Cyprus during 1998-2001 are given in Figure 5. Two to fourteen stage models fit the data very well, while the 15 stage model provides a bad fitting with a large error. We plot the age-specific incidence rates for Israel (Arabs) during 1996-2001 in Figure 6. The 15 stage has led to a very large error, so we reject it and accept the 2-14 stage models. Figure 7 shows the age-specific breast cancer incidence for Israel (Jews) in 1996-2001. The 2-14 stage models fit the data very well, and thus we reject the 15 stage model.
We applied hypothesis two of constant mutation rate per cell per year to check the genetic instability. This hypothesis means that there is no mutator phenotype but only the clonal expansion of intermediate cells in the process of breast carcinogenesis. Mutator phenotype refers to the increase in mutation rates of breast cancer cells [45]. Table 2 shows the minimum Chi-square values of the goodness fit of the data for the 2-6 stage models. The fitting of the models in more than 6 stages given worse results than the 6-stage model. We found that models with two to four stages can fit Egypt, Jordan and Cyprus data and reject all other models that are more than 4stage of the selection of clonal expansion of intermediate cells in the process of breast carcinogenesis, because it gives a large error. Models with two to five stages can fit the data from Israel. However, models with more than three stages cannot fit the SEER data [37]. We conclude that the first two or three stages lead to genetic instability before generating a cancer cell. We plot the probability under hypothesis two for models with two to six stages for the four countries as shown in Figures 8-12. In figures (8-10), it can be observed that two, three, and four stage models provided good simulations with data, while five and six stage models cannot fit the data well. Therefore, we should accept the two-four stage models and reject the models with more than four stages. In Figures 11 and 12, the 2-5 stage models fit the data very well. So, we reject all other models of the selection of clonal expansion of intermediate cells in the process of breast carcinogenesis. This hypothesis gives the best results for the data from Israel (Arabs and Jews) because two to five stage models give us good simulation for the data from Israel.

Discussion
The purpose of this work is to investigate the main differences in the mutation mechanism of breast cancer of women in the four member countries (Egypt, Jordan, Cyprus and Israel) of the Middle East Cancer Consortium (MECC) and compare them with the results obtained by Li et al. [37] for the United States of America (USA). Li et al. [37] have used the stochastic multistage models to fit the data set from America on age-specific incidence rates of breast cancer by using several coupled ordinary differential equations derived from the Kolmogorov backward equations. We used different mathematical models to estimate the age-specific incidence rates of breast cancer for women in four member countries (Egypt, Jordan, Cyprus and Israel) of the Middle East Cancer Consortium (MECC). In addition, we have used statistical inference to test the number of mutation stages that are required for female breast cancer. We estimated the optimal parameters by using the numerical optimization routine, fminsearch in MATLAB, to calculate the minimum Chi-square value. We made comparisons among four countries to know the differences in the process of breast carcinogenesis. The regional comparisons can shed light on the health care factors and population, to identify common risk factors and share cancer prevention and control strategies. The differences between countries reflect the variation in lifestyle, genetic influences, environmental exposure, and the availability of mammography screening. The maximum number of driver mutations is 13 in Egypt. However, by goodness fit of data to the probabilities of female breast cancer occurrence in Jordan, Cyprus, and Israel (Arabs and Jews), the maximum number of driver mutations for three countries is found to be 14. These results support the results of Wood et al. [31] that are less than 15 mutations required to develop breast cancer and colorectal cancer. The optimal models are 3, 10, 5, 5 and 4-stage for Egypt, Jordan, Cyprus, Israel (Arabs) and Israel (Jews), respectively. The maximum number of driver mutations is 14 in the USA, and the three-stage model is the optimal model [37]. Therefore, Egypt has the same optimal model as United States of America (USA), but differs in the maximum number of driver mutations for breast cancer. Genomic instability is a characteristic of almost all human cancers. By optimal parameter values for mutation rates of multi-stage models, it is found that only the first three mutation rates are small. Therefore, the mutation of instability gene occurred due to the first three mutations (see supplementary materials). This confirms the fact that normal stem cells need two mutations to become intermediate cells that expand clonally to destroy the homeostasis of the cells, or the first mutation occurs in genetic instability genes. The mutations rates and clonal expansion of intermediate cells depend on menopause. Therefore, hormone therapy such as tamoxifen has a good effect on female patients before menopause. By testing hypothesis two, we have checked the selection between the mutator phenotype and the clonal expansion of intermediate cells. We conclude in the case of three countries (Egypt, Jordan and Cyprus), that if the driver mutations are less than five stages, we cannot reject the selection of both the mutator phenotype and the clonal expansion of intermediate cells. If the driver mutations are larger than four stages, we should reject the selection of the clonal expansion of intermediate cells. As a matter of fact, if the driver mutations are larger than four stages, carcinogenesis does not need the mutator phenotype because all mutation rates after four stages approach to one. However, in Israel (Arabs and Jews), we should accept the two-five stage models at the 5% significant level and reject models that are more than 5-stage. According to our results, hypothesis one gives the best results for the four countries compared to hypothesis two because 2 to 13 or 14 stage models are available and give us a good simulation and a small error.   7 The simulation of the probability of age-specific incidence rates of all races per 100,000 females for breast cancer in Israel (Jews) from 1996 to 2001 under hypothesis one. The blue line represents the model and red stars represent data from Israel (Jews). Fig. 8 The simulation of the probability of age-specific incidence rates of all races per 100,000 females for breast cancer in Egypt under hypothesis two that all mutation rates are equal. The blue line represents the model and red stars represent data from Egypt. Fig. 9 The simulation of the probability of age-specific incidence rates of all races per 100,000 females for breast cancer in Jordan under hypothesis two that all mutation rates are equal. The blue line represents the model and red stars represent the data from Jordan. Fig. 10 The simulation of the probability of age-specific incidence rates of all races per 100,000 females for breast cancer in Cyprus under hypothesis two that all mutation rates are equal. The blue line represents the model and red stars represent the data from Cyprus. Fig. 11 The simulation of the probability of age-specific incidence rates of all races per 100,000 females for breast cancer in Israel (Arabs) under hypothesis two that all mutation rates are equal. The blue line represents the model and red stars represent data from Israel (Arabs). Fig. 12 The simulation of the probability of age-specific incidence rates of all races per 100,000 females for breast cancer in Israel (Jews) under hypothesis two that all mutation rates are equal. The blue line represents the model and red stars represent data from Israel (Jews).