Evidence for Phase Transitions in Replication Fidelity and Survival Probability at the Origin of Life


 Highly accurate self-replication of cellular phenotype is a requirement for biological evolution. I previously investigated the degree of self-replication fidelity needed in a viable, evolving population of living cells. Here I present a phase transition approach from non-living chemical complexity to evolving living creatures and illustrate the necessary non-continuity of whatever process led to the origin of evolution. A theoretical approach to the relationship between replication fidelity, survival probability and the capacity to grow and evolve is presented consistent with previous data from experimental simulations. The implications for the origin of life to include explanations for non-continuity are discussed.


Evolution and Inheritance
Inheritance is a crucial component of the evolutionary process, as stressed by Darwin: "But if variations useful to any organic being do occur, assuredly such individuals thus characterized will have the best chance of being preserved in the struggle for life; and from the strong principle of inheritance they will tend to produce offspring similarly characterized" (1 [emphasis added]). In many modern discussions the underlying mechanisms of evolution seem to be thoroughly explained by mutation, which is how new alleles are produced; and by evolutionary processes such as natural selection, which is how the frequencies of the various alleles change over time in a population. In contrast, inheritance is often assumed or taken for granted. However, "Along with metabolism, life is based on another equally-important fundamental principleinheritance, also described as information that copies itself" (2). Evolution requires that alleles be inherited in order for the "over time" part of the definition to hold. It is the inherited alleles (the variations in specific genes) that determine all the characteristics of the organism (the phenotype), which in turn are the targets of natural selection.

Evolution and Phenotype Self-Replication
In all modern life, there is a threshold of replication fidelity below which cells cannot survive. At high mutation rates modern organisms undergo an error catastrophe from which they cannot recover, as has been shown in the cases of viruses (3,4), aging (5), in evolution (6) and in macromolecular replication in early life (7,8), The existence of this upper limit for mutation rate has been used as an antiviral strategy (3). It is also part of the relationship between mutation rate and evolution (9). The idea that evolution can occur at any starting level of replication fidelity and survival probability is illustrated in Figure 1A. The parameter P can be defined as the summation of all the ingredients of evolutionary fitness (such as energy capture and usage, metabolic efficacy, homeostasis, etc.) into a single parameter of the probability of survival of each cell in a population between cell divisions. Replication fidelity (F) is defined as the probability that the phenotype of a cell is replicated in both daughter cells with complete accuracy. This allows for a quantitative measure of replication fidelity from 0 to 1. The mutation rate is a function of 1 -F. The arrows in Figure 1 indicate the evolutionary process leading to steady increases in the values of replication fidelity (F) and survival probability (P). The literature on origin of life generally assumes that the situation depicted in Figure 1A is correct, since it is often assumed that at some point during abiogenesis, evolution by natural selection emerged spontaneously from protolife chemistry, regardless of inherent fitness for survival and replication (10)(11)(12). However, given the necessity for a minimal replication accuracy in modern life and a principle of uniformity, it appears legitimate to presume a minimal threshold of replication fidelity in early life as well -the burden of proof seems to be on those who would deny this assumption.
One way of addressing this issue is to consider that this threshold represents a sort of phase transition, in the sense that values below it cannot be increased by any known evolutionary process. At the phase transition continued growth of a population and evolution become possible. A simple version of such a phase transition is shown in Figure 1B, with a threshold of 50% probability of perfect replication. Below that value no population growth or evolution is possible. Since the other major determinant of overall fitness is the probability that an early cell will survive long enough to reproduce (P), it is worth asking whether such thresholds and phase transitions might also apply to this parameter. For example, in Figure 1C, we see an illustration of thresholds in both F and P at 0.5, leaving a relatively small area from which future evolution is possible.

The Continuity Principle
The possibility of thresholds and phase transitions in biological evolution appears to contradict what is known as the continuity principle, or the "…general Darwinian principle… [that] evolution must proceed via consecutive, manageable steps, each one associated with a demonstrable increase in fitness" (13,14). The same idea is sometimes referred to as gradualism. Figure 1A can be said to reflect perfect continuity, while Figures 1B and 1C violate that principle. While Darwin envisaged a strict requirement for very small steps of increasing fitness, we now know that there are many exceptions to this rule. Major evolutionary changes can occur that lead to saltation -a sudden and dramatic increase in fitness - (15)(16)(17)(18), violating the continuity concept. An example of saltation in evolution is the incorporation or endosymbiosis of small, highly energy efficient bacteria into larger cells (19,20). There is strong evidence that there were several whole genome duplication events at critical junctures in evolutionary history, including the origin of the vertebrates (21,22). The rapid radiation of the Cambrian explosion (23-25) also seems to have diverged from the strict continuous process of gradual increases in fitness. Mechanisms for these violations of the continuity principle are now part of standard evolutionary biological theory (26)(27)(28).

Continuity During the Origin of Life
When we try to understand the origin of life, evolutionary continuity is an important issue (29). There are many required steps to reaching a living cell that have remained mysterious for many decades (30). One of the most difficult of these is the origin of biological evolution, which allows for the transition from chemical to biological complexity (18) and the ensuing development of biological processes such as replication, metabolism, and energy usage. The obvious conundrum is how evolution could have evolved before evolution existed. As Eugene Koonin states: "The crucial question, then, is Below that threshold, no growth or further evolution is possible, independent of P. C. Phase transition at F=0.5, P = 0.5. The threshold for life is a function of both P and F, at a fixed value.
how was the minimal complexity attained that is required to achieve the threshold replication fidelity" (31). The idea that a continuity exists in the growing complexity of chemical assemblies leading to the emergence of fully biological features, including the biochemical mechanisms required for evolution, has not been tested or rigorously investigated.

REVIEW OF PREVIOUS WORK -The Continuity Principle and the Evolution of Replication Fidelity
In a recent paper (32), I investigated one aspect of continuity in the evolution of life, using a stochastic statistical model to try to understand whether continuity is a reasonable hypothesis for the development of replication accuracy. The model produced simulations of cell division and population growth with variations of the two key parameters, P and F (defined above). It is important to note that the model is based on results starting from a single cell among a population. The model confirms that if either P or F for this ancestral cell are < 1, each succeeding generation will contain cells with diverse values for each parameter, and of course this diversity will continue to expand with each generation. In the paper, the analysis followed every possible starting value of P and F to determine the fate of these diverse populations with time. This investigation confirmed the idealized scenario portrayed in Figure 1C as correct in principle and provided details on quantitative aspects of the thresholds in the values of P and F necessary for the onset of biological evolution. Here I present the data from that earlier study in the form of a phase transition diagram as shown in Figure 2. An illustration of the model used for the simulations in ref. 32 as well as for the present theoretical investigation is shown in Figure 3. If the probability of perfect replication (F) is <1, then the model assumes a probability of 1-F that a change in the value of P will occur in the daughter cells. See below for more details. The averaged cell trajectory data obtained from simulation experiments (32) was fitted to a growth curve to allow for determination of the growth rate constant K -a measure of overall fitness -as a function of P and F. The curve allowed for the determination of an empirical formula: As discussed in reference 32, the second term in this equation is related to the probability of death (1-P) of one of the 4 descendants of any pair of cells in the previous generation.
The minimum values of F and P to allow for K > 1 (or survival of the population) can be calculated from eq. 1: The lower the value of F, the higher the initial survival probability P must be to maintain a positive growth rate. For low initial P values (<0.4), the evolution of increased replication fidelity requires large jumps in F that suggest a saltational evolutionary path. Furthermore, the existence of continuity from low P to higher values depends on the simultaneous value of F. Even at perfect replication fidelity (F = 1), the minimal level of P must be at least 0.55, (consistent with the simulation results reported in ref 32) in order to allow for any possibility of growth, survival, and evolution. The present communication seeks to expand the original study and provide a more theoretical basis for the conclusions, as well as to investigate the role of the mutational effect magnitude (M) and the deleterious to beneficial mutation ratio (D). A broader analysis of the implications of these results is presented in the Discussion section.

Methods
The following assumptions were made for the model: 1. Cells divide into two new daughter cells, whose degree of accurate inheritance of phenotype from the parent cell is determined by the parameter F. 2. There is a period of cell growth between each division, during which cells either perish or survive, determined by the parameter P. 3. The two probabilities of survival and fidelity of replication (P and F) can be varied independently and each one of them can take on any value between 0 and 1.0. For modern life based on double-stranded DNA, each daughter cell has an equal chance of mutation, as each strand is replicated. The model applied in this paper, however, is neutral with respect to the molecular basis of cell replication. For example, if the genetic material were single stranded, or some other process (such as budding rather than cell division) occurred at early stages of protolife, then it is possible that only one daughter cell would use a copy of the genetic information, while the other daughter cell would be identical to the parent. This is a conservative assumption that leads to much higher level of replication fidelity than the more common double-stranded replication model. Therefore, the results using this mechanism-neutral model overrepresent the degree of replication fidelity and underrepresent the level of cell death from deleterious mutations. Each cell divides into two cells (A and B cells). The B cells arise from the copying of information and phenotype of the parent. A cells have a probability P of surviving to the next generation, so that the value of P will either be the same as the parent or 0 (if the cell dies). B cells will have the same value of P as the parent or a different mutated value, depending on the initial value of F, as well as D and M. See Figure 3 for an illustration of this model, and Table 1 for definitions of all symbols used.

Phase Transition Diagrams
Phase transition curves are plots of the minimal values of P and F that produce a value of K > 1.0. Regions below and to the left of these curves are shaded to indicate a nonevolving, non-growing phase, while those above and to the right of the curve include arrows indicating the possibility of further evolution of these parameters to higher levels.
As in all phase diagrams, the curve represents a sharp boundary between the two phases, in this case between pre-living chemical systems and modern, evolving, growing life.

Derivation of the Theoretical Model
The results presented here, unlike those in my previous paper (32), are not based on data from simulation experiments but on purely theoretical calculations of cell numbers given the values of the parameters P, F, D, and M. Each cell divides in two, with one cell retaining the genetic information of the parent, and the other cell acquiring a copy of that information. The two cell types are labeled A and B, respectively. If the replication of the parent cell is perfect (F = 1.0), then both A and B cells will be identical clones of each other and the parent cell. In the next generation, both A and B cells divide, each also giving rise to A and B cells. In every generation there will be 4 kinds of cells -AA are A cells that came from an A cell parent; AB cells are A cells from a B cell parent; BA cells are B cells from an A cell parent; and BB are B cells from a B cell parent. The total number of A cells in each generation is the sum of AA and AB cells, and the total B cell count is the sum of BA and BB cells. In the next generation the number of AA cells C t = P(A t-1 ), or the probability of survival times the number of all A cells in the previous generation. For the number of AB cells at generation t, C t = P(B t-1 ). Since all A cells (both AA and AB cells) inherit the P value from their parents without error, the parameter F is not included. For B cells, the cells whose phenotype depends on the fidelity of replication, the formula for cell numbers involves all the factors involved in accurate replication. These include the degree of replication fidelity, F; and the phenotypic consequences of errors, approximated by the parameters M (the magnitude of the effects of a mutation) and D (the ratio of deleterious to beneficial mutations). Any phenotypic change due to a mutation is incorporated as change in the value of P.
The new value of P (P m ) is also subject to mutation in further generations of B cells. The range of possible values of P m assuming a mutation occurs is determined by the magnitude of the mutation effect as well as the ratio of the number of deleterious to beneficial mutations that are possible. Values for D in modern cells are on the order of 20-50 to 1 (33), but that figure is highly variable. It could have been much lower during early life; that is, there might have been a higher frequency of beneficial mutations at that time.
The maximum (max) value for the range of P m is given by: and the minimum (min) value by The value of P m for the mutated B cells is the midpoint of the range given by Then, by substitution, we have: The cell number (C) for the mutated BA cells is given by: and similarly for the total number of BB cells at time t; the total cell count at time t (C t ) is: Combining the equations for all 4 cell types and rearranging: The growth constant K is given by: Where K ≤1 signifies no growth of the population.

Comparison of Theoretical vs Simulated Stochastic Models
The theoretical equation for the prediction of growth rate derived from the dynamics of cell division was found to be very similar to the previously reported empirical equation based on simulated experimental data using a stochastic statistical model (32). In both cases the central component of the relationship between survival probability (P) and replication fidelity (F) is given by C = P(1+F), confirming a dominant role for survival compared to replication fidelity. The theoretical model allows for a more in-depth examination of the role of extended generation time and phenotypic parameters such as M and D in determining the existence and nature of any phase transitions in the origin of modern life.

Effects of Mutation Parameters
Confirming the limited data obtained from the simulation experiments (32), no effect of generation time or cell population size was seen on cell counts or growth constants. Both D and M showed moderate effects on population growth rates, and slight changes in the shape of the transition curve were observed with values of D below 20 as seen in Figure 4. No effect of M on cell counts was seen at D > 20, and while M had some minor influence on cell counts, especially at low values of D, in no case did M have any influence on the phase transition curve. Figure 4B shows no differences in the phase transition curve between D values of 0.5 and 0.2 (i.e., 2 and 5 times as many beneficial as deleterious mutations, respectively). At values of D below 1.0 (more beneficial than deleterious mutations), the effect of the M metric reversed (see eq. 7 below), and higher values of mutational impact led to slightly higher cell counts; but, again, M played no role in the fairly sharp phase transition curves at these low levels of D. The effects of D and M can be understood by rearranging eq. 5 to illustrate the influence of these two parameters on cell counts as shown in eq. 7.
A plot of C t vs. M shows a linear relationship with slope = (1 − F)(1 − D)/(1 + D), and intercept = 2P. The data shown in Figure 4 suggest that while the phase transition leading to the possibility of further evolution and stable life forms is slightly pushed back at lower values of D (indicating a greater proportion of beneficial to harmful mutations than ever seen in modern life), there is still a hard limit around a value of P< 0.5, below which no smooth evolutionary pathway is possible. While lower D appears to straighten the transition curve, thus slightly decreasing the effects of replication fidelity, both P and F play a role throughout the range of all these parameters in determining the point at which the transition from no possibility of stable cell growth to ongoing continued evolutionary progress can occur. These theoretically derived results confirm the more limited data on the effects of these phenotypic parameters from simulation experiments reported previously (32).

Discussion
Among the unique characteristics of life that had to arise early in protolife is self-replication with some degree of accuracy. Only living cells replicate themselves. We do not have any ideas as to how that amazing ability arose within collections of molecules encased in membranous sacs, but we know it had to happen, since the existence of any kind of stable life as well as evolution requires accurate self-replication. The origin of self-replication has been studied using statistical and mathematical approaches (34). Self-replicating molecules (genes) are not enough for evolution to occur. It is the phenotype of an organism (all of its characteristics) that interacts with the environment to determine how well a cell will thrive, and the probability that it will survive long enough to reproduce. Since the cell phenotype is the target of natural selection, a mechanism to replicate the phenotype is critical to allow for evolution by natural selection. If a creature undergoes a change that improves its fitness, it will survive longer and perhaps reproduce more successfully. But if that improvement in fitness is not inherited by its offspring, no evolution is possible. And for inheritance to happen, replication of the phenotype must be accurate enough. Therefore, replicating informational molecules (such as RNA or DNA) must have the capacity to direct phenotype replication. This is achieved in all modern life by a linkage between the genotype and phenotype by use of the genetic code and a complex and elaborate process known as translation or protein synthesis. Both genotype (DNA) replication and phenotype replication by translation are extremely accurate in all modern cells. The genes and proteins of cells are replicated with over 99.9999% accuracy (35). What is unclear is how such high levels of replication fidelity could have evolved at the beginning of life, before the existence of modern highly accurate selfreplication systems. Some theories related to error-prone evolutionary mechanisms or chemical selection processes have been proposed to explain how early, pre-biological evolutionary cells could have evolved (36,37), but there is little understanding of what the minimum level of fidelity might be. The requirement for minimal replication fidelity to allow for survival and evolution in early cells with a minimal size of replicating informational nucleic acid has been estimated to be equal to 1-(1/L), where L is the length of the information molecule polymer. For a short RNA strand of 50 nucleotides (assumed to be the minimal size for a functional molecule), this would imply a minimal fidelity of 98% (38). This communication follows previous work (32) trying to assess the minimum level of both replication fidelity and survivability needed to allow for a smooth continuous evolutionary process leading to modern levels of complexity. The working hypothesis was that there is a threshold or phase transition of each parameter, below which evolution to improvement in these two characteristics of early cells is either difficult or requires saltation events that violate the continuity principle. The earlier work supported this hypothesis and found that there is a quantitative relationship between values of P and F that define that threshold (eq. 1). However, the earlier work did not explore the roles of mutational effect magnitude or the likelihood of an increased proportion of beneficial to deleterious mutations during early life. The results were also based entirely on empirical data from a simulated cell division model and were not theoretically derived from the basic facts of cell division. In this report, data are presented in terms of a phase transition between the survival/fidelity space that allows for smooth evolution to continuous improvement and the space from which only a saltation event can bridge the transition from discontinuity to evolutionary continuity. The scenario depicted in Figure 1A, which implies that no such phase transitions exist, and evolution can occur smoothly from any starting point, has not been supported by either the simulation data (32) or by derived theory. The results presented here from theoretical extrapolations of the basic principles of binary cell division and phenotype inheritance demonstrate that even at highly unrealistic levels of beneficial mutations, a clear phase transition is seen. The position and shape of the transition curve are modified only slightly by changes in D. Values of D below 0.2 (5 times as many beneficial as deleterious mutations) seem unrealistic for any early life scenario, even if more beneficial mutations might have been possible at that time than in the present. Furthermore, the mutational effect magnitude, which only has influence on cell growth at very low values of D, produced no effects on the transition boundary under any condition. The model used for both simulation and theoretical results is independent of the specific biochemical mechanisms leading to cell survival and replication and is therefore applicable to any cellular replicative system, including RNA or RNA-peptide worlds or other as yet unknown primitive systems for the replication of cellular components and characteristics. No assumptions were made regarding the molecular mechanism of replication or even the nature of the informational molecule that allows for replication of a cell's phenotype, and it is possible that a much more error-prone system may have begun cell self-replication at the dawn of life. For protocells or early life, it is also plausible to assume that survival probability and replication fidelity were both far from the very high values they hold today, likely closer to 0 than 1. The idea that the earliest living protocells could have naturally or spontaneously possessed a relatively high degree of survival probability and replication fidelity without any evolutionary improvements of starting conditions is generally considered impossible and has never been seriously considered. The evidence presented here strongly supports the idea that evolution to greater replication fidelity and survivability required some saltational mechanisms in order to overcome phase transition boundaries. This in turn prevents the application of the evolutionary continuity principle to at least some crucial aspects of the origin of life, consistent with other saltation events known to occur during evolution, such as endosymbiosis (39) or whole genome duplication. Standard discussions of phase transitions in chemistry generally involve the term emergence to describe the sudden, non-continuous appearance of new properties that accompany the transition from one phase to another. The example of the three phases of water (vapor, liquid, and solid ice) is often used when illustrating the concept of emergence at phase transitions due to changes in pressure and temperature. The current work describes a phase transition leading to the emergence of at least one set of new properties -the ability of a population of cells (collections of complex chemicals enclosed in a membrane) to begin to grow and evolve -as a natural event related to changes in survival probability and replication fidelity. The mechanisms by which such changes can occur in the absence of biological evolutionary mechanisms are unknown. It seems likely that during the origin of life, there were numerous such phase transitions related to a host of biological properties such as efficient energy harnessing, the emergence of an informational genetic code (the first example of symbolic information in the universe), the emergence of metabolic regulation, and so on. This reality could at least partially explain the apparently enormous challenges facing researchers trying to decipher the origin of life. To quote Paul Davies: "Asked whether physics can explain life, most physicists would answer yes. The more pertinent question, however, is whether known physics is up to the job, or whether something fundamentally new is required" (40). While the concept of emergence is a useful phenomenological description of what happens at phase transitions, it is quite likely that further progress into elucidating the emergence of biology from chemistry might require the use of radically new perspectives on possible biological mechanisms, including teleology (41,42) which are beyond the scope of this report.