Microsatellite analysis of genetic diversity in Czech populations of European beech (Fagus sylvatica L.)


 European beech (Fagus sylvatica L.) is one of the most important broadleaved tree species in Europe both ecologically and economically. Nowadays, in the Czech Republic, beech is underrepresented in forest tree species composition, and there are tendencies to increase its proportion. When reintroducing beech, genetic variability, along with other factors, play a key role. The main aim of this study was to evaluate the genetic diversity of ten selected indigenous beech populations across the Czech Republic. Two hundred and fifty individuals were genotyped on 21 polymorphic nuclear microsatellite markers, which were amplified using two newly assembled multiplexes. According to the results, observed heterozygosity (Ho
 ) among populations ranged from 0.595 to 0.654 and expected heterozygosity (He
 ) from 0.650 to 0.678. That is comparable with the findings in other European studies. The high discriminatory power of the assembled multiplexes was confirmed by calculating the Probability of Identity among both unrelated and related individuals. Principal Coordinate Analysis (PCoA) based on Nei's genetic distances revealed that there are genetic differences among populations resulting in three approximate clusters (geographically north, south-east, and south-west). Nevertheless, the results implicate that on a geographical scale of the Czech Republic, the distance is unlikely to be the primary driver of genetic differentiation.


Introduction
European beech (Fagus sylvatica L.) is one of the most widespread broad-leaved tree species in Europe, covering circa 12 million ha of land (Buitelveld et al., 2007). It is characterized by an extensive ecological niche, high competitiveness and physiological tolerance against a wide range of environmental factors, making beech a vital tree species both ecologically and economically (Dittmar et al., 2003;Pastorelli et al., 2003;Leuschner et al., 2006;Lefèvre et al., 2012). In many regions of Central Europe, including the Czech Republic, beech covered naturally large areas. However, due to medieval deforestation and revenue-oriented forest management, many of the natural beech stands have been replaced by spruce (Picea) or pine (Pinus) species (Paule, 1997;Hasenkamp et al., 2011). In the Czech Republic, the natural representation of beech Forestry Studies | Metsanduslikud Uurimused,Vol. 73, Research paper

Microsatellite analysis of genetic diversity in Czech populations of European beech (Fagus sylvatica L.)
was 40.2% compared to its current deficient proportion of just 8.6% which represents 223,611 ha (Fulín et al., 2016;Ministry of Agriculture, 2019). Nowadays, however, there are trends to put beech back into the forests, and it is frequently used together with other broadleaves for reforestation (Tomášková, 2004). The recommended proportion of beech in the tree species composition makes 18% (e.g. approximately 470,000 ha), which means a future increase of about 9.5% (Ministry of Agriculture, 2019). The preferred seed sources should provide the genetic pre-conditions for the best possible adaptation to the respective local conditions and presumed climatic shifts (Hasenkamp et al., 2011;Gömöry et al., 2015). Mainly because of ongoing climate change, the study of genetic variation and differentiation of beech populations is crucial for determining the adaptive potential of the species (Jump et al., 2006a(Jump et al., , 2006bSjölund & Jump, 2015;Kempf & Konnert, 2016). In the Czech Republic, supposedly native populations are often located in the border regions of the country. Moreover, these sources are usually under a particular degree of forest protection. Knowledge of the genetic structure of indigenous populations might reveal the significant indicator of the best-adapted populations for a given site. Therefore, efficient methods for reliable identification of potentially well-preadapted seeds are needed. In the past, beech has been studied by various genetic markers such as isozymes (Thiebaut et al., 1982;Comps et al., 1991;Degen & Scholz, 1998;Božič & Kutnar, 2012), Random Amplification of Polymorphic DNA (RAPD) and Amplified Fragment Length Polymorphism (AFLP) (Scalfi et al., 2004), nuclear, chloroplast and mitochondrial microsatellites (Pastorelli et al., 2003;Asuka et al., 2004;Vettori et al., 2004;Vornam et al., 2004;Magri et al., 2006;Hasenkamp et al., 2011;Lefèvre et al., 2012, Pluess & Määttänen, 2013 and Single Nucleotide Polymorphism (SNP) (Seifert et al., 2012). Out of all the markers mentioned above, nu-clear microsatellite markers (SSRs, Simple Sequence Repeats) represent an ideal tool for studying genetic diversity, considering their high abundance in genomes, a high degree of polymorphism, neutrality and locus-specificity (Pastorelli et al., 2003). In the past, several studies used multiplexed microsatellite loci to determine genetic variability on a population level (Tanaka et al., 1999;Pastorelli et al., 2003;Asuka et al., 2004;Lefèvre et al., 2012;Plues & Määttänen, 2013). This paper presents the results of nuclear-microsatellite analyses of beech populations from ten regions in the Czech Republic. To evaluate their population genetics parameters, we used two newly modified multiplexes (13-and 8-plex). We focused on the following questions: (i) Are there significant differences between local beech populations across the Czech Republic? (ii) Can we eventually identify substantial differences among the targeted population? (iii) What is the discrimination power of our marker assemblage?

Study site and sampling
We identified ten beech stands across the Czech Republic (Table 1, Figure 3) aiming at populations that are considered to be locally adapted and superior under local conditions. Each of the selected populations was represented either by 20 or 30 mature individuals according to the site characteristics, which turned into 250 tested individuals in total. Sampling was carried out in 2017. The minimal distance between sampled trees was 30 m to avoid sampling among closely related individuals (von Wühlisch, 2008). For each individual, we collected unburst buds and extracted the vascular cambium with a hole punch. Plant tissues were labeled appropriately and placed in plastic bags with silica gel. Then they were long-term stored in a freezer.

DNA extraction
Initially, both the vascular cambium and leaf buds were tested for DNA extraction. Even though both types of tissues provided identical genetic profiles, we decided to extract the DNA from the buds, because of its higher purity and yield. From each individual, approximately 2 or 3 unsterilized fresh buds were cut into small pieces and placed into 2 mL safe-lock tubes (Eppendorf, Germany). Two 3 mm tungsten beads were added in each tube and frozen in liquid nitrogen before grinding (1 min, 30 Hz) using a mixer mill MM400 (Retsch, Haan, Germany
** Final concentration in each primer premix forward + reverse.

PCR
Amplification was performed in two multiplex polymerase chain reactions (Mas-tercycler® nexus, Eppendorf, Germany). The PCR mixture for multiplex 1 was performed in a total volume of 11.4 μL containing 1 μL template DNA (c = 20 ng/μL), 5.7 μL of Type-it® solution (Qiagen, Hilden, Germany), 0.26 μL sterile water and a total volume of 4.44 μL primer premix. The PCR for multiplex 2 was performed in a total volume of 6 μL, consisting of 1 μL template DNA (c = 20 ng/μL), 3 μL of Type-it® solution (Qiagen, Hilden, Germany), 0.14 μL of sterile water and 1.86 μL of primer premix. Concentrations and label dyes for each primer pair in the final primer premix are shown in Table 2. Both multiplexes had the same PCR conditions as follows: an initial incubation (5 min at 95 °C) was followed by 33 cycles consisting of the denaturation (45 s at 95 °C), annealing (45 s at 60 °C) and extension phase (60 s at 72 °C). Amplification was terminated with the final extension step (30 min at 72 °C). The fluorescently labelled PCR products, together with an internal size standard (GMC-GT500 LIZ) were processed on a 3500 Series Genetic Analyzer® (Applied Biosystems, Foster City, CA, USA).

Data analysis
Allele identification and genotyping were done using the GeneMarker® Fragment Analysis software (SoftGenetics, State College, PA, USA). All alleles were subsequently checked manually to minimize genotyping errors. Allele frequency analysis was performed using CERVUS v. 3.0.7 (Kalinowski et al., 2007) in order to identify the number of alleles per locus. Also, summary statistics were calculated for each locus, such as observed heterozygosity (H o ), expected heterozygosity (H e ), polymorphic information content (PIC) and fixation index (F IS ).
For the analysis of population genetic parameters, we used software GenAIEx v. 6.5 (Peakall & Smouse, 2006. To group populations according to their pairwise distances in a distance matrix, a Principal Coordinate Analysis (PCoA) was carried out. The distribution of total variance at different hierarchical levels was quantified with an Analysis of Molecular Variance (AMOVA). We also evaluated the discrimination power of assembled multiplexes. Therefore, the Probability of Identity (PI) for unrelated and related (PI sibs ) individuals was calculated, which represents a measure for the statistical power of individual identification with the chosen marker combination (Hartl & Clark, 2007).

Allele frequency analysis
All samples (n = 250) were genotyped on 21 SSR loci grouped into two multiplexes ( Table 3). The number of genotyped individuals on each locus (N) varies because some of the samples did not show any apparent alleles on a particular locus. We did not detect vegetative copies among the samples. We observed the most significant dropout of samples (56 individuals) on locus Fagsyl_003093.
The total amount of 208 alleles was detected. The number of alleles per locus (k) ranged from 4 to 22 with a mean value of 9.905 ± 4.098. The highest count of alleles per locus (22) was found at mfc5 where we also observed significant deviation from Hardy-Weinberg equilibrium. The expected heterozygosity (H e ) ranged from 0.436 (csolfagus_29) to 0.906 (mfc5), with an average of 0.685 ± 0.134. The observed heterozygosity (H o ) ranged from 0.216 (Fagsyl_003093) to 0.859 (csolfagus_06) with an average of 0.625 ± 0.183. The mean polymorphic information content (PIC) across all loci was 0.649 ± 0.146. Most loci did not show a significant deviation from the expected Hardy-Weinberg frequencies, except 5 loci (mfc11, FS1-03, mfc5, Fagsyl_001018, Fagsyl_003093). We also detected 6 loci with a significant occurrence of null alleles (α = 0.05, marked bold in Table 3).
Private alleles were found in all analyzed populations except populations 5 and 10. The frequencies of rare alleles were generally low and ranged between 0.017 and 0.077, with an average of 0.033 ± 0.019. The highest amount of private alleles was observed in populations 1 and 9. Private alleles with the highest frequencies were found in population 2.

The discriminatory power of multiplexed SSRs markers
In order to evaluate the discriminatory power of used multiplexes, the Probability of Identity (PI, GenAlEx v. 6.5) was carried out. Figure 1 shows the decreasing probability of identity with an increasing number of markers used (e.g., the effect of cumulative markers).
In our study, we also evaluated a modified version PI sib for estimating the probability of identity when related individuals are included in the sample. When using 5 randomly selected loci, the probability of statistically significant distinction was for unrelated individuals PI = 7.2 x 10 -4 and for related individuals PI sib = 0.04. With 8 loci used the probability decreases to PI = 3.13 x 10 -7 and PI sib = 2.2 x 10 -2 . With 15 loci the probability further declines to PI = 4.97 x 10 -15 and PI sib = 3.78 x 10 -6 . For an entire set of 21 loci, the observed PI for un-related individuals was PI = 1.85 x 10 -20 and for related individuals PI sib = 2.77 x 10 -8 . These values confirm a high discriminatory power of used multiplexes even when family structures are expected among samples and the discriminatory power of loci is therefore significantly lower.
The Principal Coordinate Analysis was performed based on genetic distances between the populations (e.g. the pairwise population matrix of Nei's genetic distance, Table 5). The result of PCoA is presented in Figure 2. The first two coordinates explain 35% of the total variation. PCoA sorted the populations in three groups, which can be simplified as south-west (Pops. 5, 4, 2, 3 and 7), north (Pops. 1 and 10), and southeast (Pops. 6, 9 and 8). Population 9 was the most distant from the others. The approximate regionalization of the populations is shown in Figure 3. However, in the scale of the Czech Republic, the geographical distance cannot be seen as the main driver of the genetic variation between the tested indigenous populations. AMOVA indicated that 98% of total molecular variance occurs within populations. Only 2%of the variation can be observed on an interpopulation level. It corresponds with the previous findings that a significant part of the genetic variability of forest trees is attributable to the intrapopulation level.     Table 1).  (Nei, 1972(Nei, , 1987 (Vornam et al., 2004;Hasenkamp et al., 2011). However, we also observed a deviation from Hardy-Weinberg equilibrium on this locus as well as a higher rate of null alleles occurrence (F(Null) = 0.1504). Null alleles above the threshold α = 0.05 were also observed on loci mfc11, FS1-03, mfc7, Fagsyl_001018 and Fagsyl_003093. As confirmed in a review assembled by Janković et al. (2019), in many studies, null alleles were commonly observed on both mfc5 and FS1-03. Besides those, one study observed null alleles also on loci csolfagus_06, csolfagus_19 and csolfagus_31 where we failed to do so. Loci mfc11, FS1-03, mfc5, Fagsyl_001018 and Fagsyl_003093 exhibited high F IS values. During scoring locus csolfagus_05 exhibited split peaks and locus Fagsyl_003093 created an artefact allele. Loci sfc0036 (Asuka et al., 2004), mfc5, mfc7 and mfc11 (Tanaka et al., 1999) were originally developed for related species Fagus crenata Fagus Blume and successfully used also for F. japonica Maxim. and F. sylvatica L.. It can be due to the fact that species within the genus Fagus are very closely related (Tanaka et al., 1999).
PI and PI sib analysis confirmed the high discriminatory power of both newly assembled multiplexes, and their ability to identify particular individuals. In analogous studies mentioned above, between 4 and 18 loci were used to genotype the individuals. In their study, Lefèvre et al. (2012) estimated PI for 16 loci 2.1 x 10 -8 and stated that this probability of finding two randomly matching genotypes is extremely low. In our study, we reached this level of PI by using nine loci for unrelated and 21 loci for related individuals.
The PCoA analysis clustered studied indigenous populations into three approximate regions, geographically roughly corresponding to north, south-east and southwest. However, on the scale of the Czech Republic, distance cannot be considered as a significant criterion of population clustering. Long-term local adaptive processes are more likely to cause it.
Significant differences in observed heterozygosity within populations have not been detected. In general, the genetic diversity of the investigated populations was similar to populations across Central Europe. In our study H o range was 0.595-0.654 and H e range was 0.650-0.678 which is comparable with the results of Dounavi et al. (2016)  In all these studies, differences between the observed and expected values of heterozygosity of studied populations were rather small. Across the studied populations, the highest number of private alleles was found on locus Fagsyl_003093, followed by loci Fagsyl_003849, mfc5 and csolfagus_29. Cvrčková et al. (2017) observed the highest number of private alleles in a population which was geographically very close to population 9 in our study and which also exhibited the highest number of private alleles. This may indicate a bigger genetic differentiation of beech individuals in this region which is also shown in the PCoA analysis ( Figure 2).
AMOVA showed that 98% of total variability is observed on the intrapopulation level, only 2% occurs among the populations. This result is in agreement with previous studies as well, for example, Kempf & Konnert (2016) observed 93% of total variation within populations and 5% among them and 2% between observed regions. As previous findings suggest, little differentiation on interpopulation level might be due to an extensive gene flow between beech populations over long distances (Wang, 2004;Piotti et al., 2012).

Conclusions
The results presented in this paper offer an insight into the genetic diversity of indigenous populations of European beech on the territory of the Czech Republic where only a few such studies have been done so far. Although the majority of detected variance is not attributable to a particular population, the evaluation of allelic parameters (heterozygosity, allelic diversity) is crucial and should not be omitted. Genetic variability, especially during artificial reforestation and the ongoing effort to increase the percentage of beech in the forests, should be carefully considered. It gains even more importance in the context of the continuing climate change, which is highly challenging for the adaptive and survival potential of species. In general, maximizing genetic diversity during artificial reforestation might prevent gene pool reduction and thus indirectly maintain allelic forms with higher fitness levels under shifted climatic conditions. Moreover, we confirmed that a well-optimized set of nuclear microsatellite markers is a useful and robust tool for the examination of genetic structure.