Ecological and Morphological Differentiation Among COI Haplotype Groups in the Plant Parasitic Nematode Species Mesocriconema Xenoplax

Abstract DNA barcoding with the mitochondrial COI gene reveals distinct haplotype subgroups within the monophyletic and parthenogenetic nematode species, Mesocriconema xenoplax. Biological attributes of these haplotype groups (HG) have not been explored. An analysis of M. xenoplax from 40 North American sites representing both native plant communities and agroecosystems was conducted to identify possible subgroup associations with ecological, physiological, or geographic factors. A dataset of 132 M. xenoplax specimens was used to generate sequences of a 712 bp region of the cytochrome oxidase subunit I gene. Maximum-likelihood and Bayesian phylogenies recognized seven COI HG (≥99/0.99 posterior probability/bootstrap value). Species delimitation metrics largely supported the genetic integrity of the HG. Discriminant function analysis of HG morphological traits identified stylet length, total body length, and stylet knob width as the strongest distinguishing features among the seven groups, with stylet length as the strongest single distinguishing morphological feature. Multivariate analysis identified land cover, ecoregion, and maximum temperature as predictors of 53.6% of the total variation (P = 0.001). Within land cover, HG categorized under “herbaceous,” “woody wetlands,” and “deciduous forest” were distinct in DAPC and RDA analyses and were significantly different (analysis of molecular variance P = 0.001). These results provide empirical evidence for molecular, morphological, and ecological differentiation associated with HG within the monophyletic clade that represents the species Mesocriconema xenoplax.

DNA barcoding with the COI mitochondrial gene has been widely applied for species identification and biodiversity assessment (Avó et al., 2017;Andújar et al., 2018;Curry et al., 2018;Ge et al., 2021;Martoni et al., 2021). Both applications require a broader context in which to interpret the nucleotide sequence data. That context is provided by species delimitation methodology, which provides the tools necessary to establish species boundaries (Janssen et al., 2017;Olson et al., 2017;Kieneke et al., 2021).
Mesocriconema xenoplax, when defined solely based on morphology, includes host associations of more than 25 plant families and has a geographic distribution that includes every continent except Antarctica. This wide range of hosts and habitats suggests that this species is either a remarkable generalist in feeding behavior and tolerance to environmental conditions, or that unrecognized physiological subgroups have evolved within M. xenoplax. A recent study of multiple criconematid species, including M. xenoplax, using the COI marker and nuclear 18S gene, identified genetic support for subdividing M. xenoplax into seven haplotype groups (HG), labeled HG 8 to HG 14 . However, no additional information was provided about the nature of the differentiated groups. In this study, we use multivariate analysis of 132 M. xenoplax specimens to search for nematode traits associated with the seven HG.

Sample selection
A large-scale survey of nematode diversity across diverse ecosystems was conducted previously (Neher et al., 1995;Powers et al., 2016) to determine if ecoregion boundaries that structure plant and animal communities also structure communities of plant parasitic nematodes. Most of these nematode specimens exist within a curated database that has been characterized both morphologically and molecularly (Powers et al., 2020). Mesocriconema xenoplax specimens selected from this and subsequent collections were analyzed in the current study. Collection protocols are described in Olson et al. (2017). Specimens were assigned a nematode identification number (NID), and DNA sequences for most were submitted to GenBank. Images of selected specimens were also deposited in the Barcode of Life Database (www.barcodinglife.org/). The degenerate COI primer sequences used for amplification were COI-F5 (5´-AATWTWGGTGTTGGAACTTCTTGAAC-3´) and COI-R9 (5´-CTTAAAACATAATGRAAATGW-GCWACWACATAATAAGTATC-3´) resulting in an amplicon approximately 790 bp in length and an edited 721-bp sequence for analysis once primer sequences were removed.

Environmental data
Metadata associated with each site were obtained from the United States Department of Agriculture's Natural Resource Conservation Service geospatial database (USDA-NRCS, 2017) and assembled into a matrix that included the annual average minimum and maximum temperature and precipitation over 40 years (1981 to 2010). These three continuous variables were categorized by grouping them in increments of 5 years. Associated land cover and elevation for each individual nematode were obtained and viewed using ArcMap software (ESRI, 2011). Associated ecoregion name and biome for each site were identified and recorded into the matrix dataset (Ricketts et al., 1999). Estimates of annual, monthly, and event-based climatic parameters were obtained from the USDA-NRCS geospatial database, which used ~800-m 2 (30 arc-seconds) grids and the PRISM analytical model to generate these estimates. The USDA-NRCS database also provided land cover in a classification system that included 16 categories estimated at a 30-m 2 resolution: deciduous forest, developed/open space, evergreen forest, woody wetlands, cultivated crops, herbaceous, scrub/shrub, mixed forest, and hay/pasture. Ecoregion and major habitat type were obtained from the World Wildlife Fund based on GPS coordinates (Olson et al., 2001). Elevation at each site was recorded at the time of soil sampling using a handheld GPS tracking device. Additionally, specific host associations identified at the time of sample collection, when present, were evaluated for each HG.

Morphological data
Twenty-four standard morphological measurements previously recorded were used in this study of M. xenoplax and included: length, number of body annuli, number of annuli from vulva to tail terminus, number of annuli anterior to excretory pore, pharynx length, stylet length, stylet knob width, mid-body width, vulva position from anterior, vulva position as a percentage of body length, vulval body width, body annulus width, number of anastomoses, width of 10 annuli at mid-body, and width of first labial annuli (Table 1) (Geraert, 2010). No males were encountered in sampling. Only adult females were used in the discriminate function analysis.

Sequence alignment and basic statistics
Forward and reverse COI sequences were edited using CodonCode Aligner software version 4.2 (CodonCode Corporation, Centerville, MA; codoncode.com/aligner/), with manual adjustment. Sequences were aligned using ClustalW in MEGA6 (Tamura et al., 2013) andGeneious 10.1.3 (Kearse et al., 2012). DNA sequences were submitted to GenBank, with associated GPS coordinates when available. Individual and group genetic statistics were calculated in DNAsp (Librado and Rozas, 2009) and MEGA, including: number of polymorphic sites, number of parsimony informative sites, haplotype diversity (Hd), average number of nucleotide differences, number of haplotypes, and average number of base pair differences between sequences.

Datasets used
Three variations of the original dataset were created for use in our analyses. The original dataset included all M. xenoplax sequences analyzed. For sequences from locations without GPS coordinates, but with known county of origin, spatial location was imputed using the county centroid from Google Maps (Fresno, CA). The second dataset excluded individuals with a unique gene sequence (singletons) and retained only those with ≥2 individuals per gene sequence. The third dataset excluded individuals collected outside of the United States, due to the limited metadata resources. The fourth dataset excluded individuals that were both genetically and environmentally identical to one another (e.g. clone-corrected dataset), and used in analyses affected by overrepresentation.

Phylogenetic analysis
Three types of phylogenetic trees were constructed, with Mesocriconema inaratum (Hoffman, 1974) Powers et al., 2014 included as an outgroup. The neighbor-joining and maximum-likelihood trees were reconstructed in MEGA, and the Bayesian tree in Geneious. Two evolutionary models were applied within the neighbor-joining tree, Jukes-Cantor (JC69) and Kimura 2-Parameter (K2P). Neighborjoining and maximum-likelihood trees were run with 2,000-bootstrap replications (Shao and Tu, 2012). Maximum-likelihood and Bayesian analysis used jModelTest for model selection and Akaike information criterion (Akaike, 1998) and Akaike information criterion corrected values (Hurvich and Tsai, 1989). For both maximum-likelihood and Bayesian trees, an HKY+G+I model was the best fit with the lowest Akaike information criterion and Akaike information criterion corrected values. The Bayesian inference tree was supported with 2,000-posterior probabilities rankings, run for two million generations (ngen), a heating scheme (temp = 0.06), sampling each 5,000 generations (sampled frequency), with burn in of 250,000, and remaining samples were used to compute the consensus tree. Convergence was confirmed when the average standard deviation (SD) of split frequencies was less than 0.01.

Species delimitation
Each species delimitation method utilized the original dataset, with the exception of Birky's K/q analysis, which censored singletons (i.e., first dataset variation). Cladistic statistics were calculated using the species delimitation plugin within the Geneious software package (Masters et al., 2011). The plugin included assessments of overall monophyly and intra-or interdistance ratio. This ratio, together with the known number of taxa in the reference group, was used to estimate the probability of correct identification under the conservative P ID(Strict) criteria based on variation within as compared to between. The relaxed, P ID(Liberal) criteria is based on a-priori groupings with a cut off for each group at >80% (Ross et al., 2008;Hamilton et al., 2014). Rosenberg's P(AB) (Rosenberg, 2007) and Rodrigo's P(AD) (Rodrigo et al., 2008) measured the probability that the observed patterns identifying reciprocal monophyly were due to random coalescent processes (Prévot et al., 2013). Statistical parsimony analysis was conducted on individual HG using the program TCS 1.21 (Clement et al., 2000) with a 90-connection limit. Automatic Barcode Gap Discovery (ABGD) analysis is an online platform (http://wwwabi.snv.jussieu.fr/ public/abgd/) that applies a clustering algorithm to determine the optimal threshold (Puillandre et al., 2012). Three models of sequence evolution were applied -JC69, K2P, and simple distance -along with the default settings, consisting of relative gap width (X = 1.50) and intraspecific divergence values (P min < 0.001 and P max = 0.10), where correct species estimates are projected to correspond to P = 0.01. To test for parthenogenetic speciation, the 4× rule, Birky's K/q > 4, was applied, which utilized the threshold between clades to delimit reciprocal monophyly in asexual parthenogenetic species (Birky et al., 2005). The estimator for interclade divergence K was calculated as observed average base pair distance between clades corrected for multiple hits. The intraclade variation estimator q is calculated as p(1-4p/3), with p representing the relativized nucleotide diversity that is corrected for sample size. Calculations were done in an Excel spreadsheet. The ratio rule indicates that if the ratio K/q is greater than four, it can be stated with 95% confidence that these two clades have arisen solely by neutral genetic drift (Birky, 2013).

Phenotypic variation
Under the assumption that two HG can diverge morphologically over time, the association of genetic structure was compared with the morphological variation described above, using a discriminant function analysis conducted in JMPÒ, Version 7 (SAS Institute Inc., Cary, NC, 1989. The discriminant function analysis uses an algorithm that classifies cases into previously determined groups and derives a model that best discriminates groups, maximizing intragroup variation relative to inter-group variation (Friedman, 1989;McClure et al., 2003;Härdle and Simar, 2007). Prior to analysis, juvenile specimens, singletons, and specimens lacking all morphological measurements were omitted. A quadratic discriminant function was computed using the withingroup covariance matrices that fit the data best, and a backward stepwise procedure was used to assess the discriminatory power of variable combinations. Results were interpreted by a canonical plot with ellipses representing the confidence intervals of groups determined a priori. The amount of overlap between the ellipses correlates with the degree to which there is morphological distinction or separation between the groups. If there was no overlap, then there was significant difference between the two groups morphologically. The canonical plot also contains bi-plot rays that indicate which ones, among the variables tested, were the strongest predictors of the data, as indicated by the length of the ray and the direction the ray was pointing to with respect to the other plotted rays.

Spatial genetic structure
The association between geographical distance within and between HG and the intra-genetic distance of the respective HG was assessed using the Mantel test (Mantel, 1967) in R software version 1.1.383 (R Core Team, 2018) using the mantel() function in the R package "ecodist" 2.0.1 (Goslee and Urban, 2007). This analysis assumes that a consequence of dispersal limited by geographic distance, isolation by distance, is correlated (positively or negatively) with the organisms' genetic distance. This relationship can be positive, genetically similar, and geographically close together, or genetically dissimilar and geographically far apart. It can also be negative and genetically similar, but geographically far apart, or genetically dissimilar but geographically close together. The detection of a correlation indicates an effect of dispersal and/or migration probabilities within the individual HG. A pairwise distance matrix of genetic similarity between individual sequences within each HG was created using the JC69 substitution model on the dataset without singletons and without nematodes collected outside the United States. The JC69 assumes equal base pair frequencies and equal mutation rates. Geographic distance between individuals was also calculated in a pairwise fashion, using latitudinal and longitudinal coordinates and calculated using the geodesic() function in the R package "geosphere" 1.5-7 (Hijmans, 2017). This function calculates the shortest distance between two points following an ellipsoid (Hijmans, 2015). To determine whether there was increasing genetic similarity with decreasing distance, the two matrices were tested for structure along a spatial gradient with 10,000 permutations, 500-bootstrap iterations, a 0.90-resampling level, and a Pearson correlation coefficient with a 0.95-confidence level. The power and false-positive rates were estimated based on the significance threshold value of a = 0.05, the Mantel coefficient, MantleR (r) with a two-tailed test, and a null hypothesis r = 0. The Mantel coefficient was calculated for each HG individually, with P-value, and lower and upper limits.

Population structure
To assess for associations between population structure and environmental parameters, a distance-based redundancy analysis (dbRDA) was applied (Legendre and Anderson, 1999). This analysis does not require groups to be defined a priori and relies on a forwardbackward selection process to identify the model of environmental parameters that best predicts the genetic variation. This method detects linear relationships based on similarities and dissimilarities, generated by constrained ordination on a distance matrix representing the response variable, to describe the relative contribution of multiple independent explanatory variables (McClure et al., 2013;Kamvar et al., 2017). The analysis was first performed using the censored singletons (i.e., first dataset variation), and then using the clone-corrected dataset (i.e., third dataset variation). Input data for the dbRDA included a pairwise genetic distance matrix that was calculated using the JC69 model and environmental variables corresponding to each individual, supplied as a data frame. The dbRDA was computed in R using the package "vegan" 2.4-3 (Oksanen et al., 2013), and the function capscale() was used to eliminate collinear environmental variables with a correlation ³0.75. Among correlated environmental variables, the variable that described the most genetic variation was retained in further analysis. A forwardbackward selection process was applied with the "vegan" function ordistep() to identify the model that best described the genetic variability between HG using an automatic stepwise building and selection method on each possible model combination via permutation tests (Oksanen, 2015). An analysis of variance was performed to test for the significance of the reduced model and marginal effects using 999 permutations. The environmental variables with the strongest associations were selected for further analysis.
The strongest association was assigned by the visualization within the bi-plots. Principal components are indicated by an axis, with each variable or HG indicated a priori and projected onto the axis (Jongman, 1995). The strength of a variable in association with the HG's genetic score is conveyed by the length of the bi-plot rays and the relative distance of the ray in relation to the HG's location on the bi-plot figure. This plot indicates the presence of a correlation between HG and the bi-plot axes. The eigenvalues for these axes also indicate the importance of the subsequent variables associated with the axes, as well as other HG, in order to explain the relationship within the data matrices (Fiscus and Neher, 2002).
A discriminant analysis of principal components was implemented using the R package "adegenet" 2.1.0 to assess the per-sample posterior group assignment probability (Jombart et al., 2010). Clone-corrected environmental and sequence data (i.e., third dataset variation) were used in this analysis, in addition to censoring groups with fewer than five individuals per variable category. This analysis first analyzes the transformed data in a principal component analysis, followed by an analysis of the resulting principal components in a linear discriminant analysis to optimize variation between HG while minimizing variation within each HG (Jombart et al., 2010, Jombart andCollins, 2015). The optimal number of principal components was assessed and selected by the cross-validation function xvalDapc(). This function implements a procedure to iterate over an increasing number of principal components on a subset (90%) of the data, while maximizing the lowest mean squared error. Percent reassignment of each individual to its corresponding land cover was visualized using stacked bar plots in R.
HG were defined according to the strongest environmental variables, and those containing fewer than five individuals were compared using an analysis of molecular variance (AMOVA) in the R package "pegas" version 0.10 (Excoffier et al., 1992;Paradis, 2010). The AMOVA utilized a JC69 pairwise genetic distance matrix and evaluated population differentiation within and between HG, estimated with the phi statistic (φ). Significant differences were determined using the function randtest() from R package "ade4" version 1.7-10, using 9,999-bootstrap replicates (Dray and Dufour, 2007). For environmental variables that resulted in more than two HG being defined, pairwise comparison between individuals within a HG were made using AMOVA to determine which HG were significantly different.

Results
A total of 132 specimens were identified as M. xenoplax and selected for the present study, representing 40 sites within 16 states and 14 ecoregions, with 33 specimens associated with agroecosystems (Table 2). Of the 132 individuals, 92 were adult females, 31 were juvenile, and the developmental stage was unidentifiable for 9 specimens. There were 54 unique haplotypes among the 132 M. xenoplax COI sequences.

Species delimitation
The seven HG supported by phylogenetic analyses served as primary species hypotheses for testing by other species delimitation approaches. HG 8, 11, 13, and 14 had significant support from each of the species delimitation methods (Fig. 1). These four groups were characterized by (i) the statistical parsimony network analysis indicating 90% confidence level resulting in a single interconnected network, (ii) distinct recursive partitioning (P = 0.012) for the ABGD analysis, (iii) statistical significance for reciprocal monophyly for K/q (Table 3), and (iv) statistical significance (<0.001) from the Rosenberg's P(AB), P ID(Liberal), and P ID(Strict) (£0.90) analyses calculated within the Geneious species delimitation plugin software (Table 4). Statistics calculated for each HG using DNAsp software identified HG 8 as having low genetic variability as indicated by scoring the lowest among the seven HG on seven parameters (Table 5). A significant isolation by distance relationship was identified for HG 8, 9, 10, and 11 (P ≤ 0.050; Table 6).

Geographic distribution of HG 11 to 13
Nematodes included in HG 12 (n = 16) were collected from sites in Nebraska, South Dakota, Wisconsin, Tennessee, and Vermont, all from native plant communities (Fig. 2). Specimens comprising the sister clade to HG 12 and HG 13 (n = 28) were also predominantly collected from native plant communities, including forests in the Canadian province of British Columbia and five sites within Great Smoky Mountains National Park. Representatives of HG 13 were also collected from a vineyard in California near the type locality of M. xenoplax in the vicinity of Fresno. Both HG 12 and HG 13 are present in the Great Smoky Mountains and northern temperate forests, but not found south of the Appalachian Mountains. The sister group to HG 12 and 13 is HG 11. Like its sister clades, HG 11 was collected from multiple sites within Great Smoky Mountains National Park and northern Virginia, with one western sample obtained from a walnut farm in central California.

Geographic distribution of HG 8 and 10
Two HG were predominantly comprised of specimens associated with peach orchards. HG 8 (n = 8) consisted of specimens exclusively collected from peach orchards in South Carolina, Georgia, and Alabama. There were only two haplotypes in HG 8, one representing all the specimens from South Carolina, and the other found in orchards in Georgia and Alabama. Nematodes in HG 10 (n = 23) were primarily collected from peach orchards in Georgia Figure 1: A condensed maximum-likelihood tree of 721-bp sequences isolated from the COI gene for each individual in the morphospecies Mesocriconema xenoplax. Generated in the program MEGA, the tree displays seven partitions resulting in the HG label for further analysis. Bootstrap values from neighbor-joining and maximum-likelihood trees and posterior probability from Bayesian trees showing support for each HG are given at the terminal node, neighborjoining, maximum-likelihood, and Bayesian, respectively. Each analysis is indicated by shading in the legend. The absence of the bar indicates the absence of support. The spanning tree network of each HG constructed using statistical parsimony is given within each HG bracket. Larger circles represent a unique haplotype. Smaller circles represent the number of mutational steps between haplotypes. Networks are disconnected when they exceed the 90% connection limit. HG, haplotype group. Interclade variation calculated as p(1-4p/3), with p equal to the relativized nucleotide diversity that is corrected for sample size (Birky, 2013). c Degree of monophyly within the group, where values greater than 4.0 indicate reciprocal monophyly. *95% confidence of reciprocal monophyly. HG, haplotype group. Table 4. Species delimitation statistics corresponding to the neighbor-joining tree HG built from the COI gene sequence for the morphospecies Mesocriconema xenoplax. Statistics include the intra-distance of each HG, inter-distance of each HG compared to its sister clade, ratio of intra-and inter-distance, P ID(strict), P ID(liberal), Av(MRCA-tips), Rodrigo's P(AD), and Rosenberg's P(AB). HG are organized in the and Alabama. Three specimens from a nursery in Nebraska belonged to this group. Haplotype 10 also included specimens from two state parks in Florida. Specimens from Ichetucknee State Park in northern Florida shared a haplotype that was common in the Georgia peach orchards, whereas Torreya State Park located in the Florida panhandle contained a specimen differing by one nucleotide from haplotypes found in Alabama peach orchards.

Geographic distribution of HG 14
HG 14 was the only HG not associated with North American forests. All specimens in this group were collected from remnant tallgrass prairie sites in Iowa, Kansas, and Nebraska. Precise host relationships have not been determined for this group, but collection sites were in the proximity of native woody shrubs that commonly invade prairie habitats, such as roughleaf dogwood (Cornus drummondii C.A. Mey.) and smooth sumac (Rhus glabra L.).

Geographic distribution of HG 9
HG 9 has the greatest Hd, intragroup nucleotide diversity (Pi), and average number of nucleotide differences (k) ( Table 5). It is also unique among the seven HG in that

Population Structure
Results within the dbRDA analysis identified an autocorrelation (0.964) between minimum and maximum temperatures. Minimum temperature displayed less of the variation within the global model, a model that includes all possible variables; therefore, it was omitted from further analysis. Multivariate dbRDA analysis of the pairwise genetic distances and environmental metadata identified that the global model accounted for 35.7% (F = 3.990, P = 0.001) of the initial variation, with an adjusted R 2 = 68.2% (P < 0.001). The function ordistep() identified the significant predictors of genetic variation from the global model. This reduced model included: land cover, ecoregion, and maximum temperature, accounting for 71.0% of the total variation. The reduced model was applied to the bi-plot analysis and explained a significant portion of the initial variation (F = 3.418, P = 0.001), with an adjusted R 2 = 50.263 (P = 0.001; Fig. 3). The "herbaceous land cover" category was the strongest predictor, followed by maximum temperature, Flint Hills Tall Grassland ecoregion, deciduous forest and woody wetland land cover, and Piney Woods Forest and Central Tall Grassland ecoregions. The first axis accounted for 34.2% of the variation, and the second axis accounted for 19.5%. HG 14 associations included herbaceous land cover, the Flint Hill Tall Grassland ecoregion, and the Central Tall Grassland ecoregion. HG 11, 12, and 13 are associated with deciduous forest land cover. HG 9 is associated with a maximum temperature range of 23.88 o C to 26.11 o C, woody wetland land cover, and Piney Woods Forest ecoregion. Based on the dbRDA results, land cover was influential for the separation of three HG and was the strongest predictor variable. Therefore, land cover was analyzed further using the discriminant analysis of principal components. The first seven principal components were analyzed, saving four discriminant functions with 92.9% of the variation accounted for. The first discriminant axis represented 11.0% of the variation, and the second axis represented 4.2% (Fig. 4). In general, individuals were clustered loosely near herbaceous land cover, which was the most isolated variable on the bi-plot. The pairwise AMOVA indicated differentiation with significant differences between nematodes from herbaceous and woody wetlands (P ≤ 0.01; Table 7). Visualization of individual membership to each of the land cover categories indicated an average reassignment of 52.4% in the category of deciduous forest (min = 17.9%, max = 80.2%, SD = 30.3%; Fig. 5A-E), followed by an average reassignment of 35.4% for woody wetlands (min = 1.5%, max = 67.6%, SD = 30.3%), 10.3% for cultivated crops (min = 0.8%, max = 24.6%, SD = 10.1%), 0.9% for developed open space (min < 0.1%, max = 2.6%, SD = 1.1%), and 0.9% for herbaceous land cover (min < 0.1%, max = 3.6%, SD = 1.0%).

Phenotypic variation
Morphological analysis of each HG in the discriminant function analysis identified that stylet length, total body length, and stylet knob width are the strongest distinguishing features among the seven HG as defined a priori (Fig. 6), and stylet length to be the strongest distinguishing feature. The analysis correctly classified 61 out of 87 (70.1%) of the sequences belonging to their respective groups (misclassifying 26). Canonical discriminant function for the first dimension indicated that stylet length and total body length accounted for most of the variation between groups (85.8% variation; Wilk's l = 0.189, F 18 = 13.059, P < 0.001). The eigenvector for stylet length extended longer than the eigenvectors for knob width, which were positioned at an angle close to 90 o C, suggesting little to no relationship between the two morphological features in their ability to differentiate between groups. Canonical discriminant function for the second dimension indicated that stylet length and knob width are inversely related and account for less variation than the first dimension (13.4%; Wilk's l = 0.606, F 10 = 4.426, P < 0.001). The third dimension of canonical correlations was not significant. Body length and knob width were slightly related as they are on a similar angle, with less than a 90 o angle and knob width having a slightly longer ray. HG 9 and 10, and HG 11, 12, and 13 formed two well-separated clusters. HG 8 overlapped both of these clusters. HG 14 was isolated, with no overlap with any other HG. HG 14 had the highest mean body length (641.2 mm) and knob width (13.8 mm), and the smallest stylet length (70.4 mm). HG 9 had the second largest mean body length (634.8 mm), knob width (13.5 mm), and stylet length (87.6 mm). HG 13 had the smallest body length (average 578 mm) and knob width (11.5 mm), and the second-smallest stylet length (72.3 mm).

Discussion
In this study, we sought to investigate the ecological and morphological groupings within M. xenoplax using multiple approaches to determine if the resulting groupings support one large, widespread, generalist species or multiple, genetically distinct, sub-populations, occupying specific niches. To test this, we focused on 132 samples that originated from a broad range of environments, ecosystems, habitats, and land cover patterns throughout the USA and Canada. Using COI DNA Sanger sequencing of individual specimens, we applied an Figure 3: An ordination plot using dbRDA bi-plot showing the seven most influential explanatory environmental variables (arrows) overlain on the first two eigenvectors of the analysis of Mesocriconema xenoplax HG. The length of the arrows is directly proportional to the strength of the correlation between the explanatory variable and the genetic variation calculated between individuals within each HG. Circles represent HG in ordination space, with the corresponding HG number indicated in the respective circle. Each axis indicates the amount of total variation each respective axis accounts for. dbRDA, distance-based redundancy analysis; HG, haplotype group. integrated approach to clarify haplotype groupings for M. xenoplax using three categories of genetic analyses: phylogenetic, species delimitation, and population structure. In this study, phylogenetic and species delimitation analyses provided ample support for seven independently evolving lineages (Powers Figure 4: Scatterplots of discriminant analysis of principal components, calculated on the category of land cover that distinguishes the sequential data the most. Points represent observed individual Mesocriconema xenoplax nematodes identified at their respective land cover type, connected to the population centroids. The center of each component is represented as black grid lines. Land cover types with fewer than five individual nematodes were removed from the analysis to reduce the amount of noise. The lower triangle displays the statistic φ, and the value with the upper triangle displays the statistical significance of the φ statistic. Land cover types with fewer than five individual nematodes were removed from the analysis. AMOVA, analysis of molecular variance. et al., 2014). Also in this study, population structure analyses identified differentiation for three populations by environmental variables, land cover, and ecoregion (Porazinska et al., 2003;Darby et al., 2011, Duyck et al., 2012Tsiafouli et al., 2017). Morphological analysis identified distinction for groups based on total body length, stylet length, and stylet knob width. Support for HG 9 and 14, and for the combination of HG 11, 12, and 13, were recognized based on environmental variables. HG 14 stood out as a group found only in native prairies and associated with midwestern prairie ecoregions and herbaceous land cover type. This distribution pattern has previously been observed in other criconematid nematode species (Powers et al., 2016;Olson et al., 2017). Of the seven land cover types, HG 11, 12, and 13 were distinguished by an association with deciduous forests, particularly within the GSMNP region and expanding northward. Members of these groups were not observed south of GSMNP. The AMOVA indicated that only deciduous forest and herbaceous land cover types were significantly different from all other land cover types. Morphological measurements distinguished HG 14, HG 9 and 10, and HG 11, 12, and 13. Future studies could explore the relationship between morphology and environment further, focusing on host-parasite relationships, particularly the degree to which species of plant parasites are generalists or host specialists. We were unable to find consistent support in our measurement of isolation by distance within HG, due to the lack of sufficient sample size (Castellano and Balletto, 2002;Lycett et al., 2011).
Our study was the first to analyze the population structure of M. xenoplax, combining genetic variation with environmental and morphological variations to identify predictive patterns of HG distribution and to highlight possible new species boundaries within M. xenoplax. Based on these results, we believe that there is sufficient support for the recognition of Figure 5A-E: Stacked bar plot of the probability of each Mesocriconema xenoplax nematode to be reassigned to its respective population, defined according to land cover type; A. deciduous forest, B. developed open space, C. woody wetlands, D. cultivated crops, and E. herbaceous. Only those land cover types with ³5 individuals were included in the analysis. The vertical axis is the probability (y-axis) of each individual (x-axis) being reassigned to its respective population, and the width of the bars depends on population size. one new species, comprised of HG 14, based on genetic, morphological, and ecological distinction. Other haplotype groupings might be supported as separate species, but we feel that additional evidence is necessary to fully assess species status. We also believe that the composition of the plant community at the scale of land cover and ecoregion is a potential indicator for subdivisions with M. xenoplax as observed in our population structure analysis. These conclusions parallel similar studies that recognize species boundaries of an organism by combining biological, physiological, and genetic data for the establishment of species boundaries within otherwise cryptic species (Cairns et al., 1993;Porazinska et al., 1999;Barbercheck et al., 2009;Shao and Tu, 2012). However, it was not the objective of this paper to describe new species in Mesocriconema xenoplax and await the clarification of host relationship analyses.
A surprising result of this study is the association of specific HG with agroecosystems, which has the potential to influence management strategies for nematodes feeding on agroeconomic hosts. In particular, the results have implications for the process of breeding for resistance in agronomic hosts, as different genetic populations may exhibit different physiological responses.
Collectively, our study provides multiple lines of evidence that the morphospecies named M. xenoplax contains separately evolving metapopulations that display ecological differentiation. In a broader sense, these results allow us to better understand the interactions that plant parasitic nematodes have historically maintained within specific environments and hosts, shedding light on how these associations may serve to predict nematode distributions in a changing environment.