Genome Announcement: The Draft Genome of the Carrot Cyst Nematode Heterodera Carotae

Abstract Heterodera carotae, the carrot cyst nematode, is a significant pest affecting carrot globally. Here we present the draft genome of H. carotae, which was generated from short read libraries from Illumina HiSeq technology, and the corresponding genome annotation.

The carrot cyst nematode, Heterodera carotae, is a considerable pest affecting carrot growing regions around the world, including much of Europe, South Africa, Mexico, Cyprus, Chile, Ontario, Canada, and Michigan in the United States (Berney and Bird, 1992;Subbotin et al., 2010;Yu et al., 2017;Escobar-Avila et al., 2018;Madani et al., 2018;CABI, 2021). Although H. carotae has a host range limited to species in the Apiaceae family, mainly wild and cultivated carrot (Daucus carota), infestation of commercial production fields by this nematode can be devastating with yield losses between 12% and 80% (Greco et al., 1993;Subbotin et al., 2010). Despite the significance of this pest, there are no genomic or transcriptomic resources publically available for this nematode. To date, research utilizing molecular tools in H. carotae has been limited to the development of a molecular diagnostic for populations of H. carotae from Ontario, Canada, and Northern Italy (Madani et al., 2018) and microsatellite genotyping of H. carotae to examine the gene flow in French populations (Esquibet et al., 2020). The H. carotae genome will provide a valuable resource to researchers tackling pest management of H. carotae and to the wider cyst nematode research community.
In 2019, 1 kg of soil was collected from a carrot field in Calama, Antofagasta region, Chile. The soil was placed in a paper bag. At the Nematology Laboratory of Agriculture and Livestock Service (Santiago, Chile) the soil samples were placed on trays and allowed to dry for at least a week. Cysts were then extracted from 250 g of soil using a modified Fenwick can (Fenwick, 1940). The cysts were picked from the samples and placed in DESS (Yoder et al., 2006). The samples were shipped to the USDA-ARS Horticultural Crops Research Unit in Corvallis, OR for DNA extraction and sequencing. From the picked cysts, 25 were hand selected, broken open in sterilized water using a scalpel, and the eggs were collected using a Pasteur pipette. Genomic DNA was extracted from the eggs using the QIAmp DNA Micro Kit (Qiagen, Hilden, Germany) following manufacturer's instructions. Sequencing and library preparation were done at the Center for Qualitative Life Sciences at Oregon State University (Corvallis, OR). The NEBNext Ultra II DNA Library Prep Kit for Illumina (San Diego, CA) was used to prepare a whole genome shotgun library from egg DNA and sequencing was performed on the Illumina HiSeq 3000 platform. The cysts were also sent to the USDA-ARS Mycology and Nematology Genetic Diversity and Biology Laboratory (Beltsville, MD) to confirm species identification through morphometrics.
To assemble the genome, raw sequencing data was first subjected to adaptor removal and quality control (Q = 20) using Trim Galore! v. 0.6.6 (Krueger, 2020) resulting in 33,127,853 150-bp paired-end reads. Using the filtered reads, a de novo assembly was generated using metaSPAdes v. 3.15.3 (Nurk et al., 2017). The de novo meta-assembly was visualized using the Blob Tools workflow (Kumar et al., 2013) to identify and remove contaminating contigs. Each contig in the meta-assembly was assigned a phylum ID based on BLAST similarity (E-value < 10e −25 ) to sequences in the NCBI "nt" database or other plant-parasitic nematode genomes ( . The read coverage and GC content of each contig were used to visualize the assembly. Reads that belonged to contigs that were identified as Nematoda or had no assigned identity were retained and used to assemble the H. carotae genome using de novo assembler SPAdes v. 3.15.3 (Bankevich et al., 2012). This second assembly was again subjected to the Blob Tools BLASTbased filtering of reads to remove any remaining contamination before reassembly using SPAdes to achieve the final assembly of the H. carotae genome. The final assembly was visualized using each contig's coverage, GC content, and phylum identity ( Supplementary Fig. 1). Using Pilon v. 1.22 (Walker et al., 2014) gap-filling, mis-assembly correction, base correction, and scaffolding were performed on the final assembly. The genome statistics of the corrected version of the final assembly were calculated using QUAST (Gurevich et al., 2013).
The goal of this genome sequencing effort was to rapidly provide usable data that was of sufficient quality to the nematology community to expand genomic resources available for cyst nematodes. The scaffolded assembly of the H. carotae genome was 95,118,078 bp in 17,839 scaffolds (Table 1). Two other Heterodera species have publicly available genomes, H. glycines and H. schachtii, which are 1.66 and 1.88X larger than the H. carotae genome, respectively (Table 1). Unlike H. carotae, the genomes for H. glycines and H. schachtii were assembled in fewer segments and consisted of 9 and 395 scaffolds, respectively. Using the raw data as input and a kmer size of 21, GenomeScope (Vurture et al., 2017) was used to determine the potential genome size and repeat length for the H. carotae genome which were estimated at 120 Mb and 81 Mb, respectively. The level of duplication and the number of repetitive regions is unclear in the H. schachtii genome. The H. glycines genome contains 34% repeated regions and 18.7 Mb of tandem duplicates (Masonbrink et al., 2019), indicating that the fragmented assembly of H. carotae may be due in part to the inability of Illumina sequencing to resolve repetitive and low complexity portions of the genome. Additionally, some repetitive regions of the genome may have been lost due to the high stringency standards used when BLAST filtering for contamination.  (Table 1).
To determine the completeness of the H. carotae genome BUSCO v.5.2.2 was used (Simão et al., 2015). BUSCO was also run on the H. glycines and H. schachtii genomes for comparison (Table 1). The H. carotae genome was ~5% less complete than the H. glycines and H. schachtii genomes. All three genomes shared 927 complete genes and 1,391 missing BUSCO genes. Across all three genomes, there are 1,613 complete BUSCO genes. In Supplementary Fig. 2, a Venn diagram can be found depicting the overlap of complete BUSCO genes across the three Heterodera species. Although all three genomes had complete BUSCO scores below 50% they are in line with other plant-parasitic nematode genomes that range in completeness from 40% to 60% (Howe et al., 2016(Howe et al., , 2017. In addition to confirming specimen identity using morphometrics, coxI and hsp90 were used to place the specimen in a phylogenetic context. All available sequences of coxI for H. carotae and Heterodera cruciferae were obtained from NCBI, along with a random selection of coxI accessions from 31 other Heterodera species, and five accessions of coxI from Meloidogyne species as an outgroup. All NCBI accessions used are denoted in Supplementary  Fig. 3 (Price et al., 2010), and trees were visualized using TreeGraph 2 (Stöver and Müller, 2010). As shown in Supplementary  Fig. 3 Supplementary Fig. 4. Similar to the coxI results, the H. carotae from Chile fell into a clade separate from other Heterodera species but with other H. carotae isolates, supported by a bootstrap value of 0.943 ( Supplementary Fig. 4).
The H. carotae genome was annotated using BRAKER v. 2.1.5 (Hoff et al., 2016) with protein hints from H. glycines (Masonbrink et al., 2019). In order to compare across other Heterodera species genomes, H. schachtii was also annotated with the same parameters as H. carotae. H. schachtii and H. glycines have 9,556 and 12,467 more genes, respectively, than H. carotae, which has 17,212 protein coding genes identified in the genome (Table 1). In the H. glycines genome there is a strong colocalization of effector genes in highly duplicated scaffolds (Masonbrink et al., 2019). Some protein coding genes could be lost in H. carotae genome due to the collapse of repeated and duplicated regions in the current assembly. Additionally, H. glycines was annotated with the addition of RNAseq data, which is not available for H. carotae. The addition of RNAseq data could both improve the predicted proteins calls in the H. carotae annotation and increase the number of predicted proteins.
The H. carotae genome assembly, annotation, and raw data can be found under the NCBI BioProject PRJNA774818. This version of the H. carotae genome provides a starting point for further investigation into the basic biology of H. carotae and a resource for the greater cyst nematode genomics community. Further sequencing and improvements to the H. carotae genome would increase its usefulness; however, in its current state this genome offers researchers a resource for discovering novel H. carotae effectors, developing diagnostic markers, or examining genomic similarities across cyst species. Figure S2: Venn diagram of complete BUSCO genes across Heterodera glycines, H. carotae, and H. schachtii. A BUSCO gene was counted as complete if denoted in the BUSCO analysis results as either complete or duplicated. Figure S3: Phylogenetic tree of Chilean Heterodera carotae Cyclooxygenase 1 (coxI) gene in relation to other Heterodera species. The Chilean H. carotae coxI gene was extracted from the final assembly and aligned with all available coxI sequences on NCBI for H. carotae and H. cruciferae, a random selection of coxI accessions from thirty-one other Heterodera species, and five accessions of coxI from Meloidogyne species as an outgroup. A Newick phylogenetic tree was generated from the alignment. Bootstrap values are indicated on branch points in this tree and the placement within the tree of H. carotae coxI from this study highlighted in red. NCBI accessions for the sequence used are listed next to each species in the tree. Figure S4: Phylogenetic tree of Chilean Heterodera carotae heat shock protein 90 (hsp90) gene in relation to other Heterodera species. The Chilean H. carotae hsp90 gene was extracted from the final assembly and aligned with hsp90 sequences from NCBI of four Globodera species, six Heterodera species, all available H. carotae accessions, two Cactodera species, and six Meloidogyne species as an outgroup. A Newick phylogenetic tree was generated from the alignment. Bootstrap values are indicated on branch points in this tree and the placement within the tree of H. carotae hsp90 from this study is highlighted in red. NCBI accessions for the sequence used are listed next to each species in the tree.