Ortiz, Edgardo M. [1], Vargas, Oscar M [2], Schaefer, Hanno [1].

An integrated pipeline for ortholog discovery, bait design, and targeted capture data assembly and the development of a bait set for the Cucurbitaceae.

We developed a bioinformatics pipeline for bait design and assembly of capture sequencing datasets. The design modules streamline the discovery of ortholog markers using coding sequences from annotated genomes, transcriptomes or non-coding regions via sequence clustering. Our pipeline takes into consideration overall similarity, percentage of gaps, sequence length and, if complete gene annotations are available, proportion of capturable sequence (e.g. exons shorter than the baits are excluded from the design). Once markers have been selected for enrichment, the software creates a suitable set of baits that will not capture exons shorter than the bait length, will not span two adjacent exons, will adjust to the chemistry parameters of the hybridization (i.e. the baits must have an appropriate GC content and melting temperature, while minimizing the inclusion of homopolymers and low-complexity regions), and have proper tiling design by controlling the percentage of overlap between baits. The assembly modules separate the reads per maker and perform de novo assembly on each subset of reads; heterozygous calls are made by mapping back the original reads to their own assemblies using freebayes, which can work on any ploidy level. Within the framework of the Taxon-Omics project (DFG, SPP-1991) we designed a set of baits for the Cucurbitaceae that allow the capture of 1,132 markers: 854 orthologs selected by our pipeline, 101 selected by their function, and 177 non-coding regions. We applied this bait set to type material from the genus Cucumis. We present preliminary results of our phylogenetic analyses, that include plastomes assembled from off-target reads.

1 - Technical University Of Munich, Ecology & Ecosystem Management, Plant Biodiversity Research, Emil-Ramann Strasse 2, Freising, D-85354, Germany
2 - University of California, Santa Cruz, Ecology and Evolutionary Biology, 130 McAllister Way, Santa Cruz, CA, 95060, USA

targeted capture
target enrichment
nuclear genes
marker selection
bait design

