Abstract Detail



Conference Wide

Gruenstaeudl, Michael [1].

Assessing sequence coverage and inverted repeat annotations among complete plastid genomes.

The sequencing and comparative analysis of complete plastid genomes has become a common, almost routine procedure in contemporary botanical research. Researchers can now choose from a plethora of user-friendly software tools for genome assembly and annotation which enable them to generate dozens, if not hundreds, of complete plastid genomes per investigation. Understandably, this ease in data generation has prompted some researchers to consider the assembly and annotation of plastid genomes a triviality and to implicitly assume the correctness of plastid genomes archived on public sequence databases. However, a growing number of studies are reporting potential quality issues with publicly available plastid genomes, and many researchers can cite anecdotal evidence of incorrect genome assembly or annotation despite using state-of-the-art tools. The systematic detection of suboptimal plastid genome assemblies or annotations is challenging, and no single method exists that can be used to identify such anomalies comprehensively. However, several bioinformatic strategies have been reported that seem to provide quality indicators for complete plastid genome sequences. In this workshop, we will discuss two of these indicators: sequence coverage and inverted repeat annotation. First, we discuss the application of the R package PACVr (https://doi.org/10.1186/s12859-020-3475-0), which can visualize sequencing depth and evenness across complete plastid genomes to highlight regions of reduced coverage depth. Second, we will discuss the application of the Python package airpg (https://pypi.org/project/airpg/), which can survey thousands of archived plastid genomes and automatically parse sequence annotations to identify the presence or absence of inverted repeat annotations. Both tools were designed to assist in the process of quality control of complete plastid genomes, and their application on land plant plastid genomes will be demonstrated. Participants will be guided through both tools in a step-by-step process. While prepared datasets will be provided, I encourage attendees to bring their own land plant plastid genomes so that they can be evaluated right then and there. Specifically, users should bring at least one plastid genome (in GenBank flatfile format) as well as the underlying sequence reads of that genome (in FASTQ format). Please note: This workshop is intended for researchers with prior experience in plastid genome assembly and annotation. Participants should join with a UNIX-compatible operating system (OS-X or Linux) and should have a basic understanding of the UNIX command line.


Related Links:
https://pypi.org/project/airpg/
https://doi.org/10.1186/s12859-020-3475-0


1 - Freie Universitaet Berlin, Institute of Biology, Altensteinstr. 6, Berlin, 14195, Germany

Keywords:
none specified

Presentation Type: Workshop
Number:
Abstract ID:374
Candidate for Awards:None


Copyright © 2000-2021, Botanical Society of America. All rights reserved

aws4