The present work is part of the ERC-funded project NOVABREED, which has as objective the characterization of the pan-genome of Vitis vinifera and Zea mays, through the application and the development of in silico methods for the analysis of Next Generation Sequencing (NGS) data. The concept of pan-genome arises from the observation that some DNA sequences are not shared by all subjects of a species, and that a single genome is not enough to describe the species. The DNA segments shared from all subjects of a species constitute the core genome, while those not present in all subjects compose the dispensable genome. Here, we focused on the genome of Zea mays, a complex and highly repeated genome, whose size is approximately 2.5 Gb (Schnable et al, Science 2009). Structural variants are an important source of genetic variation in plants, mostly due to large (>1000 bp) insertions and deletions of transposable elements (TEs) and are an important component of the dispensable genome. Maize dispensable fraction of the genome was characterized through the analysis of structural variants (SVs) in 7 inbred lines selected from the parental lines of the MAGIC maize population. As part of the project, a new algorithm (Walle) for the detection of insertions relying on split-read mapping (SR) has been developed, and its performance has been compared with existing tools. Results showed that Walle performed better than existing tools. Deletions were detected using publicly available tools, while insertions were detected using tools previously detected in our lab and the tool developed in the present project. A total of 48,904 deletions and 75,370 insertions were identified, accounting respectively for 0.56 Gb of sequences present in the B73 reference genome and absent in at least one other line, and 0.81 Gb of sequences present in at least one other line while absent in B73. Taken together, those results confirms previous pan-genome estimations (Morgante et al., Curr Opin Plant Biol 2007), in which the authors estimated the relative size of the dispensable genome as the 50% of the pan-genome, compared to our estimate of 48%. The composition of dispensable genome was investigated, confirming that a large fraction of extant variation in maize is due to LTR retrotransposons insertions and that most of them occurred in a relatively recent time. Although most SVs are located in intergenic regions, some of them are located in genes and may disrupt exons, leading to evolutionary consequences. We therefore assessed the function of genes affected by deletions and insertions. Nested elements were investigated in greater detail, and we confirmed that LTR retrotransposons form nesting structures more often than expected by chance alone, as previously reported (Jiang and Wessler, Plant Cell 2001). Moreover, nesting patterns were investigated, finding that most of nesting events occurs within a few families of LTR retrotransposons. The main results of the present work are a) a software tool for the accurate identification of insertions in the genome, which has been shown to outperform existing tools, has been used for the identification of insertions in Zea mays and can be used on the genome of any species, and b) the characterization of the dispensable genome of Zea mays, which resulted in important information on the patterns of the movement of transposable elements, on their nesting patterns, and on the function of genes affected by the movement of TEs.

Identification of structural variation in Zea mays: use of paired-end mapping and development of a novel algorithm based on split reads / Ettore Zapparoli - Udine. , 2017 Mar 17. 29. ciclo

Identification of structural variation in Zea mays: use of paired-end mapping and development of a novel algorithm based on split reads

ZAPPAROLI, Ettore
2017-03-17

Abstract

The present work is part of the ERC-funded project NOVABREED, which has as objective the characterization of the pan-genome of Vitis vinifera and Zea mays, through the application and the development of in silico methods for the analysis of Next Generation Sequencing (NGS) data. The concept of pan-genome arises from the observation that some DNA sequences are not shared by all subjects of a species, and that a single genome is not enough to describe the species. The DNA segments shared from all subjects of a species constitute the core genome, while those not present in all subjects compose the dispensable genome. Here, we focused on the genome of Zea mays, a complex and highly repeated genome, whose size is approximately 2.5 Gb (Schnable et al, Science 2009). Structural variants are an important source of genetic variation in plants, mostly due to large (>1000 bp) insertions and deletions of transposable elements (TEs) and are an important component of the dispensable genome. Maize dispensable fraction of the genome was characterized through the analysis of structural variants (SVs) in 7 inbred lines selected from the parental lines of the MAGIC maize population. As part of the project, a new algorithm (Walle) for the detection of insertions relying on split-read mapping (SR) has been developed, and its performance has been compared with existing tools. Results showed that Walle performed better than existing tools. Deletions were detected using publicly available tools, while insertions were detected using tools previously detected in our lab and the tool developed in the present project. A total of 48,904 deletions and 75,370 insertions were identified, accounting respectively for 0.56 Gb of sequences present in the B73 reference genome and absent in at least one other line, and 0.81 Gb of sequences present in at least one other line while absent in B73. Taken together, those results confirms previous pan-genome estimations (Morgante et al., Curr Opin Plant Biol 2007), in which the authors estimated the relative size of the dispensable genome as the 50% of the pan-genome, compared to our estimate of 48%. The composition of dispensable genome was investigated, confirming that a large fraction of extant variation in maize is due to LTR retrotransposons insertions and that most of them occurred in a relatively recent time. Although most SVs are located in intergenic regions, some of them are located in genes and may disrupt exons, leading to evolutionary consequences. We therefore assessed the function of genes affected by deletions and insertions. Nested elements were investigated in greater detail, and we confirmed that LTR retrotransposons form nesting structures more often than expected by chance alone, as previously reported (Jiang and Wessler, Plant Cell 2001). Moreover, nesting patterns were investigated, finding that most of nesting events occurs within a few families of LTR retrotransposons. The main results of the present work are a) a software tool for the accurate identification of insertions in the genome, which has been shown to outperform existing tools, has been used for the identification of insertions in Zea mays and can be used on the genome of any species, and b) the characterization of the dispensable genome of Zea mays, which resulted in important information on the patterns of the movement of transposable elements, on their nesting patterns, and on the function of genes affected by the movement of TEs.
17-mar-2017
Zea mays; Maize; Structural variants; Sequencing NGS; Pan-genome; Dispensable genome; Transposable elements Bioinformatics
Identification of structural variation in Zea mays: use of paired-end mapping and development of a novel algorithm based on split reads / Ettore Zapparoli - Udine. , 2017 Mar 17. 29. ciclo
File in questo prodotto:
File Dimensione Formato  
10990_773_tesi_zapparoli_definitiva.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 6.85 MB
Formato Adobe PDF
6.85 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1132184
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact