The theme of the thesis is sequencing (large) genomes and assembling them: an area at the intersection of algorithmics and technology. The birth of next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms dropped the costs of genome analysis by orders of magnitude compared to the older (Sanger) method. These events also paved the way to a continuously increasing number of genome sequencing projects and the need of redesigning several algorithms (as well as data structures) in order to cope with the computational challenges introduced by the latest technologies. In this dissertation we explore two major problems: de novo assembly and long-sequence alignment. The former has been tackled, first, with a global approach and then by taking advantage of a hierarchical scheme (more natural considering the type of dataset at our disposal). More precisely, we proposed a novel assembly reconciliation tool which also proved to be competitive with state-of-the-art competitors and the only one able to scale with large datasets. The second problem analyzed, instead, has been studied in order to extend and speed up a computationally critical phase of the first one. Specifically, it consists in aligning and merging pools of long assembled sequences, each one representing a small fraction of the genome and independently assembled from NGS data. We devised a hierarchical framework (HAM) and a fingerprint-based algorithm (DFP) for merging and detecting overlaps between long and accurate sequences. Also in this case, the tools we developed achieved comparable results with state-of-the-art softwares, while using considerably less computational resources

Alignment and reconciliation strategies for large-scale de novo assembly / Riccardo Vicedomini - Udine. , 2016 Apr 04. 27. ciclo

Alignment and reconciliation strategies for large-scale de novo assembly

Vicedomini, Riccardo
2016-04-04

Abstract

The theme of the thesis is sequencing (large) genomes and assembling them: an area at the intersection of algorithmics and technology. The birth of next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms dropped the costs of genome analysis by orders of magnitude compared to the older (Sanger) method. These events also paved the way to a continuously increasing number of genome sequencing projects and the need of redesigning several algorithms (as well as data structures) in order to cope with the computational challenges introduced by the latest technologies. In this dissertation we explore two major problems: de novo assembly and long-sequence alignment. The former has been tackled, first, with a global approach and then by taking advantage of a hierarchical scheme (more natural considering the type of dataset at our disposal). More precisely, we proposed a novel assembly reconciliation tool which also proved to be competitive with state-of-the-art competitors and the only one able to scale with large datasets. The second problem analyzed, instead, has been studied in order to extend and speed up a computationally critical phase of the first one. Specifically, it consists in aligning and merging pools of long assembled sequences, each one representing a small fraction of the genome and independently assembled from NGS data. We devised a hierarchical framework (HAM) and a fingerprint-based algorithm (DFP) for merging and detecting overlaps between long and accurate sequences. Also in this case, the tools we developed achieved comparable results with state-of-the-art softwares, while using considerably less computational resources
4-apr-2016
Alignment and reconciliation strategies for large-scale de novo assembly / Riccardo Vicedomini - Udine. , 2016 Apr 04. 27. ciclo
File in questo prodotto:
File Dimensione Formato  
10990_684_thesis_final_pdfa.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 2.32 MB
Formato Adobe PDF
2.32 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1132931
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact