The theme of the thesis is sequencing (large) genomes and assembling them: an area at the intersection of algorithmics and technology. The birth of next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms dropped the costs of genome analysis by orders of magnitude compared to the older (Sanger) method. These events also paved the way to a continuously increasing number of genome sequencing projects and the need of redesigning several algorithms (as well as data structures) in order to cope with the computational challenges introduced by the latest technologies. In this dissertation we explore two major problems: de novo assembly and long-sequence alignment. The former has been tackled, first, with a global approach and then by taking advantage of a hierarchical scheme (more natural considering the type of dataset at our disposal). More precisely, we proposed a novel assembly reconciliation tool which also proved to be competitive with state-of-the-art competitors and the only one able to scale with large datasets. The second problem analyzed, instead, has been studied in order to extend and speed up a computationally critical phase of the first one. Specifically, it consists in aligning and merging pools of long assembled sequences, each one representing a small fraction of the genome and independently assembled from NGS data. We devised a hierarchical framework (HAM) and a fingerprint-based algorithm (DFP) for merging and detecting overlaps between long and accurate sequences. Also in this case, the tools we developed achieved comparable results with state-of-the-art softwares, while using considerably less computational resources
Alignment and reconciliation strategies for large-scale de novo assembly / Riccardo Vicedomini - Udine. , 2016 Apr 04. 27. ciclo
Alignment and reconciliation strategies for large-scale de novo assembly
Vicedomini, Riccardo
2016-04-04
Abstract
The theme of the thesis is sequencing (large) genomes and assembling them: an area at the intersection of algorithmics and technology. The birth of next-generation sequencing (NGS) and third-generation sequencing (TGS) platforms dropped the costs of genome analysis by orders of magnitude compared to the older (Sanger) method. These events also paved the way to a continuously increasing number of genome sequencing projects and the need of redesigning several algorithms (as well as data structures) in order to cope with the computational challenges introduced by the latest technologies. In this dissertation we explore two major problems: de novo assembly and long-sequence alignment. The former has been tackled, first, with a global approach and then by taking advantage of a hierarchical scheme (more natural considering the type of dataset at our disposal). More precisely, we proposed a novel assembly reconciliation tool which also proved to be competitive with state-of-the-art competitors and the only one able to scale with large datasets. The second problem analyzed, instead, has been studied in order to extend and speed up a computationally critical phase of the first one. Specifically, it consists in aligning and merging pools of long assembled sequences, each one representing a small fraction of the genome and independently assembled from NGS data. We devised a hierarchical framework (HAM) and a fingerprint-based algorithm (DFP) for merging and detecting overlaps between long and accurate sequences. Also in this case, the tools we developed achieved comparable results with state-of-the-art softwares, while using considerably less computational resourcesFile | Dimensione | Formato | |
---|---|---|---|
10990_684_thesis_final_pdfa.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Licenza:
Non specificato
Dimensione
2.32 MB
Formato
Adobe PDF
|
2.32 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.