With the advent of new sequencing technologies able to produce an enormous quantity of short genomic sequences, new tools able to search for them inside a references sequence genome have emerged. Because of chemical reading errors or of the variability between organisms, one is interested in finding not only exact occurrences, but also occurrences with up to k mismatches. The contribution of this paper is twofold. On one hand, we present a generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity . On the other hand, we show how to employ this idea in conjunction with an index over the text, allowing to search a pattern, with up to k mismatches, in time proportional to its length. This novel tool-rNA (randomized Numerical Aligner)-outperforms available tools like SOAP2, BWA, and BOWTIE, processing up to 10 times more patterns per second on texts of (practically) significant lengths.

A Randomized Numerical Aligner (rNA)

POLICRITI, Alberto;
2010-01-01

Abstract

With the advent of new sequencing technologies able to produce an enormous quantity of short genomic sequences, new tools able to search for them inside a references sequence genome have emerged. Because of chemical reading errors or of the variability between organisms, one is interested in finding not only exact occurrences, but also occurrences with up to k mismatches. The contribution of this paper is twofold. On one hand, we present a generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity . On the other hand, we show how to employ this idea in conjunction with an index over the text, allowing to search a pattern, with up to k mismatches, in time proportional to its length. This novel tool-rNA (randomized Numerical Aligner)-outperforms available tools like SOAP2, BWA, and BOWTIE, processing up to 10 times more patterns per second on texts of (practically) significant lengths.
2010
9783642130885
File in questo prodotto:
File Dimensione Formato  
rNA.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Non pubblico
Dimensione 388.59 kB
Formato Adobe PDF
388.59 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/882207
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact