The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=min{Hk+1,⌈logσ⌉} . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform

POLICRITI, Alberto;GIGANTE, Nicola;PREZZA, Nicola
2015-01-01

Abstract

The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=min{Hk+1,⌈logσ⌉} . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
2015
978-3-319-15578-4
978-3-319-15579-1
978-3-319-15578-4
978-3-319-15579-1
File in questo prodotto:
File Dimensione Formato  
lata2015.pdf

solo utenti autorizzati

Descrizione: Articolo principale
Tipologia: Documento in Pre-print
Dimensione 320.96 kB
Formato Adobe PDF
320.96 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1063735
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact