Compressed representations of sequences and full-text indexes (Articolo in rivista)

Type
Label
  • Compressed representations of sequences and full-text indexes (Articolo in rivista) (literal)
Anno
  • 2007-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.1145/1240233.1240243 (literal)
Alternative label
  • [1] Manzini G., [2] Ferragina P., [3] Mäkinen V., [4] Navarro G. (2007)
    Compressed representations of sequences and full-text indexes
    in ACM transactions on algorithms; ACM Press, New York (Stati Uniti d'America)
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • [1] Manzini G., [2] Ferragina P., [3] Mäkinen V., [4] Navarro G. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
  • 3 (literal)
Rivista
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • DOI: http://doi.acm.org/10.1145/1240233.1240243 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 20 (literal)
Note
  • Scopu (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • [1] CNR-IIT, Pisa, Italy; [2] Univ. di Pisa, Pisa, Italy; [3] University of Helsinki, Finland; [4] University of Chile, Santiago, Cile (literal)
Titolo
  • Compressed representations of sequences and full-text indexes (literal)
Abstract
  • Given a sequence S = s_1 s_2 ... s_n of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any s_q, as well as answer rank and select queries on S, in constant time. H0(S) is the zero-order empirical entropy of S and nH0(S) provides an information-theoretic lower bound to the bit storage of any sequence S via a fixed encoding of its symbols. This extends previous results on binary sequences, and improves previous results on general sequences where those queries are answered in O(log r) time. For larger r, we can still represent S in nH_0(S) + o(n log r) bits and answer queries in O(log r/log log n) time. Another contribution of this article is to show how to combine our compressed representation of integer sequences with a compression boosting technique to design compressed full-text indexes that scale well with the size of the input alphabet. Specifically, we design a variant of the FM-index that indexes a string T[1, n] within nHk(T) + o(n) bits of storage, where Hk(T) is the kth-order empirical entropy of T. Compared to all previous works, our index is the first that removes the alphabet-size dependance from all query times, in particular, counting time is linear in the pattern length. Still, our index uses essentially the same space of the kth-order entropy of the text T, which is the best space obtained in previous work. (literal)
Editore
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Editore di
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
Insieme di parole chiave di
data.CNR.it