CLEANUP: A fast computer program for removing redundancies from nucleotide sequence databases (Articolo in rivista)

Type
Label
  • CLEANUP: A fast computer program for removing redundancies from nucleotide sequence databases (Articolo in rivista) (literal)
Anno
  • 1996-01-01T00:00:00+01:00 (literal)
Alternative label
  • Grillo, G.; Attimonelli, M.; Liuni, S.; Pesole, G. (1996)
    CLEANUP: A fast computer program for removing redundancies from nucleotide sequence databases
    in Computer applications in the biosciences (Print)
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Grillo, G.; Attimonelli, M.; Liuni, S.; Pesole, G. (literal)
Pagina inizio
  • 1 (literal)
Pagina fine
  • 8 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
  • 12 (literal)
Rivista
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroFascicolo
  • 1 (literal)
Note
  • ISI Web of Science (WOS) (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • Universitá della Basilicata, Potenza, Centro di Studio sui Mitocondri e Metabolismo Energetico,Bari, Universitá di Bari, Bari, Italy (literal)
Titolo
  • CLEANUP: A fast computer program for removing redundancies from nucleotide sequence databases (literal)
Abstract
  • A key concept in comparing sequence collections is the issue of redundancy. The production of sequence collections free from redundancy is undoubtedly very useful, both in performing statistical analyses and accelerating extensive database searching on nucleotide sequences. Indeed, publicly available databases contain multiple entries of identical or almost identical sequences. Performing statistical analysis on such biased data makes the risk of assigning high significance to non-significant patterns very high. In order to carry out unbiased statistical analysis as well as more efficient database searching it is thus necessar), to analyse sequence data that have been purged of redundancy. Given that a unambiguous definition of redundancy is impracticable for biological sequence data, in the present program a quantitative description of redundancy will be used, based on the measure of sequence similarity. A sequence is considered redundant if it shows a degree of similarity and overlapping with a longer sequence in the database greater than a threshold fixed by the user. In this paper we present a new algorithm based on an approximate string matching' procedure, which is able to determine the overall degree of similarity between each pair of sequences contained in a nucleotide sequence database and to generate automatically nucleotide sequence collections free from redundancies. (literal)
Prodotto di
Autore CNR

Incoming links:


Autore CNR di
Prodotto
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
data.CNR.it