Mining top-K patterns from binary datasets in presence of noise (Contributo in atti di convegno)

Type
Label
  • Mining top-K patterns from binary datasets in presence of noise (Contributo in atti di convegno) (literal)
Anno
  • 2010-01-01T00:00:00+01:00 (literal)
Alternative label
  • Lucchese C.; Orlando S.; Perego R. (2010)
    Mining top-K patterns from binary datasets in presence of noise
    in Tenth SIAM International Conference on Data Mining, Columbus, Ohio, US
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Lucchese C.; Orlando S.; Perego R. (literal)
Pagina inizio
  • 165 (literal)
Pagina fine
  • 176 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://www.siam.org/proceedings/datamining/2010/dm10_015_lucchesec.pdf (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • In: SDM10 - Tenth SIAM International Conference on Data Mining (Columbus, Ohio, US, April 29 - May 1 2010). Proceedings, pp. 165 - 176. SIAM, 2010. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
  • ABSTRACT: The discovery of patterns in binary dataset has many ap- plications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and real- world data. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • CNR-ISTI, Pisa, Dipartimento di Informatica, Università Ca' Foscari di Venezia, CNR-ISTI, Pisa (literal)
Titolo
  • Mining top-K patterns from binary datasets in presence of noise (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#isbn
  • 978-0-898717-03-7 (literal)
Abstract
  • The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data. (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Autore CNR di
Prodotto
Insieme di parole chiave di
data.CNR.it