A scalable algorithm for high-quality clustering of Web snippets (Contributo in atti di convegno)

Type
Label
  • A scalable algorithm for high-quality clustering of Web snippets (Contributo in atti di convegno) (literal)
Anno
  • 2006-01-01T00:00:00+01:00 (literal)
Alternative label
  • [1] Geraci F., [1] Pellegrini M., [1] Pisati P., [2] Sebastiani F. (2006)
    A scalable algorithm for high-quality clustering of Web snippets
    in 2006 ACM Symposium on Applied Computing (SAC 2006), Dijon (France), 23- 27 April 2006
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • [1] Geraci F., [1] Pellegrini M., [1] Pisati P., [2] Sebastiani F. (literal)
Pagina inizio
  • 1058 (literal)
Pagina fine
  • 1062 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#altreInformazioni
  • Codice Puma: cnr.iit/2006-A2-012 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
  • 2 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • In: SAC-06. 21st ACM Symposium on Applied Computing (Dijon, FR, April 23-27). Proceedings, pp. 1058-1062. ACM Press, 2006. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 5 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
  • ABSTRACT: We consider the problem of partitioning, in a highly accurate emph{and} highly efficient way, a set of $n$ documents lying in a metric space into $k$ non-overlapping clusters. We augment the well-known emph{furthest-point-first} algorithm for $k$-center clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical $k$-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable. (literal)
Note
  • Scopu (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • [1] CNR-IIT, Pisa, Italy; [2] CNR-ISTI, Pisa, Italy (literal)
Titolo
  • A scalable algorithm for high-quality clustering of Web snippets (literal)
Abstract
  • We consider the problem of partitioning, in a highly accurate and highly e±cient way, a set of n documents lying in a met- ric space into k non-overlapping clusters. We augment the well-known furthest-point-¯rst algorithm for k-center clus- tering in metric spaces with a ¯ltering scheme based on the triangular inequality. We apply this algorithm to Web snip- pet clustering, comparing it against strong baselines consist- ing of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clus- tering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clus- tering methods unsuitable. (literal)
Editore
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Editore di
Insieme di parole chiave di
data.CNR.it