http://www.cnr.it/ontology/cnr/individuo/prodotto/ID172145
A scalable algorithm for high-quality clustering of Web snippets (Contributo in atti di convegno)
- Type
- Label
- A scalable algorithm for high-quality clustering of Web snippets (Contributo in atti di convegno) (literal)
- Anno
- 2006-01-01T00:00:00+01:00 (literal)
- Alternative label
[1] Geraci F., [1] Pellegrini M., [1] Pisati P., [2] Sebastiani F. (2006)
A scalable algorithm for high-quality clustering of Web snippets
in 2006 ACM Symposium on Applied Computing (SAC 2006), Dijon (France), 23- 27 April 2006
(literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
- [1] Geraci F., [1] Pellegrini M., [1] Pisati P., [2] Sebastiani F. (literal)
- Pagina inizio
- Pagina fine
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#altreInformazioni
- Codice Puma: cnr.iit/2006-A2-012 (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
- In: SAC-06. 21st ACM Symposium on Applied Computing (Dijon, FR, April 23-27). Proceedings, pp. 1058-1062. ACM Press, 2006. (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
- ABSTRACT: We consider the problem of partitioning, in a highly accurate emph{and} highly efficient way, a set of $n$ documents lying in a metric space into $k$ non-overlapping clusters. We augment the well-known emph{furthest-point-first} algorithm for $k$-center clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical $k$-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable. (literal)
- Note
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
- [1] CNR-IIT, Pisa, Italy; [2] CNR-ISTI, Pisa, Italy (literal)
- Titolo
- A scalable algorithm for high-quality clustering of Web snippets (literal)
- Abstract
- We consider the problem of partitioning, in a highly accurate and highly e±cient way, a set of n documents lying in a met- ric space into k non-overlapping clusters. We augment the well-known furthest-point-¯rst algorithm for k-center clus- tering in metric spaces with a ¯ltering scheme based on the triangular inequality. We apply this algorithm to Web snip- pet clustering, comparing it against strong baselines consist- ing of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clus- tering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clus- tering methods unsuitable. (literal)
- Editore
- Prodotto di
- Autore CNR
- Insieme di parole chiave
Incoming links:
- Prodotto
- Autore CNR di
- Editore di
- Insieme di parole chiave di