http://www.cnr.it/ontology/cnr/individuo/prodotto/ID172145

A scalable algorithm for high-quality clustering of Web snippets (Contributo in atti di convegno)

Type

Prodotto della ricerca (Classe)
Contributo in atti di convegno (Classe)

Label

A scalable algorithm for high-quality clustering of Web snippets (Contributo in atti di convegno) (literal)

Anno

2006-01-01T00:00:00+01:00 (literal)

Alternative label

[1] Geraci F., [1] Pellegrini M., [1] Pisati P., [2] Sebastiani F. (2006)
A scalable algorithm for high-quality clustering of Web snippets
in 2006 ACM Symposium on Applied Computing (SAC 2006), Dijon (France), 23- 27 April 2006
(literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori

[1] Geraci F., [1] Pellegrini M., [1] Pisati P., [2] Sebastiani F. (literal)

Pagina inizio

1058 (literal)

Pagina fine

1062 (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#altreInformazioni

Codice Puma: cnr.iit/2006-A2-012 (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume

2 (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note

In: SAC-06. 21st ACM Symposium on Applied Computing (Dijon, FR, April 23-27). Proceedings, pp. 1058-1062. ACM Press, 2006. (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali

5 (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto

ABSTRACT: We consider the problem of partitioning, in a highly accurate emph{and} highly efficient way, a set of $n$ documents lying in a metric space into $k$ non-overlapping clusters. We augment the well-known emph{furthest-point-first} algorithm for $k$-center clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical $k$-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable. (literal)

Note

Scopu (literal)

Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni

[1] CNR-IIT, Pisa, Italy; [2] CNR-ISTI, Pisa, Italy (literal)

Titolo

A scalable algorithm for high-quality clustering of Web snippets (literal)

Abstract

We consider the problem of partitioning, in a highly accurate and highly e±cient way, a set of n documents lying in a met- ric space into k non-overlapping clusters. We augment the well-known furthest-point-¯rst algorithm for k-center clus- tering in metric spaces with a ¯ltering scheme based on the triangular inequality. We apply this algorithm to Web snip- pet clustering, comparing it against strong baselines consist- ing of recent, fast variants of the classical k-means iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clus- tering, either the real-time nature of the task or the large amount of data make the poorly scalable, traditional clus- tering methods unsuitable. (literal)

Editore

ACM (Editore)

Prodotto di

Autore CNR

FABRIZIO SEBASTIANI (Unità di personale interno)
FILIPPO GERACI (Unità di personale esterno)
PAOLO PISATI (Unità di personale esterno)
MARCO PELLEGRINI (Unità di personale interno)

Insieme di parole chiave

Parole chiave di "A scalable algorithm for high-quality clustering of Web snippets" (Insieme di parole chiave)

Incoming links:

Prodotto

Autore CNR di

MARCO PELLEGRINI (Unità di personale interno)
FABRIZIO SEBASTIANI (Unità di personale interno)
FILIPPO GERACI (Unità di personale esterno)
PAOLO PISATI (Unità di personale esterno)

Editore di

ACM (Editore)

Insieme di parole chiave di

Parole chiave di "A scalable algorithm for high-quality clustering of Web snippets" (Insieme di parole chiave)

data.CNR.it