A Scalable Algorithm for Metric High-Quality Clustering in Information Retrieval Tasks (Rapporti tecnici, manuali, carte geologiche e tematiche e prodotti multimediali)

Type
Label
  • A Scalable Algorithm for Metric High-Quality Clustering in Information Retrieval Tasks (Rapporti tecnici, manuali, carte geologiche e tematiche e prodotti multimediali) (literal)
Anno
  • 2005-01-01T00:00:00+01:00 (literal)
Alternative label
  • Geraci F., Pellegrini M., Pisati P. (2005)
    A Scalable Algorithm for Metric High-Quality Clustering in Information Retrieval Tasks
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Geraci F., Pellegrini M., Pisati P. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • Technical Report IIT TR-08/2005 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
  • We consider the problem of finding efficiently a high quality k-clustering of n points in a (possibly discrete) metric space. Many methods are known when the point are vectors in a real vector space, and the distance function is a standard geometric distance such as L1, L2 (Euclidean) or L2 2 (squared Euclidean distance). In such cases efficiency is often sought via sophisticated multidimensional search structures for speeding up nearest neighbor queries (e.g. variants of kd-trees). Such techniques usually work well in spaces of moderately high dimension say up to 6 or 8). Our target is a scenario in which either the metric space cannot be mapped into a vector space, or, if this mapping is possible, the dimension of such a space is so high as to rule out the use of the above mentioned techniques. This setting is rather typical in Information Retrieval applications. We augment the well known furthest-point-first algorithm for kcenter clustering in metric spaces with a filtering step based on the triangular inequality and we compare this algorithm with some recent fast variants of the classical k-means iterative algorithm augmented with an analogous filtering schemes. We extensively tested the two solutions on synthetic geometric data and real data from Information Retrieval applications. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#supporto
  • Altro (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • IIT-CNR (literal)
Titolo
  • A Scalable Algorithm for Metric High-Quality Clustering in Information Retrieval Tasks (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Insieme di parole chiave di
data.CNR.it