Cluster Generation and Cluster Labelling for Web Snippets:A Fast and Accurate Hierarchical Solution (Articolo in rivista)

Type
Label
  • Cluster Generation and Cluster Labelling for Web Snippets:A Fast and Accurate Hierarchical Solution (Articolo in rivista) (literal)
Anno
  • 2007-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.1007/11880561_3 (literal)
Alternative label
  • [1] Geraci F., [1] Pellegrini M., [2] Seabastiani F., [3] Maggini M. (2007)
    Cluster Generation and Cluster Labelling for Web Snippets:A Fast and Accurate Hierarchical Solution
    in Internet mathematics (Print)
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • [1] Geraci F., [1] Pellegrini M., [2] Seabastiani F., [3] Maggini M. (literal)
Pagina inizio
  • 413 (literal)
Pagina fine
  • 444 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
  • 3 (literal)
Rivista
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 32 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • [1] IIT-CNR, Pisa, Italy; [2] ISTI-CNR, Pisa, Italy; [3] Dipartimento di ingegneria dell'informazione, Università di Siena, Italy (literal)
Titolo
  • Cluster Generation and Cluster Labelling for Web Snippets:A Fast and Accurate Hierarchical Solution (literal)
Abstract
  • This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Strik- ing the right balance between running time and cluster well- formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the °y by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowl- edge. Clustering is performed by means of a fast version of the furthest-point-¯rst algorithm for metric k-center cluster- ing. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering ef- fectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Di- rectory Project hierarchy. According to two widely accepted \external' metrics of clustering quality, Armil achieves bet- ter performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz ma- chine, Armil performs clustering and labelling altogether in less than one second. (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
Insieme di parole chiave di
data.CNR.it