Know your neighbors: Web spam detection using the web topology (Contributo in atti di convegno)

Type
Label
  • Know your neighbors: Web spam detection using the web topology (Contributo in atti di convegno) (literal)
Anno
  • 2007-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.1145/1277741.1277814 (literal)
Alternative label
  • Castillo C.; Donato D.; Gionis A.; Murdock V.; Silvestri F. (2007)
    Know your neighbors: Web spam detection using the web topology
    in 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, Netherland, 23-27 July 2007
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Castillo C.; Donato D.; Gionis A.; Murdock V.; Silvestri F. (literal)
Pagina inizio
  • 423 (literal)
Pagina fine
  • 430 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#altreInformazioni
  • Lavoro con piĆ¹ di 20 citazioni all'ultima valutazione. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://dl.acm.org/citation.cfm?id=1277814&CFID=106740534&CFTOKEN=21970113 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#titoloVolume
  • SIGIR '07 The 30th Annual International SIGIR Conference Amsterdam -- July 23 - 27, 2007 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, Netherland, 23-27 July 2007). Proceedings, pp. 423 - 430. ACM, 2007. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
  • ABSTRACT: Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. In this paper we present a spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages. We find that linked hosts tend to belong to the same class: either both are spam or both are non-spam. We demonstrate three methods of incorporating the Web graph topology into the predictions obtained by our base classifier: (i) clustering the host graph, and assigning the label of all hosts in the cluster by majority vote, (ii) propagating the predicted labels to neighboring hosts, and (iii) using the predicted labels of neighboring hosts as new features and retraining the classifier. The result is an accurate system for detecting Web spam, tested on a large and public dataset, using algorithms that can be applied in practice to large-scale Web data. (literal)
Note
  • Scopu (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • Yahoo Research, Barcelona (Castillo C.; Donato D.; Gionis A.; Murdock V.; ) CNR-ISTI, Pisa (Silvestri F.) (literal)
Titolo
  • Know your neighbors: Web spam detection using the web topology (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#isbn
  • 978-1-59593-597-7 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#curatoriVolume
  • Wessel Kraaij;Arjen P. de Vries;Charles L. A. Clarke;Norbert Fuhr;Noriko Kando (literal)
Abstract
  • Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. In this paper we present a spam detection system that combines link-based and content-based features, and uses the topology of the Web graph by exploiting the link dependencies among the Web pages. We find that linked hosts tend to belong to the same class: either both are spam or both are non-spam. We demonstrate three methods of incorporating the Web graph topology into the predictions obtained by our base classifier: (i) clustering the host graph, and assigning the label of all hosts in the cluster by majority vote, (ii) propagating the predicted labels to neighboring hosts, and (iii) using the predicted labels of neighboring hosts as new features and retraining the classifier. The result is an accurate system for detecting Web spam, tested on a large and public dataset, using algorithms that can be applied in practice to large-scale Web data. (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Insieme di parole chiave di
data.CNR.it