Scalability issues for self similarity join in distributed systems (Contributo in atti di convegno)

Type
Label
  • Scalability issues for self similarity join in distributed systems (Contributo in atti di convegno) (literal)
Anno
  • 2010-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.1109/PDP.2010.73 (literal)
Alternative label
  • Gennaro C.; Rabitti F. (2010)
    Scalability issues for self similarity join in distributed systems
    in The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Pisa, 17-19 February 2010
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Gennaro C.; Rabitti F. (literal)
Pagina inizio
  • 309 (literal)
Pagina fine
  • 316 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5452451&contentType=Conference+Publications&searchField%3DSearch_All%26queryText%3DScalability+issues+for+self+similarity+join+in+distributed+systems (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • In: PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing. (Pisa, 17-19 February 2010). Proceedings, pp. 309 - 316. IEEE, 2010. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
  • ABSTRACT: Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. However, most of the existing approaches have been based on spatial join techniques designed primarily for data in a vector space. Treating data collections as metric objects brings a great advantage in generality, because a single metric technique can be applied to many specific search problems quite different in nature. In this paper, we concentrate our attention on a special form of join, the Self Similarity Join, which retrieves pairs from the same dataset. In particular, we consider the case in which the dataset is split into subsets that are searched for self similarity join independently (e.g, in a distributed computing environment). To this end, we formalize the abstract concept of epsilon-Cover, prove its correctness, and demonstrate its effectiveness by a (literal)
Note
  • Scopus (literal)
  • Google Scholar (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • CNR-ISTI, Pisa (literal)
Titolo
  • Scalability issues for self similarity join in distributed systems (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#isbn
  • 978-0-7695-3939-3 (literal)
Abstract
  • Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. However, most of the existing approaches have been based on spatial join techniques designed primarily for data in a vector space. Treating data collections as metric objects brings a great advantage in generality, because a single metric technique can be applied to many specific search problems quite different in nature. In this paper, we concentrate our attention on a special form of join, the Self Similarity Join, which retrieves pairs from the same dataset. In particular, we consider the case in which the dataset is split into subsets that are searched for self similarity join independently (e.g, in a distributed computing environment). To this end, we formalize the abstract concept of epsilon-Cover, prove its correctness, and demonstrate its effectiveness by applying it to two real implementations on a real-life large dataset. (literal)
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Insieme di parole chiave di
data.CNR.it