http://www.cnr.it/ontology/cnr/individuo/prodotto/ID181457
Approximate k-Closest-Pairs in Large High-Dimensional Data Sets (Articolo in rivista)
- Type
- Label
- Approximate k-Closest-Pairs in Large High-Dimensional Data Sets (Articolo in rivista) (literal)
- Anno
- 2005-01-01T00:00:00+01:00 (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
- 10.1007/s10852-004-4080-3 (literal)
- Alternative label
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
- Fabrizio Angiulli;Clara Pizzuti (literal)
- Pagina inizio
- Pagina fine
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
- Rivista
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroFascicolo
- Note
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
- Università della Calabria
Istituto di calcolo e reti ad alte prestazioni (literal)
- Titolo
- Approximate k-Closest-Pairs in Large High-Dimensional Data Sets (literal)
- Abstract
- An approximate algorithm to efficiently solve the {\it
k-Closest-Pairs} problem on large high-dimensional data sets is
presented. The algorithm runs, for a suitable choice of the input
parameters, in ${\cal O}(d^2 n k)$ time, where $d$ is the
dimensionality and $n$ is the number of points of the input data
set, and requires linear space in the input size. It performs at
most $d+1$ iterations. At each iteration a shifted version of the
data set is sequentially scanned according to the order induced on
it by the Hilbert space filling curve and points whose
contribution to the solution has already been analyzed are
detected and eliminated. The pruning is lossless, in fact the
remaining points along with the approximate solution found can be
used for the computation of the exact solution. If the data set is
entirely pruned, then the algorithm returns the exact solution. We
prove that the pruning ability of the algorithm is related to the
nearest neighbor distance distribution of the data set and show
that there exists a class of data sets for which the method,
augmented with a final step that applies an exact method to the
reduced data set, calculates the exact solution with the same time
requirements.
Although we are able to guarantee a ${\cal O}(d^{1+ \frac{1}{t}})$
approximation to the solution, where $t \in \{ 1,2,\ldots,\infty
\}$ identifies the Minkowski ($L_t$) metric of interest,
experimental results give the exact $k$ closest pairs for all the
large high-dimensional synthetic and real data sets considered and
show that the pruning of the search space is effective.
%
We present a thorough scaling analysis of the algorithm for
in-memory and disk-resident data sets showing that the algorithm
scales well in both cases. (literal)
- Prodotto di
- Autore CNR
- Insieme di parole chiave
Incoming links:
- Prodotto
- Autore CNR di
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
- Insieme di parole chiave di