http://www.cnr.it/ontology/cnr/individuo/prodotto/ID181479
Outlier Mining in Large High-Dimensional Data Sets (Articolo in rivista)
- Type
- Label
- Outlier Mining in Large High-Dimensional Data Sets (Articolo in rivista) (literal)
- Anno
- 2005-01-01T00:00:00+01:00 (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
- 10.1109/TKDE.2005.31 (literal)
- Alternative label
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
- Fabrizio Angiulli;Clara Pizzuti (literal)
- Pagina inizio
- Pagina fine
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
- Rivista
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroFascicolo
- Note
- ISI Web of Science (WOS) (literal)
- Google Scholar (literal)
- DBLP (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
- Università della Calabria
Istituto di calcolo e reti ad alte prestazioni (literal)
- Titolo
- Outlier Mining in Large High-Dimensional Data Sets (literal)
- Abstract
- In this paper a new definition of
distance-based outlier and an algorithm, called {\em HilOut},
designed to efficiently detect the top $n$ outliers of a large and
high-dimensional data set are proposed. Given an integer $k$, the
weight of a point is defined as the sum of the distances
separating it from its $k$ nearest-neighbors. Outlier are those
points scoring the largest values of weight. The algorithm {\em
HilOut} makes use of the notion of space-filling curve to
linearize the data set, and it consists of two phases. The first
phase provides an approximate solution, within a rough factor,
after the execution of at most $d + 1$ sorts and scans of the data
set, with temporal cost quadratic in $d$ and linear in $N$ and in
$k$, where $d$ is the number of dimensions of the data set and $N$
is the number of points in the data set. During this phase, the
algorithm isolates points candidate to be outliers and reduces
this set at each iteration. If the size of this set becomes $n$,
then the algorithm stops reporting the exact solution. The second
phase calculates the exact solution with a final scan examining
further the candidate outliers remained after the first phase.
Experimental results show that the algorithm always stops,
reporting the exact solution, during the first phase after much
less than $d + 1$ steps. We present both an in-memory and
disk-based implementation of the {\em HilOut} algorithm and a
thorough scaling analysis for real and synthetic data sets showing
that the algorithm scales well in both cases. (literal)
- Prodotto di
- Autore CNR
- Insieme di parole chiave
Incoming links:
- Prodotto
- Autore CNR di
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
- Insieme di parole chiave di