Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data (Articolo in rivista)

Type
Label
  • Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data (Articolo in rivista) (literal)
Anno
  • 2007-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.1109/TKDE.2007.190649 (literal)
Alternative label
  • Eugenio Cesario; Giuseppe Manco; Riccardo Ortale (2007)
    Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
    in IEEE transactions on knowledge and data engineering (Print); IEEE-Institute Of Electrical And Electronics Engineers Inc., Piscataway (Stati Uniti d'America)
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Eugenio Cesario; Giuseppe Manco; Riccardo Ortale (literal)
Pagina inizio
  • 1607 (literal)
Pagina fine
  • 1624 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4358941 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
  • 19 (literal)
Rivista
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 18 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroFascicolo
  • 12 (literal)
Note
  • Google Scholar (literal)
  • DBLP (literal)
  • Scopu (literal)
  • ISI Web of Science (WOS) (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • ICAR-CNR; ICAR-CNR; ICAR-CNR (literal)
Titolo
  • Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data (literal)
Abstract
  • A parameter-free, fully-automatic approach to clustering high-dimensional categorical data is proposed. The technique is based on a two-phase iterative procedure, which attempts to improve the overall quality of the whole partition. In the first phase, cluster assignments are given, and a new cluster is added to the partition by identifying and splitting a low-quality cluster. In the second phase, the number of clusters is fixed, and an attempt to optimize cluster assignments is done. On the basis of such features, the algorithm attempts to improve the overall quality of the whole partition and finds clusters in the data, whose number is naturally established on the basis of the inherent features of the underlying dataset, rather than being previously specified. Furthermore, the approach is parametric to the notion of cluster quality: here, a cluster is defined as a set of tuples exhibiting a sort of homogeneity. We show how a suitable notion of cluster homogeneity can be defined in the context of high dimensional categorical data, from which an effective instance of the proposed clustering scheme immediately follows. Experiments on both synthetic and real data prove that the devised algorithm scales linearly and achieves nearly-optimal results in terms of compactness and separation. (literal)
Editore
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
Editore di
Insieme di parole chiave di
data.CNR.it