P-AUTOCLASS: Scalable Parallel Clustering for Mining Large Data Sets (Articolo in rivista)
- Type
- Articolo in rivista (Classe)
- Prodotto della ricerca (Classe)
- Label
- P-AUTOCLASS: Scalable Parallel Clustering for Mining Large Data Sets (Articolo in rivista) (literal)
- Anno
- 2003-01-01T00:00:00+01:00 (literal)
- Alternative label
- Pizzuti Clara, Talia Domenico (2003)(literal)
P-AUTOCLASS: Scalable Parallel Clustering for Mining Large Data Sets
in IEEE transactions on knowledge and data engineering (Print)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
- Pizzuti Clara, Talia Domenico (literal)
- Pagina inizio
- 629 (literal)
- Pagina fine
- 641 (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
- 15-3 (literal)
- Rivista
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#descrizioneSinteticaDelProdotto
- In questo articolo viene proposta una implementazione parallela su computers a memoria distribuita di un algoritmo di clustering basato sulla classificazione Bayesiana, noto come AutoClass. L'implementazione P-AutoClass divide l'esecuzione del clustering tra i vari processori di una macchina parallela che lavora sulla propria partizione dei dati e scambia i risultati intermedi. Nel lavoro vengono descritti la progettazione e lo sviluppo dell'algoritmo parallelo e ne viene validata la scalabilità mediante una valutazione delle prestazioni sia sperimentale che teorica usando una metrica nota come funzione di isoefficienza. (literal)
- Note
- ISI Web of Science (WOS) (literal)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
- 1- ICAR-CNR; 2- DEIS UNICAL (literal)
- Titolo
- P-AUTOCLASS: Scalable Parallel Clustering for Mining Large Data Sets (literal)
- Abstract
- Data clustering is an important task in the area of data mining. Clustering is the unsupervised classification of data items into homogeneous groups called clusters. Clustering methods partition a set of data items into clusters such that items in the same cluster are more similar to each other than items in different clusters according to some defined criteria. Clustering algorithms are computationally intensive, particularly when they are used to analyze large amounts of data. A possible approach to reduce the processing time is based on the implementation of clustering algorithms on scalable parallel computers.
This paper describes the design and implementation of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian model for determining optimal classes in large data sets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that each processor works on its own partition and exchanges intermediate results with the other processors. The system architecture, its implementation and experimental performance results on different processor numbers and data sets are presented and compared with theoretical performance. In particular, experimental and predicted scalability and efficiency of P-AutoClass versus the sequential AutoClass system are evaluated and compared. (literal)
- Data clustering is an important task in the area of data mining. Clustering is the unsupervised classification of data items into homogeneous groups called clusters. Clustering methods partition a set of data items into clusters such that items in the same cluster are more similar to each other than items in different clusters according to some defined criteria. Clustering algorithms are computationally intensive, particularly when they are used to analyze large amounts of data. A possible approach to reduce the processing time is based on the implementation of clustering algorithms on scalable parallel computers.
- Prodotto di
- Autore CNR
- CLARA PIZZUTI (Unità di personale interno)
- DOMENICO TALIA (Persona)
- Insieme di parole chiave
- Keywords of "P-AUTOCLASS: Scalable Parallel Clustering for Mining Large Data Sets" (Insieme di parole chiave)
Incoming links:
- Prodotto
- Autore CNR di
- CLARA PIZZUTI (Unità di personale interno)
- DOMENICO TALIA (Persona)
- Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
- Insieme di parole chiave di
- Keywords of "P-AUTOCLASS: Scalable Parallel Clustering for Mining Large Data Sets" (Insieme di parole chiave)
