The main aim of this work is the study of clustering dependent data by means of copula functions. Copulas are popular multivariate tools whose importance within clustering methods has not been investigated yet in detail. We propose a new algorithm (CoClust in brief) that allows to cluster dependent data according to the multivariate structure of the generating process without any assumption on the margins. Moreover, the approach does not require either to choose a starting classification or to set a priori the number of clusters; in fact, the CoClust selects them by using a criterion based on the log-likelihood of a copula fit. We test our proposal on simulated data for different dependence scenarios and compare it with a model-based clustering technique. Finally, we show applications of the CoClust to real microarray data of breast-cancer patients.
A Copula-Based Algorithm for Discovering Patterns of Dependent Observations
GIANNERINI, SIMONE
2012-01-01
Abstract
The main aim of this work is the study of clustering dependent data by means of copula functions. Copulas are popular multivariate tools whose importance within clustering methods has not been investigated yet in detail. We propose a new algorithm (CoClust in brief) that allows to cluster dependent data according to the multivariate structure of the generating process without any assumption on the margins. Moreover, the approach does not require either to choose a starting classification or to set a priori the number of clusters; in fact, the CoClust selects them by using a criterion based on the log-likelihood of a copula fit. We test our proposal on simulated data for different dependence scenarios and compare it with a model-based clustering technique. Finally, we show applications of the CoClust to real microarray data of breast-cancer patients.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.