In this work we introduce a copula-based method for imputing missing data by using conditional density functions of the missing variables given the observed ones. In theory, such functions can be derived from the multivariate distribution of the variables of interest. In practice, it is very difficult to model joint distributions and derive conditional distributions, especially when the margins are different. We propose a natural solution to the problem by exploiting copulas so that we derive conditional density functions through the corresponding conditional copulas. The approach is appealing since copula functions enable us (1) to fit any combination of marginal distribution functions, (2) to take into account complex multivariate dependence relationships and (3) to model the marginal distributions and the dependence structure separately. We describe the method and perform a Monte Carlo study in order to compare it with two well-known imputation techniques: the nearest neighbour donor imputation and the regression imputation by EM algorithm. Our results indicate that the proposal compares favourably with classical methods in terms of preservation of microdata, margins and dependence structure.

Exploring copulas for the imputation of complex dependent data

GIANNERINI, SIMONE;
2015-01-01

Abstract

In this work we introduce a copula-based method for imputing missing data by using conditional density functions of the missing variables given the observed ones. In theory, such functions can be derived from the multivariate distribution of the variables of interest. In practice, it is very difficult to model joint distributions and derive conditional distributions, especially when the margins are different. We propose a natural solution to the problem by exploiting copulas so that we derive conditional density functions through the corresponding conditional copulas. The approach is appealing since copula functions enable us (1) to fit any combination of marginal distribution functions, (2) to take into account complex multivariate dependence relationships and (3) to model the marginal distributions and the dependence structure separately. We describe the method and perform a Monte Carlo study in order to compare it with two well-known imputation techniques: the nearest neighbour donor imputation and the regression imputation by EM algorithm. Our results indicate that the proposal compares favourably with classical methods in terms of preservation of microdata, margins and dependence structure.
File in questo prodotto:
File Dimensione Formato  
DiLascio_Giannerini_Reale_SMA_2015.pdf

non disponibili

Licenza: Non pubblico
Dimensione 260.5 kB
Formato Adobe PDF
260.5 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1293403
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 13
social impact