Statistical boosting represents a very effective method for fitting complex models, while performing variable selection and preventing overfitting at the same time. However, the available methods are not directly applicable to factor analysis models for binary data, since any gradient descent method is not able to move from the starting point with zero loadings. The proposed algorithm, exploiting the directions of negative curvature of the log-likelihood function, is able to escape from the regions of local non-convexity. The component-wise approach followed leads to a sparse solution, which has the advantage of facilitating the interpretation without requiring a posterior rotation of the loadings. The method also performs regularization of the estimates, hence reducing their mean square error. To lighten the computational burden of the inferential procedure, a suitable pseudolikelihood, called pairwise likelihood, is exploited. In addition, a group lasso penalty is considered in order to automatically select the number of latent variables included in the model. The good performance of the proposal is illustrated through a simulation study and a real-data example.
A likelihood-based boosting algorithm for factor analysis models with binary data
Battauz M.;Vidoni P.
2022-01-01
Abstract
Statistical boosting represents a very effective method for fitting complex models, while performing variable selection and preventing overfitting at the same time. However, the available methods are not directly applicable to factor analysis models for binary data, since any gradient descent method is not able to move from the starting point with zero loadings. The proposed algorithm, exploiting the directions of negative curvature of the log-likelihood function, is able to escape from the regions of local non-convexity. The component-wise approach followed leads to a sparse solution, which has the advantage of facilitating the interpretation without requiring a posterior rotation of the loadings. The method also performs regularization of the estimates, hence reducing their mean square error. To lighten the computational burden of the inferential procedure, a suitable pseudolikelihood, called pairwise likelihood, is exploited. In addition, a group lasso penalty is considered in order to automatically select the number of latent variables included in the model. The good performance of the proposal is illustrated through a simulation study and a real-data example.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.