In this paper, the problem of non-collaborative person identification for a secure access to facilities is addressed. The proposed solution adopts a face and a speaker recognition techniques. The integration of these two methods allows to improve the performance with respect to the two classifiers. In non-collaborative scenarios, the problem of face recognition first requires to detect the face pattern then to recognize it even when in non-frontal poses. In the current work, a histogram normalization, a boosting technique and a linear discrimination analysis have been exploited to solve typical problems like illumination variability, occlusions, pose variation, etc. in addition, a new temporal classification is proposed to improve the robustness of the frame-by-frame classification. This allows to project known classification techniques for still image recognition into a multi-frame context where the image capture allows dynamics in the environment. For the audio, a method for the automatic speaker identification in noisy environments is presented. In particular, we propose an optimization of a speech de-noising algorithm to optimize the performance of the extended Kalman filter (EKF). To provide a baseline system for the integration with our proposed speech de-noising algorithm, we use a conventional speaker recognition system, based on Gaussian mixture models and mel frequency cepstral coefficients (MFCCs) as features. To confirm the effectiveness of our methods, we performed video and speaker recognition tasks first separately then integrating the results. In particular, two different corpora have been used: (a) a public corpus (ELDSR for audio and FERRET for images) and (b) a dedicated audio/video corpus, in which the speakers read a list of sentences wearing a scarf or a full-face motorcycle helmet. Experimental results show that our methods are able to reduce significantly the classification error rate. (C) 2009 Elsevier Ltd. All rights reserved.

Audio-video biometric recognition for non-collaborative access granting

C. MICHELONI;S. CANAZZA;G. L. FORESTI
2009-01-01

Abstract

In this paper, the problem of non-collaborative person identification for a secure access to facilities is addressed. The proposed solution adopts a face and a speaker recognition techniques. The integration of these two methods allows to improve the performance with respect to the two classifiers. In non-collaborative scenarios, the problem of face recognition first requires to detect the face pattern then to recognize it even when in non-frontal poses. In the current work, a histogram normalization, a boosting technique and a linear discrimination analysis have been exploited to solve typical problems like illumination variability, occlusions, pose variation, etc. in addition, a new temporal classification is proposed to improve the robustness of the frame-by-frame classification. This allows to project known classification techniques for still image recognition into a multi-frame context where the image capture allows dynamics in the environment. For the audio, a method for the automatic speaker identification in noisy environments is presented. In particular, we propose an optimization of a speech de-noising algorithm to optimize the performance of the extended Kalman filter (EKF). To provide a baseline system for the integration with our proposed speech de-noising algorithm, we use a conventional speaker recognition system, based on Gaussian mixture models and mel frequency cepstral coefficients (MFCCs) as features. To confirm the effectiveness of our methods, we performed video and speaker recognition tasks first separately then integrating the results. In particular, two different corpora have been used: (a) a public corpus (ELDSR for audio and FERRET for images) and (b) a dedicated audio/video corpus, in which the speakers read a list of sentences wearing a scarf or a full-face motorcycle helmet. Experimental results show that our methods are able to reduce significantly the classification error rate. (C) 2009 Elsevier Ltd. All rights reserved.
File in questo prodotto:
File Dimensione Formato  
Audio-video biometric recognition for non-collaborative access granting.pdf

non disponibili

Tipologia: Altro materiale allegato
Licenza: Non pubblico
Dimensione 799.46 kB
Formato Adobe PDF
799.46 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/877115
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 9
social impact