Vehicle re-identification has seen increasing interest thanks to its fundamental impact on intelligent surveillance systems and smart transportation. The visual data acquired from monitoring camera networks comes with severe challenges including occlusions, color and illumination changes as well as orientation issues (a vehicle can be seen from the side/front/rear due to different camera viewpoints). To deal with such challenges, the community has spent much effort in learning robust feature representations that hinge on additional visual attributes and part-driven methods, but with the side-effects of requiring extensive human annotation labor as well as increasing computational complexity. We propose an approach that learns a feature representation robust to vehicle orientation issues without the need for extra-labeled data and adding negligible computational overheads. The former objective is achieved through the introduction of a Hanoi pooling layer exploiting ring regions and the image pyramid approach yielding a multi-scale representation of vehicle appearance. The latter is tackled by transferring the accuracy of a deep network to its first layers, thus reducing the inference effort by the early stop of a test example. This is obtained by means of a self-knowledge distillation framework encouraging multi-exit network decisions to agree with each other. Results demonstrate that the proposed approach significantly improves the accuracy of early (i.e., very fast) exits while maintaining the same accuracy of a deep (slow) baseline. Moreover, our solution obtains the best existing performance on three benchmark datasets.

Lord of the Rings: Hanoi Pooling and Self-Knowledge Distillation for Fast and Accurate Vehicle Re-Identification

Martinel N.;Dunnhofer M.;Pucci R.;Foresti G. L.;Micheloni C.
2021-01-01

Abstract

Vehicle re-identification has seen increasing interest thanks to its fundamental impact on intelligent surveillance systems and smart transportation. The visual data acquired from monitoring camera networks comes with severe challenges including occlusions, color and illumination changes as well as orientation issues (a vehicle can be seen from the side/front/rear due to different camera viewpoints). To deal with such challenges, the community has spent much effort in learning robust feature representations that hinge on additional visual attributes and part-driven methods, but with the side-effects of requiring extensive human annotation labor as well as increasing computational complexity. We propose an approach that learns a feature representation robust to vehicle orientation issues without the need for extra-labeled data and adding negligible computational overheads. The former objective is achieved through the introduction of a Hanoi pooling layer exploiting ring regions and the image pyramid approach yielding a multi-scale representation of vehicle appearance. The latter is tackled by transferring the accuracy of a deep network to its first layers, thus reducing the inference effort by the early stop of a test example. This is obtained by means of a self-knowledge distillation framework encouraging multi-exit network decisions to agree with each other. Results demonstrate that the proposed approach significantly improves the accuracy of early (i.e., very fast) exits while maintaining the same accuracy of a deep (slow) baseline. Moreover, our solution obtains the best existing performance on three benchmark datasets.
File in questo prodotto:
File Dimensione Formato  
TXT_TII-20-3874.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Non pubblico
Dimensione 7.64 MB
Formato Adobe PDF
7.64 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1206149
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? ND
social impact