Wearable assistants hold the promise of supporting humans in daily tasks, which requires a persistent awareness of the objects relevant to the user. However, existing methods typically operate on short video clips or rely on offline processing, limiting their capacity for long-term understanding. In contrast, humans are able to recognize specific object instances, recall previous interactions, and opportunistically retain useful spatial information. In this paper, we propose T-EVO (Tracking in Egovision for Online Visual episodic memory), a framework for online episodic memory that processes video streams online, storing compact, queryable object memories. T-EVO integrates an object discovery module, visual tracker, and a memory module to detect, track, and store spatio-temporal data of objects. Evaluated on Ego4D, T-EVO achieves an 81.9% success rate in the oracle configuration. However, its real-world performance drops sharply to 2.9%, highlighting significant limitations in detection and tracking capabilities. It enables fast, compact retrieval—cutting storage by 24× and retrieval time by 9×- demonstrating strong potential for real-world deployment in wearable devices.
T-EVO: Tracking in Egovision for Online Visual Episodic Memory
Nottebaum M.;Dunnhofer M.;Micheloni C.
2026-01-01
Abstract
Wearable assistants hold the promise of supporting humans in daily tasks, which requires a persistent awareness of the objects relevant to the user. However, existing methods typically operate on short video clips or rely on offline processing, limiting their capacity for long-term understanding. In contrast, humans are able to recognize specific object instances, recall previous interactions, and opportunistically retain useful spatial information. In this paper, we propose T-EVO (Tracking in Egovision for Online Visual episodic memory), a framework for online episodic memory that processes video streams online, storing compact, queryable object memories. T-EVO integrates an object discovery module, visual tracker, and a memory module to detect, track, and store spatio-temporal data of objects. Evaluated on Ego4D, T-EVO achieves an 81.9% success rate in the oracle configuration. However, its real-world performance drops sharply to 2.9%, highlighting significant limitations in detection and tracking capabilities. It enables fast, compact retrieval—cutting storage by 24× and retrieval time by 9×- demonstrating strong potential for real-world deployment in wearable devices.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


