Unsupervised audio-visual model adaptation for person re-identification with wearable cameras
The increasing availability of body-worn cameras is facilitating applications such as life-logging and activity detection. In particular, recognizing objects or the identity of humans from egocentric data is an important capability. Model adaptation is fundamental for wearable devices due to limited training material and rapidly varying operational conditions and target appearances. This talk will discuss the specific issues of audio-visual target identification with wearable cameras and propose a new approach for the on-line and unsupervised adaptation of deep-learning models. Specifically, each mono-modal model is adapted using the unsupervised labeling provided by the other modality, leveraging on the complementary information available in the audio and visual streams. To limit the detrimental effects of erroneous labels, we use a regularization term based on the Kullback-Leibler divergence between the initial model and the one being adapted.