Efficient local image descriptors learned with autoencoders


Local image descriptors play a crucial role in many image processing tasks, such as object tracking, object recognition, panorama stitching, and image retrieval. In this paper, we focus on learning local image descriptors in an unsupervised way, using autoencoders and variational autoencoders. We perform a thorough comparative analysis of these two approaches along with an in-depth analysis of the most relevant hyperparameters to guide their optimal selection. In addition to this analysis, we give insights into the difficulties and the importance of selecting right evaluation techniques during the unsupervised learning of the local image descriptors. We explore the extent to which a simple perceptual metric during training can predict the performance on tasks such as patch matching, retrieval and verification. Finally, we propose an improvement to the encoder architecture that yields significant savings in memory complexity, especially in single-image tasks. As a proof of concept, we integrate our descriptor into an inpainting algorithm and illustrate its results when applied to the virtual restoration of master paintings. The source code required to reproduce the presented results has been made available as a repository on GitHub (https://github.com/nimpy/local-img-descr-ae).