Deep Metric Learning with Varying Similarity Definitions
07.06.2023Deep Metric Learning (DML) trains a neural network that represents input data (e.g. images or texts) as an n-dimensional vector such that similar input items are close together in embedding space, while dissimilar items are farther apart.
For this, usually items are labeled as “similar” or “dissimilar”. For example, given car images, two images of cars are similar if the shown car model is the same, regardless of the color, orientation, or background. The neural network then should learn to identify the common visual features between images of the same class and represent both car images as similar embedding vectors, while images of different car models should lead to different vectors.
DML can be used in many applications. One example is item retrieval, where we want to get items based on content features, e.g. Google Image Reverse Search. However, DML can be used in few-shot learning, face recognition, or person re-identification.
Now, different users usually define similarity differently. For example, one user might find that two car images are similar if both cars have the same color, while another user might deem two images with cars from the same manufacturer more similar. Changing the similarity definition while keeping the computational overhead manageable has not been done yet in current DML research.
In this thesis or project, you are going to develop and evaluate methods to dynamically set the similarity definition for DML models. This includes implementing neural network models and loss functions in PyTorch as well as training and testing models on different datasets and metrics.
Supervisor: Albin Zehe, Konstantin Kobs