Many vision tasks depend on whether two images show the same visual identity under changing conditions, but most metrics capture general similarity instead.
Many vision tasks depend on whether two images show the same visual identity under changing conditions, but most metrics capture general similarity instead.
For the purposes of this paper, we use the following definitions:
Perceptual metrics emphasize overall appearance, while foundation model embeddings often prioritize semantic alignment. Instance retrieval and re-identification systems address identity, but are often domain-specific or lack a consistent notion of instance-level identity across settings.
ID-Sim measures identity consistency directly across settings.
The training data combines real instance-level datasets with synthetic edits that introduce identity-preserving variation and identity-altering hard negatives.
Global and patch-level supervision encourage both holistic identity discrimination and fine-grained correspondence.
Across 49 diverse evaluation settings spanning concept preservation, retrieval, and re-identification, ID-Sim outperforms prior methods in 48.
Click a reference image
Our representation is useful both for global retrieval and local correspondence. See the paper for comparisons to DINOv3 and personalized segmentation with PerSAM.
ID-Sim shows the desired pattern: high sensitivity to identity changes and relatively low sensitivity to background, viewpoint, and lighting.
@misc{chae2026idsimidentityfocusedsimilaritymetric,
title = {ID-Sim: An Identity-Focused Similarity Metric},
author = {Julia Chae and Nicholas Kolkin and Jui-Hsien Wang and Richard Zhang and Sara Beery and Cusuh Ham},
year = {2026},
eprint = {2604.05039},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2604.05039},
}