ID-Sim: An Identity-Focused Similarity Metric

MIT CSAIL, Adobe Research CVPR 2026
Selective sensitivity animation showing contextual changes versus identity changes

Many vision tasks depend on whether two images show the same visual identity under changing conditions, but most metrics capture general similarity instead.

For the purposes of this paper, we use the following definitions:

Visual identity The intrinsic visual properties of an object, such as shape, texture, and color.
Instance The same visual identity as it appears across different images.

Where Existing Metrics Fall Short

Selected metric disagreement triplet showing Different ID, Ref, and Same ID columns
Human
ID-Sim
DINOv2
DreamSim
UNeD

Perceptual metrics emphasize overall appearance, while foundation model embeddings often prioritize semantic alignment. Instance retrieval and re-identification systems address identity, but are often domain-specific or lack a consistent notion of instance-level identity across settings.

ID-Sim measures identity consistency directly across settings.

Training ID-Sim

Dataset curation pipeline combining real instances with identity-preserving and identity-altering edits

The training data combines real instance-level datasets with synthetic edits that introduce identity-preserving variation and identity-altering hard negatives.

ID-Sim training overview with shared encoder and joint global and local supervision

Global and patch-level supervision encourage both holistic identity discrimination and fine-grained correspondence.

Results Across Tasks

Global results across identity-focused tasks comparing baselines and ID-Sim

Across 49 diverse evaluation settings spanning concept preservation, retrieval, and re-identification, ID-Sim outperforms prior methods in 48.

Local Correspondence with Patch Features

Click a reference image

Reference gallery
Tennis shoes reference
Target scene
Multi-object scene containing several shoes
Overlay
Heatmap for tennis shoes reference over the multi-object shoe scene

Our representation is useful both for global retrieval and local correspondence. See the paper for comparisons to DINOv3 and personalized segmentation with PerSAM.

Selective Sensitivity Analysis

Selective sensitivity analysis showing response to identity and contextual factors

ID-Sim shows the desired pattern: high sensitivity to identity changes and relatively low sensitivity to background, viewpoint, and lighting.

BibTeX

@misc{chae2026idsimidentityfocusedsimilaritymetric,
  title         = {ID-Sim: An Identity-Focused Similarity Metric},
  author        = {Julia Chae and Nicholas Kolkin and Jui-Hsien Wang and Richard Zhang and Sara Beery and Cusuh Ham},
  year          = {2026},
  eprint        = {2604.05039},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2604.05039},
}