Skip to content

Imageomics/Finer-CAM

Repository files navigation

Finer-CAM : Spotting the Difference Reveals Finer Details for Visual Explanation [CVPR 2025]

Official implementation of "Finer-CAM [arxiv]".

CAM methods highlight image regions influencing predictions but often struggle in fine-grained tasks due to shared feature activation across similar classes. We propose Finer-CAM, which explicitly compares the target class with similar ones, suppressing shared features and emphasizing unique, discriminative details.

Finer-CAM retains CAM’s efficiency, offers precise localization, and adapts to multi-modal zero-shot models, accurately activating object parts or attributes. It enhances explainability in fine-grained tasks without increasing complexity.

images

Update

Demo

Experience the power of Finer-CAM with our interactive demos! Witness accurate localization of discriminative features.

  • Try the multi-modal demo and see how Finer-CAM activates detailed and relevant regions for diverse concepts:
    Open In Colab

  • Test the CUB classifier demo to visualize fine-grained, discriminative traits with enhanced interpretability:
    Hugging Face Demo

  • Try the Colab tutorial and run Finer-CAM on your own data:
    Open In Colab

Requirements

Install the dependencies from this repo:

pip install -r requirements.txt

Run scripts and notebooks from the repository root so Python imports the local pytorch_grad_cam package in this tree.

Usage

Python API

FinerCAM is available from pytorch_grad_cam and wraps an existing CAM backend such as GradCAM. It keeps the normal CAM pipeline for collecting activations and gradients, but replaces the optimization target with the Finer-CAM objective.

from pytorch_grad_cam import FinerCAM, GradCAM

cam = FinerCAM(
    model=model,
    target_layers=target_layers,
    reshape_transform=reshape_transform,  # optional
    base_method=GradCAM,
)

Call FinerCAM with:

  • input_tensor: input batch passed to the model.
  • targets: optional list of pytorch_grad_cam target callables. See model_targets. If None, Finer-CAM targets are constructed automatically based on the model outputs.
  • target_size: optional output size used when resizing CAM maps.
  • eigen_smooth: enables eigenvalue-based smoothing in the wrapped CAM method.
  • alpha: scaling factor used by FinerWeightedTarget for penalizing reference categories.
  • reference_category_ranks: ranks from the similarity-sorted category list used to choose reference categories when targets=None. The default [1, 2, 3] uses the second to fourth most similar categories as references. If a requested rank exceeds the number of available classes, it is ignored.
  • target_idx: the index of the target category, usually the ground-truth category. If omitted, the highest-scoring category in each sample is used.
  • H, W: optional feature-grid height and width for backbones that need them in the activation/gradient path, such as ViT-style reshape transforms.

FinerCAM.forward(...) returns a tuple:

cam_map, outputs, main_categories, references = cam(
    input_tensor=input_tensor,
    targets=None,
    alpha=1.0,
    reference_category_ranks=[1, 2, 3],
    target_idx=target_idx,
    H=grid_height,
    W=grid_width,
)
  • cam_map: aggregated CAM map from the wrapped backend.
  • outputs: raw model outputs from the forward pass.
  • main_categories: automatically selected main category per sample when targets=None.
  • references: automatically selected reference categories per sample when targets=None.

When targets=None, Finer-CAM computes a similarity ranking by sorting class logits according to their absolute distance from the reference logit. The closest class becomes the main category and the selected reference_category_ranks become the reference set.

FinerWeightedTarget

Automatic target construction uses FinerWeightedTarget, which implements the weighted relative objective used by Finer-CAM. For a main category n and a reference set i, it computes

$$\sum_i p_i (w_n - \alpha w_i) / (\sum_i p_i + 10^{-9})$$

where w_n is the main-category logit, w_i are the reference-category logits, and p_i are the softmax probabilities of the reference categories. This keeps evidence for the main class while suppressing shared evidence from similar classes. If a reference category index exceeds the number of available classes, it is ignored. If no valid reference categories remain, the target falls back to the main-category score.

You can also provide targets manually:

from pytorch_grad_cam.utils.model_targets import FinerWeightedTarget

targets = [
    FinerWeightedTarget(
        main_category=target_idx,
        reference_categories=[similar_idx_1, similar_idx_2, similar_idx_3],
        alpha=1.0,
    )
]

cam_map, outputs, _, _ = cam(
    input_tensor=input_tensor,
    targets=targets,
)

Step 1. Generate CAMs for Validation Set

Run the Script:

  • Execute the generate_cams.py script with the appropriate arguments using the following command:

     python generate_cams.py \
         --classifier_path <path_to_classifier_weight> \
         --dataset_path <path_to_dataset_or_image_list> \
         --save_path <path_to_save_results>
  • In order to get a classifier, please refer to [placeholder].

Step 2. Visualize Results

Run the Script:

  • Execute the visualize.py script with the appropriate arguments using the following command:
    python visualize.py --dataset_path <path_to_dataset_directory> \
                        --cams_path <path_to_cams_directory> \
                        --save_path <path_to_save_visualizations>

Example Dataset Preparation

Stanford Cars

  1. Download the dataset using the following command:

    curl -L -o datasets/stanford_cars.zip \
    https://www.kaggle.com/api/v1/datasets/download/cyizhuo/stanford-cars-by-classes-folder
    
    
  2. Unzip the downloaded file

    unzip datasets/stanford_cars.zip -d datasets/
    
  3. The structure of datasets/should be organized as follows:

datasets/
├── train/
│   ├── Acura Integra Type R 2001/
│   │   ├── 000405.jpg
│   │   ├── 000406.jpg
│   │   └── ...
│   ├── Acura RL Sedan 2012/
│   │   ├── 000090.jpg
│   │   ├── 000091.jpg
│   │   └── ...
│   └── ...
└── test/
    ├── Acura Integra Type R 2001/
    │   ├── 000450.jpg
    │   ├── 000451.jpg
    │   └── ...
    ├── Acura RL Sedan 2012/
    │   ├── 000122.jpg

Acknowledgement

We utilized code from:

Thanks for their wonderful works.

Citation Paper

If you find this repository useful, please consider citing our work 📝 and giving a star 🌟 :

@InProceedings{zhang2025finer,
    author    = {Zhang, Ziheng and Gu, Jianyang and Chowdhury, Arpita and Mai, Zheda and Carlyn, David and Berger-Wolf, Tanya and Su, Yu and Chao, Wei-Lun},
    title     = {Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {9611-9620}
}

About

This is an official implementation for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation. [CVPR'25] Better understand discriminative traits between similar species.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors