Grad-CAM : Visual Explanations from Deep Networks via Gradient-based Localization

Abstract
Introduction
Related Work
Approach

0. Abstract

Grad-CAM ( Gradient-weighted Class Activation Mapping )을 제안

Visual Explanation for decisions
gradient of “target concept” ( ex. “강아지” )을 사용
마지막 conv layer를 통과함으로써, coarse localization map을 생성함

( 중요한 지역 highlight )
applicable to variety of CNN model-families

Guided Grad-CAM : Grad-CAM을 fine-grained visualization과 combine

1. Introduction

deep learning의 한계 :

lack of decomposability into “intuitive” & “understandable” components

\(\rightarrow\) hard to interpret!

따라서, 우리는 TRANSPARENT한 model을 만들 수 있어야 한다

( transparent 모델 = 왜 그러한 prediction을 냈는지에 대해 알고 있는 모델 )

주로 accuracy & simplicity/interpretability 사이의 trade-off가 있다.

CAM (Class Activation Mapping)

identify discriminative regions used by a restricted class
trades off model complexity&performance for transparency

Grad-CAM

위의 CAM에서 발생하는 trade-off 문제를 피한다
generalization of CAM
더 broader range of CNN model families에 적용 가능
- 1) CNN with FC layers ( ex. VGG )
- 2) CNN used for structured outputs ( ex. captioning )
- 3) CNN used in tasks with multi-modal inputs ( ex. VQA )

Good Visual Explanation이란?

(a) class-discriminative : 이미지 내에서 물체를 잘 구분할 수 있어야
(b) high-resolution : 고해상도 ( fine grained detail )

Example

요약

Guided Back-propagation & Deconvolution (pixel-space gradient visualization) :
- (a) class-discriminative : BAD
- (b) high-resolution : GOOD
CAM, Grad-CAM
- (a) class-discriminative : GOOD
- (b) high-resolution : SOSO
위 둘의 장점을 합치자!

pixel-space gradient visualization + Grad-CAM

Contribution

Grad-CAM을 제안함

( class-discriminative localization technique )
apply Grad-CAM to existing top-performing classification / captioning / VQA
visualize ResNets

Visualizing CNN

important pixel 찾아내기

( = have most impact on prediction’s score )
ex) partial derivatives of predicted class scores w.r.t pixel intensities
하지만 이러한 방법들은 NOT class-discriminative

Weakly supervised localization

풀어야 하는 task = localize objects in images, using only WHOLE image class labels
ex) CAM
- 특징) replace FC layer with CNN layers + Global pooling
- 단점) require feature maps to directly precede softmax layers
  
  ( \(\therefore\) 특정한 형태의 CNN architecture에만 적용 가능 )

Proposal : Grad-CAM

new way of combining feature maps
using the gradient signal
that does not require any modification in network architecture
(Fully Convolutional Architecture의) Grad-CAM = 그냥 CAM

3. Approach

Grad-CAM은 gradient 정보를 이용한다! ( CNN의 마지막 conv layer를 타고 흐르는 gradient )

\(\rightarrow\) to understand the importance of each neuron for decision

작동 원리

Grad-CAM uses the GRADIENT information flowing into the LAST conv layer of the CNN,

to understand the IMPORTANCE of each neuron for a decision of interest

Class-discriminative localization map : \(L_{\text {Grad-CAM }}^{c} \in \mathbb{R}^{u \times v}\)

step 1) gradient of the score for class \(c\) ( = \(y^c\) ), w.r.t feature maps \(A^{k}\)를 계산한다

( 즉, \(\frac{\partial y^{c}}{\partial A^{k}}\) 를 계산한다 )
step 2) step 1에서 구한 gradient는 global-average pooled되어서 neuron importance weight를 계산한다

( \(\alpha_{k}^{c}=\overbrace{\frac{1}{Z} \sum_{i} \sum_{j}}^{\text {global average pooling }} \underbrace{\frac{\partial y^{c}}{\partial A_{i j}^{k}}}_{\text {gradients via backprop }}\) )
- \(\alpha_{k}^{c}\) : partial linearization of DNN from \(A\)
step 3) \(L_{\text {Grad-CAM }}^{c}\) 구하기

( \(L_{\mathrm{Grad}-\mathrm{CAM}}^{c}=\operatorname{Re} L U \underbrace{\left(\sum_{k} \alpha_{k}^{c} A^{k}\right)}_{\text {linear combination }}\) )

Grad-CAM as a generalization to CAM

CAM 복습

image classification을 위한 localization map을 생성
global average pooling 후, directly softmax로 직행!
feature map : \(\operatorname{map} A^{k}\)
score 계산 ( by Global average pooling & Linear transformation )

\(S^{c}=\sum_{k} \underbrace{w_{k}^{c}}_{\text {class feature weights }} \overbrace{\frac{1}{Z} \sum_{i} \sum_{j}}^{\text {global average pooling }} \underbrace{A_{i j}^{k}}_{\text {feature map }}\).
이를 다르게 표현하면,

\(S^{c}=\frac{1}{Z} \sum_{i} \sum_{j} \underbrace{\sum_{k} w_{k}^{c} A_{i j}^{k}}_{L_{\mathrm{CAM}}^{c}}\).

when \(\alpha_k^c=w_k^c\) \(\rightarrow\) Grad-CAM is a strict generalization of CAM

Guided Grad-CAM

Grad-CAM이 class discriminative하고, relevant image를 잘 localize하지만…

lack ability to show fine-grained importance

그러기 위해, fuse Guided Back propoagation & Grad-CAM visualization via point-wise multiplication

Algorithm

Twitter Facebook LinkedIn

[interpretable] (paper 2) Grad-CAM ; Visual Explanations from Deep Networks via Gradient-based Localization

Seunghan Lee

Grad-CAM : Visual Explanations from Deep Networks via Gradient-based Localization

Contents

0. Abstract

1. Introduction

CAM (Class Activation Mapping)

Grad-CAM

Good Visual Explanation이란?

Example

Contribution

Visualizing CNN

Weakly supervised localization

Proposal : Grad-CAM

3. Approach

작동 원리

Grad-CAM as a generalization to CAM

Guided Grad-CAM

Algorithm

You May Also Enjoy

[interpretable] (paper 2) Grad-CAM ; Visual Explanations from Deep Networks via Gradient-based Localization

Seunghan Lee

Grad-CAM : Visual Explanations from Deep Networks via Gradient-based Localization

Contents

0. Abstract

1. Introduction

CAM (Class Activation Mapping)

Grad-CAM

Good Visual Explanation이란?

Example

Contribution

2. Related Work

Visualizing CNN

Weakly supervised localization

Proposal : Grad-CAM

3. Approach

작동 원리

Grad-CAM as a generalization to CAM

Guided Grad-CAM

Algorithm

You May Also Enjoy