( 참고 : 패스트 캠퍼스 , 한번에 끝내는 컴퓨터비전 초격차 패키지 )

Object Detection - RCNN

1. Visual Recognition Tasks

  1. Image Classification
  2. Object Detection : “bounding box”
  3. Semantic Segmentation : “pixel-level”

figure2


2. Naive Approach

Object Detection (OD) = (1) + (2)

  • (1) Box Localization ( via object proposals )
    • ex 1) Selective Search for Object Detection ( IJCV 2013 )
    • ex 2) Edge Boxes : Locating Object Proposals from Edges ( ECCV 2014 )
  • (2) Box Classification ( via CNN )


3. RCNN (2014)

( Girshick et al., Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014 )

RCNN = Region-based CNN

figure2


(1) Summary

  • 2 stage detector

    • stage # 1 : region proposal

    • stage # 2 : classification

  • a) Region proposal :

    • 2,000 proposed regions per image

    • crop 2,000 regions into same size

      ( for a fixed-length vector, for regression & classification )

    • put 2,000 regions to CNN ( \(\rightarrow\) very slow )

  • b) Regression & Classification :

    • with 2,000 fixed-length vectors, do regression & classification
    • Classification
      • binary SVM x (num_classes+1) … for background
    • Regression
      • IoU >= 0.3 : positive
      • IoU < 0.3 : negative


(2) Regression

Regression = Learning a transformation of bounding box

figure2


(3) Loss Function

\(\begin{aligned} t_{x} &=\left(g_{x}-p_{x}\right) / p_{w} \\ t_{y} &=\left(g_{y}-p_{y}\right) / p_{h} \\ t_{w} &=\log \left(g_{w} / p_{w}\right) \\ t_{h} &=\log \left(g_{h} / p_{h}\right) \end{aligned}\).

\(\mathcal{L}_{\mathrm{reg}}=\sum_{i \in\{x, y, w, h\}}\left(t_{i}-d_{i}(\mathbf{p})\right)^{2}+\lambda \mid \mid \mathbf{w} \mid \mid ^{2}\).


4. Fast RCNN

( Girshick, Ross. “Fast R-CNN: Fast Region-based Convolutional Networks for object detection.” Internation Conference on Computer Vision,(ICCV). 2015. )

figure2

figure2


Solve the problem of slow speed of RCNN

RCNN vs Fast RCNN

  • common ) Region proposal using selective search

  • difference # 1 ) SPEED

    • RCNN :

      • step 1) 2,000 region proposals
      • step 2) crop 2,000 region proposals
      • step 3) CNN 2,000 times ( with proposed regions )
    • Fast-RCNN

      • step 1-1) 2,000 region proposals

        step 1-2) CNN 1 time ( with original image )

      • step 2) crop regions from feature map ( RoI projection )

  • difference # 2 ) RoI Pooling

    • Why RoI Pooling??

      • no need for fixed length vector for CNN Input!

        ( just need fixed length for FC input )

      • Thus, RoI Pooling after RoI Projection!

  • difference # 3 ) Classification

    • instead of SVM, use FC layer


(1) Roi Pooling

figure2


(2) Loss Function

\(\begin{aligned} \mathcal{L}\left(p, u, t^{u}, v\right) &=\mathcal{L}_{\mathrm{cls}}(p, u)+\mathbb{1}[u \geq 1] \mathcal{L}_{\mathrm{box}}\left(t^{u}, v\right) \\ \mathcal{L}_{\mathrm{cls}}(p, u) &=-\log p_{u} \\ \mathcal{L}_{\mathrm{box}}\left(t^{u}, v\right) &=\sum_{i \in\{x, y, w, h\}} L_{1}^{\text {smooth }}\left(t_{i}^{u}-v_{i}\right) \end{aligned}\).

  • where \(L_{1}^{\mathrm{smooth}}(x)= \begin{cases}0.5 x^{2} & \text { if } \mid x \mid <1 \\ \mid x \mid -0.5 & \text { otherwise }\end{cases}\).


5. Faster RCNN

figure2


Solve the problem of selective search of RCNN/Fast RCNN

  • selective search : on CPU -> bottleneck!


Faster RCNN : end-to-end

  • region proposal inside CNN, using GPU

  • Process

    • step 1) CNN with original image

    • step 2) input feature map to RPN (Region Proposal Network)

      ( use these output region proposals, instead of selective search )

    • below are same


Two Additional Loss ( of RPN )

  • (Loss 1) Classification Loss
  • (Loss 2) Box regression Loss


(1) RPN (Region Proposal Network)

figure2

  • Input : 7x7xC Feature Map

  • Kernel Size : 3 x 3 filter

  • Number of anchor boxes : K (=9)

  • Output : 7x7x6K Feature Map
    • 6k = 2k + 4k
      • 2K : whether object is foreground/background
      • 4K : (X,Y,W,H)
  • $p^{*}=f(x)=\left{\begin{aligned} -1, & \text { if IoU }<0.3
    1, & \text { if IoU }>0.7
    0, & \text { otherwise } \end{aligned}\right.$.


(2) Loss Function

\(\begin{aligned} \mathcal{L} &=\mathcal{L}_{\mathrm{cls}}+\mathcal{L}_{\mathrm{box}} \\ \mathcal{L}\left(\left\{p_{i}\right\},\left\{t_{i}\right\}\right) &=\frac{1}{N_{\mathrm{cls}}} \sum_{i} \mathcal{L}_{\mathrm{cls}}\left(p_{i}, p_{i}^{*}\right)+\frac{\lambda}{N_{\mathrm{box}}} \sum_{i} p_{i}^{*} \cdot L_{1}^{\text {smooth }}\left(t_{i}-t_{i}^{*}\right) \end{aligned}\).

  • where \(\mathcal{L}_{\mathrm{cls}}\left(p_{i}, p_{i}^{*}\right)=-p_{i}^{*} \log p_{i}-\left(1-p_{i}^{*}\right) \log \left(1-p_{i}\right)\).

Categories:

Updated: