( 참고 : 패스트 캠퍼스 , 한번에 끝내는 컴퓨터비전 초격차 패키지 )

Object Detection - YOLO

( You Only Look Once: Unified, Real-Time Object Detection, Redmon et al., CVPR 2016 )


1. One vs Two-stage Detector

figure2

2. YOLO v1

YOLO = You Only Look Once


(1) Overall Architecture

figure2

  • input : single image

  • output :

    • (1) bounding boxes
    • (2) bounding boxes’ classes

    & use confidence threshold & NMS to filter boxes


(2) Model

figure2

figure2

Feature Map Size :

  • H x W x (Bx5 + C)
    • H : Height
    • W : Width
    • (Bx5 + C)
      • B : number of bounding boxes
      • 5 : confidence score + 4 coordinates
      • C : number of classes


(3) Loss Function

  • (1) Classification Loss
    • \(\sum_{i=0}^{S^{2}} \mathbb{1}_{i}^{\mathrm{obj}} \sum_{c \in \text { classes }}\left(p_{i}(c)-\hat{p}_{i}(c)\right)^{2}\).
  • (2) Localization Loss
    • \(\begin{aligned} &\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(x_{i}-\hat{x}_{i}\right)^{2}+\left(y_{i}-\hat{y}_{i}\right)^{2}\right] \\ &\quad+\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(\sqrt{w_{i}}-\sqrt{\hat{w}_{i}}\right)^{2}+\left(\sqrt{h_{i}}-\sqrt{\hat{h}_{i}}\right)^{2}\right] \end{aligned}\).
  • (3) Confidence Loss
    • \(\sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left(C_{i}-\hat{C}_{i}\right)^{2}\).


(4) NMS (Non-Maximum Suppression)

  • sort by confidence score
  • Merge to box with larger score


(5) Results

figure2

Categories:

Updated: