( 참고 : 패스트 캠퍼스 , 한번에 끝내는 컴퓨터비전 초격차 패키지 )
Object Detection - YOLO
( You Only Look Once: Unified, Real-Time Object Detection, Redmon et al., CVPR 2016 )
1. One vs Two-stage Detector
2. YOLO v1
YOLO = You Only Look Once
(1) Overall Architecture
-
input : single image
-
output :
- (1) bounding boxes
- (2) bounding boxes’ classes
& use confidence threshold & NMS to filter boxes
(2) Model
Feature Map Size :
- H x W x (Bx5 + C)
- H : Height
- W : Width
- (Bx5 + C)
- B : number of bounding boxes
- 5 : confidence score + 4 coordinates
- C : number of classes
(3) Loss Function
- (1) Classification Loss
- \(\sum_{i=0}^{S^{2}} \mathbb{1}_{i}^{\mathrm{obj}} \sum_{c \in \text { classes }}\left(p_{i}(c)-\hat{p}_{i}(c)\right)^{2}\).
- (2) Localization Loss
- \(\begin{aligned} &\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(x_{i}-\hat{x}_{i}\right)^{2}+\left(y_{i}-\hat{y}_{i}\right)^{2}\right] \\ &\quad+\lambda_{\text {coord }} \sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left[\left(\sqrt{w_{i}}-\sqrt{\hat{w}_{i}}\right)^{2}+\left(\sqrt{h_{i}}-\sqrt{\hat{h}_{i}}\right)^{2}\right] \end{aligned}\).
- (3) Confidence Loss
- \(\sum_{i=0}^{S^{2}} \sum_{j=0}^{B} \mathbb{1}_{i j}^{\text {obj }}\left(C_{i}-\hat{C}_{i}\right)^{2}\).
(4) NMS (Non-Maximum Suppression)
- sort by confidence score
- Merge to box with larger score