( 참고 : 패스트 캠퍼스 , 한번에 끝내는 컴퓨터비전 초격차 패키지 )

Image Clustering

(1) Mahalanobis Distance

Euclidean vs Mahalanobis Distance

Example )

Euclidean : \(D_{E}\left(\boldsymbol{x}_{1}, \boldsymbol{x}_{2}\right) < D_{E}\left(\boldsymbol{x}_{3}, \boldsymbol{x}_{4}\right)\)

\(D_{E}\left(x_{i}, x_{j}\right)=\sqrt{\left(x_{i}-x_{j}\right)^{\top}\left(x_{i}-x_{j}\right)}\).

Mahalanobis : \(D_{M}\left(x_{1}, x_{2}\right)>D_{M}\left(x_{3}, x_{4}\right)\)

\(D_{M}\left(\boldsymbol{x}_{i}, \boldsymbol{x}_{j}\right)=\sqrt{\left(\boldsymbol{x}_{i}-\boldsymbol{x}_{\boldsymbol{j}}\right)^{\top} M\left(\boldsymbol{x}_{i}-\boldsymbol{x}_{\boldsymbol{j}}\right)}\).

(2) K-means Clustering

clustering based on centroid
framework : EM algorithm
- E step : finding centroid
- M step : assigning data to centroids

\(\begin{gathered} X=C_{1} \cup C_{2} \ldots \cup C_{K}, \quad C_{i} \cap C_{j}=\phi \\ \operatorname{argmin}_{C} \sum_{i=1}^{K} \sum_{x_{j} \in C_{i}} \mid \mid x_{j}-c_{i} \mid \mid ^{2} \end{gathered}\).

(3) Unsupervised Metric Learning

Why not find embedding space, using metric learning w.o labeled data?

( + previous supervised metric learning may cause overfitting! )

Solution

step 1) pre-train with UN-labeled images
- contrastive learning, using hard positives & hard negatives in manifold
step 2) fine-tune with labeled images

Example)

contrastive loss :
- \(l_{c}\left(\mathbf{z}^{r}, \mathbf{z}^{+}, \mathbf{z}^{-}\right)= \mid \mid \mathbf{z}^{r}-\mathbf{z}^{+} \mid \mid ^{2}+\left[m- \mid \mid \mathbf{z}^{r}-\mathbf{z}^{-} \mid \mid _{+}^{2}\right]\).
triplet loss :
- \(l_{t}\left(\mathbf{z}^{r}, \mathbf{z}^{+}, \mathbf{z}^{-}\right)=\left[m+ \mid \mid \mathbf{z}^{r}-\mathbf{z}^{+} \mid \mid ^{2}- \mid \mid \mathbf{z}^{r}-\mathbf{z}^{-} \mid \mid \right]_{+}^{2}\).

Ground Truth (O)

Ground Truth (X)

Conclusion

without discrete category label, use fine-grained similarity
save annotation cost!
cons :
- still need some pre-trained model!

Self-Taught Metric Learning without Labels ( Kim et al., CVPR 2022 )

use Pseudo-label!

\(\rightarrow\) solution : self-taught networks ( by self-knowledge distillation )

(4) t-SNE

t-SNE

= t-distributed Stochastic Neighbor Embedding

( dimension reduction for visualization of high-dim data )

( for more details, refer to https://seunghan96.github.io/ml/stat/t-SNE/ )

Twitter Facebook LinkedIn

(CV summary) 14. Image Clustering

Seunghan Lee