SCAN : Learning to Classify Images without Labels


Contents

  1. Abstract
  2. Introduction
  3. Method
    1. RL for semantic clustering
    2. A semantic clustering loss
    3. Fine-tuning through self-labeling


0. Abstract

Unsupervised Image Classification

  • automatically group images into semantically meaningful clusters, when GT labels are absent

  • previous works

    • (1) end-to-end
    • (2) two-step approach ( this paper )
      • feature learning & clustering


1. Introduction

Representation Learning

  • use self-supervised learning to generate feature representations

    ( no need for label )

  • use pre-designed tasks, called pretext tasks

  • (1) two-stage approach

    • representation learning : mainly used as the first pretraining stage
    • ( second stage = fine-tuning on another task )


(2) end-to-end learning

  • combine feature learning & clustering


Proposed work, SCAN

( SCAN = Semantic Clustering by Adopting Nearest neighbors )

  • two-step approach
  • Leverage the advantage of both
    • (1) representation learning
    • (2) end-to-end learning


Procedures of SCAN

  • step 1) learn feature representation via pretext task

    • (representation learning) use K-means

      \(\rightarrow\) may have cluster degeneracy problem

    • (proposed) mine the nearest neighbors of each image, based on feature similarity

  • step 2) integrate semantically meaningful neighbors as prior intoa learnable approach


2. Method

(1) RL for semantic clustering

Notation

  • image dataset : \(\mathcal{D}=\left\{X_1, \ldots, X_{ \mid \mathcal{D} \mid }\right\}\)

  • class label ( absent ) : \(\mathcal{C}\)

    \(\rightarrow\) However, we do not have access to class label !


Representation learning

  • pretext task : \(\tau\)
  • embedding function : \(\phi_{\theta}\)
  • image & augmented image : \(X_i\) & \(T[X_i]\)
  • minimize …
    • \(\min _\theta d\left(\Phi_\theta\left(X_i\right), \Phi_\theta\left(T\left[X_i\right]\right)\right)\).


Conclusion : pretext tasks from RL can be used to obtain semantically meaningful features


(2) A semantic clustering loss

a) Mining nearest negibhors

naively applying K-means to obtained features \(\rightarrow\) lead to cluster degeneracy


[ Setting ]

  • Using pretext-tasks & nearest neighbors (NN) …….

    for every sample \(X_i \in \mathcal{D}\), mine its \(K\) neareste neighbors, \(\mathcal{N}_{X_i}\)

figure2


Loss Function

Goal : learn a clustering function \(\Phi_\eta\)

  • Classifies a sample \(X_i\) & \(\mathcal{N}_{X_i}\) together
  • soft assignment over clusters \(\mathcal{C}=\{1, \ldots, C\}\), with \(\Phi_\eta\left(X_i\right) \in [0,1]^C\)
    • probability of \(X_i\) assigned to \(c\) : \(\Phi_\eta^c\left(X_i\right)\)


Loss Function : \(\Lambda=-\frac{1}{ \mid \mathcal{D} \mid } \sum_{X \in \mathcal{D}} \sum_{k \in \mathcal{N}_X} \log \left\langle\Phi_\eta(X), \Phi_\eta(k)\right\rangle+\lambda \sum_{c \in \mathcal{C}} \Phi_\eta^{\prime c} \log \Phi_\eta^{\prime c}\)

  • with \(\Phi_\eta^{\prime c}=\frac{1}{ \mid \mathcal{D} \mid } \sum_{X \in \mathcal{D}} \Phi_\eta^c(X) .\)

  • (1st term) correct prediction

  • (2nd term) spreads the prediction across all clusters

    ( = can be replaced by KL-divergence )


(3) Fine-tuning through self-labeling

  • each sample is combined with \(K \geq 1\) Neighbors…but may have FP (False Positive)

  • experimently observed that samples with high confident predictions (\(p_{max}\approx1\) ) tend to have propor cluster

    \(\rightarrow\) regard them as prototypes for each class


figure2


Categories: ,

Updated: