Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image ClassificationPermalink


ContentsPermalink

  1. Abstract
  2. Introduction
  3. Model
    1. Stage 1 : Unsupervsied Deep Embedding
    2. Stage 2 : Unsupervised Class Assignment with Refining Pretraining Embeddings


0. AbstractPermalink

Unsupervised Image Classification

  • latest approach : end-to-end
    • unified losses from (1) embedding & (2) class assignment
    • have different goals … thus jointly optimizing may lead to suboptimal solutions


Solution : propose a novel two-stage algorithm

  • (1) embedding module for pretraining
  • (2) refining module for embedding & class assignment


1. IntroductionPermalink

Unsupervised Image Classification

  • determine the membership of each data point, as one of the predefined class labels
  • 2 methods are used..
    • (1) sequential method
    • (2) joint method


This paper : two-stage approach

  • stage 1) embedding learning
    • gather similar data points
  • stage 2) refine embedding & assign class
    • minimize 2 kinds of loss
      • (1) class assignment loss
      • (2) embedding loss


figure2


2. ModelPermalink

Notation

  • # of underlying classes : nc
  • set of n images : I={x1,x2,,xn}


figure2


(1) Stage 1 : Unsupervised Deep EmbeddingPermalink

  • [GOAL] extract visually essential features

  • adopt Super-AND to initialize encoder


Super-ANDPermalink

employs…

  • (1) data augmentation
  • (2) entropy-based loss


total of 3 losses

  • (1) AND-loss ( Land )
  • (2) UE-loss ( Lue ) ….. unification entropy loss
  • (3) AUG-loss ( Laug ) ….. augmentation loss


Details

  • considers every data occurence as individual class

  • groups the data points into small clusters

    ( by discovering the nearest neighbors )


a) AND-lossPermalink

  • considers each neighborhood pair & remaining data as a single class to separate

  • Land =iNlog(j˜N(xi){i}pji)iNclogpii.

    • N : selected part of the neighborhood pair sets

    • Nc : complement of N
    • ˜N(xi) : neighbor of i-th image
    • pji : probability of i-th image being identified as j-th class


b) UE-lossPermalink

  • intensifies the concentration effect
  • minimizing UE-loss = makes nearby data occurrence attract each other
  • Lue=iji˜pjilog˜pji.


Jointly optimizing a) & b)

enforce overall neighborhoods to be separated, while keeping similar neighbors close.


c) AUG-lossPermalink

  • defined to learn invariant image features

  • Regards augmented images as positive pairs

    Reduce the discrepancey between original & augmented

  • Laug=ijilog(1¯pji)ilog¯pii.


Total Loss :Permalink

Lstage 1=Land +w(t)×Lue +Laug .

  • w(t) : initialized from 0 and increased gradually


(2) Stage 2 : Unsupervised Class Assignment with Refining Pretraining EmbeddingsPermalink

ideal class assignment : requires …

  • (1) not only ideal embedding
  • (2) but also dense grouping


use 2 kinds of loss in Stage 2

  • (1) class assignment loss
  • (2) consistency preserving loss


Mutual Information-based Class AssignmentPermalink

Mutual Information (MI) :

I(x,y)=DKL(p(x,y)∣∣p(x)p(y))=xXyYp(x,y)logp(x,y)p(x)p(y)=H(x)H(xy).


IIC (Invariant Information Clustering)

  • maximize MI between samples & augmented samples

  • trains the classifier with invariant features from DA

  • procedure

    • [input] image set x & augmented image set g(x)

    • mapping : fθ

      • classifies images & generate probability vector

        ( y=fθ(x),ˆy=fθ(g(x)) )

    • find optimal fθ, that maximizes…

      • max.
  • by maximizing MI, can prevent clustering degeneracy


Details of MI : I(x, y) =H(x)-H(x \mid y)

  • (1) maximize H(y)
    • when every data is EVENLY assigned to every cluster
  • (2) minimize H(y \mid \hat{y})
    • when consistent cluster


Loss Function :

  • joint pdf of y and \hat{y} : matrix \mathbf{P}

    ( \mathbf{P}=\frac{1}{n} \sum_{i \in \mathcal{B}} f_\theta\left(x_i\right) \cdot f_\theta\left(g\left(x_i\right)\right)^T )

  • L_{a s s i g n}=-\sum_c \sum_{c^{\prime}} \mathbf{P}_{c c^{\prime}} \cdot \log \frac{\mathbf{P}_{c c^{\prime}}}{\mathbf{P}_{c^{\prime}} \cdot \mathbf{P}_c}.


Consistency Preserving on EmbeddingPermalink

add an extra loss term, L_{cp}

Notation

  • image : \mathbf{x}_i
  • embedding of \mathbf{x}_i : \mathbf{v}_i
    • projected to normalized sphere
  • \hat{\mathbf{p}}_i^j(i \neq j) : probability of given instance i classified as j-th instance
  • \hat{\mathbf{p}}_i^i : probability of being classified as its own i-th augmented instance


Consistency preserving loss L_{cp} : minimizes any mis-classified cases over the batches

  • \begin{array}{r} \hat{\mathbf{p}}_i^j=\frac{\exp \left(\mathbf{v}_j^{\top} \mathbf{v}_i / \tau\right)}{\sum_{k=1}^n \exp \left(\mathbf{v}_k^{\top} \mathbf{v}_i / \tau\right)}, \quad \hat{\mathbf{p}}_i^i=\frac{\exp \left(\mathbf{v}_i^{\top} \hat{\mathbf{v}}_i / \tau\right)}{\sum_{k=1}^n \exp \left(\mathbf{v}_k^{\top} \hat{\mathbf{v}}_i / \tau\right)} \\ L_{c p}=-\sum_i \sum_{j \neq i} \log \left(1-\hat{\mathbf{p}}_i^j\right)-\sum_i \log \hat{\mathbf{p}}_i^i \end{array}.


Total Unsupervised Classification Loss :

  • L_{\text {stage } 2}=L_{\text {assign }}+\lambda \cdot L_{c p}.


Normalized FC classifierPermalink

Norm-FC classification heads :

  • used for the second stage classifier

Predicted value :

  • y_i^j=\frac{\exp \left(\frac{\mathbf{w}_j}{ \mid \mid \mathbf{w}_j \mid \mid } \cdot \mathbf{v}_i / \tau_c\right)}{\sum_k \exp \left(\frac{\mathbf{w}_k}{ \mid \mid \mathbf{w}_k \mid \mid } \cdot \mathbf{v}_i / \tau_c\right)}.

Categories: ,

Updated: