Deep Clustering for Unsupervised Learning of Visual Features


Contents

  1. Abstract
  2. Method
    1. Preliminaries
    2. Unsupervised Learning by Clustering
    3. Avoiding Trivial Solutions


0. Abstract

DeepCluster

  • end-to-end training of visual features

  • jointly learns….
    • (1) the parameters of NN
    • (2) cluster assignments of the resulting features
  • iteratively …
    • step 1) groups the features with standard clustering algorithm (k-means)
    • step 2) use the subsequent assignments as supervision to update NN


1. Method

figure2


(1) Preliminaries

data : \(X=\left\{x_{1}, x_{2}, \ldots, x_{N}\right\}\)

model :

  • (1) convent mapping : \(f_{\theta}\)
  • (2) classifier : \(g_{W}\)


Loss function : \(\min _{\theta, W} \frac{1}{N} \sum_{n=1}^{N} \ell\left(g_{W}\left(f_{\theta}\left(x_{n}\right)\right), y_{n}\right)\).

( \(l\) : negative log-softmax )


(2) Unsupervised Learning by Clustering

cluster the output of the convnet &

use the subsequent cluster assignments as “pseudo-labels” to optimize \(\min _{\theta, W} \frac{1}{N} \sum_{n=1}^{N} \ell\left(g_{W}\left(f_{\theta}\left(x_{n}\right)\right), y_{n}\right)\)

( iteratively learns the features & groups them )


K-means

  • input : \(f_{\theta}\left(x_{n}\right)\)
  • output : clusters them into \(k\) distinct groups

  • jointly learns a ..
    • (1) \(d \times k\) centroid matrix \(C\)
    • (2) cluster assignments \(y_{n}\) of each image \(n\)
  • loss function :
    • \(\min _{C \in \mathbb{R}^{d \times k}} \frac{1}{N} \sum_{n=1}^{N} \min _{y_{n} \in\{0,1\}^{k}} \mid \mid f_{\theta}\left(x_{n}\right)-C y_{n} \mid \mid _{2}^{2} \quad \text { such that } y_{n}^{\top} 1_{k}=1\).

\(\rightarrow\) these assignments are used as pseudo-labels

( do not use centroid matrix \(C\) )


problem : degenerating problem

\(\rightarrow\) cluster them into a single group… solution??


(3) Avoiding Trivial Solutions

solutions are typically based on constraining or penalizing the MINIMAL number of points per cluster

but…not applicable to large scale datasets


a) empty clusters

when cluster becomes empty….

randomly select a non-empty cluster & use its centroid with a small perturbation as a new centroid


b) trivial parameterization

minimizing \(\min _{\theta, W} \frac{1}{N} \sum_{n=1}^{N} \ell\left(g_{W}\left(f_{\theta}\left(x_{n}\right)\right), y_{n}\right)\)

\(\rightarrow\) leads to a trivial parameterization

( = predict the same output, regardless of output )


solution ?

  • sample images based on a uniform distn over the classes ( or pseudo labels )

    ( = weighted loss, with weight of inverse of size of clusters )


Categories: ,

Updated: