Self-Labeling via Simultaneous Clustering and Representation Learning

Abstract
Introduction
Method
1. Self-labeling

0. Abstract

combining (1) clustering + (2) representation learning

\(\rightarrow\) doing it naively…leads to degenerate solutions

solution : propose a method, that maximizes the information between labels & input data indicies

1. Introduction

self-supervision tasks : mostly done by new pretext task

But, task of classification is sufficient for pre-training

( of course…. provided that labels are given )

\(\rightarrow\) focus on obtaining the labels automatically ( with self-labeling algorithm )

Degeneration problem ?

\(\rightarrow\) solve by adding the constraint, that the labels must induce an equipartition of the data ( = maximizes the information between data indicies & labels )

2. Method

(1) self-labeling method

(2) interpret the method as optimizing laels & targets of CE loss

(1) Self-labeling

Notation :

\(x=\Phi(I)\) : DNN
- map images (\(I\)) to feature vectors (\(x \in \mathbb{R}^D\) )
\(I_1, \ldots, I_N\) : Image data
\(y_1, \ldots, y_N \in\{1, \ldots, K\}\) : Image labels
\(h: \mathbb{R}^D \rightarrow \mathbb{R}^K\) : classification head
\(p\left(y=\cdot \mid \boldsymbol{x}_i\right)=\operatorname{softmax}\left(h \circ \Phi\left(\boldsymbol{x}_i\right)\right)\) : class probabilities

Train model & head parameters, with average CE loss

\(E\left(p \mid y_1, \ldots, y_N\right)=-\frac{1}{N} \sum_{i=1}^N \log p\left(y_i \mid \boldsymbol{x}_i\right)\).

\(\rightarrow\) requires labelled dataset

( if not, requires a self-labeling mechanism )

[ Self-labeling mechanism ]

achieved by jointly optimizing , w.r.t
- (1) model \(h \circ \Phi\)
- (2) labels \(y_1, \ldots, y_N\)
but if fully unsupervised …. leads to degenerate solution

( = trivially minimized by assigning all data points to a single (arbitrary) label )

Solution?

first, encode the labels as posterior distn \(q\left(y \mid \boldsymbol{x}_i\right)\)
- (Before) \(E\left(p \mid y_1, \ldots, y_N\right)=-\frac{1}{N} \sum_{i=1}^N \log p\left(y_i \mid \boldsymbol{x}_i\right)\).
- (After) \(E(p, q)=-\frac{1}{N} \sum_{i=1}^N \sum_{y=1}^K q\left(y \mid \boldsymbol{x}_i\right) \log p\left(y \mid \boldsymbol{x}_i\right) .\)
( optimizing \(q\) = reassigning labels )
to avoid degeneracy…

\(\rightarrow\) add the constraint that the label assignments must partition the data in equally-sized subsets
objective function :
- \(\min _{p, q} E(p, q) \quad \text { subject to } \quad \forall y: q\left(y \mid \boldsymbol{x}_i\right) \in\{0,1\} \text { and } \sum_{i=1}^N q\left(y \mid \boldsymbol{x}_i\right)=\frac{N}{K}\).

Twitter Facebook LinkedIn

(paper 30) Self-Labeling via Simultaneous Clustering and Representation Learning

Seunghan Lee

Self-Labeling via Simultaneous Clustering and Representation Learning

Contents

0. Abstract

1. Introduction

2. Method

(1) Self-labeling

You May Also Enjoy