Self-Labeling via Simultaneous Clustering and Representation LearningPermalink
ContentsPermalink
- Abstract
- Introduction
- Method
- Self-labeling
0. AbstractPermalink
combining (1) clustering + (2) representation learning
→ doing it naively…leads to degenerate solutions
solution : propose a method, that maximizes the information between labels & input data indicies
1. IntroductionPermalink
self-supervision tasks : mostly done by new pretext task
But, task of classification is sufficient for pre-training
( of course…. provided that labels are given )
→ focus on obtaining the labels automatically ( with self-labeling algorithm )
Degeneration problem ?
→ solve by adding the constraint, that the labels must induce an equipartition of the data ( = maximizes the information between data indicies & labels )
2. MethodPermalink
(1) self-labeling method
(2) interpret the method as optimizing laels & targets of CE loss
(1) Self-labelingPermalink
Notation :
- x=Φ(I) : DNN
- map images (I) to feature vectors (x∈RD )
- I1,…,IN : Image data
- y1,…,yN∈{1,…,K} : Image labels
- h:RD→RK : classification head
- p(y=⋅∣xi)=softmax(h∘Φ(xi)) : class probabilities
Train model & head parameters, with average CE loss
- E(p∣y1,…,yN)=−1N∑Ni=1logp(yi∣xi).
→ requires labelled dataset
( if not, requires a self-labeling mechanism )
[ Self-labeling mechanism ]
-
achieved by jointly optimizing , w.r.t
- (1) model h∘Φ
- (2) labels y1,…,yN
-
but if fully unsupervised …. leads to degenerate solution
( = trivially minimized by assigning all data points to a single (arbitrary) label )
Solution?
-
first, encode the labels as posterior distn q(y∣xi)
- (Before) E(p∣y1,…,yN)=−1N∑Ni=1logp(yi∣xi).
- (After) E(p,q)=−1N∑Ni=1∑Ky=1q(y∣xi)logp(y∣xi).
( optimizing q = reassigning labels )
-
to avoid degeneracy…
→ add the constraint that the label assignments must partition the data in equally-sized subsets
-
objective function :
- minp,qE(p,q) subject to ∀y:q(y∣xi)∈{0,1} and ∑Ni=1q(y∣xi)=NK.