An Overview of Deep Semi-Supervised Learning (2020) - Part 4

Abstract
Introduction
1. SSL
2. SSL Methods
3. Main Assumptions in SSL
4. Related Problems
Consistency Regularization
1. Ladder Networks
2. Pi-Model
3. Temporal Ensembling
4. Mean Teachers
5. Dual Students
6. Fast-SWA
7. Virtual Adversarial Training (VAT)
8. Adversarial Dropout (AdD)
9. Interpolation Consistency Training (ICT)
10. Unsupervised Data Augmentation
Entropy Minimization
Proxy-label Methods
1. Self-training
2. Multi-view Training
Holistic Methods
1. MixMatch
2. ReMixMatch
3. FixMatch
Generative Models
1. VAE for SSL
2. GAN for SSL
Graph-Based SSL
1. Graph Construction
2. Label Propagation
Self-Supervision for SSL

7. Graph-Based SSL

Setting : each data point \(x_i\) : labeled or unlabeled

Notation : \(G(V, E)\)

node : \(V=\left\{x_1, \ldots, x_n\right\}\)
edge : \(E=\left\{e_{i j}\right\}_{i, j=1}^n\)
adjacency matrix : \(A\)
- each element as a non-negative weight
- if not connected … \(A_{i j}=0\)

Graph-based tasks

(1) node classification
(2) link prediction
(3) clustering
(4) visualization

Graph methods

(1) transductive
- only capable of producing labels assignments of the examples seen during training
(2) inductive
- more generalizable, and can be transferred and applied to unseen examples

will discuss node classification approaches

can be grouped into…

(1) methods which propagate the labels from labeled to unlabeled
- assumption : nearby nodes tend to have the same labels
(2) methods which learn node embeddings
- assumption : nearby nodes should have similar embeddings in vector space
& apply classifiers on the learned embeddings

(1) Graph Construction

we first need a graph ! graph can be represented as…

(1) adjacency matrix \(A\)
(2) constructed to reflect the similarity of the nodes

In case we have limited domain knowledge…

\(\rightarrow\) create graphs!

(1) Fully connected graphs
- fully connected with weighted edges between all pairs of data
- high computational costs
(2) Sparse graphs
- each node is only connected to a few similar nodes
  
  ( connections to dissimilar nodes are removed )
- Ex) kNN graphs
  - \(i\) & \(j\) is connected, if both are \(k\)-nearest neighbors
  - how to obtain weight ?
    - ex) Gaussian kernel : \(W_{i j}=\exp \left\{- \mid \mid x_i-x_j \mid \mid ^2 / 2 \sigma^2\right\}\)
- Ex) \(\epsilon\)NN graphs
  - \(i\) & \(j\) are connected if \(d(i, j) \leq \epsilon\)

(2) Label Propagation

propagates labels of the labeled data \(\rightarrow\) unlabeled data

Assumption :

data in same manifold \(\rightarrow\) likely to share label

Notation

\(\hat{Y}\) : \(n \times C\) matrix ( of new classificaiton scores )
- each row : \(\hat{Y}_i\) ( distn over \(C\) classes )
\(Y\) : \(n \times C\) matrix ( labels for the labeled data )
- each row : \(Y_i\) ( one-hot vector GT )

Loss function for label propagation :

\(\mathcal{L}=\frac{1}{2} \sum_{i, j=1}^n A_{i j}\left(\hat{y}_i-\hat{y}_j\right)^2=\hat{Y}^T L \hat{Y}\).
- \(L\) = \(D-A\) ( = graph Laplacian matrix )

Probabilistic transition matrix : \(P = D^{-1}A\)

split into labeld & unlabeled!

\(P=\left(\begin{array}{cc} P_{l l} & P_{l u} \\ P_{u l} & P_{u u} \end{array}\right) \quad Y=\left(\begin{array}{c} Y_l \\ Y_u \end{array}\right) \quad \hat{Y}=\left(\begin{array}{c} \hat{Y}_l \\ \hat{Y}_u \end{array}\right)\).

Optimal solution :

\(\hat{Y}_l =Y_l\).
\(\hat{Y}_u =\left(I-P_{u u}\right)^{-1} P_{u l} Y_l\).

Labeling score computation : involves matrix inversion

\(\rightarrow\) computaitonally heavy

\(\rightarrow\) solution : propose an iterative approach to converge to same solution

Iterative Approach

procedure
- step 1) propagate \(\hat{Y} \leftarrow P \hat{Y}\)
- step 2) preserve labeled data \(\hat{Y}_l=Y_l\)
- repeat above, until convergence
loss function :
- \(\mathcal{L}=\frac{1}{2} \sum_{i, j=1}^n A_{i j} \mid \mid \frac{\hat{Y}_i}{\sqrt{D_{i i}}}-\frac{\hat{Y}_i}{\sqrt{D_{j j}}} \mid \mid ^2+(1 / \alpha-1) \sum_{i=1}^n \mid \mid \hat{Y}_i-Y_i \mid \mid ^2\).
  - 1st term ) smoothness constraint
    - labels that do not change too much between nearby points,
  - 2nd term ) fitting constraint
    - final labels of the labeled nodes to be similar to their initial value
optimal solution :
- \(\hat{Y}=(I-\alpha S)^{-1} Y\).
  
  where \(S=D^{-1 / 2} A D^{-1 / 2}\)

Iterative Approach (2)

less computationally expensive
procedure
- step 1) propagate \(\hat{Y} \leftarrow \alpha S \hat{Y}+(1-\alpha) Y\)
- repeat until convergence

8. Self-Supervision for SSL

trained using pretext task

exemplar-CNN
Rotation
Patches
Coloriation
Contrastive Predictive Coding

Twitter Facebook LinkedIn

(paper) SSL12(4) - An Overview of Deep Semi-Supervised Learning (Graph-Based SSL)

Seunghan Lee

An Overview of Deep Semi-Supervised Learning (2020) - Part 4

Contents

7. Graph-Based SSL

(1) Graph Construction

(2) Label Propagation

8. Self-Supervision for SSL

You May Also Enjoy