Learning Representations by Maximizing Mutual Information Across Views
Contents
- Abstract
- Method Description
- Local DIM
- NCE (Noise-Contrastive Estimation) Loss
- Data Augmentation
- Multiscale Mutual Information
0. Abstract
Propose AMDIM, that…
maximizes mutual information between features extracted from multiple views
\(\rightarrow\) requires capturing ionformation about high-level factors
1. Method Description
AMDIM ( = Augmented Multiscale DIM )
( DIM = Deep InfoMax )
Step 1) maximize mutual information bewteen two \(z\) s
Step 2) maximize mutual information bewteen multiple feature scales simultaneously
(1) Local DIM
Local DIM
= maximize MI between global features & local features
Notation
- global features : \(f_{1}(x)\)
- local features : \(\left\{f_{7}(x)_{i j}: \forall i, j\right\}\)
- produced by an intermediate layer in $$
- \(d \in\{1,7\}\) : denotes features from the top-most encoder layer with dim \(d \times d\)
- \(i\) and \(j\) : index the 2 spatial dimes of the array of activations in layer \(d\)
Meaning of MI
= how much better we can guess the value of \(f_{7}(x)_{i j}\) when we know the value of \(f_{1}(x)\) than when we do not know the value of \(f_{1}(x)\).
Term change
- global \(\rightarrow\) antecedent features
- local \(\rightarrow\) consequent features
Construct a distribution : \(p\left(f_{1}(x), f_{7}(x)_{i j}\right)\)
- via ancestral sampling
- process
- step 1) sample an input \(x \sim \mathcal{D}\)
- step 2) sample spatial indices \(i \sim u(i)\) and \(j \sim u(j)\)
- step 3) compute features \(f_{1}(x)\) and \(f_{7}(x)_{i j}\).
Given \(p\left(f_{1}(x)\right), p\left(f_{7}(x)_{i j}\right)\) and \(p\left(f_{1}(x), f_{7}(x)_{i j}\right)\)
\(\rightarrow\) local DIM seeks an encoder that maximizes MI \(I\left(f_{1}(x) ; f_{7}(x)_{i j}\right)\) in \(p\left(f_{1}(x), f_{7}(x)_{i j}\right)\)
(2) NCE (Noise-Contrastive Estimation) Loss
Maximize the NCE lower bound on \(I\left(f_{1}(x) ; f_{7}(x)_{i j}\right)\), by minimizing …
- \(\underset{\left(f_{1}(x), f_{7}(x)_{i j}\right)}{\mathbb{E}}\left[\underset{N_{7}}{\mathbb{E}}\left[\mathcal{L}_{\Phi}\left(f_{1}(x), f_{7}(x)_{i j}, N_{7}\right)\right]\right]\).
Positive & Negative
- Positive : from joint distn \(\rightarrow\) \(\left(f_{1}(x), f_{7}(x)_{i j}\right) \sim p\left(f_{1}(x), f_{7}(x)_{i j}\right)\)
- Negative ( = \(N_{7}\) ) : from marginal distn \(\rightarrow\) \(p\left(f_{7}(x)_{i j}\right)\)
\(\mathcal{L}_{\Phi}\left(f_{1}, f_{7}, N_{7}\right)=-\log \frac{\exp \left(\Phi\left(f_{1}, f_{7}\right)\right)}{\sum_{\tilde{f}_{7} \in N_{7} \cup\left\{f_{7}\right\}} \exp \left(\Phi\left(f_{1}, \tilde{f}_{7}\right)\right)}\).
(3) Data Augmentation
Local DIM \(\rightarrow\) Local DIM + Data Augmentation
= extends local DIM, by maximizing MI between features from augmented views of each input.
Construct the AUGMENTED feature distn \(p_{\mathcal{A}}\left(f_{1}\left(x^{1}\right), f_{7}\left(x^{2}\right)_{i j}\right)\) as …
-
Step 1) sample an input \(x \sim \mathcal{D}\)
- Step 2) sample augmented images \(x^{1} \sim \mathcal{A}(x)\) and \(x^{2} \sim \mathcal{A}(x)\)
- \(\mathcal{A}(x)\) : distn of images generated by applying stochastic DA to \(x\)
-
Step 3) sample spatial indices \(i \sim u(i)\) and \(j \sim u(j)\)
- Step 4) ompute features \(f_{1}\left(x^{1}\right)\) and \(f_{7}\left(x^{2}\right)_{i j}\)
(4) Multiscale Mutual Information
Local DIM + **Data Augmentation **\(\rightarrow\) AMDIM ( Augmented Multiscale DIM )
= extend local DIM, by maximizing MI across multiple feature scales
\(n\)-to-\(m\) infomax costs :
\(\underset{\left(f_{n}\left(x^{1}\right)_{i j}, f_{m}\left(x^{2}\right)_{k l}\right)}{\mathbb{E}}\left[\underset{N_{m}}{\mathbb{E}}\left[\mathcal{L}_{\Phi}\left(f_{n}\left(x^{1}\right)_{i j}, f_{m}\left(x^{2}\right)_{k l}, N_{m}\right)\right]\right]\).
- ex) \(p_{\mathcal{A}}\left(f_{5}\left(x^{1}\right)_{i j}, f_{7}\left(x^{2}\right)_{k l}\right)\)
- ex) \(p_{\mathcal{A}}\left(f_{5}\left(x^{1}\right)_{i j}, f_{5}\left(x^{2}\right)_{k l}\right)\)
- ex) \(p_{\mathcal{A}}\left(f_{1}\left(x^{1}\right), f_{5}\left(x^{2}\right)_{k l}\right)\)