(paper 52) CORE

2021

1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

CORE: Self- and Semi-supervised Tabular Learning with COnditional Regularizations

Notation

generation of input : \(X \in \mathbb{R}^M\)
outcome : \(Y \in \mathbb{R}\)
low-dim signal : \(W \in \mathbb{R}^D\)
- drawn from \(p(W)\) …… where \(D<M\)
individual noises : \(U^j\)
- drawn from \(p\left(U^j\right)\) …… for \(j=1, \cdots, M\)

Assumption

input \(X^j\) ( = \(j\)-th dimension ) is generated from \(p\left(X^j \mid W, U^j\right)\) for \(j=1, \ldots, M\)
outcome \(Y\) is generated from \(p(Y \mid W)\).

predicting \(Y\) requires estimation of \(W\)

& learning to infer \(W\) Is possible on unlabeled \(X\)

1. Self-supervised CORE

Setting : ENC & DEC from AE

Key ida : prevent ENC from memorizing \(X\)

Process : for a given input \(X\)….

step 1) create \(\hat{X}(j)\)
- the \(j\)-th dimension replaced by samples from \(p\left(X^j \mid X^{-j}\right)\)
- sampling from \(p\left(X^j \mid X^{-j}\right)\) …. requireS \(M\) different conditional distribution estimators?
  
  \(\rightarrow\) no! use DDLK ( only need to generate one knockoff \(\tilde{X}\) )
step 2) set \(\hat{X}(j)^j=\tilde{X}^j\) and \(\hat{X}(j)^{-j}=X^{-j}\)
step 3) conditional regularization (CORE)
- \(\sum_{j=1}^M \mathbb{E}_{X, \hat{X}(j)} \mid \mid \operatorname{dec}(\operatorname{enc}(X))-\operatorname{dec}(\operatorname{enc}(\hat{X}(j))) \mid \mid _2^2\).

Encoder : if encoder memorizes \(X^{j}\) , CORE loss will be large!

due to conditional resampling of the \(j\)-th dimension in \(\hat{X}(j)\)

Decoder : keep the decoder constant in the CORE loss

do not take any gradient ( no gradient )

Loss function :

\(\mathbb{E}_X \mid \mid \operatorname{dec}(\operatorname{enc}(X))-X \mid \mid _2^2+\alpha \cdot \sum_{j=1}^M \mathbb{E}_{X, \hat{X}(j)} \mid \mid n g(\operatorname{dec})(\operatorname{enc}(X))-n g(\operatorname{dec})(\operatorname{enc}(\hat{X}(j))) \mid \mid _2^2\).

2. Semi-supervised CORE

Notation :

supervised loss \(l_{\text {sup }}\)
consistency loss \(l_c\)

Loss function :

\(\mathbb{E}_{X, Y} l_{\text {sup }}(f(\operatorname{enc}(X)), Y)+\beta \cdot \sum_{j=1}^M \mathbb{E}_{X, \hat{X}(j)} l_c(f(\operatorname{enc}(X)), f(\operatorname{enc}(\hat{X}(j))))\).

Twitter Facebook LinkedIn

You May Also Enjoy

8 minute read

2 minute read

5 minute read

14 minute read