Representation Learning via Invariant Causal Mechanisms (2020, 47)

Contents

  1. Abstract
  2. Introduction
  3. Representation Learning via Invariant Causal Mechanisms
    1. Problem Setting
    2. Causal Interpretation
    3. ReLIC objective


0. Abstract

idea : self-supervised representation using a causal framework

  • show how data augmentations can be effectilvey utilized, through explicit invariance constraints

\(\rightarrow\) propose a novel self-supervised objective, ReLIC


ReLIC

= Representation Learning via Invariant Causal Mechanism

\(\rightarrow\) enforce invariant prediction of proxy targets … improved generalization!


1. Introduction

data generating process using a CAUSAL graph

& leverage causal tools to derive properties of the optimal representation


Representatin should be an invariant predictor of proxy targets

( not causally related to the downstream targets of interest )


(1) Use data augmentations to simulate a subset of possible interventions

(2) Propose a regularizer, which enforces that the prediction of the proxy targets is invariant across data uagmentations


Contributions

  • (1) formalize problem of self-supervised representation learning using causality

    & propose to more effectively learning data augmentations through invariant prediction

  • (2) propose ReLIC

    • enforces invariant prediction through explicit regularizer


2. Representation Learning via Invariant Causal Mechanisms

figure2


(1) Problem Setting

Notation

  • \(X\) : unlabelled observed data

  • \(\mathcal{Y}=\left\{Y_{t}\right\}_{t=1}^{T}\) : set of unknown tasks

    • \(Y_{t}\) : targets for task \(t\)

    • \(\left\{Y_{t}\right\}_{t=1}^{T}\) : multi-task setup


Goal

  • PRE-train with UNsupervised data a representation \(f(X)\) ,

    that will be useful for solving downstream tasks \(\mathcal{Y}\)


(2) Causal Interpretation

Assumptioin

  • (1) data is generated from CONTENT & STYLE variables
  • (2) only CONTENT being relevant for unknown downstream tasks
  • (3) CONTENT & STYLE are independent


Content

  • good representation of the data for downstream taks
  • goal of representation learning = estimating content


Notation

  • \(C\) : latent variable describing CONTENT
  • \(S\) : latent variable describing STYLE


Independence of mechanisms

  • intervention of \(S\) does not change \(P(Y_t \mid C)\)
  • that is…
    • \(p^{d o\left(S=s_{i}\right)}\left(Y_{t} \mid C\right)=p^{d o\left(S=s_{j}\right)}\left(Y_{t} \mid C\right) \quad \forall s_{i}, s_{j} \in \mathcal{S}\).


(1) Since the targets \(Y_t\) are unknown, construct a proxy task \(Y^T\) to learn representation

(2) To learn INVARIANT representation, enforce above equation!

  • since no access to \(S\) …. use content-preserving data augmentations
    • ex) rotation, gray-scaling, translation, cropping …


(3) ReLIC objective

goal : prediction of proxy targets from the representation is INVARIANT under data augmentations

  • invariant prediction criteria =

    \(p^{\mathrm{do}\left(a_{i}\right)}\left(Y^{R} \mid f(X)\right)=p^{\mathrm{do}\left(a_{j}\right)}\left(Y^{R} \mid f(X)\right) \quad \forall a_{i}, a_{j} \in \mathcal{A}\).

    • \(\mathcal{A}=\left\{a_{1}, \ldots, a_{m}\right\}\) is the set of data augmentations
  • enforce this through regularizer

figure2

Tags:

Categories:

Updated: