Handling Incomplete Heterogeneous Data using VAEs (2020)

Contents

  1. Abstract
  2. Introduction
  3. Problem Statement
  4. Generalizing VAEs for H & I data
    1. Handling Incomplete data
    2. Handling heterogeneous data
    3. Summary
  5. HI-VAE (Heterogeneous-Incomplete VAE)


0. Abstract

VAEs : efficient & accurate for capturing “latent structure”

VAE 문제점 : can not handle data that are…

  • 1) heterogeneous ( continuous + discrete )
  • 2) incomplete ( missing data )


Propose “HI-VAE”

  • suitable for fitting “incomplete & heterogeneous” data


1. Introduction

underlying latent structure

\(\rightarrow\) allow us to better understand the data


Deep generative models

  • highly flexible & expressive unsupervised methods
  • able to capture “latent structure” of “complex high-dim data”

  • ex) VAEs & GANs

\(\rightarrow\) current DGMs focus on “HOMOgeneous data” collections

( Not many attention to “HETEROgeneous data”)


This paper :

  • present a general framework for VAEs that effectively incorporates incomplete data & heterogeneous observations


Features

  • 1) GENERATIVE model that handles both mixed (1) numerical & (2) nominal
  • 2) handles MCAR (Missing Data Completely at Random)
  • 3) data-normalization input/output layer
  • 4) ELBO used to optimized generative/recognition models


2. Problem Statement

HETEROgeneous dataset : \(N\) objects & \(D\) attributes

  • dataset : \(\mathbf{x}_{n}=\left[x_{n 1}, \ldots, x_{n D}\right]\)


[ Data type ]

Numerical variables:

  1. Real-valued data … \(x_{n d} \in \mathbb{R}\).
  2. Positive real-valued data … \(x_{n d} \in \mathbb{R}^{+}\).
  3. (Discrete) count data … \(x_{n d} \in\{1, \ldots, \infty\}\).

Nominal variables:

  1. Categorical data … \(x_{n d} \in\{\) ‘blue’, ‘red’, ‘black’ \(\}\).
  2. Ordinal data … \(x_{n d} \in\{\) ‘never’, ‘sometimes’, ‘often’, ‘usually’, ‘always’ \(\}\).


[ Missingness ]

  • \(\mathcal{O}_{n}\) : observed data index
  • \(\mathcal{M}_{n}\) : missing data index


3. Generalizing VAEs for H & I data

( H data : Heterogeneous data )

( I data : Incomplete data )


(1) Handling Incomplete data

  • generator (decoder) & recognition (encoder)

  • factorization for the decoder : \(p\left(\mathbf{x}_{n}, \mathbf{z}_{n}\right)=p\left(\mathbf{z}_{n}\right) \prod_{d} p\left(x_{n d} \mid \mathbf{z}_{n}\right)\).

    • \(\mathbf{z}_{n} \in \mathbb{R}^{K}\) : latent vector
    • \(p\left(\mathbf{z}_{n}\right)=\mathcal{N}\left(\mathbf{z}_{n} \mid \mathbf{0}, \mathbf{I}_{K}\right)\).
    • makes it easy to marginalize out the missing attributes
  • likelihood : \(p\left(x_{n d} \mid \mathbf{z}_{n}\right)\)

    • parameters : \(\gamma_{n d}=\mathbf{h}_{d}\left(\mathbf{z}_{n}\right)\)

      ( \(\mathbf{h}_{d}\left(\mathbf{z}_{n}\right)\) : DNN that transforms the \(\mathbf{z}_{n}\) into parameters \(\gamma_{n d}\) )

  • factorization of likelihood :

    • missing & observed
    • \(p\left(\mathrm{x}_{n} \mid \mathbf{z}_{n}\right)=\prod_{d \in \mathcal{O}_{n}} p\left(x_{n d} \mid \mathbf{z}_{n}\right) \prod_{d \in \mathcal{M}_{n}} p\left(x_{n d} \mid \mathbf{z}_{n}\right)\).
  • recognition model : \(q\left(\mathbf{z}_{n}, \mathbf{x}_{n}^{m} \mid \mathbf{x}_{n}^{o}\right)\)
  • factorization of recognition model :

    • \(q\left(\mathbf{z}_{n}, \mathbf{x}_{n}^{m} \mid \mathbf{x}_{n}^{o}\right)=q\left(\mathbf{z}_{n} \mid \mathbf{x}_{n}^{o}\right) \prod_{d \in \mathcal{M}_{n}} p\left(x_{n d} \mid \mathbf{z}_{n}\right)\).


Recognition models (=encoder) for incomplete data

  • propose “input drop-out recognition distribution”
  • \(q\left(\mathbf{z}_{n} \mid \mathbf{x}_{n}^{o}\right)=\mathcal{N}\left(\mathbf{z}_{n} \mid \boldsymbol{\mu}_{q}\left(\tilde{\mathbf{x}}_{n}\right), \mathbf{\Sigma}_{q}\left(\tilde{\mathbf{x}}_{n}\right)\right)\).
    • missing dimensions are replaced by zero
    • \(\mu_{q}\left(\tilde{\mathbf{x}}_{n}\right)\) and \(\Sigma_{q}\left(\tilde{\mathbf{x}}_{n}\right)\) are parametrized DNNs with input \(\tilde{\mathbf{x}}_{n}\)
  • VAE for incomplete data
    • \(p\left(\mathrm{x}_{n}^{m} \mid \mathrm{x}_{n}^{o}\right) \approx \int p\left(\mathrm{x}_{n}^{m} \mid \mathbf{z}_{n}\right) q\left(\mathbf{z}_{n} \mid \mathbf{x}_{n}^{o}\right) d \mathbf{z}_{n}\).


(2) Handling heterogeneous data

  • choose appropriate likelihood function, depending on data type
  1. Real-valued : \(p\left(x_{n d} \mid \gamma_{n d}\right)=\mathcal{N}\left(x_{n d} \mid \mu_{d}\left(\mathbf{z}_{n}\right), \sigma_{d}^{2}\left(\mathbf{z}_{n}\right)\right)\)

  2. Positive real-valued : \(p\left(x_{n d} \mid \gamma_{n d}\right)=\log \mathcal{N}\left(x_{n d} \mid \mu_{d}\left(\mathbf{z}_{n}\right), \sigma_{d}^{2}\left(\mathbf{z}_{n}\right)\right)\)

  3. Count : \(p\left(x_{n d} \mid \gamma_{n d}\right)=\operatorname{Poiss}\left(x_{n d} \mid \lambda_{d}\left(\mathbf{z}_{n}\right)\right)=\frac{\left(\lambda_{d}\left(\mathbf{z}_{n}\right)\right)^{x_{n d}} \exp \left(-\lambda_{d}\left(\mathbf{z}_{n}\right)\right)}{x_{n d} !}\)

  4. Categorical : \(p\left(x_{n d}=r \mid \gamma_{n d}\right)=\frac{\exp \left(-h_{d r}\left(\mathbf{z}_{n}\right)\right)}{\sum_{q=1}^{R} \exp \left(-h_{d q}\left(\mathbf{z}_{n}\right)\right)}\).

  5. Ordinal : \(p\left(x_{n d}=r \mid \gamma_{n d}\right)=p\left(x_{n d} \leq r \mid \gamma_{n d}\right)-p\left(x_{n d} \leq r-1 \mid \gamma_{n d}\right)\)

    with \(p\left(x_{n d} \leq r \mid \mathbf{z}_{n}\right)=\frac{1}{1+\exp \left(-\left(\theta_{r}\left(\mathbf{z}_{n}\right)-h_{d}\left(\mathbf{z}_{n}\right)\right)\right)}\)


(3) Summary

figure2


4. HI-VAE (Heterogeneous-Incomplete VAE)

generative model that fully factorizes for every heterogeneous dimension in the data

\(\rightarrow\) loses the properties of AMORTIZED generative models


figure2


(1) propose Gaussian Mixture prior

  • \(p\left(\mathbf{s}_{n}\right) =\text { Categorical }\left(\mathbf{s}_{n} \mid \boldsymbol{\pi}\right)\).
  • \(p\left(\mathbf{z}_{n} \mid \mathbf{s}_{n}\right) =\mathcal{N}\left(\mathbf{z}_{n} \mid \boldsymbol{\mu}_{p}\left(\mathbf{s}_{n}\right), \mathbf{I}_{K}\right)\).
    • \(\mathrm{s}_{n}\) is a one-hot encoding vector representing the component in the mixture
    • assume \(\pi_{\ell}=1 / L\) ( uniform )


(2) propose hierarchical structure

  • allows different attributes to share network parameter

    ( amortize generative model )


(3) introduce an “intermediate homogeneous representation” of the data

  • \(\mathbf{Y}=\left[\mathbf{y}_{n 1}, \ldots, \mathbf{y}_{n D}\right]\).
  • jointly generated by single DNN with input \(\mathbf{z}_{n}\), \(\mathrm{g}\left(\mathbf{z}_{n}\right)\)


(4) likelihood params of each atttribute \(d\) = output of DNN

  • \(p\left(x_{n d} \mid \gamma_{n d}=\mathbf{h}_{d}\left(\mathbf{y}_{n d}, \mathbf{s}_{n}\right)\right)\).


(5) Recognition model : \(q\left(\mathrm{~s}_{n} \mid \mathrm{x}_{n}^{o}\right)\)

  • categorical distn with param vector \(\pi\)

  • output of DNN with input \(\tilde{\mathbf{x}}_{n}\) & softmax

  • factorization

    \(q\left(\mathbf{s}_{n}, \mathbf{z}_{n}, \mathbf{x}_{n}^{m} \mid \mathbf{x}_{n}^{o}\right)=q\left(\mathbf{s}_{n} \mid \mathbf{x}_{n}^{o}\right) q\left(\mathbf{z}_{n} \mid \mathbf{x}_{n}^{o}, \mathbf{s}_{n}\right) \prod_{d \in \mathcal{M}_{n}} p\left(x_{n d} \mid \mathbf{z}_{n}, \mathbf{s}_{n}\right)\).

.

Tags:

Categories:

Updated: