[Paper Review] 05.Reliable Fidelity and Diversity Metrics for Generative Models

Abstract
Introduction
Background
1. Evaluation pipeline
2. Fidelity and Diversity
Density and Coverage
1. Problems with improved precision & recall
2. Density & Coverage

0. Abstract

problem of FID score

does not differentiate fidelity and diversity aspects of generated images
recent papers have introduced variants…but still not relabile

$\rightarrow$ propose DENSITY and COVERAGE metrics!

1. Introduction

necessary conditions for useful evaluation metrics :

(1) ability to detect identical real and fake distributions
(2) robustness to outlier samples
(3) responsiveness to mode dropping
(4) the ease of hyperparameter selection in the evaluation algorithms

propose density and coverage metrics

not only make the fidelity-diversity metrics empirically reliable
but also theoretically analysable

study the embedding algorithms for evaluating image generation algorithms

( embedding pipeline has been relatively less studied )
( mostly rely on the features from an ImageNet pretrained model )

$\rightarrow$ inevitably include the dataset bias
SOLUTION : To exclude the dataset bias, use randomly initialised CNN feature extractors
+ random embeddings : more sensible evaluation results, especially when the target data distribution is significantly different from ImageNet statistics

2. Background

real distribution : $P(X)$
generative model : $Q(Y)$

$\rightarrow$ assume that we can sample $\left\{X_{i}\right\}$ and $\left\{Y_{j}\right\}$

Statistical testing methods ( or distributional distance measures )

ex) KL-divergence, Expected Likelihood

when $P(X)$ are complex & high-dim $\rightarrow$ difficult to apply such measures naively

(1) Evaluation pipeline

step 1) embed real & fake sample $\left\{X_{i}\right\}$ and $\left\{Y_{j}\right\}$ into Euclidean space $\mathbb{R}^{D}$

using non-linear mapping $f$ ( ex. CNN feature extractor )

step 2) construct real & fake distn over $\mathbb{R}^{D}$ with the embedded samples $\left\{f\left(X_{i}\right)\right\}$ , $\left\{f\left(Y_{j}\right)\right\}$

step 3) quantify discrepancy between the two distributions

[step 1] embedding

difficult to define a sensible metric

ex) $\ell_{2}$ distance over the image pixels $\mid \mid X_{i}-Y_{j} \mid \mid _{2}$

( often misleading )
ex) $\ell_{2}$ distance in the feature space $\mid \mid f\left(X_{i}\right)-f\left(Y_{j}\right) \mid \mid _{2}$

for embedding… adopt ImageNet pre-trained CNNs

when using data distn DISTINCT from Image Net distribution…

suggest using randomly-initialised CNN feature extractors

[step 2] Building & Comparing distributions

Given embedded samples $\left\{X_{i}\right\}$ and $\left\{Y_{j}\right\}$
(non-)parametric statistical approximation
- Parzen window estimates
  - approximate the likelihoods of the fake samples $\left\{Y_{j}\right\}$ by estimating the density $P(X)$ with Gaussian kernels around the real samples $\left\{X_{i}\right\} .$
- Inception Score
  - estimate the multinomial distribution $P\left(T \mid Y_{j}\right)$ over 1000 ImageNet classes
  - compares it against the estimated marginalized distn $P(T)$, using KL-divergence
- FID
  - distance between 2 Gaussians

(2) Fidelity and Diversity

Trade-off between Fidelity and Diversity

Fidelity

how realistic each input is

Diversity

how well fake samples capture the variations in real samples

2. Density and Coverage

Introduce variants of two-value metrics, Density & Coverage

(1) Problems with improved precision & recall

( 참고 : https://seunghan96.github.io/gan/(gan4)Improved-Precision-and-Recall-Metric-for-Assessing-Generative-Models/ )

use KNN
vulnerability to outliers & computational inefficiency
generally overestimate the true manifold around outliers

(2) Density & Coverage

출처 : (연세대학교 어영정 교수님) 생성적적대적신경망 강의 자료

Twitter Facebook LinkedIn

[Paper Review] 05.(evaluation)Reliable Fidelity and Diversity Metrics for Generative Models

Seunghan Lee