Accurate Uncertainties for Deep Learning Using Calibrated Regression
Contents
- Abstract
- Introduction
- Calibrated Classification
- Calibration
- Training Calibrated Classifiers
- Calibrated Regression
- Calibration
- Training Calibrated Regression Models
- Recalibrating Bayesian models
- Features for Recalibration
- Diagnostic Tools
0. Abstract
Goal : Uncertainty 설명하기!
- ex) Bayesian method
- 하지만, Bayesian method의 문제점 : approximate inference를 하기 때문에, estimate가 inaccurate
이 Paper는 그 어떠한 regression algorithm을 simple하게 calibrate하는 방법을 제안한다.
1. Introduction
이 논문은 uncertainty estimation over CONTINOUS variable에 대해 제안
기존의 Bayesian 방법의 문제점 : (아래 그림 참조)
- (상) [ 베이지안 ] interval fails to capture true distribution
- (하) [ 제안된 방법 ] recalibrated method
- 90% C.I가 정확히 9/10의 point를 cover한다.
한 줄 요약 :
propose a new procedure for RECALIBRATING any REGRESSION algorithm, that is inspired by Platt Scaling for classification
Contributions
-
1) simple technique for RE-calibrating the output of REGRESSION output
( classification을 위한 Platt scaling의 확장판 )
-
2) 이 technique을 BNN의 문제점을 푸는데에 사용함
( BNN = Bayesian Neural Network )
2. Calibrated Classification
Notation
-
labeled dataset \(x_{t}, y_{t} \in \mathcal{X} \times \mathcal{Y}\) for \(t=1,2, \ldots, T\)
-
\(X, Y \sim \mathbb{P}\), where \(\mathbb{P}\) is data distribution
-
forecaster \(H: \mathcal{X} \rightarrow(\mathcal{Y} \rightarrow[0,1])\)
-
outputs a probability distribution \(F_{t}(y)\) targeting the label \(y_{t}\)
( \(Y\) 가 continuous하면 \(F_{t}\) 는 CDF )
-
2-1) Calibration
ex) binary classification
- \(\mathcal{Y}=\{0,1\}\)인 경우
- \(H\) is calibrated \(\leftrightarrow\) \(\frac{\sum_{t=1}^{T} y_{t} \mathbb{I}\left\{H\left(x_{t}\right)=p\right\}}{\sum_{t=1}^{T} \mathbb{I}\left\{H\left(x_{t}\right)=p\right\}} \rightarrow p \text { for all } p \in[0,1]\).
Calibration을 위한 충분 조건
- \(\mathbb{P}(Y=1 \mid H(X)=p)=p \text { for all } p \in[0,1]\).
Calibration & Sharpness
- 둘 다 중요하다!
- sharp하다 = probabilities should be close to 0 or 1
2-2) Training Calibrated Classifiers
(1) Estimating a probability distribution
Calibrated Classifier : \(R \circ H\)
- \(R(p)=\mathbb{P}(Y=1 \mid H(X)=p)\) 잘 만들기
ex) Platt scaling
- approximate \(R(p)=\mathbb{P}(Y=1 \mid H(X)=p)\) with sigmoid
(2) Projections & Features
Base classifier \(H\) : \(H: \mathcal{X} \rightarrow \Phi\)
- output features \(\phi \in \Phi \subseteq \mathbb{R}^{d}\) that do not correspond to probabilities
- \(R: \Phi \rightarrow[0,1]\) 를 \(\mathbb{P}(Y=1 \mid H(X)=\phi)\)에 적용
(3) Diagnostic Tools
- calibration curve 사용하기 ( 아래 그림 참조 )
- group \(p_t\) into intervals \(I_j\) ( for \(j=1,...,m\) ) , which are partitions of [0,1]
- calibration curve plots the predictive average \(p_j\) in each interval \(I_j\)
- \(p_{j}=T_{j}^{-1} \sum_{t: p_{t} \in I_{j}} p_{t}\).
3. Calibrated Regression
Regression에서는,
- forecaster \(H\) outputs at each step \(t\) a CDF \(F_t\), targeting \(y_t\)
- quantile function : \(F_{t}^{-1}(p)=\inf \left\{y: p \leq F_{t}(y)\right\}\)
- \(F_{t}^{-1}:[0,1] \rightarrow \mathcal{Y}\) .
3-1) Calibration
regression에서 calibration의 의미
- 90%의 횟수로, \(y_t\)는90% C.I에 위치해야
- \(\frac{\sum_{t=1}^{T} \mathbb{I}\left\{y_{t} \leq F_{t}^{-1}(p)\right\}}{T} \rightarrow p \text { for all } p \in[0,1]\).
충분조건 : \(\mathbb{P}\left(Y \leq F_{X}^{-1}(p)\right)=p \text { for all } p \in[0,1]\)
- forecaster : \(F_{X}=H(X)\)
다른 표현으로 나타내면, 아래와 같다.
- \(\frac{\sum_{t=1}^{T} \mathbb{I}\left\{F_{t}^{-1}\left(p_{1}\right) \leq y_{t} \leq F_{t}^{-1}\left(p_{2}\right)\right\}}{T} \rightarrow p_{2}-p_{1}\)>
3-2) Training Calibrated Regression Models
simple re-calibrated scheme을 제안한다!
- pre-trained forecaster : \(H\)
- auxiliary model : \(R\) : \([0,1] \rightarrow [0,1]\)
- CALIBRATED model : \(R \circ F_{t}\)
Algorithm
Estimating a probability distribution
-
perfectly calibrated forecaster :
\(R \circ F_{t}\), where \(R(p):=\mathbb{P}(Y \leq \left.F_{X}^{-1}(p)\right)\).
-
위의 cdf를 estimate하도록 formulate하기
Example
\(p=95 \%\), but only \(80 / 100\) observed \(y_{t}\) fall below the \(95 \%\) quantile of \(F_{t}\)
\(\rightarrow\) Adjust the \(95 \%\) quantile to \(80 \%\)
- learn \(\mathbb{P}\left(Y \leq F_{X}^{-1}(p)\right)\) by fitting any regression algorithm ( isotonic regression 추천 )
- 학습에 사용할 data : \(\left\{F_{t}\left(y_{t}\right), \hat{P}\left(F_{t}\left(y_{t}\right)\right)\right\}_{t=1}^{T}\)
-
$$\hat{P}(p)=\frac{\left \left{y_{t} \mid F_{t}\left(y_{t}\right) \leq p, t=1, \ldots, T\right}\right }{T}$$.
3-3) Recalibrating Bayesian models
Probabilistic forecasts \(F_{t}\) : BNN, GP등을 통해 찾음
-
model : \(\mathcal{N}\left(\mu\left(x_{t}\right), \sigma^{2}\left(x_{t}\right)\right)\).
-
ex) MCDO
그러나, true data distribution \(\mathbb{P}(Y \mid X)\) 이 Gaussian이 아닐 경우…
\(\rightarrow\) uncertainty estimates는 well calibrated되지 않을 것
3-4) Features for Recalibration
[Algorithm 1]을 사용하여 recalibration을 진행할 수 있음.
이를 아무런 increasing function \(F(y): \mathcal{Y} \rightarrow \Phi\) where \(\Phi \subseteq \mathbb{R}\) defines a “feature” that correlates with the confidence of the classifier로 generalize 할 수 있음.
ex) distance from mean prediction ( \(\phi \in \Phi\) )
-
cdf : \(\mathbb{P}\left(Y \leq F_{X}^{-1}(\phi)\right)\)
-
\([H(x)](y)=F_{x}(y)=y-\mu(x)\).
ex 2) uncertainty까지 고려하여..
- \(F_{x}(y)=(y-\mu(x)) / \sigma(x)\).
3-5) Diagnostic Tools
(a) Calibration
\(\operatorname{cal}\left(F_{1}, y_{1}, \ldots, F_{T}, y_{T}\right)=\sum_{j=1}^{m} w_{j} \cdot\left(p_{j}-\hat{p}_{j}\right)^{2}\).
-
$$\hat{p}_{j}=\frac{\left \left{y_{t} \mid F_{t}\left(y_{t}\right) \leq p_{j}, t=1, \ldots, T\right}\right }{T}$$.
(b) Sharpness
- \(\operatorname{sha}\left(F_{1}, \ldots, F_{T}\right)=\frac{1}{T} \sum_{t=1}^{T} \operatorname{var}\left(F_{t}\right)\).