Neural ODE

(1) Introduction

https://arxiv.org/abs/1806.07366

  • 2018 NeurIPS Best paper


Contribution: Discrete layer \(\rightarrow\) Continuous layer


(2) 미분 & 적분

미분 & 적분

  • 미분 = 기울기/변화율/도함수(dx)

  • 적분 = 미분의 역 관계 ( = function approximation )

미분 방정식 (Differential Equation, DE)

  • y를 x에 대해 미분한 “도함수를 포함”하는 방정식

두 종류의 DE

  • (1) ODE (Ordinary DE, 상미분 방정식): univariate
    • \(f^{\prime}(x)-2 x=0 \Leftrightarrow \frac{d f}{d x}-2 x=0 \Leftrightarrow y^{\prime}-2 x=0\).
  • (2) PDE (Paratial DE, 편미분 방정식): multivariate
    • \(\frac{\partial f(x, z)}{\partial x}+\frac{\partial f(x, z)}{\partial z}=0\).


(3) ODE

ODE를 푼다 = Function approximation


방정식 vs ODE

  • 방정식을 푼다 = 해(= )을 찾는다
  • ODE를 푼다 = 해(= 함수)을 찾는다


ODE를 어떻게 풀지? = 적분

  • ex) Euler method

(4) Euler method

How? 무수한 더하기

두 가지가 필요함

  • (1) Initial state
  • (2) ODE


figure2

figure2


(5) Deep Learning

딥러닝의 최적화 = loss를 최소화하는 f(x) 찾기

= Function approximation


Residual connection vs. Neural ODE

  • Residual connection:
    • \(h_{t+1}=h_t+f\left(h_t, \theta\right)\).
  • Neural ODE:
    • \(y_n=y_1+h \cdot \frac{\partial y_1}{\partial x_1}+h \cdot \frac{\partial y_2}{\partial x_2}+\cdots+h \cdot \frac{\partial y_{n-1}}{\partial x_{n-1}}\).


(6) Residual connection vs. Neural ODE

a) Residual connection

\(h_{t+1}=h_t+f\left(h_t, \theta\right)\).

  • \(h_2=h_1+f\left(h_1, \theta\right)\).
  • \(h_3=h_2+f\left(h_2, \theta\right)=h_1+f\left(h_1, \theta\right)+f\left(h_2, \theta\right)\).


결론: \(h_n=h_1+f\left(h_1, \theta\right)+f\left(h_2, \theta\right)+f\left(h_3, \theta\right)+\cdots+f\left(h_{n-1}, \theta\right)\).

( = Euler method in discrete transformation )


b) Neural ODE

\(y_n=y_{n-1}+h \cdot \frac{\partial y_{n-1}}{\partial x_{n-1}}=y_{n-2}+h \cdot \frac{\partial y_{n-2}}{\partial x_{n-2}}+h \cdot \frac{\partial y_{n-1}}{\partial x_{n-1}}\).


결론: \(y_n=y_1+h \cdot \frac{\partial y_1}{\partial x_1}+h \cdot \frac{\partial y_2}{\partial x_2}+\cdots+h \cdot \frac{\partial y_{n-1}}{\partial x_{n-1}}\).

( = Euler method in continuous transformation )


figure2


(7) Neural ODE in SL

figure2


\(z(1)=z(0)+\int_0^1 f(z(t), t ; \theta) d t\).

  • \(z\): ODE의 state (hidden vector)
  • \(\int_0^1 f(z(t), t ; \theta) d t\) : \(z\)의 변화량
    • 푸는 방법: Euler method


Forward & Backward

  • Forward: Euler method
  • Backward: Adjoint Sensitivity method


(8) Adjoint Sensitivity method

Procedure

  • (1) \(a(t)\) : adjoint state 정의

    • 정의: 각 state 별 gradient
    • 수식: \(a(t)=\frac{\partial \text { Loss }}{\partial z(t)}=\text { Gradient }=\text { Adjoint state of } ' t-\text { state }^{\prime}\)
  • (2) \(a(1) \rightarrow a(0)\).

    • \(a(0)\)을 얻기 위해서, 역으로 \(a(1)\)에서 시작하기!

      ( 새로운 ODE를 forward와 같은 방식으로 풀기 )

  • (3) \(a(\cdot)\) 를 통해 optimize 진행


figure2


figure2

  • 참고) \(\frac{d \mathbf{a}(t)}{d t}=-\mathbf{a}(t)^{\top} \frac{\partial f(\mathbf{z}(t), t, \theta)}{\partial \mathbf{z}}\).


figure2


Reference

https://www.youtube.com/watch?v=UegW1cIRee4

https://arxiv.org/abs/1806.07366

Categories: , ,

Updated: