5. Graph Recurrent Networks (GRNs)


5-1. GGNN (Gated GNN)

  • use GRU in propagation step

  • unrolls the RNN for a fixed number of \(T\) Steps & BPTT


Basic recurrence :

  • aggregation
    • \[\mathbf{a}_{v}^{t} =\mathbf{A}_{v}^{T}\left[\mathbf{h}_{1}^{t-1} \ldots \mathbf{h}_{N}^{t-1}\right]^{T}+\mathbf{b}\]
  • GRU update

    • \(\begin{aligned}\mathbf{z}_{v}^{t} &=\sigma\left(\mathbf{W}^{z} \mathbf{a}_{v}^{t}+\mathbf{U}^{z} \mathbf{h}_{v}^{t-1}\right) \\ \mathbf{r}_{v}^{t} &=\sigma\left(\mathbf{W}^{r} \mathbf{a}_{v}^{t}+\mathbf{U}^{r} \mathbf{h}_{v}^{t-1}\right) \\ \widetilde{\mathbf{h}}_{v}^{t} &=\tanh \left(\mathbf{W} \mathbf{a}_{v}^{t}+\mathbf{U}\left(\mathbf{r}_{v}^{t} \odot \mathbf{h}_{v}^{t-1}\right)\right) \\ \mathbf{h}_{v}^{t} &=\left(1-\mathbf{z}_{v}^{t}\right) \odot \mathbf{h}_{v}^{t-1}+\mathbf{z}_{v}^{t} \odot \widetilde{\mathbf{h}}_{v}^{t} \end{aligned}\).

    • use information from

      • (1) node’s neighbors
      • (2) previous timestep


GGS-NNs (Gated Graph Sequence NN)

  • uses several GGNNs, to produce an output sequence \(\mathbf{o}^{(1)} \ldots \mathbf{o}^{(K)}\)


example) 2 GGNNs are used

figure2


5-2. Tree LSTM

2 extensions of LSTM

  • (1) Child-Sum Tree-LSTM
  • (2) N-ary Tree-LSTM


Tree-LSTM

  • (like standard LSTM), contains..
    • (1) \(\mathbf{i}_{v}\) : input gate
    • (2) \(\mathbf{o}_{v}\) : output gate
    • (3) \(\mathbf{c}_{v}\) : memory cell
    • (4) \(\mathbf{h}_{v}\) : hidden state
  • forget gate

    • use one (X)

    • use forget gate \(\mathbf{f}_{v k}\) for each child \(k\)

      ( allow node \(v\) to aggregate information from its child )


(1) Child-Sum Tree-LSTM

\(\begin{aligned} \widetilde{\mathbf{h}}_{v}^{t-1} &=\sum_{k \in N_{v}} \mathbf{h}_{k}^{t-1} \\ \mathbf{i}_{v}^{t} &=\sigma\left(\mathbf{W}^{i} \mathbf{x}_{v}^{t}+\mathbf{U}^{i} \widetilde{\mathbf{h}}_{v}^{t-1}+\mathbf{b}^{i}\right) \\ \mathbf{f}_{v k}^{t} &=\sigma\left(\mathbf{W}^{f} \mathbf{x}_{v}^{t}+\mathbf{U}^{f} \mathbf{h}_{k}^{t-1}+\mathbf{b}^{f}\right) \\ \mathbf{o}_{v}^{t} &=\sigma\left(\mathbf{W}^{o} \mathbf{x}_{v}^{t}+\mathbf{U}^{o} \widetilde{\mathbf{h}}_{v}^{t-1}+\mathbf{b}^{o}\right) \\ \mathbf{u}_{v}^{t} &=\tanh \left(\mathbf{W}^{u} \mathbf{x}_{v}^{t}+\mathbf{U}^{u} \widetilde{\mathbf{h}}_{v}^{t-1}+\mathbf{b}^{u}\right) \\ \mathbf{c}_{v}^{t} &=\mathbf{i}_{v}^{t} \odot \mathbf{u}_{v}^{t}+\sum_{k \in N_{v}} \mathbf{f}_{v k}^{t} \odot \mathbf{c}_{k}^{t-1} \\ \mathbf{h}_{v}^{t} &=\mathbf{o}_{v}^{t} \odot \tanh \left(\mathbf{c}_{v}^{t}\right), \end{aligned}\).


(2) N-ary Tree-LSTM

  • if # of childern of each node is at most \(K\)

  • And children can be ordered from \(1\) to \(K\)..

\(\rightarrow\) it is N-ary Tree-LSTM


\(\begin{aligned} \mathbf{i}_{v}^{t} &=\sigma\left(\mathbf{W}^{i} \mathbf{x}_{v}^{t}+\sum_{l=1}^{K} \mathbf{U}_{l}^{i} \mathbf{h}_{v l}^{t-1}+\mathbf{b}^{i}\right) \\ \mathbf{f}_{v k}^{t} &=\sigma\left(\mathbf{W}^{f} \mathbf{x}_{v}^{t}+\sum_{l=1}^{K} \mathbf{U}_{k l}^{f} \mathbf{h}_{v l}^{t-1}+\mathbf{b}^{f}\right) \\ \mathbf{o}_{v}^{t} &=\sigma\left(\mathbf{W}^{o} \mathbf{x}_{v}^{t}+\sum_{l=1}^{K} \mathbf{U}_{l}^{o} \mathbf{h}_{v l}^{t-1}+\mathbf{b}^{o}\right) \\ \mathbf{u}_{v}^{t} &=\tanh \left(\mathbf{W}^{u} \mathbf{x}_{v}^{t}+\sum_{l=1}^{K} \mathbf{U}_{l}^{u} \mathbf{h}_{v l}^{t-1}+\mathbf{b}^{u}\right) \\ \mathbf{c}_{v}^{t} &=\mathbf{i}_{v}^{t} \odot \mathbf{u}_{v}^{t}+\sum_{l=1}^{K} \mathbf{f}_{v l}^{t} \odot \mathbf{c}_{v l}^{t-1} \\ \mathbf{h}_{v}^{t} &=\mathbf{o}_{v}^{t} \odot \tanh \left(\mathbf{c}_{v}^{t}\right) \end{aligned}\).


  • separate parameter matrices for each child \(k\)

    ( more fine-grained reprsentations for each node )


5-3. Graph LSTM

2 types of Tree-LSTM can be adapted to GRAPH!

\(\rightarrow\) Graph structured LSTM


Difference between “graphs” & “trees”

  • edges of graphs have LABELS

\(\begin{aligned} \mathbf{i}_{v}^{t} &=\sigma\left(\mathbf{W}^{i} \mathbf{x}_{v}^{t}+\sum_{k \in N_{v}} \mathbf{U}_{m(v, k)}^{i} \mathbf{h}_{k}^{t-1}+\mathbf{b}^{i}\right) \\ \mathbf{f}_{v k}^{t} &=\sigma\left(\mathbf{W}^{f} \mathbf{x}_{v}^{t}+\mathbf{U}_{m(v, k)}^{f} \mathbf{h}_{k}^{t-1}+\mathbf{b}^{f}\right) \\ \mathbf{o}_{v}^{t} &=\sigma\left(\mathbf{W}^{o} \mathbf{x}_{v}^{t}+\sum_{k \in N_{v}} \mathbf{U}_{m(v, k)}^{o} \mathbf{h}_{k}^{t-1}+\mathbf{b}^{o}\right) \\ \mathbf{u}_{v}^{t} &=\tanh \left(\mathbf{W}^{u} \mathbf{x}_{v}^{t}+\sum_{k \in N_{v}} \mathbf{U}_{m(v, k)}^{u} \mathbf{h}_{k}^{t-1}+\mathbf{b}^{u}\right) \\ \mathbf{c}_{v}^{t} &=\mathbf{i}_{v}^{t} \odot \mathbf{u}_{v}^{t}+\sum_{k \in N_{v}} \mathbf{f}_{v k}^{t} \odot \mathbf{c}_{k}^{t-1} \\ \mathbf{h}_{v}^{t} &=\mathbf{o}_{v}^{t} \odot \tanh \left(\mathbf{c}_{v}^{t}\right), \end{aligned}\).

  • \(m(v,k)\) : edge label between node \(v\) & \(k\)


5-4. Sentence LSTM ( S-LSTM )

purpose : text encoding

  • converts text into graph
  • uses Graph-LSTM to learn representations


Concepts

  • node = words
    • aggregate informations from neighboring words
  • uses super node
    • provide GLOBAL information, to solve long-distnace dependency problem

\(\rightarrow\) each word can obtain both (1) local & (2) global information


Sentence Classification

  • hidden state of supernode can be used
  • outperforms Transformer!


figure2


Categories:

Updated: