[ 6. Graph Neural Networks ]Permalink

( 참고 : CS224W: Machine Learning with Graphs )


ContentsPermalink

  • 6-1. Introduction
  • 6-2. DL for graphs
  • 6-3. GCN (Graph Convolution Networks)
  • 6-4. GNN vs CNN
  • 6-5. GNN vs Transformer

6-1. IntroductionPermalink

learn mapping function via NN!

  • similar nodes, close in embedding space

figure2


Shallow Encoders

  • just “embedding-lookup table”
  • limitations :
    • 1) O(V) parameters needed
    • 2) “transductive”
      • no embeddings for unseen nodes
    • 3) do not use “node features”

figure2


Networks are complex!

  • arbitrary size & complex topological structure
  • no fixed node ordering / reference point
  • dynamic & multimodal features


6-2. DL for graphsPermalink

Notation

  • V : vertex set
  • A : adjacency matrix
  • \boldsymbol{X} \in \mathbb{R}^{m \times \mid V \mid} : matrix of “node features”
  • N(v) : neighbors of node v


Naive approach : use concat ( [A , X] ) as input

CNN : problems?

  • no fixed notion of locality or sliding window on the graph

    ( there can be many order plans :( )


(1) Permutation invariant functionPermalink

  • input graph : G=(\boldsymbol{A}, \boldsymbol{X})

  • map G into vector \mathbb{R}^{d}

  • f is “Permutation invariant function”,

    if f\left(\boldsymbol{A}_{i}, \boldsymbol{X}_{i}\right)=f\left(\boldsymbol{A}_{j}, \boldsymbol{X}_{j}\right) for any order plan i &j

figure2


(2) Permutation EquivariancePermalink

for 2 order plans..

the vector of “SAME POSITION” should have “SAME EMBEDDING”

figure2


(3) GNNPermalink

GNN consists of multiple…

  • 1) permutation “invariant” layers
  • 2) permutation “equivariant” layers

( but naive MLP’s do not meet those conditions! )


So, how to design “invariant & equivariant” layers?


6-3. GCN (Graph Convolution Networks)Permalink

(1) Basic IdeaPermalink

  • Node’s neighborhood defines a computation graph
  • so, how to propagate neighbor’s information?


Let’s generate node embeddings,

  • based on “LOCAL NETWORK NEIGHBORHOOD”
  • by using NN

figure2

figure2


(2) Depth of NNPermalink

meaning of “Depth” ( of layers )

  • layer 0 : target node itself
  • layer 1 : 1-hop neighbors
  • layer K : K-hops neighbors

figure2


(3) Neighborhood aggregationPermalink

Procedure

  • step 1) average messages of neighbors

  • step 2) apply NN

figure2


figure2


Both steps are “permutation equivariant”

figure2


(4) Training GCNPermalink

2 big steps

  • 1) defining “neighborhood aggregation” function
  • 2) defining “loss function”


figure2


Notation

  • h_v^k : hidden representation of v at layer k
    • W_k : parameter for “neighborhood aggregation”
    • B_k : parameter for “transforming itself”

figure2


(5) Matrix FormulationPermalink

We can solve the above, by “matrix formulation”

  • (vector) \sum_{u \in N(v)} \frac{h_{u}^{(k-1)}}{\mid N(v) \mid}.

  • (matrix) H^{(k+1)}=D^{-1} A H^{(k)}.

    • H^{(k)}=\left[h_{1}^{(k)} \ldots h_{ \mid V \mid }^{(k)}\right]^{\mathrm{T}}.

      ( thus, \sum_{u \in N_{v}} h_{u}^{(k)}=\mathrm{A}_{v,:} \mathrm{H}^{(k)} )

    • D_{v, v}=\operatorname{Deg}(v)= \mid N(v) \mid.


Updating FunctionPermalink

H^{(k+1)}=\sigma\left(\tilde{A} H^{(k)} W_{k}^{\mathrm{T}}+H^{(k)} B_{k}^{\mathrm{T}}\right).

  • where \tilde{A}=D^{-1} A
  • term 1) \tilde{A} H^{(k)} W_{k}^{\mathrm{T}} :neighborhood aggregation
  • term 2) H^{(k)} B_{k}^{\mathrm{T}} : self-transformation


Unsupervised TrainingPermalink

Loss Function :

\mathcal{L}=\sum_{z_{u}, z_{v}} \operatorname{CE}\left(y_{u, v}, \operatorname{DEC}\left(z_{u}, z_{v}\right)\right).

  • y_{u,v}=1, if u and v is similar
  • example of DEC : inner product


Supervised TrainingPermalink

just train as simple classification task!

( with CE loss )

figure2


General Training procedurePermalink

dataset : batch of sub-graphs

procedure

  • 1) train on a set of nodes
  • 2) generate embedding “for all nodes”

Note

  • “same” aggregation parameters for “all nodes”

    \rightarrow can generalize to UNSEEN nodes

figure2


6-4. GNN vs CNNPermalink

GNN :

  • \mathrm{h}_{v}^{(l+1)}=\sigma\left(\mathbf{W}_{l} \sum_{u \in \mathrm{N}(v)} \frac{\mathrm{h}_{u}^{(l)}}{ \mid \mathrm{N}(v) \mid }+\mathrm{B}_{l} \mathrm{~h}_{v}^{(l)}\right), \forall l \in\{0, \ldots, L-1\}.

CNN :

  • \mathrm{h}_{v}^{(l+1)}=\sigma\left(\sum_{u \in \mathrm{N}(v)} \mathbf{W}_{l}^{u} \mathrm{~h}_{u}^{(l)}+\mathrm{B}_{l} \mathrm{~h}_{v}^{(l)}\right), \forall l \in\{0, \ldots, L-1\}.


Intuition

  • learn “DIFFERENT” W_l for “DIFFERENT” neighbors in CNN,

    but not in GNN

    ( \because can pick an “ORDER” of neighbors, using “RELATIVE position”)

  • can think of CNN as an “special GNN, with fixed neighbor size & ordering”


figure2


6-5. GNN vs TransformerPermalink

Transformer :

  • can be seen as a special GNN that “runs on a fully connected word graph”

figure2

Tags: ,

Categories:

Updated: