[ 2. Traditional Methods for ML on Graphs ]

( 참고 : CS224W: Machine Learning with Graphs )


Contents

  • 2-1. Traditional Feature-based Methods : Node
  • 2-2. Traditional Feature-based Methods : Link
  • 2-3. Traditional Feature-based Methods : Graph


ML tasks with graph

  • Node-level prediction
  • Link-level prediction
  • Graph-level prediction

figure2


Traditional ML Pipeline

  • design “node features”,”link features”,”graph features”

    ( traditional method : “hand-designed” features )

  • 2 steps

    • Step 1 : train an ML model

    • Step 2 : apply the model

( will focus on undirected graphs for simplicity )


Goal : make “prediction” for node/link/graph

Design choices

  • features : \(d\)-dim vector
  • objects : node / edge / set of nodes / graph
  • objective functions


2-1. Traditional Feature-based Methods : Node

1) Node-level tasks

ex) Node classification

figure2


Goal : characterize “node” in structure

  • 1) node degree
  • 2) node centrality
  • 3) clustering coefficient
  • 4) graphlets


1) Node Degree

  • \(k_v\) : number of edges in node \(v\)
  • simple, but very useful feature!
  • treat all neighbors equally

figure2


2) Node Centrality

Problems of node degree :

  • do not consider nodes’ importance

    \(\rightarrow\) use “Node Centrality”


Node centrality

  • \(c_v\): importance of node \(v\) in a graph
  • example
    • 1) eigenvector centrality
    • 2) betweenness ~
    • 3) closeness ~


(a) Eigenvector centrality

important, if it is surrounded by important nodes

\(c_{v}=\frac{1}{\lambda} \sum_{u \in N(v)} c_{u}\) \(\leftrightarrow\) \(\lambda c = Ac\)

  • \(\lambda\) : positive constant
  • \(A\) : adjacency matrix
  • \(c\) : centrality vector

leading eigenvector \(c_{max}\) is used for centrality


(b) Betweenness centrality

important, if it lies in the shortest path between 2 nodes

\(c_{v}=\sum_{s \neq v \neq t} \frac{\#(\text { shortest paths betwen } s \text { and } t \text { that contain } v)}{\#(\text { shortest paths between } s \text { and } t)}\).

figure2


(c) Closeness centrality

important, if it has small shortest path lengths to all other nodes

\(c_{v}=\frac{1}{\sum_{u \neq v} \text { shortest path length between } u \text { and } v}\).

figure2


3) Clustering Coefficients

measure how connected node \(v\)’s NEIGHBORS are!

\(e_{v}=\frac{\#(\text { edges among neighboring nodes })}{\left(\begin{array}{c} k_{v} \\ 2 \end{array}\right)} \in[0,1]\).

  • \(\left(\begin{array}{c} k_{v} \\ 2 \end{array}\right)\) : # of node pairs, among \(k_v\) neighboring nodes

figure2


4) Graphlets

clustering coefficients :

  • count the # of triangles in the ego-network


\(\rightarrow\) generalize, by counting graphlets

( graphlets : rooted connected non-isomorphic subgraphs )


figure2


Definition

  • GDV (Graphlet Degree Vector) :
    • graphlet-base features for nodes
    • # of graphlets that node touches
  • Degree :
    • # of edges that node touches
  • Clustering Coefficients
    • # of triangles that node touches


figure2


Summary

  • importance-based features :
    • 1) node degree
    • 2) node centrality measures
      • eigenvector / betweenness / closeness centrality
  • structure-based features :
    • 1) node degree
    • 2) clustering coefficients
      • how connected neighbors are to each other
    • 3) graphlet degree vector
      • occurences of different graphlets


2-2. Traditional Feature-based Methods : Link

predict “new links”

( whether 2 nodes are connected! )

figure2


Key point : how to design features for PAIR OF NODES


  • 1) Links Missing At Random

    • step 1) remove random set of links
    • step 2) predict them
  • 2) Links over time

    • network that changes over time!

    • input : \(G[t_0, t_0']\)

      ( = graphs, up to time \(t_o'\) )

    • output : ranked list \(L\) of links

      ( that are not in \(G[t_0, t_0']\), but is predicted to appear in \(G[t_1, t_1']\) )


notation

  • pair of nodes : \((x,y)\)
  • score of \((x,y)\) : \(c(x,y)\)
    • ex) # of common neighbors of \(x\) and \(y\)


Step 1) Sort “pair of nodes”, by their “score” ( decreasing order )

Step 2) Predict top \(n\) pairs, as new links


  • distance-based feature
  • local neighborhood overlap
  • global neighborhood overlap


(a) distance based feature

  • shortest-path distance between 2 nodes
  • problem
    • does not capture “degree of neighborhood of overlap

figure2


(b) local neighborhood overlap

  • # of neighboring nodes, shared between 2 nodes
  • example
    • 1) common neighbors : \(\mid N\left(v_{1}\right) \cap N\left(v_{2}\right) \mid\)
    • 2) Jaccard’s coefficient : \(\frac{ \mid N\left(v_{1}\right) \cap N\left(v_{2}\right) \mid }{ \mid N\left(v_{1}\right) \cup N\left(v_{2}\right) \mid }\)
    • 3) Adamic-Adar index : \(\sum_{u \in N\left(v_{1}\right) \cap N\left(v_{2}\right)} \frac{1}{\log \left(k_{u}\right)}\)
      • \(k_u\) : degree of node \(u\)
      • consider neighbor’s importance by its degree


Limitation :

  • zero, when 2 nodes don’t share any nodes

    ( but may have potential to be connected! )

  • thus, use GLOBAL neighborhood overlap

    ( by considering “ENTIRE” graph )


figure2


(c) global neighborhood overlap

(intro) Katz index

  • # of paths of all lengths between 2 nodes
  • how? by calculating “Powers of graph Adjacency Matrix”
  • Theory
    • \(\boldsymbol{P}_{u v}^{(K)}\) = # of paths of length \(\boldsymbol{K}\) between \(\boldsymbol{u}\) and \(\boldsymbol{v}\)
    • \(P^{(K)}=A^{k}\).
  • ex) \(A^2_{uv}\) : # of paths of length 2, between \(u\) & \(v\)


figure2


Katz index

  • sum over all path lengths

  • \(S_{v_{1} v_{2}}=\sum_{l=1}^{\infty} \beta^{l} \boldsymbol{A}_{v_{1} v_{2}}^{l}\).
    • \(\beta\) : discount factor
    • \(\boldsymbol{A}_{v_{1} v_{2}}^{l}\) : # paths of length \(l\), between \(v_1\) & \(v_2\)
  • closed form solution :
    • \(\boldsymbol{S}=\sum_{i=1}^{\infty} \beta^{i} \boldsymbol{A}^{i}=\underbrace{(\boldsymbol{I}-\beta \boldsymbol{A})^{-1}}_{=\sum_{i=0}^{\infty} \beta^{i} A^{i}}-\boldsymbol{I}\).


2-3. Traditional Feature-based Methods : Graph

Goal : want feature that considers structure of ENTIRE GRAPH

\(\rightarrow\) use Kernel Methods


1) Kernel Methods

Key Idea :

  • design “kernels”, instead of “feature vectors”


Kernel & Kernel matrix

  • kernel \(K(G,G')\) : similarity between data
    • \[K(G,G') = \phi(G)^T\phi(G')\]
  • kernel matrix \(K=(K(G,G'))_{G,G'}\) : positive semi-definite


2) Graph kernels

measures similarity between 2 graphs

  • 1) Graphlet Kernel
  • 2) Weisfeiler-Lehman Kernel


Goal : design graph feature vector \(\phi(G)\)

Key Idea : BoW( + \(\alpha\) ) for a graph


(1) simple version : Bag-of-Words

  • just treat node as words

  • example )

    figure2


(2) Bag-of-“Node degrees”

  • example )

    figure2


Graphlet Kernel & Weisfeiler-Lehman Kernel

\(\rightarrow\) both uses “Bag-of-xx” concept!


3) Graphlet Kernel

  • # of different graphlets in a graph
  • Node-level vs Graph-level
    • (1) do not need to be connected
    • (2) are not rooted
    • example)


Notation

  • \(G\) : graph
  • \(G_k = (g_1,..,g_{n_k})\) : graphlet list
  • \(f_g \in R^{n_k}\) : graphlet count vector
    • \(\left(\boldsymbol{f}_{G}\right)_{i}=\#\left(g_{i} \subseteq G\right) \text { for } i=1,2, \ldots, n_{k}\).
  • \(K(G,G') = f_G^T f_{G'}\) : graphlet kernel
  • figure2


Problem

  • if size of \(G\) & \(G'\) is different…. skew the value


Solution

  • normalize each feature vector
  • \(K\left(G, G^{\prime}\right)=\boldsymbol{h}_{G}{ }^{\mathrm{T}} \boldsymbol{h}_{G^{\prime}}\).
    • where \(\boldsymbol{h}_{G}=\frac{\boldsymbol{f}_{G}}{\operatorname{Sum}\left(\boldsymbol{f}_{G}\right)}\).


Limitation

  • too expensive to count graphlets!

  • how to design more efficient kernel?

    \(\rightarrow\) Weisfeiler-Lehman Kernel


4) Weisfeiler-Lehman Kernel

Goal :

  • design an “EFFICIENT” graph feature descriptor \(\phi(G)\)


Key Idea

  • use neighborhood structure to iteratively enrich node vocabulary
  • ( = generalized version of Bag of “node-degrees” )


Algorithm : COLOR REFINEMENT


Color Refinement

  • (notation) graph \(G\) with nodes \(V\)

  • process

    • step 1) assign initial color \(c^{(0)}(v)\) to each node \(v\)

    • step 2) iteratively refine node colors…..

      • \(C^{(k+1)}(v)=\operatorname{HASH}\left(\left\{C^{(k)}(v),\left\{C^{(k)}(u)\right\}_{u \in N(v)}\right\}\right)\).

      • HASH : mapping function

        ( different input to different color )

    • ….

    • after \(K\) steps of color refinement,

      \(c^{(K)}(v)\) : structure of \(K\)-hop neighborhood

figure2

figure2

figure2


figure2


figure2


5) Summary of graph kernels

Graphlet Kernel

  • summary
    • graph = “bag of GRAPHLETS”
  • disadvantage
    • computationally expensive


Weisfeiler-Lehman Kernel

  • summary
    • apply \(K\)-step color refinement
    • graph = “bag-of-COLORS”
    • closely related to GNN
  • advantages
    • computationally efficient

    • only colors need to be tracked

      \(\rightarrow\) # of colors : at most # of nodes

    • time complexity : linear in # of edges


Summary

Traditional ML

  • hand-made feature + ML model


Hand-made features :

  • 1) node-level
    • node degree
    • node centrality
    • clustering coefficients
    • graphlets
  • 2) link-level
    • distance-based feature
    • local & global neighborhood overlap
  • 3) graph-level
    • graphlet kernel
    • WL kernel

Tags: ,

Categories:

Updated: