[ 2. Traditional Methods for ML on Graphs ]

( 참고 : CS224W: Machine Learning with Graphs )

2-1. Traditional Feature-based Methods : Node
2-2. Traditional Feature-based Methods : Link
2-3. Traditional Feature-based Methods : Graph

ML tasks with graph

Node-level prediction
Link-level prediction
Graph-level prediction

Traditional ML Pipeline

design “node features”,”link features”,”graph features”

( traditional method : “hand-designed” features )
2 steps
- Step 1 : train an ML model
- Step 2 : apply the model

( will focus on undirected graphs for simplicity )

Goal : make “prediction” for node/link/graph

Design choices

features : \(d\)-dim vector
objects : node / edge / set of nodes / graph
objective functions

2-1. Traditional Feature-based Methods : Node

1) Node-level tasks

ex) Node classification

Goal : characterize “node” in structure

1) node degree
2) node centrality
3) clustering coefficient
4) graphlets

1) Node Degree

\(k_v\) : number of edges in node \(v\)
simple, but very useful feature!
treat all neighbors equally

2) Node Centrality

Problems of node degree :

do not consider nodes’ importance

\(\rightarrow\) use “Node Centrality”

Node centrality

\(c_v\): importance of node \(v\) in a graph
example
- 1) eigenvector centrality
- 2) betweenness ~
- 3) closeness ~

(a) Eigenvector centrality

important, if it is surrounded by important nodes

\(c_{v}=\frac{1}{\lambda} \sum_{u \in N(v)} c_{u}\) \(\leftrightarrow\) \(\lambda c = Ac\)

\(\lambda\) : positive constant
\(A\) : adjacency matrix
\(c\) : centrality vector

leading eigenvector \(c_{max}\) is used for centrality

(b) Betweenness centrality

important, if it lies in the shortest path between 2 nodes

\(c_{v}=\sum_{s \neq v \neq t} \frac{\#(\text { shortest paths betwen } s \text { and } t \text { that contain } v)}{\#(\text { shortest paths between } s \text { and } t)}\).

(c) Closeness centrality

important, if it has small shortest path lengths to all other nodes

\(c_{v}=\frac{1}{\sum_{u \neq v} \text { shortest path length between } u \text { and } v}\).

3) Clustering Coefficients

measure how connected node \(v\)’s NEIGHBORS are!

\(e_{v}=\frac{\#(\text { edges among neighboring nodes })}{\left(\begin{array}{c} k_{v} \\ 2 \end{array}\right)} \in[0,1]\).

\(\left(\begin{array}{c} k_{v} \\ 2 \end{array}\right)\) : # of node pairs, among \(k_v\) neighboring nodes

4) Graphlets

clustering coefficients :

count the # of triangles in the ego-network

\(\rightarrow\) generalize, by counting graphlets

( graphlets : rooted connected non-isomorphic subgraphs )

Definition

GDV (Graphlet Degree Vector) :
- graphlet-base features for nodes
- # of graphlets that node touches
Degree :
- # of edges that node touches
Clustering Coefficients
- # of triangles that node touches

Summary

importance-based features :
- 1) node degree
- 2) node centrality measures
  - eigenvector / betweenness / closeness centrality
structure-based features :
- 1) node degree
- 2) clustering coefficients
  - how connected neighbors are to each other
- 3) graphlet degree vector
  - occurences of different graphlets

2-2. Traditional Feature-based Methods : Link

predict “new links”

( whether 2 nodes are connected! )

Key point : how to design features for PAIR OF NODES

1) Two formulations of link prediction

1) Links Missing At Random
- step 1) remove random set of links
- step 2) predict them
2) Links over time
- network that changes over time!
- input : \(G[t_0, t_0']\)
  
  ( = graphs, up to time \(t_o'\) )
- output : ranked list \(L\) of links
  
  ( that are not in \(G[t_0, t_0']\), but is predicted to appear in \(G[t_1, t_1']\) )

2) Link prediction, via Proximity

notation

pair of nodes : \((x,y)\)
score of \((x,y)\) : \(c(x,y)\)
- ex) # of common neighbors of \(x\) and \(y\)

Step 1) Sort “pair of nodes”, by their “score” ( decreasing order )

Step 2) Predict top \(n\) pairs, as new links

3) Link feature

distance-based feature
local neighborhood overlap
global neighborhood overlap

(a) distance based feature

shortest-path distance between 2 nodes
problem
- does not capture “degree of neighborhood of overlap”

(b) local neighborhood overlap

# of neighboring nodes, shared between 2 nodes
example
- 1) common neighbors : \(\mid N\left(v_{1}\right) \cap N\left(v_{2}\right) \mid\)
- 2) Jaccard’s coefficient : \(\frac{ \mid N\left(v_{1}\right) \cap N\left(v_{2}\right) \mid }{ \mid N\left(v_{1}\right) \cup N\left(v_{2}\right) \mid }\)
- 3) Adamic-Adar index : \(\sum_{u \in N\left(v_{1}\right) \cap N\left(v_{2}\right)} \frac{1}{\log \left(k_{u}\right)}\)
  - \(k_u\) : degree of node \(u\)
  - consider neighbor’s importance by its degree

Limitation :

zero, when 2 nodes don’t share any nodes

( but may have potential to be connected! )
thus, use GLOBAL neighborhood overlap

( by considering “ENTIRE” graph )

(c) global neighborhood overlap

(intro) Katz index

# of paths of all lengths between 2 nodes
how? by calculating “Powers of graph Adjacency Matrix”
Theory
- \(\boldsymbol{P}_{u v}^{(K)}\) = # of paths of length \(\boldsymbol{K}\) between \(\boldsymbol{u}\) and \(\boldsymbol{v}\)
- \(P^{(K)}=A^{k}\).
ex) \(A^2_{uv}\) : # of paths of length 2, between \(u\) & \(v\)

Katz index

sum over all path lengths
\(S_{v_{1} v_{2}}=\sum_{l=1}^{\infty} \beta^{l} \boldsymbol{A}_{v_{1} v_{2}}^{l}\).
- \(\beta\) : discount factor
- \(\boldsymbol{A}_{v_{1} v_{2}}^{l}\) : # paths of length \(l\), between \(v_1\) & \(v_2\)
closed form solution :
- \(\boldsymbol{S}=\sum_{i=1}^{\infty} \beta^{i} \boldsymbol{A}^{i}=\underbrace{(\boldsymbol{I}-\beta \boldsymbol{A})^{-1}}_{=\sum_{i=0}^{\infty} \beta^{i} A^{i}}-\boldsymbol{I}\).

2-3. Traditional Feature-based Methods : Graph

Goal : want feature that considers structure of ENTIRE GRAPH

\(\rightarrow\) use Kernel Methods

1) Kernel Methods

Key Idea :

design “kernels”, instead of “feature vectors”

Kernel & Kernel matrix

kernel \(K(G,G')\) : similarity between data
- \[K(G,G') = \phi(G)^T\phi(G')\]
kernel matrix \(K=(K(G,G'))_{G,G'}\) : positive semi-definite

2) Graph kernels

measures similarity between 2 graphs

1) Graphlet Kernel
2) Weisfeiler-Lehman Kernel

Goal : design graph feature vector \(\phi(G)\)

Key Idea : BoW( + \(\alpha\) ) for a graph

(1) simple version : Bag-of-Words

just treat node as words
example )

(2) Bag-of-“Node degrees”

example )

Graphlet Kernel & Weisfeiler-Lehman Kernel

\(\rightarrow\) both uses “Bag-of-xx” concept!

3) Graphlet Kernel

# of different graphlets in a graph
Node-level vs Graph-level
- (1) do not need to be connected
- (2) are not rooted
- example)

Notation

\(G\) : graph
\(G_k = (g_1,..,g_{n_k})\) : graphlet list
\(f_g \in R^{n_k}\) : graphlet count vector
- \(\left(\boldsymbol{f}_{G}\right)_{i}=\#\left(g_{i} \subseteq G\right) \text { for } i=1,2, \ldots, n_{k}\).
\(K(G,G') = f_G^T f_{G'}\) : graphlet kernel

Problem

if size of \(G\) & \(G'\) is different…. skew the value

Solution

normalize each feature vector
\(K\left(G, G^{\prime}\right)=\boldsymbol{h}_{G}{ }^{\mathrm{T}} \boldsymbol{h}_{G^{\prime}}\).
- where \(\boldsymbol{h}_{G}=\frac{\boldsymbol{f}_{G}}{\operatorname{Sum}\left(\boldsymbol{f}_{G}\right)}\).

Limitation

too expensive to count graphlets!
how to design more efficient kernel?

\(\rightarrow\) Weisfeiler-Lehman Kernel

4) Weisfeiler-Lehman Kernel

Goal :

design an “EFFICIENT” graph feature descriptor \(\phi(G)\)

Key Idea

use neighborhood structure to iteratively enrich node vocabulary
( = generalized version of Bag of “node-degrees” )

Algorithm : COLOR REFINEMENT

(notation) graph \(G\) with nodes \(V\)
process
- step 1) assign initial color \(c^{(0)}(v)\) to each node \(v\)
- step 2) iteratively refine node colors…..
  - \(C^{(k+1)}(v)=\operatorname{HASH}\left(\left\{C^{(k)}(v),\left\{C^{(k)}(u)\right\}_{u \in N(v)}\right\}\right)\).
  - HASH : mapping function
    
    ( different input to different color )
- ….
- after \(K\) steps of color refinement,
  
  \(c^{(K)}(v)\) : structure of \(K\)-hop neighborhood

5) Summary of graph kernels

Graphlet Kernel

summary
- graph = “bag of GRAPHLETS”
disadvantage
- computationally expensive

Weisfeiler-Lehman Kernel

summary
- apply \(K\)-step color refinement
- graph = “bag-of-COLORS”
- closely related to GNN
advantages
- computationally efficient
- only colors need to be tracked
  
  \(\rightarrow\) # of colors : at most # of nodes
- time complexity : linear in # of edges

Summary

Traditional ML

hand-made feature + ML model

Hand-made features :

1) node-level
- node degree
- node centrality
- clustering coefficients
- graphlets
2) link-level
- distance-based feature
- local & global neighborhood overlap
3) graph-level
- graphlet kernel
- WL kernel

Twitter Facebook LinkedIn

(CS224W) 2.Traditional Methods for ML on Graphs

Seunghan Lee

[ 2. Traditional Methods for ML on Graphs ]

Contents

2-1. Traditional Feature-based Methods : Node

1) Node-level tasks

1) Node Degree

2) Node Centrality

(a) Eigenvector centrality

(b) Betweenness centrality

(c) Closeness centrality

3) Clustering Coefficients

4) Graphlets

Summary

2-2. Traditional Feature-based Methods : Link

1) Two formulations of link prediction

2) Link prediction, via Proximity

3) Link feature

(a) distance based feature

(b) local neighborhood overlap

(c) global neighborhood overlap

2-3. Traditional Feature-based Methods : Graph

1) Kernel Methods

2) Graph kernels

3) Graphlet Kernel

4) Weisfeiler-Lehman Kernel

Color Refinement

5) Summary of graph kernels

Graphlet Kernel

Weisfeiler-Lehman Kernel

Summary

You May Also Enjoy