Node2vec : Scalable Feature Learning for Networks
1. Introduction
- semi-supervised algorithm for scalable feature learning in networks
- “learning continuous feature representations for nodes in network” ( into low dimension )
( classical approach : PCA, Multi-Dimensional Scaling ) - use 2nd order random walk approach to generate network neighborhoods ( contribution : defning a flexible notion of a node’s network neighborhood )
- use feature representations not only of nodes, but also “edges”
[ KEY CONTRIBUTION ]
- efficient scalable algorithm for feature learning in networks (using SGD)
- provides flexibilty in discovering representations
- extend node2vec from ‘nodes’ -> ‘edges’
- evaluate node2vec for (1) multi-label classification & (2) link prediction
- (1) multi-label classification : which class does each node belongs to
- (2) link prediction : predict whether there is a connection between two nodes
2. Feature learning Framework
- applicable to any (un)directed & (un)weighted network
- extend Skip-Gram architecture to networks
** skip-gram : https://github.com/seunghan96/datascience/blob/master/%5B%EC%97%B0%EA%B5%AC%EC%8B%A4%EC%9D%B8%ED%84%B4%5DComputing_Science_and_Engineering/1%EC%A3%BC%EC%B0%A8/DeepWalk_Online_Learning_of_Social_Representations.md - propose a randomized procedure that samples many different neighbors of a source node!
- network :
mapping function (from nodes -> feature representations) :
( d : number of dimension of feature representation )
( : matrix size of )
network neighborhood of node u ( generated by neighborhood sampling strategy S ):
[ Objective Function ]
- maximize log-probability of observing a network neighborhood () for a node u, conditioned on its feature representation( = f(u) )
[ Two Assumptions ]
1) Conditional Independence
- factorize the likelihood, by assuming independence among likelihood of obesrving a neighborhood nodes
2) Symmetry in feature space
- source node & neighborhood node -> have symmetric effect over each other in feature space
- conditional likelihood of source-neighborhood node pair ( as a softmax ) :
with these two assumption, the objective function can be simplified into
( in the above, is too expensive to compute! use negative sampling )
** negative sampling : https://github.com/seunghan96/datascience/blob/master/%5B%EC%97%B0%EA%B5%AC%EC%8B%A4%EC%9D%B8%ED%84%B4%5DComputing_Science_and_Engineering/4%EC%A3%BC%EC%B0%A8/Negative_Sampling.md
3-1. Classic search strategies
- sample neighbors of a source node as a form of local search
[ two kind of similarities ]
1) homophily
- highly interconnected -> should be embedded closely
- macro view
- infer communities based on distances
( ex. u & s1 : community A, s8 & s9 : community B )
2) structural equivalence - similar structural roles -> embedded together
- micro view( example with the image above : u & s6 )
- does not emphasize connectivity!
( ex. u & s6 : act as hubs of communities )
[ two search algorithms ]
- two sampling strategies for generating neighborhood sets N(s) of k nodes ( BFS & DFS )
(https://i.stack.imgur.com/vm0sn.png)
a. Breadth-first Sampling(BFS)
- neighborhood : only immediate neighbors of the source node
- micro-view
- for ‘structural equivalence’
b. Depth-first Sampling(DFS)
- neighborhood : nodes sequentially sampled at increasing distance from the source node
- macro-view
- for ‘homophily’
** BFS & DFS implementation : (https://github.com/seunghan96/datascience/tree/master/Data_Structure/2.Algorithm/Graph_Algorithm)
3-2. node2vec
- interporlate between BFS & DFS
- by developing a flexible biased random walk procedure ( that can explore neighborhoods both in BFS & DFS fashion )
(1) Random Walks
(2) Search bias
- simple way to bias randm walk : based on edge weights -> hard to find different types of network structure
- ( REMEMBER! homophily & structure equivalence are not in trade-off ! So, how to achieve both? )
[ Second order random walk with 2 parameters, p & q ]
Example
- passed the edge (t,v) just before! ( previous state : node t & current state : node v )
- to decide next step, evaluate transition probability
-
set the unnormalized transition probability to
</a> is the shortest path distance between node t & x -> minimum 0, maximum 2) </a> https://www.mdpi.com/algorithms/algorithms-12-00012/article_deploy/html/images/algorithms-12-00012-g004.png - parameters p & q : control how fast the random walk explores!
- these two parameters allow the search to interpolate between BFS & DFS
A. Return parameter, p
- controls the likelihood of “immediately revisiting a node in the walk”
- high p -> less likely to sample the pre-visited nodes ( in the following two steps )
- low p -> keep the walk “local” close to the starting node
B. In-out parameter, q
- allows the search to differentiate between ‘inward & outward’ nodes
- q > 1 : to nodes close to node ‘t’
- q < 1 : get further from node ‘t’ => encourages exploration
(3) node2vec algorithm
</a>