Node2vec : Scalable Feature Learning for Networks

1. Introduction

  • semi-supervised algorithm for scalable feature learning in networks
  • “learning continuous feature representations for nodes in network” ( into low dimension )
    ( classical approach : PCA, Multi-Dimensional Scaling )
  • use 2nd order random walk approach to generate network neighborhoods ( contribution : defning a flexible notion of a node’s network neighborhood )
  • use feature representations not only of nodes, but also “edges”

[ KEY CONTRIBUTION ]

  1. efficient scalable algorithm for feature learning in networks (using SGD)
  2. provides flexibilty in discovering representations
  3. extend node2vec from ‘nodes’ -> ‘edges’
  4. evaluate node2vec for (1) multi-label classification & (2) link prediction
    • (1) multi-label classification : which class does each node belongs to
    • (2) link prediction : predict whether there is a connection between two nodes

2. Feature learning Framework

  • applicable to any (un)directed & (un)weighted network
  • extend Skip-Gram architecture to networks
    ** skip-gram : https://github.com/seunghan96/datascience/blob/master/%5B%EC%97%B0%EA%B5%AC%EC%8B%A4%EC%9D%B8%ED%84%B4%5DComputing_Science_and_Engineering/1%EC%A3%BC%EC%B0%A8/DeepWalk_Online_Learning_of_Social_Representations.md
  • propose a randomized procedure that samples many different neighbors of a source node!
  • network :
    mapping function (from nodes -> feature representations) :
    ( d : number of dimension of feature representation )
    ( : matrix size of )
    network neighborhood of node u ( generated by neighborhood sampling strategy S ):

[ Objective Function ]

  • maximize log-probability of observing a network neighborhood () for a node u, conditioned on its feature representation( = f(u) )

[ Two Assumptions ]

1) Conditional Independence

  • factorize the likelihood, by assuming independence among likelihood of obesrving a neighborhood nodes


2) Symmetry in feature space

  • source node & neighborhood node -> have symmetric effect over each other in feature space
  • conditional likelihood of source-neighborhood node pair ( as a softmax ) :


with these two assumption, the objective function can be simplified into



( in the above, is too expensive to compute! use negative sampling )

** negative sampling : https://github.com/seunghan96/datascience/blob/master/%5B%EC%97%B0%EA%B5%AC%EC%8B%A4%EC%9D%B8%ED%84%B4%5DComputing_Science_and_Engineering/4%EC%A3%BC%EC%B0%A8/Negative_Sampling.md

3-1. Classic search strategies

  • sample neighbors of a source node as a form of local search

[ two kind of similarities ]

1) homophily

  • highly interconnected -> should be embedded closely
  • macro view
  • infer communities based on distances
    ( ex. u & s1 : community A, s8 & s9 : community B )
    2) structural equivalence
  • similar structural roles -> embedded together
  • micro view( example with the image above : u & s6 )
  • does not emphasize connectivity!
    ( ex. u & s6 : act as hubs of communities )

[ two search algorithms ]

  • two sampling strategies for generating neighborhood sets N(s) of k nodes ( BFS & DFS )
    (https://i.stack.imgur.com/vm0sn.png)

a. Breadth-first Sampling(BFS)

  • neighborhood : only immediate neighbors of the source node
  • micro-view
  • for ‘structural equivalence’

b. Depth-first Sampling(DFS)

  • neighborhood : nodes sequentially sampled at increasing distance from the source node
  • macro-view
  • for ‘homophily’

    ** BFS & DFS implementation : (https://github.com/seunghan96/datascience/tree/master/Data_Structure/2.Algorithm/Graph_Algorithm)

3-2. node2vec

  • interporlate between BFS & DFS
  • by developing a flexible biased random walk procedure ( that can explore neighborhoods both in BFS & DFS fashion )

(1) Random Walks


  • : unnormalized transition probability between nodes v & x
  • Z : normalizing constant

(2) Search bias

  • simple way to bias randm walk : based on edge weights -> hard to find different types of network structure
  • ( REMEMBER! homophily & structure equivalence are not in trade-off ! So, how to achieve both? )

[ Second order random walk with 2 parameters, p & q ]

Example

  • passed the edge (t,v) just before! ( previous state : node t & current state : node v )
  • to decide next step, evaluate transition probability
  • set the unnormalized transition probability to
    </a> is the shortest path distance between node t & x -> minimum 0, maximum 2) </a> https://www.mdpi.com/algorithms/algorithms-12-00012/article_deploy/html/images/algorithms-12-00012-g004.png

  • parameters p & q : control how fast the random walk explores!
  • these two parameters allow the search to interpolate between BFS & DFS

A. Return parameter, p

  • controls the likelihood of “immediately revisiting a node in the walk”
  • high p -> less likely to sample the pre-visited nodes ( in the following two steps )
  • low p -> keep the walk “local” close to the starting node

B. In-out parameter, q

  • allows the search to differentiate between ‘inward & outward’ nodes
  • q > 1 : to nodes close to node ‘t’
  • q < 1 : get further from node ‘t’ => encourages exploration

(3) node2vec algorithm

</a>