[ 1. Introduction : ML for Graphs ]Permalink
( 참고 : CS224W: Machine Learning with Graphs )
ContentsPermalink
-
1-1. Why Graphs
-
1-2. Applications of Graph ML
-
1-3. Choice of Graph Representation
1-1. Why GraphsPermalink
1) GraphsPermalink
= general language for describing & analyzing entities with…
- 1) “relations”
- 2) “interactions”
Lots of data are “graphs”!
- ex) computer network, event graphs,…
2) Types of networks & graphsPermalink
Networks ( = Natural Graphs )
- 1) Social networks ( society )
- 2) Communication & Transactions
- 3) Biomedicine ( ex. genes )
- 4) Brain connections ( ex. thoughts )
Graphs ( = representation )
- 1) information/knowledge
- 2) similarity networks
- 3) relational structures
Relational Graphs
-
model better relationships between entities!
-
use DL for complex modeling
3) Networks are complexPermalink
Why is it complex?
- arbitrary size & complex structure
- no fixed node ordering or reference point
- dynamic & multi-modal features
4) DL in graphsPermalink
[ Input ]
- Network
[ Output ] (Prediction)
- 1) node labels
- 2) new links
- 3) generated graphs / subgraphs
Key point : “Representation Learning”
-
“automatically” learn features with DNN
-
map to d-dim embeddings
- similar nodes = similar in embedded space
5) OutlinePermalink
- 1) Traditional methods
- Graphlets, Graph Kernels
- 2) Node Embeddings
- DeepWalk, Node2vec
- 3) GNN
- GCN, GraphSAGE, GAT
- 4) Knowledge graph & reasoning
- TransE, BetaE
-
5) Deep Generative models ( for graphs )
- 6) Applications
1-2. Applications of Graph MLPermalink
1) Diverse TasksPermalink
- 1) Node classification
- ex) categorizing users/items
- 2) Link prediction
- ex) knowledge graph completion
- 3) Graph Classification
- ex) Molecule property prediction
- 4) Clustering
- ex) community detection
- 5) etc
- Graph generation, Graph evolution ..
Diverse level of tasksPermalink
2) Example of “NODE-level” MLPermalink
ex) Protein FoldingPermalink
- Protein
- protein= sequence of amino acid
- 3d structure
- interact with each other
- Goal : “predict 3D structure”, based on “amino acid sequence”
- key idea of AlphaFold : “spatial graph”
- (1) node : amino acids
- (2) edges : proximity between nodes
3) Example of “EDGE-level” MLPermalink
ex) Recommender SystemsPermalink
- Formulation
- (1) node : user & items
- (2) edge : user & item interaction
-
Goal : “Recommend item to users”
( predict whether 2 nodes are related )
ex) Drug Side EffectsPermalink
- Background : many patients & many drugs
-
Goal : predict adverse side effects of “pair of drugs”
- Formulation
- (1) node : drugs & proteins
- (2) edges : interactions
- drug-protein interaction
- protein-protein interaction
- drug-drug interaction
4) Example of “SUBGRAPH-level” MLPermalink
ex) Traffic PredictionPermalink
- want to move from A to B…. how long will it take?
- Formulation
- (1) node : road segments
- (2) edges : connectivity between nodes
- ex) Google Maps : traffic prediction with GNN
5) Examples of “GRAPH-level” MLPermalink
ex) Drug DiscoveryPermalink
- Antibiotics = small molecular graphs
- Formulation
- (1) node : atoms
- (2) edges : chemical bonds
- (Q) Which molecules should be prioritized?
- ex) graph classification model
- predict promising molecules among candidates
ex) Graph generationPermalink
- generate novel molecules ( new structure )
- with “high drug likeness”
- with “desirable properties”
ex) Physics SimulationPermalink
- Formulation
- (1) node : particles
- (2) edges : interaction between nodes
- Goal : predict how a graph will EVOLVE
1-3. Choice of Graph RepresentationPermalink
1) Components of NetworkPermalink
-
(1) Objects : nodes, vertices ( notation : N )
-
(2) Interactions : links, edges ( notation : E )
-
(3) System : network, graph ( notation : G(N,E) )
Notation :
- ∣N∣ : number of nodes
- ∣E∣ : number of edges
[ Question ]
Which one is proper representation??
How to build a graph?
- 1) what are nodes?
- 2) what are edges?
[ Answer ]
- differ by domain/tasks!
2) Directed & Undirected GraphsPermalink
Directed
- links = arcs
- ex) phone calls, following in INSTAGRAM
Undirected
- links = symmetrical, reciprocal
- ex) collaborations, friendship in FB
3) Node DegreePermalink
Node degree ( ki )
- # of edges, adjacent to node i
Average degree
- ˉk=⟨k⟩=1N∑Ni=1ki=2EN.
Directed NetworkPermalink
In-degree & Out-degree
4) Bipartite GraphPermalink
graph, whose nodes can be divided into “2 disjoint sets U, V”, where …
every link connects a node in U to one in V
( = U & V : independent sets )
ex) User-Movie rating
Folded / Projected Bipartite GraphsPermalink
5) Adjacency MatrixPermalink
Different ways of representing graphs :
-
1) adjacency matrix
-
2) edge list & adjacency list
Adjacency Matrix
n×n matrix ( n : number of nodes )
- connected = 1
- disconnected = 0
( Undirected Graph : symmetric adjacency matrix )
Problem : Adjacency matrices are SPARSE!
6) Edge list & Adjacency listPermalink
Different ways of representing graphs :
-
1) adjacency matrix
-
2) edge list & adjacency list
(a) Edge list
- ex) (2,3), (2,4), (3,2), (3,4) ,… (5,2)
(b) Adjacency list
- easier when network is large & sparse
- ex)
- 1 :
- 2 : 3,4
- 3 : 2,4
- 4 : 5
- 5 : 1,2
7) Node & Edge ATTRIBUTESPermalink
ex) weight, ranking, type, sign, …
8) More Types of GraphsPermalink
unweighted & weighted
- unweighted ( 1 or 0 )
- weighted ( not binary )
self-edges & multi-graph
- self-edges ( self-loops )
- multi-graph
- multiple edges between pair of nodes
connected & disconnected graph
strongly & weakly connected graph
- strongly : both A-B & B-A path
- weakly : disregard edge direction
Strongly connected components (SCCs)