Graph Data Structure
Many machine learning algorithms (e.g. Isomap) employ graph data structures internally. Graphs are mathematical structures used to model pairwise relations between objects from a certain collection. A graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices (also called nodes or points), and the links that connect some pairs of vertices are called edges (also called lines). The edges may be directed (asymmetric) or undirected (symmetric).
There are different ways to store graphs in a computer system. The data
structure used depends on both the graph structure and the algorithm
used for manipulating the graph. Theoretically one can distinguish between
list and matrix structures but in concrete applications the best structure
is often a combination of both. List structures are often preferred for
sparse graphs as they have smaller memory requirements. Matrix structures
on the other hand provide faster access for some applications but can
consume huge amounts of memory. In Smile, we support both adjacency list
(class AdjacencyList
) and adjacency matrix
(class AdjacencyMatrix
).
In adjacency list, each vertex has a list of which vertices it is adjacent to. This causes redundancy in an undirected graph. Adjacency queries are faster, at the cost of extra storage space.
An adjacency matrix is an n by n matrix A, where n is the number of vertices in the graph. If there is an edge from a vertex x to a vertex y, then the element A(x,y) is 1 (or in general the number of edges or a weight), otherwise it is 0. In computing, this matrix makes it easy to find subgraphs, and to reverse a directed graph.
Both AdjacencyList
and AdjacencyMatrix
implement
the abstract interface Graph
. The basic operation on graph is
traversal, i.e. to visit the vertices in some systematic order.
Typically, we have breadth first search (BFS) and depth first search (DFS).
Both of these construct spanning trees with certain properties useful
in other graph algorithms. In Graph
, the methods bfs()
and dfs()
returns the connected components of graph. In addition,
the overloaded bfs(Visitor)
and dfs(Visitor)
can take a user-define Visitor
object to perform specific
operations when visiting a vertex.
import smile.graph.*;
var graph = new AdjacencyList(8);
graph.addEdge(0, 2);
graph.addEdge(1, 7);
graph.addEdge(2, 6);
graph.addEdge(7, 4);
graph.addEdge(3, 4);
graph.addEdge(3, 5);
graph.addEdge(5, 4);
graph.dfs();
With DFS or BFS, we can also obtain the topological ordering of a directed graph, which is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG).
var graph = new AdjacencyList(13, true);
graph.addEdge(8, 7);
graph.addEdge(7, 6);
graph.addEdge(0, 1);
graph.addEdge(0, 2);
graph.addEdge(0, 3);
graph.addEdge(0, 5);
graph.addEdge(0, 6);
graph.addEdge(2, 3);
graph.addEdge(3, 4);
graph.addEdge(3, 5);
graph.addEdge(6, 4);
graph.addEdge(6, 9);
graph.addEdge(4, 9);
graph.addEdge(9, 10);
graph.addEdge(9, 11);
graph.addEdge(9, 12);
graph.addEdge(11, 12);
graph.sortdfs(); // DFS
graph.sortbfs(); // BFS
Graph traversal can be used to compute the topological ordering of graph
by the methods sortbfs
or sortdfs
. The topological
ordering of a directed graph to be one in which, whenever we have an edge
from x
to y
, the ordering visits x
before y
. Note that a graph has a topological ordering if
and only if it is a directed acyclic graph.
With the method dijkstra()
, we can calculate the shortest path
from a source to all other vertices in the graph by Dijkstra algorithm.
Many manifold algorithms employ the shortest path among the samples as
a similarity measure instead of pairwise distance.
var graph = new AdjacencyMatrix(6, true);
graph.addEdge(0, 1, 0.41);
graph.addEdge(1, 2, 0.51);
graph.addEdge(2, 3, 0.50);
graph.addEdge(4, 3, 0.36);
graph.addEdge(3, 5, 0.38);
graph.addEdge(3, 0, 0.45);
graph.addEdge(0, 5, 0.29);
graph.addEdge(5, 4, 0.21);
graph.addEdge(1, 4, 0.32);
graph.addEdge(4, 2, 0.32);
graph.addEdge(5, 1, 0.29);
graph.dijkstra();