Graph Data Structure

Many machine learning algorithms (e.g. Isomap) employ graph data structures internally. Graphs are mathematical structures used to model pairwise relations between objects from a certain collection. A graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices (also called nodes or points), and the links that connect some pairs of vertices are called edges (also called lines). The edges may be directed (asymmetric) or undirected (symmetric).

There are different ways to store graphs in a computer system. The data structure used depends on both the graph structure and the algorithm used for manipulating the graph. Theoretically one can distinguish between list and matrix structures but in concrete applications the best structure is often a combination of both. List structures are often preferred for sparse graphs as they have smaller memory requirements. Matrix structures on the other hand provide faster access for some applications but can consume huge amounts of memory. In Smile, we support both adjacency list (class AdjacencyList) and adjacency matrix (class AdjacencyMatrix).

In adjacency list, each vertex has a list of which vertices it is adjacent to. This causes redundancy in an undirected graph. Adjacency queries are faster, at the cost of extra storage space.

An adjacency matrix is an n by n matrix A, where n is the number of vertices in the graph. If there is an edge from a vertex x to a vertex y, then the element A(x,y) is 1 (or in general the number of edges or a weight), otherwise it is 0. In computing, this matrix makes it easy to find subgraphs, and to reverse a directed graph.

Both AdjacencyList and AdjacencyMatrix implement the abstract interface Graph. The basic operation on graph is traversal, i.e. to visit the vertices in some systematic order. Typically, we have breadth first search (BFS) and depth first search (DFS). Both of these construct spanning trees with certain properties useful in other graph algorithms. In Graph, the methods bfs() and dfs() returns the connected components of graph. In addition, the overloaded bfs(Visitor) and dfs(Visitor) can take a user-define Visitor object to perform specific operations when visiting a vertex.

    import smile.graph.*;

    var graph = new AdjacencyList(8);
    graph.addEdge(0, 2);
    graph.addEdge(1, 7);
    graph.addEdge(2, 6);
    graph.addEdge(7, 4);
    graph.addEdge(3, 4);
    graph.addEdge(3, 5);
    graph.addEdge(5, 4);


With DFS or BFS, we can also obtain the topological ordering of a directed graph, which is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG).

    var graph = new AdjacencyList(13, true);
    graph.addEdge(8, 7);
    graph.addEdge(7, 6);
    graph.addEdge(0, 1);
    graph.addEdge(0, 2);
    graph.addEdge(0, 3);
    graph.addEdge(0, 5);
    graph.addEdge(0, 6);
    graph.addEdge(2, 3);
    graph.addEdge(3, 4);
    graph.addEdge(3, 5);
    graph.addEdge(6, 4);
    graph.addEdge(6, 9);
    graph.addEdge(4, 9);
    graph.addEdge(9, 10);
    graph.addEdge(9, 11);
    graph.addEdge(9, 12);
    graph.addEdge(11, 12);

    graph.sortdfs(); // DFS
    graph.sortbfs(); // BFS

Graph traversal can be used to compute the topological ordering of graph by the methods sortbfs or sortdfs. The topological ordering of a directed graph to be one in which, whenever we have an edge from x to y, the ordering visits x before y. Note that a graph has a topological ordering if and only if it is a directed acyclic graph.

With the method dijkstra(), we can calculate the shortest path from a source to all other vertices in the graph by Dijkstra algorithm. Many manifold algorithms employ the shortest path among the samples as a similarity measure instead of pairwise distance.

    var graph = new AdjacencyMatrix(6, true);
    graph.addEdge(0, 1, 0.41);
    graph.addEdge(1, 2, 0.51);
    graph.addEdge(2, 3, 0.50);
    graph.addEdge(4, 3, 0.36);
    graph.addEdge(3, 5, 0.38);
    graph.addEdge(3, 0, 0.45);
    graph.addEdge(0, 5, 0.29);
    graph.addEdge(5, 4, 0.21);
    graph.addEdge(1, 4, 0.32);
    graph.addEdge(4, 2, 0.32);
    graph.addEdge(5, 1, 0.29);

←   Interpolation NLP  →
Fork me on GitHub