This work introduces a link-based covariance measure between the nodes of a weighted, directed, graph where a cost is associated to each arc. To this end, a probability distribution on the (usually infinite) countable set of paths through the graph is defined by minimizing the sum of the expected costs between all pair of nodes while fixing the total relative entropy spread in the graph. This results in a Boltzmann distribution on the set of paths such that long (high-cost) paths occur with a low probability while short (low-cost) paths occur with a high probability. The sum-over-paths (SoP) covariance measure between nodes is then defined according to this probability distribution: two nodes are considered as highly correlated if they often co-occur together on the same -- preferably short -- paths. The resulting covariance matrix between nodes (say $n$ nodes in total) is a Gram matrix and therefore defines a valid kernel on the graph. It is obtained by inverting a n * n matrix depending on the costs assigned to the arcs. In the same spirit, a betweenness score is also defined, measuring the expected number of times a node occurs on a path. The proposed measures could be used for various graph mining tasks such as computing betweenness centrality, semi-supervised classification, visualization, etc, as shown in the experimental section.
semi-supervised classification, kernels on a graph, covariance measure, machine learning
Luh Yen, Mantrach Amin, Masashi Shimbo, Marco Saerens. (2008)
A Family of Dissimilarity Measures between Nodes Generalizing both the Shortest-Path and the Commute-time Distances.