Random walks point of view
Another line of argument to explain spectral clustering is based on random walks on the similarity graph. A random walk on a graph is a stochastic process which randomly jumps from vertex to vertex. We will see below that spectral clustering can be interpreted as trying to find a partition of the graph such that the random walk stays long within the same cluster and seldom jumps between clusters. Intuitively this makes sense, in particular together with the graph cut explanation of the last section: a balanced partition with a low cut will also have the property that the random walk does not have many opportunities to jump between clusters. Formally, the transition probability of jumping in one step from vertex to vertex is proportional to the edge weight and is given by . The transition matrix of the random walk is thus defined by If the graph is connected and non-bipartite, then the random walk always possesses a unique stationary distribution , where . Obviously there is a tight relationship between and , as . As a consequence, is an eigenvalue of with eigenvector if and only if is an eigenvalue of with eigenvector . It is well known that many properties of a graph can be expressed in terms of the corresponding random walk transition matrix . From this point of view it does not come as a surprise that the largest eigenvectors of and the smallest eigenvectors of can be used to describe cluster properties of the graph.
Random walks and Ncut
Proposition 5 (
Ncut
via transition probabilities) Let be connected and non bi-partite. Assume that we run the random walk starting with in the stationary distribution . For disjoint subsets , denote by Then: This proposition leads to a nice interpretation ofNcut
, and hence of normalized spectral clustering. It tells us that when minimizing Ncut, we actually look for a cut through the graph such that a random walk seldom transitions from to and vice versa.The commute distance
A second connection between random walks and graph Laplacians can be made via the commute distance on the graph. The commute distance (also called
resistance distance
) between two vertices and is the expected time it takes the random walk to travel from vertex to vertex and back. The commute distance has several nice properties which make it particularly appealing for machine learning. As opposed to the shortest path distance on a graph, the commute distance between two vertices decreases if there are many different short ways to get from vertex to vertex . So instead of just looking for the one shortest path, the commute distance looks at the set of short paths. Points which are connected by a short path in the graph and lie in the same high-density region of the graph are considered closer to each other than points which are connected by a short path but lie in different high-density regions of the graph. In this sense, the commute distance seems particularly well-suited to be used for clustering purposes. //后面咩看
Perturbation theory point of view