Random walks point of view

Another line of argument to explain spectral clustering is based on random walks on the similarity graph. A random walk on a graph is a stochastic process which randomly jumps from vertex to vertex. We will see below that spectral clustering can be interpreted as trying to find a partition of the graph such that the random walk stays long within the same cluster and seldom jumps between clusters. Intuitively this makes sense, in particular together with the graph cut explanation of the last section: a balanced partition with a low cut will also have the property that the random walk does not have many opportunities to jump between clusters. Formally, the transition probability of jumping in one step from vertex $v_i$ to vertex $v_j$ is proportional to the edge weight $w_{ij}$ and is given by $p_{ij} := w_{ij}/d_i$ . The transition matrix $P = (p_{ij})_{i,j=1,...,n}$ of the random walk is thus defined by $P = D^{−1}W.$ If the graph is connected and non-bipartite, then the random walk always possesses a unique stationary distribution $\pi = (\pi_1, . . . , \pi_n)'$ , where $\pi_i = d_i/ vol(V)$ . Obviously there is a tight relationship between $L_{rw}$ and $P$ , as $L_{rw} = I −P$ . As a consequence, $\lambda$ is an eigenvalue of $L_{rw}$ with eigenvector $u$ if and only if $1 − \lambda$ is an eigenvalue of $P$ with eigenvector $u$ . It is well known that many properties of a graph can be expressed in terms of the corresponding random walk transition matrix $P$ . From this point of view it does not come as a surprise that the largest eigenvectors of $P$ and the smallest eigenvectors of $L_{rw}$ can be used to describe cluster properties of the graph.

Random walks and Ncut

Proposition 5 (Ncut via transition probabilities) Let $G$ be connected and non bi-partite. Assume that we run the random walk $(X_t)_{t \in \Bbb N}$ starting with $X_0$ in the stationary distribution $\pi$ . For disjoint subsets $A, B \subset V$ , denote by $P(B|A) := P(X_1 \in B|X_0 \in A).$ Then: $Ncut(A,\bar A) = P(\bar A|A) + P(A|\bar A).$ This proposition leads to a nice interpretation of Ncut, and hence of normalized spectral clustering. It tells us that when minimizing Ncut, we actually look for a cut through the graph such that a random walk seldom transitions from $A$ to $\bar A$ and vice versa.
The commute distance

A second connection between random walks and graph Laplacians can be made via the commute distance on the graph. The commute distance (also called resistance distance) $c_{ij}$ between two vertices $v_i$ and $v_j$ is the expected time it takes the random walk to travel from vertex $v_i$ to vertex $v_j$ and back. The commute distance has several nice properties which make it particularly appealing for machine learning. As opposed to the shortest path distance on a graph, the commute distance between two vertices decreases if there are many different short ways to get from vertex $v_i$ to vertex $v_i$ . So instead of just looking for the one shortest path, the commute distance looks at the set of short paths. Points which are connected by a short path in the graph and lie in the same high-density region of the graph are considered closer to each other than points which are connected by a short path but lie in different high-density regions of the graph. In this sense, the commute distance seems particularly well-suited to be used for clustering purposes. //后面咩看

Perturbation theory point of view

Random walks point of view

Random walks and Ncut

The commute distance