Graph cut point of view

The intuition of clustering is to separate points in different groups according to their similarities. For data given in form of a similarity graph, this problem can be restated as follows: we want to find a partition of the graph such that the edges between different groups have a very low weight (which means that points in different clusters are dissimilar from each other) and the edges within a group have high weight (which means that points within the same cluster are similar to each other).

Given a similarity graph with adjacency matrix , the simplest and most direct way to construct a partition of the graph is to solve the mincut problem. To define it, please recall the notation and for the complement of . For a given number of subsets, the mincut approach simply consists in choosing a partition which minimizes

In particular for = 2, mincut is a relatively easy problem and can be solved efficiently. However, in practice it often does not lead to satisfactory partitions. The problem is that in many cases, the solution of mincut simply separates one individual vertex from the rest of the graph. The two most common objective functions to encode this are ```RatioCut``` (Hagen and Kahng, 1992) and the normalized cut``` Ncut``` (Shi and Malik, 2000). In RatioCut, the size of a subset A of a graph is measured by its number of vertices , while in Ncut the size is measured by the weights of its edges . The definitions are:

So what both objective functions try to achieve is that the clusters are "balanced", as measured by the number of vertices or edge weights, respectively. Unfortunately, introducing balancing conditions makes the previously simple to solve mincut problem become NP hard.Spectral clustering is a way to solve relaxed versions of those problems. We will see that relaxing Ncut leads to normalized spectral clustering, while relaxing RatioCut leads to unnormalized spectral clustering

  1. Approximating RatioCut for k = 2

    Let us start with the case of RatioCut and , because the relaxation is easiest to understand in this setting. Our goal is to solve the optimization problem

    We first rewrite the problem in a more convenient form. Given a subset we define the vector with entries
    Now the RatioCut objective function can be conveniently rewritten using the unnormalized graph Laplacian. This is due to the following calculation:
    The problem of minimizing (1) can be equivalently rewritten as This is a discrete optimization problem as the entries of the solution vector are only allowed to take two particular values, and of course it is still NP hard. The most obvious relaxation in this setting is to discard the discreteness condition and instead allow that fi takes arbitrary values in ❘. This leads to the relaxed optimization problem By the Rayleigh-Ritz theorem (e.g., see Section 5.5.2. of L¨utkepohl, 1997) it can be seen immediately that the solution of this problem is given by the vector f which is the eigenvector corresponding to the second smallest eigenvalue of L (recall that the smallest eigenvalue of L is 0 with eigenvector ). So we can approximate a minimizer of RatioCut by the second eigenvector of L. However, in order to obtain a partition of the graph we need to re-transform the real-valued solution vector f of the relaxed problem into a discrete indicator vector. The simplest way to do this is to use the sign of f as indicator function, that is to choose 咩有看懂

  2. Approximating RatioCut for arbitrary

    The relaxation of the RatioCut minimization problem in the case of a general value follows a similar principle as the one above. Given a partition of into k sets , we define indicator vectors by Then we set the matrix as the matrix containing thosek indicator vectors as columns.Observe that the columns in are orthonormal to each other, that is . Similar to the calculations in the last section we can see that where denotes the trace of a matrix. So the problem of minimizing RatioCut() can be rewritten as Similar to above we now relax the problem by allowing the entries of the matrix to take arbitrary real values. Then the relaxed problem becomes: This is the standard form of a trace minimization problem, and again a version of the Rayleigh-Ritz theorem (e.g., see Section 5.2.2.(6) of L¨utkepohl, 1997) tells us that the solution is given by choosing H as the matrix which contains the first k eigenvectors of L as columns. We can see that the matrix H is in fact the matrix U used in the unnormalized spectral clustering algorithm as described in Section 4. Again we need to re-convert the real valued solution matrix to a discrete partition. As above, the standard way is to use the k-means algorithms on the rows of U. This leads to the general unnormalized spectral clustering algorithm as presented in Section 4.

  3. Approximating Ncut

    Again this is the standard trace minimization problem which is solved by the matrix which contains the first eigenvectors of as columns. Re-substituting and using Proposition 3 we see that the solution consists of the first k eigenvectors of the matrix , or the first generalized eigenvectors of . This yields the normalized spectral clustering algorithm according to Shi and Malik

  4. Comments on the relaxation approach

    There are several comments we should make about this derivation of spectral clustering. Most importantly, there is no guarantee whatsoever on the quality of the solution of the relaxed problem compared to the exact solution.

    That is, if is the exact solution of minimizing RatioCut, and is the solution constructed by unnormalized spectral clustering, then can be arbitrary large.

    In general it is known that efficient algorithms to approximate balanced graph cuts up to a constant factor do not exist. To the contrary, this approximation problem can be NP hard itself

    Of course, the relaxation we discussed above is not unique. For example, a completely different relaxation which leads to a semi-definite program is derived in Bie and Cristianini (2006), and there might be many other useful relaxations. The reason why the spectral relaxation is so appealing is not that it leads to particularly good solutions. Its popularity is mainly due to the fact that it results in a standard linear algebra problem which is simple to solve

This toy data set consists of a random sample of 200 points x1, . . . , x200 2 R drawn according to a mixture of four Gaussians.