The Position of Hubs

We will demonstrate that the position of a point in data space has a significant effect on its k-occurrences value, by observing the sample mean of the data distribution as a point of reference.

Figure 3 plots, for each point x, its $N_5(\mathbf x)$ against its Euclidean distance from the empirical data mean, for d = 3, 20, 100. As dimensionality increases, stronger correlation emerges, implying that points closer to the mean tend to become hubs.　　

unimodal: It is important to note that proximity to one global data-set mean correlates with hubness in high dimensions when the underlying data distribution is unimodal.
multimodal: For multimodal data distributions, for example those obtained through a mixture of unimodal distributions, hubs tend to appear close to the means of individual component distributions of the mixture.

这里写图片描述 Figure 3: Scatter plots and Spearman correlation of $N_5(\mathbf x)$ against the Euclidean distance of point $\mathbf x$ to the sample data-set mean for (a–c) i.i.d. uniform and (d–f) i.i.d. normal random data sets with (a, d) d = 3, (b, e) d = 20, and (c, f) d = 100.