The Position of Hubs
We will demonstrate that the position of a point in data space has a significant effect on its k-occurrences value, by observing the sample mean of the data distribution as a point of reference.
Figure 3 plots, for each point x, its against its Euclidean distance from the empirical data mean, for d = 3, 20, 100. As dimensionality increases, stronger correlation emerges, implying that points closer to the mean tend to become hubs.
- unimodal: It is important to note that proximity to one global data-set mean correlates with hubness in high dimensions when the underlying data distribution is unimodal.
- multimodal: For multimodal data distributions, for example those obtained through a mixture of unimodal distributions, hubs tend to appear close to the means of individual component distributions of the mixture.
Figure 3: Scatter plots and Spearman correlation of against the Euclidean distance of point to the sample data-set mean for (a–c) i.i.d. uniform and (d–f) i.i.d. normal random data sets with (a, d) d = 3, (b, e) d = 20, and (c, f) d = 100.