The Hubness Phenomenon
- In Section 1 we gave a simple set-based deterministic definition of .
- To complement this definition and introduce into a probabilistic setting, let , be random vectors drawn from the same continuous probability distribution with support , and let be a distance function defined on (not necessarily a metric).
- Let functions , where , be defined as
In this setting, we define , that is, is the random number of vectors from that have included in their list of k nearest neighbors. In this section we will empirically demonstrate the emergence of hubness through increasing skewness of the distribution of $N_k$ on synthetic and real data, relating the increase of skewness with the dimensionality of data sets, and motivating the subsequent study into the origins of the phenomenon.