The Hubness Phenomenon

  • In Section 1 we gave a simple set-based deterministic definition of .
  • To complement this definition and introduce into a probabilistic setting, let , be random vectors drawn from the same continuous probability distribution with support , and let be a distance function defined on (not necessarily a metric).
  • Let functions , where , be defined as

In this setting, we define , that is, is the random number of vectors from that have included in their list of k nearest neighbors. In this section we will empirically demonstrate the emergence of hubness through increasing skewness of the distribution of $N_k$ on synthetic and real data, relating the increase of skewness with the dimensionality of data sets, and motivating the subsequent study into the origins of the phenomenon.