What caused the skewness

Artefact of data?
- Are some songs more similar to others?
- Do some people have fingerprints or voices that are harder to distinguish from other people's?
Specifics of modeling algorithms?
- Inadequate choice of features?
Something more general?