Distances, Neighborhoods, or Dimensions? Projection Literacy for the Analysis of Multivariate Data

Abstract

Projections are some of the most common methods for presenting high-dimensional datasets on a 2D display. While these techniques provide overviews that highlight relations between observations, they are unavoidably subject to change depending on chosen configurations. Hence, the same projection technique can depict multiple compositions of the same dataset, depending on its parameter setting. Furthermore, projection techniques differ in their underlying assumptions and computation mechanisms, favoring the preservation of either distances, neighborhoods, or dimensions. This article aims to shed light on the similarities and differences of a multitude of projection techniques, the influence of features and parameters on data-representations, and give a data-driven intuition on the relation of projections. We postulate that, depending on the task and data, a different choice of projection technique, or a combination of such, might lead to a more effective view.

Publication
Proc. of IEEE VIS Workshop on Visualization for AI Explainability (VISxAI)

Related