DataShiftExplorer: Visualizing and Comparing Change in Multidimensional Data for Supervised Learning


In supervised learning, to ensure the model’s validity, it is essential to identify dataset shifts, i.e., when the data distribution changes from the one the model encountered at the time of training. To detect such changes, a comparative analysis of the multidimensional data distributions of the training data and new, unseen datasets is required. In this paper, we span the design space of visualizations for multidimensional comparative data analytics. Based on this design space, we present DataShiftExplorer, a technique tailored to identify and analyze the change in multidimensional data distributions. Throughout examples, we show how DataShiftExplorer facilitates the identification and analysis of data changes, supporting supervised learning.

Proc. of Int. Conf. on Information Visualization Theory and Applications (IVAPP)