Interactive Visualization for Data Science Scripts

Published in Symposium on Visual Data Science at IEEE VIS, 2022

Authors: Rebecca Faust, Carlos Scheidegger, Katherine Isaacs, William Bernstein, Michael Sharp, Chris North.

As the field of data science continues to grow, so does the need for adequate tools to understand and debug data science scripts. Current debugging practices fall short when applied to a data science setting, due to the exploratory and iterative nature of analysis scripts. Additionally, computational notebooks, the preferred scripting environment of many data scientists, present additional challenges to understanding and debugging workflows, including the non-linear execution of code snippets. This paper presents Anteater, a trace-based visual debugging method for data science scripts. Anteater automatically traces and visualizes execution data with minimal analyst input. The visualizations illustrate execution and value behaviors that aid in understanding the results of analysis scripts. To maximize the number of workflows supported, we present prototype implementations in both Python and Jupyter. Last, to demonstrate Anteater’s support for analysis understanding tasks, we provide two usage scenarios on real world analysis scripts.

PDF

Recommended citation: R. Faust, C. Scheidegger, K. Isaacs, W. Bernstein, M. Sharp, and C. North, “Interactive Visualization for Data Science Scripts.” Symposium on Visualization in Data Science (VDS) at IEEE VIS, 2022