Efficient and Effective Feature Space Exploration for Testing Deep Learning Systems

Zohdinasab, Tahereh; Riccio, Vincenzo; Gambi, Alessio; Tonella, Paolo

doi:10.1145/3544792

Assessing the quality of Deep Learning (DL) systems is crucial, as they are increasingly adopted in safety-critical domains. Researchers have proposed several input generation techniques for DL systems. While such techniques can expose failures, they do not explain which features of the test inputs influenced the system’s (mis-) behaviour. DeepHyperion was the first test generator to overcome this limitation by exploring the DL systems’ feature space at large. In this paper, we propose DeepHyperion-CS, a test generator for DL systems which enhances DeepHyperion by promoting the inputs that contributed more to feature space exploration during the previous search iterations. We performed an empirical study involving two different test subjects (i.e., a digit classifier and a lane-keeping system for self-driving cars). Our results proved that the contribution-based guidance implemented within DeepHyperion-CS outperforms state-of-the-art tools and significantly improves the efficiency and the effectiveness of DeepHyperion. DeepHyperion-CS exposed significantly more misbehaviours for 5 out of 6 feature combinations and was up to 65% more efficient than DeepHyperion in finding misbehaviour-inducing inputs and exploring the feature space. DeepHyperion-CS was useful for expanding the datasets used to train the DL systems, populating up to 200% more feature map cells than the original training set.