Automated N-way Program Merging for Facilitating Family-Based Analyses of Variant-Rich Software
Nowadays software tends to come in many different, yet similar variants, often derived from a common codebase via clone-and-own. Family-based-analysis strategies have recently shown very promising potential for improving efficiency in applying quality-assurance techniques to such variant-rich programs, as compared to variant-by-variant approaches. Unfortunately, these strategies require a single program representation superimposing all program variants in a syntactically well-formed, semantically sound and variant-preserving manner, which is usually not available and manually hard to obtain in practice. In this paper, we present a novel methodology, called SiMPOSE, for automatically generating superimpositions of existing program variants to facilitate family-based analyses of variant-rich software. To this end, we propose a novel N-way model-merging methodology to integrate the control-flow automaton (CFA) representations of N given variants of a C program into one unified CFA representation. CFA constitute a unified program abstraction used by many recent software-analysis tools for automated quality assurance. To cope with the inherent complexity of N-way model-merging, our approach (1) utilizes principles of similarity-propagation to reduce the number of potential N-way matches, and (2) enables us to decompose a set of N variants into arbitrary subsets and to incrementally derive an N-way superimposition from partial superimpositions. We apply our tool implementation of SiMPOSE to a selection of realistic C programs, frequently considered for experimental evaluation of program-analysis techniques. In particular, we investigate applicability and efficiency/effectiveness trade-offs of our approach by applying SiMPOSE in the context of family-based unit-test generation as well as model-checking as sample program-analysis techniques. Our experimental results reveal very impressive efficiency improvements by an average factor of up to 2.6 for test-generation and up to 2.4 for model-checking under stable effectiveness, as compared to variant-by-variant approaches, thus amortizing the additional effort required for merging. In addition, our results show that merging all N variants at once produces, in almost all cases, clearly more precise results than incremental step-wise 2-way merging. Finally, our comparison with major existing N-way merging techniques shows that SiMPOSE constitutes, in most cases, the best efficiency/effectiveness trade-off.