It is ICSE time again. Have fun in Italy, everyone! Here, I’d like to share some of my thoughts on our ICSE14 paper as well as the valuable discussions I have had with my colleagues Guowei Yang, Cristian Cadar, Eric Bodden, Christian Kastner, Hridesh Rajan and Yang Liu over the year.
In this paper, we developed a program representation MVICFG and static analysis techniques to address the questions of how to determine commonalities and variabilities of program properties over similar code and multiple versions, and why it is useful. Here, the “program properties” used are bugs, and “similar code” are software revisions and releases.
Through this research, we found that 1) a patch can contain bugs (also reported in Yin et al.’s 2011 FSE paper), 2) a patch cannot always fix all the affected versions, 3) documentation on which versions are impacted can be incomplete, and MVICFGs can be used to automatically compute such information, and 4) leveraging cached results computed for previous versions, we are able to verify changes in seconds, which is practical for continuous integration in industry.
Guowei and I have discussed the differences between his ICSE 2009 paper, regression model checking, and our paper. The former first computes all the statements that are impacted by a change and then performs model checking on the change impacted statements. The latter is to first identify all the paths affected by the change and then applied demand-driven analysis on the changed paths. Demand-driven analysis computes the dependency information and determines the properties at the same time. Since demand-driven analysis terminates when the properties are determined, i.e., it does not necessarily cover all the change impacted statements and paths, and the model checking is performed on the entire set of affected statements, I’d say our approach still can be faster for computing many of the program properties.
In terms of future work, Cristian has mentioned that it will be useful to apply KLEE to such representations to better solve some of the patch testing problems they had. Eric and Christian both mentioned that it may be interesting to use it to explore the challenges caused by the combinations of changes and configurations. Hridesh has discussed with me on whether it is feasible to use MVICFGs to represent program versions in Boa. Last but not least, Liu Yang’s group is interested in using the MVICFG to model malware variants.
To further improve the MVICFG, we are still thinking about how to better represent the changes introduced in a new version. In this paper, we took an old version and a new version (when all the changes have already been made) and tried to align the statements across versions using function level textual diffs. The problems are twofold. We ignore the process of changes, and thus the changes we represent in an MVICFG are not always the changes developers intended to make. To solve this problem, we need to integrate the information collected through code change process. There actually exist such code editors to do that documented in Susan Horwitz’s papers. Another problem we faced is how we can better represent semantic changes in MVICFGs rather than syntactic changes. For instance, if the code performs refactoring, no changes should be shown on the MVICFG. In addition to thinking about how to represent changes, we are considering how to develop a language independent version of the MVICFG to make it beneficial to more types of analyses.