Threats to Validity

The primary construct in our study is the use of bug reports as the logical unit of change to the analyzed systems. The first threat to the validity of this construct comes from the simple pattern-matching technique we use to associate commits with bug reports based on their description field. This technique does not guarantee that all of the links between bug reports and commits are found; some commit descriptions may report the associated bug ID in a nonstandard format that is missed by our regular expression, whereas other commits may describe the bug they address without explicitly mentioning the bug ID. In particular, we were only able to match approximately 20% of the commits for Evolution, which restricted our analysis to a small subset of the system’s full history.

A second threat to the construct validity is due to noise in the Bugzilla repositories. We assumed that each bug report corresponds to a single logical change to the system, but this does not hold for all bug reports. For example, some reports describe routine software maintenance (e.g., upgrading a third-party library that is used by the system to the latest version) that could have very different characteristics than the defect reports and enhancement requests that we intended to study.

The internal validity of our study concerns the relationship between our data and the conclusions reached. Many of the conclusions are subjective interpretations of the data we present, and other inferences are certainly possible. There are two areas where we can quantify the validity of our results. First, for the analysis of how the number of modules examined correlates with the time spent working on a change, the Pearson correlation coefficient is significant at the 99% level. Second, for the association rules mined from the Mylyn task contexts and change data, the confidence level of each rule is reported and we restrict our analysis only to those rules above a certain confidence threshold.

It is possible that the results of our study do not generalize to systems that differ significantly from the ones that we studied. We can identify two threats to external validity. First, we examined only three systems. Although we tried to choose systems that varied in several characteristics (e.g., programming language, size, years of development), this is still a very small set of systems from which to generalize. For the analysis of how many modules a developer examines when making changes, we were only able to consider data from a single system (namely, Mylyn). Second, all of the systems we analyzed are open source, which may have different change patterns than industrial systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.12.207