Embrace dynamic artifacts

Venkatesh-Prasad Ranganath    Kansas State University, Manhattan, KS, United States

Abstract

When we talk about data science in the context of software engineering, we often only consider static artifacts that are independent of (or not generated by) the execution of software, eg, source code, version history, bug reports, mailing lists, developer network, and organization structure. We seldom consider dynamic artifacts that are dependent on (or generated by) the execution of software, eg, execution logs, crash/core dumps, call stacks, and traffic logs. Specifically, we seldom consider dynamic artifacts to enable the use of data science to improve software engineering tasks such as coding, testing, and debugging (in contrast to improving post-deployment activities such as monitoring for service degradation or security attacks). So, here are few experience nuggets to convince you to consider dynamic artifacts to improve software engineering tasks.

Keywords

Dynamic artifacts; Pattern mining; Software testing; Debugging

Acknowledgments

The USB test suite minimization effort was carried out at Microsoft by Naren Datha, Robbie Harris, Aravind Namasivayam, Venkatesh-Prasad Ranganath, and Pradip Vallathol.

Can We Minimize the USB Driver Test Suite?

When we talk about data science in the context of software engineering, we often only consider static artifacts that are independent of (or not generated by) the execution of software, eg, source code, version history, bug reports, mailing lists, developer network, and organization structure. We seldom consider dynamic artifacts that are dependent on (or generated by) the execution of software, eg, execution logs, crash/core dumps, call stacks, and traffic logs. Specifically, we seldom consider dynamic artifacts to enable the use of data science to improve software engineering tasks such as coding, testing, and debugging (in contrast to improving post-deployment activities such as monitoring for service degradation or security attacks).

I believe that we should, and we can, use dynamic artifacts with data science to improve software engineering tasks. Read on to see if you agree with me.

Here’s an example from my personal experience when collaborating with the Windows USB driver team at Microsoft (after the development of Windows 8).

Before the USB device driver in Microsoft Windows is updated, the USB driver is tested by using it with USB devices that exercise its behavior. This form of testing is expensive, as the number of unique USB devices is huge and, consequently, the effort required to identify, procure, and use the devices to test the driver is prohibitive.

As in situations that involve a very large population, the Windows USB testing team uses a sample (test suite) of 1000 + USB devices to test the USB driver. To ensure the sample is diverse and representative of both the population and prevalent use of USB devices, the team uses expert knowledge to choose the sample. Even so, there can be redundancy in such a sample as many devices may exercise the same behavior of the USB driver, eg, devices may use the same ASIC (low-level circuitry) to implement the USB protocol, and such redundancy may elude even the experts. Consequently, this can lead to wasted time and effort, which could be better utilized to test behaviors not exercised by devices in the sample.

Further, testing using 1000 + devices takes a non-trivial amount of manual effort. So, any elimination of redundancy would help expedite test cycles.

The Windows USB testing team described the situation to my team and asked if we could help identify a subset of the test suite that would expose the same bugs that were exposed by the entire test suite. In other words, they wanted to minimize their test suite, preferably without affecting its bug coverage.

Yes, Let’s Observe Interactions

As a solution, we proposed a technique based on interactions (ie, service requests and responses) observed at runtime at the published interface of the USB driver when servicing devices. After all, the goal was to test how the driver behaves under different inputs. (The fact that we had recently used such interactions to test compatibility between USB drivers had no influence on this decision.)

So, we combined an existing logging filter driver with an existing device testing setup to log interactions at the interface of the USB driver when testing sample devices. Then, we mined structural and temporal patterns [1] from these runtime logs and used the patterns as features [2] to cluster the devices using hierarchical clustering. From this clustering, we randomly picked representative devices from each cluster and used only the representatives to test the USB driver. The choice of the number of clusters was based on halving the number of sample devices (test suite). To protect against omission errors within clusters, in each weekly test cycle, we picked different representatives from each cluster. (Further, we planned to test all sample devices during each monthly test cycle.) With this simple solution, the testing team was able to use only half the number of sample devices to achieve 75–80% of bug coverage.

Why Did Our Solution Work?

The solution worked as interaction logs captured exactly what happened when devices interacted with the USB driver, ie, the exact inputs/requests provided to the driver and the exact outputs/responses provided back by the driver. The exactness of the data automatically (and aggressively) eliminated numerous possibilities that would have been considered in alternative solutions, such as static analysis of the source code of device drivers. Further, since the logs did not contain extraneous information, we did not have to clean the data. (This may not be true in all cases involving dynamic artifacts.) Also, we reduced the cost of data collection by reusing existing infrastructure to collect logs. While we did not use the history of various data points in the logs, this information existed in the logs, and it could have been used to reconstruct the context and help understand how and why an interaction transpired.

Still Not Convinced? Here’s More

In 2009, DebugAdvisor [3] effort proposed a recommendation system to help with debugging. The idea was, when bugs are assigned to a developer, provide the developer with pointers to institutional knowledge relevant to expedite bug fixing. In DebugAdvisor, dynamic artifacts such as stack traces, core dumps, and debugging logs complemented static artifacts such as bug repository and code/fix ownership to help identify similar bugs reported in the past. The system was successfully piloted within the Windows serviceability group with 75% of the recommendations proving to be useful.

In 2010, the StackMine [4] system mined execution traces from Microsoft Windows to aid with performance debugging. The idea relied on “correlating” patterns observed in call stacks from execution traces to identify potential performance hotspots. This system helped identify performance bugs in Windows Explorer UI that were hidden in Windows 7 and even in previous versions of Windows.

Dynamic Artifacts Are Here to Stay

An observation common to all of the preceding examples is that dynamic artifacts fueled the data analysis that resulted in effective solutions to software engineering problems. Almost always, dynamic artifacts contain information that is either absent in or impossible to extract from static artifacts. Further, dynamic data collection has become easy today, as most software systems are either equipped with or run on top of platforms equipped with the ability to collect dynamic data via telemetry (in the form of logs). Also, while the amount of dynamic data collected from a system can pose a challenge in terms of efficient processing, we can now overcome this challenge by relying on cheap and accessible cloud computing. So, it is time to consider dynamic artifacts as data science worthy artifacts and use them whenever they are available; specifically, to enable and improve software engineering tasks.

So, don’t shy away from or throw away dynamic artifacts. Learn to embrace them!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.91.47