8
Defects Management

In systems-of-systems, defects management takes on its full role, as defects can be identified at the system-of-systems level often affecting several separate systems. It is therefore necessary to ensure that the organizations that developed each of the systems are aware of the existence of the defects, and of the development needs of each of their systems. These organizations do not generally share the same defect management tool, nor the same defect referencing system, which increases the complexity of management. In addition, the durations and cycles for fixing defects vary between organizations, which can lead to the implementation of temporary workarounds, which will have to be managed in configuration.

8.1. Defect management, MOA and MOE

8.1.1. What is a defect?

Several definitions exist to explain what a defect is and how can we separate a defect from a change request. There are often heated discussions between the business and the developers, some saying that the specifications – the user stories – did not specify something, while the others indicated that it was implied, that it did not correspond to normal usage by end users.

Let’s consider that a defect (a bug) is a behavior that does not correspond to something that has been specified or that is an expected behavior. From there, we could define two subtypes of defects: verification defects and validation defects.

A verification defect would be the case where a specification is not present as requested, while a validation defect is a case where the desired service does not allow use in everyday life or expected behavior.

8.1.2. Defects and MOA

The customer organization, also called MOA for project management, must manage the defects identified on the whole system and on each of the subsystems that compose it. This overview makes it possible to manage emergencies, dependencies and the impacts that a defect in one system can have on other systems.

Only the project owner and the project owner assistance teams have the levers to act on the organizations in charge of the subsystems, products and components that make up the system-of-systems. It is common for these organizations to hide behind contractual (and then legal) aspects in an attempt to clear themselves of their obligations.

The clear definition of the defect, their severities and their direct or indirect impacts must, as far as possible, be shared so that all the stakeholders (customer or users and supplier) have a shared view of the defects and the urgency of their corrections.

The processing of defects can be managed with different methods or tools by each of the organizations. The implementation of a defect management workflow makes it possible to share working methods and information.

8.1.3. Defects and MOE

The organization in charge of execution (design and development), the supplier, (called MOE in France), must manage the defects assigned to it.

The development organization can be made up of several development teams (e.g. several Agile teams, or several teams focused on different components). Defects may occur at the boundary of liability between two teams or between two systems. A decision must then be made to decide which system (which team) will resolve the defect. This decision will be taken by the Triage Committee, which will decide on the prioritization of anomalies and their urgency for correction.

8.2. Defect management workflow

There are many ways to organize defect management activities, considering the various teams involved in fixing and verifying those fixes.

The tools on the market allow us to define almost any sequence, but we recommend taking the simplest model possible. The reason is that the execution of the tests is in a period close to the delivery of the system (or the component) and there is not much time available; any complex scheme will result in wasted time dealing with these defects.

The elements to remember in the context of the management of anomalies are:

– the detection phase where the defect is not yet confirmed;

– the triage action where a potential defect is either rejected or confirmed as to be taken into account;

– corrective action by the responsible team. It should be remembered that defects may be present – in addition to the code – in requirements, specifications, documents or in test deliverables (automated or not);

– the action of verifying the correct correction of the defect, which can result in “corrected defect” or “defect reopened because not corrected”; we recommend considering as “reopened” the defects for which the correction generated side effects;

– the action of publishing the software including the corrected defect, in order to be able to “close” the defect. Indeed, if the correction is not provided in the following steps, the defect cannot really be considered closed;

– the various phases where a defect may have to progress to “pushed back”, “delayed” or “rejected”.

8.2.1. Example

Figure 8.1 includes four teams: DEV, TST-SIT, TST-UAT and Deployment. For budget reasons, the TST-SIT and TST-UAT teams share the same environment. TST-SIT focuses on technical aspects (e.g. interfaces and messages), while TST-UAT focuses on functional aspects. A separate team takes care of the delivery aspects and processes.

Schematic illustration of example of defect management workflow.

Figure 8.1 Example of defect management workflow

Note that each team can describe its activities in its own Kanban table, and that the actions available to each stakeholder are clearly identified.

8.2.2. Simplify

The management of anomalies is mainly done after the creation of the code, in the test phases, therefore relatively late in the development cycle. Often, the representatives of the business teams and the management will wish to have a view of the progress of the resolution of the anomalies. It is therefore important that the process is simple and easily understood.

However, it is possible to find anomalies (defects) in the design phases, for example, following reviews or inspections.

Even if the anomaly management tools allow complex workflows, transverse to the development and test teams, it is recommended to keep – at a high level – the KISS (Keep It Simple and Short) principle, even if, within each step of the Kanban or the workflow, the tasks can be multiple.

A simple workflow will lead to simple and easier-to-understand reporting, which will limit unnecessary exchanges and make it easier for everyone to understand.

8.3. Triage meetings

The term “triage” comes from military and emergency medicine, where the actions to be taken depend on the patient’s condition and fall into three categories: (a) treating the patient so that they can return to the front line, or (2) stabilize the patient in order to repatriate them to the back for processing and undergo more serious operations or longer duration recovery, or (3) give tranquilizers to the patient in order to allow them a more peaceful end.

For testing, as part of defect management, this means:

– correct the defect so as to allow delivery on time;

– postpone defect correction to a later version;

– do not correct the defect and consider it as a limitation.

The decision – whatever it is – cannot be made by one person. It will be necessary to have information from the business (urgency or importance for users), from the development team (impacts, efforts and deadlines for correction), from the test team (impacts towards users and impacts on the test project) and sponsors (identification of the cost/benefit ratio, compliance with contractual commitments). The feeling in terms of the necessity and urgency of delivering the correction will vary according to the point of view, the impacts and the deadlines.

Objectives: define the decisions to be taken about the discovered defects, considering the technical and business impacts, the priority and the severity of each defect. At the end of the meeting, DEVs and TSTs should be able to prioritize their actions (including impact analysis to avoid regressions) and the tests to be carried out (including regression tests to be considered). It is this body that decides on the postponement or rejection of defect fix.

Participants: PO or Business representative, test team representative, TM, CP (Dev project manager), representatives of the client and each of the suppliers; optionally: project sponsor.

Frequency: at least once a day during the test execution phase.

8.3.1. Priority and severity of defects

Testing teams identify anomalies which are forwarded to development teams to be fixed. Often, anomalies are detected more quickly than they are analyzed and then corrected. This creates a backlog of anomalies that need to be fixed before moving on to the next step.

Two elements must be taken into account to determine the order of resolution of the anomalies, which are:

– impact, both for the users (e.g. inability to execute a key feature) and for the project (e.g. blocking of tests);

– priority, which is the urgency of correction.

Impact: there are several types of impact: that on the test process, that on the business process, that on the project (in financial terms, deadlines or limitations). The impact on the test process can be that one or more test cases are blocked until the resolution of the identified anomaly. Business process impact is the inability to perform the requested business functionality. In any case, the anomaly – and its correction – will have an impact on the project compared to the initial planning, whether in terms of deadlines (increase in test duration, delay in moving to the next stage), financial (increase in costs, redevelopment and retests, or even penalties if SLAs are not respected) or limitations (inability to perform all the planned functionalities, need for workarounds or degraded solutions).

Priority: the expected order of remediation of anomalies, often defined as “Critical”, “Major”, “Significant” and “Minor”.

Severity: the severity is sometimes added in the elements defining the faults. We consider severity as a constituent element of the impact and therefore overlapping with it.

As for the calculation of the RPN, we can define a scale of values from 1 to 4 (where 1 is the most important) and multiply the severity with the impact in order to obtain a priority.

8.3.2. Defect detection

To be effective and efficient, it is necessary to limit unnecessary work. The execution of the tests on a component, product or system being carried out in a phase close to the delivery of this component, product or system, it is important not to waste time while not skipping a step.

In general, test cases focus on one aspect at a time, the one described in the test case specification document, defined from the requirement. This does not exclude, during the execution of the tests, identifying failures which would not have been specifically noted in the description of the test. It will then be necessary to describe, in the anomaly management tool, the test case that led to the result obtained (different from the expected result) and – if the expected result is not described – to explain how the result obtained does not correspond to what was expected.

The number and type of anomalies should be monitored throughout the project to compare them against expected values (e.g. average of values from previous projects). Any deviation from the expected values will impact the remediation, retest and regression testing loads, as well as the turnaround times of the test campaigns.

8.3.3. Correction and urgency

The detection of defects implies – depending on the criticality of the failures found – a need for correction of these defects by the design teams. We all know that any activity done in a hurry is more risky than an activity done carefully. For example, it is when we are in a hurry, when we take shortcuts, that problems arise that aggravate the initial issue. When defects are detected late and fixing delays can jeopardize delivery dates, it is tempting to work in a hurry, which unfortunately always has negative results:

– The developers or designers focus on fixing the defect, often without taking the time to analyze in detail the impacts and possible side effects. They therefore quickly provide a correction, but do not always take the time to carry out all the planned checks (code review, static analysis, unit and integration tests, etc.) given the urgency.

– The testers receive the corrected version and focus their tests on this correction. Sometimes, testers find themselves blocked soon after the location of the correction by another defect that had not yet been identified. And the cycle begins again… with even less time and more urgency, therefore even more risk of error.

Some defects generated during emergency fixes may be in other parts of the component or product. Retesting limited to the corrected component will not be able to detect these defects. Hence, it is important to respect the test processes as well as the test sequence provided for in the strategy.

8.3.3.1. Criticality of corrections

Fixing anomalies is a thankless task and often less well perceived than designing a new code. It is, however, a critical task in the project in the sense that it is on the critical path of the project: the anomaly has been identified by tests (which are already on the critical path) and will involve retesting (to verify the correction) and the regression test on the other components (to ensure that these other components do not suffer from side effects). So, there will be an increase in the test workload, the duration of the tests as well as the development load. All of these activities will delay an activity that is on the critical path, so they also become critical.

Any delay in the correction of anomalies, whether due to a lack of competence of the individuals assigned to the correction, or for any other reason whatsoever will have a significant impact on the costs and delays of the tests.

8.3.4. Compliance with processes

Whatever the level of testing, and even more so when there are emergencies, it is important to ensure that the processes are respected. Processes are generally designed with the involvement of all stakeholders and considering impacts and dependencies. Non-compliance with the processes can lead to failures and undesired impacts.

It is specifically to avoid unwanted impacts, which are more frequent in emergency cases, that compliance with the processes and/or checklists associated with these processes is important. For example, in aviation, pilots and co-pilots still use checklists even though they are used to using their aircraft, which guarantees that the processes are respected.

For anomalies identified late, there will be significant pressure from the production teams (who want to move on), users or product owners (who want the delivery of features), the hierarchy of both MOE and MOA (who know the financial impacts), etc. We have seen many times cases where anomalies, considered minor, were delivered and had a major impact on users. We have also seen bugs fixed quickly and which generated critical side effects just because the processes that were not respected were not able to find these side effects.

8.4. Specificities of TDDs, ATDDs and BDDs

The principle of development with these techniques is to design the tests before designing the code. It is therefore normal that defects (tests not successfully executed) are present during code development. Once the components are completed and delivered for the build, any identified defect should be in defect management.

8.4.1. TDD: test-driven development

TDD is a development method often recommended for Agile cycles, but which applies to many low-level development models. TDD consists of fast iterations of:

– design of automated tests – unit tests – that will verify that the code does what it is supposed to do; these tests will not pass directly because the code is not yet written;

– design of the minimum code so that the tests written previously pass successfully;

– periodic refactoring to remove duplications and simplify the code.

Implemented correctly, TDD and automated test generation help ensure the code continues to work (no regressions introduced) and unnecessary code design is avoided. Challenges associated with TDD, ATDD and BDD include the following:

– testing is a verification or validation activity by someone other than the developer, whereas in TDD the same person defines the tests and the code; it is therefore more of a design technique;

– tests designed by developers should include both positive and negative tests; designing tests using all test techniques will generally not be implemented, which will allow defects (false negatives) while the technique is trusted;

– the inability to rely on the automatic measurement of anomalies because the principle of designing the tests while the code is not present implies that there will be defects;

– implementing the TDD technique on a system-of-systems implies re-executing the tests on all the components each time the code is saved; this mode of operation will generate an overload of work for the execution of the tests;

– finally, the development being done continuously, with the addition of new components (incremental model), the architecture and the design which emerge may not be optimal, and their questioning may not take place.

Note the impact of cognitive dissonance that appears when tests – including automated TDD-type tests – focus solely on confirming the paradigms envisaged by designers and developers. Such tests only confirm a hypothesis, whereas negative tests would ensure that the hypothesis is – indeed – correct. We are faced with a cognitive bias and not faced with a reality. The tests may therefore be of no use at all. For more information, see Syed (2015).

8.4.2. ATDD and BDD

ATDD is a variation of TDD, where we define what the application must do, in the form of user stories.

ATDD consists of developing software from testable requirements. The concept of “acceptance” in the context of the ATDD does not correspond to the use of the term “acceptance” proposed by the testing industry, standards and the ISTQB. In the context of the ATDD, “acceptance” refers to accepting a software component, not considering that the product corresponds to all of the user’s needs. The principles of development based on verifiable requirements have been present for many years. ATDD manifests itself among others in BDD (Behavior-Driven Development), EDD (Example-Driven Development), SDD (Story Test-Driven Development), DDD (Domain-Driven Design) and EATDD (Executable Acceptance Test-Driven Development). Among the useful works on TDD and ATDD, we have Beck (2003) and Pugh (2010).

BDD (also called “specification by example”) merges TDD and ATDD, and adds various practices such as:

– association with user stories based on the “5 Why?” principle, to clarify business objectives and needs;

– think from the outside in, to only implement behaviors that directly contribute to business needs;

– describe behaviors with a unique notation, directly accessible to domain experts, testers and developers, in order to improve communication;

– apply these techniques down to the lowest levels of software abstraction, paying particular attention to behavior distribution to keep evolutions cheap.

The BDD approach is mainly conceptual and does not require any particular tools or languages (only the concepts of “given” – “when” – “then”). It is nevertheless necessary, in addition to the reflection of the expected behavior, to reflect on the behaviors that the system should not show as exceptional behaviors. This identification – of exceptional cases – is too often forgotten when specifying the expected behaviors.

AdvantagesDisadvantages
Allows us to automate the input acceptance of software componentsRequires defined and testable requirements (user stories)
Non-standard meaning of “acceptance”

8.5. Defects reporting

During the execution of the project and specifically during the testing phases, the test manager must follow the progress of defects, their assignments and their status. This is necessary to obtain a correct picture of the progress of the quality of the project. The monitoring will therefore have to adapt to the objectives and risks identified on the project, as well as to the progress of the project.

The traceability of defects with quality objectives links the defects – and their corrections – to the quality objectives, functionalities and areas planned for the project.

The adaptation to the quality criteria will link the defects to the quality characteristics considered important for the component, product, system or system-of-systems.

Warning: during a defect triage meeting, as during test sessions where the participants are very concentrated, it is common to lose track of time. As time is a scarce and non-renewable resource, it is necessary to ensure that participants are informed and aware of the passage of time. This applies, among other things, to the concept of “time boxing” where the time allocated to an action is limited.

8.5.1. Defects backlog management

Increase of the defects backlog – the set of defects that need to be fixed at a given time – can have multiple causes such as:

– defects discovered faster than they are corrected;

– inadequately sized defect fixing team (too small sized) or one that does not include the right profiles;

– problems of prioritization of corrections versus new developments.

Clearly, bug fixing is not a very glamorous job and it often deals with developments that took place some time ago. It is very important to realize that any delay in the correction of anomalies will have an almost direct impact on the delivery date of the component, product, equipment or system as well as the system-of-systems. Indeed, it is not enough only to correct the anomaly and to retest; it is also necessary to ensure that there is no side effect in the rest of the system, and of course to manage all this in management configuration.

Analyzing the defect backlog will help determine if development teams can fix defects in a timely manner, with the resources available to them. As these development resources are already assigned to the development of other components, it is important to ensure that the best profiles are assigned to fix defects.

Deciding that defects are postponed to a later delivery is generally not a permanent solution: it is a little like pushing a pile of sand which ends up creating an immense mass. One solution is to assign a percentage of the development load (e.g. 10% or 20%) to reducing the backlog of defects.

How to calculate the impact of the backlog? If we have measured the average defect remediation load (development, retest and regression load, plus defect management load), we should easily measure whether our resources are sufficiently sized to correct the defects in the defects backlog within the prescribed deadlines.

In Agile methods, it is considered that the defects on the user stories of the sprint are to be corrected in the sprint in question; otherwise, they cannot be delivered in the sprint. Defects identified on user stories delivered in previous sprints become items in the next sprint’s backlog.

The case of defects blocking the tests is special: if they are not resolved, the test team will not be able to continue working. These must be resolved as a priority.

In a cumulative defects detection and correction graph (see Figure 8.2), the horizontal gap between the detection curve and the correction curve gives the average correction time; the vertical distance between the two curves gives the backlog. Two curves with diverging trend lines will show an increase in both correction duration and backlog. The objective is for the detection curve to tend to no longer increase and for the correction curve to tend to join it.

Graph depicts the defect detection and correction graph.

Figure 8.2 Defect detection and correction graph.

8.6. Other useful reporting

In addition to the reports mentioned previously, it can be useful to ensure that the project being designed is heading in the right direction. If we follow the analyses in Jones (2018), more than 35% of projects larger than 10,000 function points are canceled and 70% of large projects that are not canceled exceed their budgets, exceed their deadlines or do not provide an adequate level of quality. So, we have a strongly negative return on investment since the expenses have taken place and the supply of a usable product is nil. Among the elements to consider having a better control of the project, we have:

– The continuous monthly increase in the number of function points, often up to 1.2% per month, which was not initially estimated. This increase is normal – evolution of business needs – but the absence of measures leads to increased costs and delays that must be considered. The measurement can be carried out via a measurement of the function points added (or modified or deleted) monthly and the effort required within the framework of the project.

– The efficiency of detecting and removing faults (DRE for Defect Removal Efficiency). This rate should be 99.5% when it is – too often – close to 85–90%. The higher the rate, the higher the quality of the software given the removal of anomalies.

– The number of working hours and unpaid overtime attributable to the project. This measurement will allow us to compare the activities carried out in the various locations – and therefore the various work habits of each country – in order to compare the efficiency rates of the teams with greater precision. We will thus be able to have a more precise vision of the workloads and the efficiency of each co-contractor, which is the basis of accurate reporting.

– The target value of the number of defects expected for the project. This number is the sum of all faults divided by the total number of function points. This rate corresponds to the injection of defects, that is, the number of anomalies introduced during each of the design phases, including during corrections of anomalies. This number allows us to estimate the effectiveness of verification, validation and testing activities throughout the project.

8.7. Don’t forget minor defects

In a system-of-systems as in software systems, large or catastrophic failures are often the combination of several small anomalies. As Amy Edmondson, professor at Harvard Business School, puts it, “Small failures are the essential warning signs to avoid catastrophic failures in the future.” It is therefore important, even critical, to measure, track and deal with all defects, including minor ones, even if they have been directly corrected. We suggest that this recommendation be implemented at least as soon as a minor defect is repeated, which leads to an obligation to systematically measure all faults.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.95.150