Live site reviews

After an alert is triggered and the team has responded and remediated the situation, it is time to evaluate what happened. This is called a live site incident review. Here, the whole team gathers to address the following:

  1. What happened—to start, a timeline should be constructed from the time the incident was discovered to the point that normal operations were restored. Next, the timeline is expanded with the events that led to the situation that triggered the incident.
  2. Next, the series of events is evaluated to learn what worked well in the response. If one member of the team used a new tool to quickly diagnose a problem, this can benefit other members of the team as well.
  3. Only then is it time to look at the possible points of improvement and translate these points into high-priority work for the team. Possible fail-safes are identified and scheduled for implementation or new alerts are identified that send an alert before a similar problem occurs again.
  4. The alert or group of alerts that triggered the initial response is evaluated to determine whether they are adequate or possibly contain duplicates.

The best time for a live site incident review is as soon after the incident itself as possible. In practice, this means giving everyone enough time to rest and recuperate and plan a meeting for the next business day.

This completes our overview of Application Insights and the Azure Monitor capabilities for instrumenting web applications. The following section describes several approaches for integrating Application Insights and Azure Monitor with other tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.160.156