CHAPTER 5: Documenting Policies and Procedures

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

images

Documenting Policies and Procedures

During your initial 90 days, one of the things you need to look at is how your team’s processes and procedures are documented.

To the extent that you can use standard, documented processes, costly errors will be avoided. Not only does that make the environment more stable, it improves the quality of life of your staff, especially your senior staff.

Your documentation should be easy to find, edit, and update. There are a number of possible ways of doing this, including file shares, wiki or blog pages, and SharePoint. If there is an organization-wide standard, it makes sense to support it.

Documentation should be produced for both internal team use and for people outside of the team.

The documentation for people outside of the team should focus on what services the team can provide, and how requests should be filed and formatted. People should be instructed how to open a ticket, and they should be told what type of information and approvals are required for what types of requests. (If budget approval is required, that should be included in the request as well.)

If people know what information they have to provide for your team to fulfill the request sooner, it will help both the requester and the person from your team who executes the request.

Your documentation repository should include some type of index to make it easier to find documents (and to avoid creating duplicates). It should also include links out to other teams’ documentation or to organizational standards that need to be followed.

It is possible to go crazy with documentation. The bottom line is the bottom line, and you will be judged by the functionality and reliability of the environment you are responsible for. Documentation is a tool to help you make the environment more stable and more flexible. Start with the documentation tasks that will help you achieve these goals, and work to build a culture where documentation is part of every deployment and every change.

POLICIES, STANDARDS, PROCESSES, AND PROCEDURES

Policies are statements by management that provide requirements to guide the people who implement technologies and processes.

Standards are organization-wide platforms and methods that support the policies.

Processes are business methods and practices that support the standards.

Procedures are step-by-step instructions on how to carry out a task in a way that supports the organization’s policies and standards.

Procedures

Ideally, every task would have a specific, standard procedure that has been vetted and approved as complying with policy. That is the ideal, but it is a very tall order.

In the real world, your group probably has some well-defined procedures, some procedures that have only recently been drafted, some procedures that exist only as emails or comments in a script, and some procedures that are so out of date that they would break the environment if someone actually tried to carry them out.

Some of your early wins should come from identifying the biggest gaps in your documented procedures and filling those gaps. Focus your attention first on procedures that:

Are frequently executed.
Have a similar procedure each time they are executed.
Are able to be executed by more junior team members.
Are related to security or audit concerns (e.g., access-related requests or patch installation procedures).

In particular, if it is a procedure that will be entered into change control requests or audit responses over and over, it is far better to have a standard document that can just be attached (or a link that can be provided).

You may need to work to get policies approved, if you see a gap in the policy structure. At the technology team level, your focus will probably be on procedures and maybe standards. Policies and standards are discussed in greater detail in the final two sections of this chapter.

Procedures as Controls

When management defines a policy, they expect that the organization will comply with it. The enforcement of these policies is only possible if we track and verify compliance and report breaches of the policy.

A control is a method or facility that is used to enforce a policy. Procedures that implement a policy are examples of controls. Compliance teams may call on technology teams to certify that the procedures are documented, and that they are in active use.

Demonstrating compliance can be done by logging changes, along with the procedures used. If the procedure can be automated, it becomes even easier to demonstrate compliance, especially if the automated procedure includes a logging step.

Changes that are controlled by a policy should be tracked and logged. If you have a working ticketing system, it is easier to require that the requests be filed as tickets in a standard format. Then the person fulfilling the request can update the ticket with the procedure used when the ticket is marked as resolved.

When new facilities are added, controls should be built into the system at implementation time. The initial requirements should include implementing the controls. Because new implementations should be tested anyway, the testing procedures may serve double duty as compliance controls.

Automation

While looking for procedures that are priorities for documentation, keep an eye out for simple tasks that are done frequently. To the extent possible, these should be automated and even made into self-service requests for the customer.

Another target for automation would be multistep procedures that are not done frequently enough for people to be able to remember all the steps. If these can be scripted and documented, it will make a big difference in the reliability of the environment.

You will never be given enough staff or enough time to carry out the full extent of your duties. The way to work around this is to identify time-consuming tasks or frequent tasks that can be automated. It takes some extra effort to develop, review, test, and deploy automated processes, but the time saved can be rolled into the next automation project. If you pick the right targets, automation projects can be good sources of early wins when you take over an environment.

Change Control

Uncontrolled change is the enemy of a reliable environment. There are several important components of a functional change management system:

A system for obtaining approvals from stakeholders for changes.
A system for tracking change requests through the review and approval process.
A process for reviewing change requests.
Each change needs to include a brief description of the change, a specific plan of how the change will be executed, a specific plan for backing out the change, and the success criteria that will be used to test the change.

Each change should pass through several phases on its way to implementation:

Assessment. The risks of implementing the change need to be considered. Part of the assessment is testing the change and the back-out plans beforehand.
Planning. A procedure needs to be created and reviewed for executing the change, testing it, and backing out if necessary.
Testing. The procedure should be tested. How it is tested will depend on the nature of the change. If the procedure is standard, additional testing may not be necessary. If like-for-like testing is impossible, the procedure needs to be reviewed by the relevant subject-matter experts.
Communication. The change needs to be communicated to all of the stakeholders, including the information about what the change is, what the expected impact is, why it is necessary, and when it will be executed. The time, format, and manner of these notifications should be standardized, so people know where and when to look for them.
Authorization. The appropriate customer or service owner needs to approve the change, based on the information from the procedure and the assessment.
Documentation. The change needs to be thoroughly documented in a standard manner and location. Each change record should include information on the requester, approver, system status before the change, reason for the change, specifics of the change, status after the change, whether the change was successful, whether the back-out was successful (if the change failed), dates and times of the different elements, and contact information for the implementer.
Validation. The requester and/or end-user community needs to validate the system after the change and provide a certification that the change was successful.

Part of making the change control system work properly is that you need people in different roles to review and approve changes:

Change requester. This may be an internal customer, or it could be someone from the technical staff. The change requester specifies the requirements and scope of the change.
Change owner. This is the person who shepherds the change request through the change control process.
Change implementer. This is the person who is responsible for implementing the change.

Incident Response

An incident is something that occurs outside of the normal functioning of the environment. Some incidents may result in a production service being affected in a measurable way. Depending on the nature of the impact, it may be necessary to execute an emergency change.

A typical incident will flow through the following phases, illustrated by Figure 5-1:

Figure 5-1. Incident Response

Detection and recording. Ideally, the monitoring system should open a ticket directly into the correct queue.
Classification and initial response. Someone from the team owning that queue verifies whether there is a legitimate incident, and does a preliminary investigation into the scope and cause of the problem.
Investigation and diagnosis. The relevant technical teams perform a more thorough investigation to diagnose the problem and propose a resolution.
Resolution and recovery. The resolution is implemented and the service recovered.
Incident closure. The incident is closed, and the customer representative validates the service.

The incident response framework needs to consider some important elements of incident response: ownership, monitoring, tracking, and communication.

WHEN YOU DON’T HAVE TIME TO ASK PERMISSION

Sometimes there is not time to get proper authorization for a change in advance. If there is an outage or other emergency, it may be necessary to change things on the fly to restore service. There are a few principles to keep in mind when you find yourself in this situation:

Don’t do anything that can’t be reversed.
Keep track of everything you do.
Follow the organization’s incident response process.
Report on the changes you introduced into the environment after the fact.
Those changes should be reviewed once the emergency is past. If the review board decides to keep them, they can stay as part of an emergency change. Otherwise, they can be reverted as part of a scheduled change.

Don’t hide the actions you take to resolve a problem. Report them honestly and help clean up and document the environment as needed.

Incident response should be handled in a standard way to instruct and protect the operations and implementation teams. Here are some common elements that need to be defined:

Time frame for the response as frequently defined in the Service Level Agreement (SLA) for the facility in question.
Notification requirements for the response and stakeholder teams.
How problem resolutions should be tracked, logged, and approved.
How logs and other evidence should be preserved.
Process for analyzing logs, documentation, and other evidence to perform a root cause analysis.
Change control process for implementing necessary changes or for approving emergency changes.

Outages need to be tracked, including impact to service levels, the outcome of the root cause analysis, and the actions taken to resolve the outage. Outage reports should be stored in a standard format and location so they can be reviewed to look for patterns on how to improve the reliability of the environment.

Policy Approvals

Because policies bind the organization, they need to be approved at a level that will be recognized.

A lot of organizations have to provide evidence of compliance with policies as part of audits. This involves setting up controls and investments of capital and staff time to maintain the controls. Because most organizations have regulatory and contractual requirements that they have to comply with anyway, it makes sense to have policies that reinforce compliance.

There are several constituent groups within the organization that need to approve a policy for it to take effect.

Depending on the nature of the policy, the legal or compliance department may need to review the proposed policy to make sure it meets regulatory or contractual requirements.
The architecture team needs to review the policy and possibly raise flags about costs required to change the technology landscape to make compliance possible.
The technology teams who would implement the policy need to review it to raise flags about additional costs or requirements that will be needed to support the policy.
Management needs to review the policy to decide if it represents a business priority, and if the proposed policy reflects the organization’s goals.

The approvals for a policy need to happen at a sufficiently high managerial level that the policy’s legitimacy will be recognized by all the relevant teams.

Once a policy is approved, it needs to be published. Organizations should have a standard way to store, approve, and communicate policies. If policies are not published in a standard location, people will not be able to refer to them to comply with them.

Policies are supported by standards and procedures. Part of the approval process for a policy has to include drafting and approvals of these standards and procedures. But there is no need to have those standards and procedures approved by upper management.

In some cases, it may be appropriate to refer a new standard or procedure to the compliance team, but the technology architecture and implementation teams usually control things at that level. Just as most technology people are not lawyers, most lawyers are not competent to judge technology.

Standards

Sometimes technologists see standards as confining. When an enterprise standard exists, sometimes that means that the absolute optimal solution for a particular problem cannot be found.

Ideally, standards should be liberating. If there is a standard way to roll out large parts of the technology infrastructure, it will help the operations teams maintain the technology, as it reduces the number of platform issues that need to be tracked and resolved. It will help the development teams because templates and code libraries are able to be re-used. And it will produce a better overall environment because standards promote more efficient, reliable functioning of the environment.

Summary

Documentation is the key to mature, effective, repeatable processes. Effective procedures are documented, followed, and updated. Learning to manage documentation effectively will be a key to your long-term success as a manager.

Discussion Questions

Effective policies are clear and definite. Write a policy for something in your environment that you know well. What was the hardest part of writing that policy?
Examine a procedure document you produced in the past. If you did not understand the procedure well, would you be able to follow the document? How frequently should this document be updated?

Table of Contents for CHAPTER 5: Documenting Policies and Procedures

Create new playlist

Sign In

Sign Up

Table of Contents for
CHAPTER 5: Documenting Policies and Procedures