Test early, test often. This idea is as true for code as it is for architecture. Even when the architecture is only in its infancy, before we’ve written any code, there’s always something we can test.
There are three things we’ll need to evaluate an architecture. First, we need an artifact, a tangible representation of the design. Next, we need a rubric, a definition for better or worse from the stakeholders’ perspective. Finally, we need a plan for helping reviewers generate insights so they can form an opinion about the goodness of the architecture.
Before we can evaluate anything, we need to have something to evaluate. We can evaluate real things, not ideas. We need to prepare a tangible artifact to evaluate. The artifact could be as simple as a whiteboard sketch or as detailed as a full architecture description.
By Ipek Ozkaya, senior member of technical staff at the Software Engineering Institute at Carnegie Mellon University
My work involves helping organizations and teams improve their systems’ quality from the perspective of the fitness of its architecture. An unavoidable and obvious request in these engagements is “Show me your architecture.” Over the years, based on the responses I get, I’ve developed a personal catalog of misconceptions about architecture and architecting.
While the collection of all your use cases and their behavioral traces, such as those you capture in sequence diagrams, are useful and important to your system, they do not provide the right level of abstraction to reason about classes of behavior of the system.
The bottom line of any architecting effort is to design and implement a system that meets its business and stakeholder goals. Working code is the inevitable reality. However, architectural concerns cross-cut many elements in the implemented system. An effective architecture review is bound by those architecturally significant requirements and all the elements they touch. Traditional code review practices do not cover this end-to-end perspective.
In cases when a team does have an artifact, often called the Software Architecture Document, and sadly referred to as SAD, the document contain depictions of ad hoc box-and-line drawings. This is a great start, but discussions are long on the boxes while the lines are completely forgotten. This is unfortunate because many times the lines carry the most critical aspects of the architectural decisions.
Of these three misconceptions, the most significant one is to appreciate the importance of lines in software architecture diagrams. Think about it. If you aim to increase performance, then you focus on the frequency and volume of the inter-element communications. If you want to increase modifiability, you limit interactions between elements. If you want to optimize security, you protect the inter-element relationships. All of these are represented by the lines!
Many architectural decisions are carried on those thin, often forgotten lines. Since so many architectural decisions are carried by the inter-element relationships, one of my first recommendations to teams is to get a better understanding of the relationships between the elements. Treat the lines as first-class citizens in your architecting journey!
Tangible things to evaluate are easy to find. Here are some ideas:
Write some code.
Sketch a model on paper or a whiteboard.
Draw a model in a diagramming application.
Prepare a slide-based presentation with different views of the architecture
Summarize the results of an experiment in a presentation or whitepaper.
Create a traditional architecture description (introduced), architecture haiku (see Activity 21, Architecture Haiku), or set of Architecture Decision Records (Activity 20, Architecture Decision Records).
Build a utility tree showing which quality attributes are promoted by different components.
In The Four Principles of Design Thinking you learned to make things tangible to facilitate communication. The artifact used during an evaluation will communicate our best intentions for how we plan to (or in some cases, already have) addressed the ASRs.
Prepare artifacts that are likely to solicit the type of feedback we want. For example, if we want reviewers to focus on a specific quality attribute, then the artifacts should include views relevant to that quality attribute. If the architecture is young and we want general feedback, consider using sketchier diagrams to show that the design is in flux. If the design decision is about something high risk and high cost, favor greater precision and formality to communicate the seriousness of the matter at hand.
Every architecture exhibits shades of right and wrong. One reviewer may conclude the architecture is a masterpiece whereas another proclaims it a dumpster fire. A design rubric defines the criteria reviewers should use when judging the fitness of the architecture.
Rubrics consist of two parts. Criteria describe the characteristics used to evaluate the design artifacts. Ratings describe the scale used to interpret the characteristics. Typically, rubrics take the shape of a matrix.
In the example from Project Lionheart we’re using quality attribute scenarios as the criteria, listed on the left. On the right, reviewers enter their ratings using the provided scale, which is described at the bottom.
Let’s explore how we arrived at this rubric and define some general advice for creating rubrics.
Architecturally significant requirements define the software’s purpose from a stakeholder’s perspective. In Chapter 5, Dig for Architecturally Significant Requirements, you learned how to specify ASRs in a way that enables analysis and evaluation. If we define ASRs in a precise, unambiguous, and measurable way, then we can use them to help define a rubric for evaluation.
Using the ASRs as a guide, we can select a rubric’s criteria. The best rubrics meet the following conditions:[24]
The criteria in a rubric defines what we think a good architecture should look like relative to the ASRs. Criteria should not include ideas that are nice to have or frivolous details not required for the architecture to be fit for purpose.
Criteria within the same rubric should not overlap with one another. Each criterion is one facet of the overall fitness of the design. Ideally each criterion can be assessed and scored independently.
Reviewers must be able to assess and score the criteria in the rubric. The artifacts we prepare for the evaluation will make criteria visible. The activities we perform during the assessment will collect data that lets us measure the criteria.
Every reviewer should interpret the criteria in the same way.
Quality attribute scenarios should already meet these recommendations and always make for good criteria.
During the evaluation reviewers will judge the criteria using a provided rating. Rating scales define what needs improvement, good, better, and best looks like. The size of the scale depends on the goals of the evaluation. Here are some different rating scales and when they might be appropriate to use:
Scale Size | Examples | Use it when… |
---|---|---|
2 | yes or no; condition satisfied or not satisfied | Acceptance is all or nothing for a condition, standard, or presence of an item; single or few reviewers |
3 | fail, pass, or awesome; never, sometimes, or always; low, acceptable, or high | There is a minimum acceptable threshold but also a preferred expectation for the design; multiple reviewers are involved |
4 | never, sometimes, usually, or always; fail, fair, pass, or exceed | Detailed feedback is desired; expectations can be nuanced or involve multiple pieces |
5+ | choose a number 1–10 | Avoid using. Too many options in the rating scale lead to inconsistent reviews. |
In our example from Project Lionheart here, we chose to use a simple 1–4 scale so reviewers could offer more feedback about the design. To use the rubric, multiple reviewers will score the criteria and we’ll average the results. We’ll also highlight any criteria that scored a 1 for further discussion, even if the criteria have an acceptable average score.
This example rubric captures scores well but doesn’t have space where reviewers can explain why they scored criteria the way they did. Scores are an easy way to assess the design quickly, but knowing what reviewers were thinking when they scored different criteria is invaluable information.
We have artifacts. We have a definition for better and worse. The last step is to help reviewers score the rubric by helping them generate insights about the architecture.
Design rubrics contain answerable questions as shown in the figure. We find the answers by helping reviewers generate insights about the design, which reviewers use to form opinions about how well the design satisfies the ASRs.
We can generate insights in a number of different ways, such as questionnaires, directed explorations, risk elicitations, or code analysis. To help decide which activities will bear the most fruit, we’ll need to figure out what information is required to answer our rubric. Here are a few examples:
Rubric Criteria | Insights to help score the criteria |
---|---|
Amount of Risk | Identify risks with risk storming (described) or a general risk elicitation workshop; examine the number and severity of risks identified |
Amount of Uncertainty | Generate open questions with a question-comment-concern workshop (described); examine the number of open questions and estimate how difficult they are to answer |
Reviewer Consensus | Use multi-voting, surveys, thumbs up/down, and ratings |
Design Completeness | List known components and their current design state; define a threshold for complete and more work needed |
Fit for Problem | Walk through quality attribute scenarios (described) and identify sensitivity points, problem areas, risks, and questions |
Technical Debt | List value adding use cases that cannot be implemented with the current architecture; estimate the cost to prepare the architecture for the use case |
Quality | Count the number of defects by architectural component and define a threshold for high and low quality |
The bulk of the effort in an architecture evaluation is spent generating insights. Since insights are often generated collaboratively during workshops, let’s learn more about how to plan and facilitate an architecture evaluation workshop.
Good evaluators ask the right questions. Learning to ask the right questions takes practice. Write down seven or more questions about the architecture of a recent project for which you don’t know the answer. Why seven? We want to move past the obvious to find interesting things others might have missed.
Here are some things to think about:
Be specific. General questions only provide general insights. The more specific the questions, the more actionable your insights.
What do you know (or not know) about the relations in the architecture?
Are there one or more views of module, component and connector, and allocation structures?
What worries you? Playing what if… is not a fun game, but worries are often the seed of real engineering risks.
3.143.5.248