This chapter covers some key ways to enhance the transparency of a model. This represents an important aspect of best practices, since it is one of the major approaches to reducing the complexity of a model. The main themes relate to:
Perhaps the best way to gain an understanding of what is required to maximise transparency is to do one (ideally both) of the following:
The “principle of entropy” applies to models: the natural state of a system is one of disorder, and most actions tend to increase this. Crudely speaking, there are many ways to form a pile of bricks, but the creation of a stable wall, with solid foundations that are sufficient to support the weight above it, requires proper planning and design, the right selection from a range of possible materials, and robust implementation processes. Analogous comments apply to Excel models and their components: the creation of clear and transparent models requires planning, structure, discipline, focus and explicit effort, whereas poor models can be built in a multitude of ways.
Following the discussion in Chapter 3, any action which increases complexity without adding required functionality or flexibility is the antithesis of “best practice”; the use of such complexity-increasing actions is therefore a simple way to highlight both the causes of complexity as well as identifying approaches to reducing it.
As a simple example, Figure 7.1 shows a small model whose key output is the profit for a business, based on using input assumptions for price, volume and costs. The model should be almost immediately understandable to most readers without further explanation.
This model could provide the same results, with a lot less transparency if one were to:
The key points that can be established by reference to the above example concern the techniques that could be applied in reverse, i.e. to start with a model such as in Figure 7.4 (or perhaps the more complex version with some of the items contained in other worksheets or in linked workbooks), and to can transform it into a clear model (i.e. as shown in Figure 7.1). This would require a few core elements, that also encompass many elements of general best practices:
These issues (and other related points) are discussed in detail in the rest of this chapter.
A core principle to the creation of transparency (and a reduction in complexity) is to minimise the total length of all audit paths. Essentially, if one were to trace the dependency and precedence paths of each input or calculation, and sum these for all inputs, the total length should be minimised. Clearly, a model with this property is likely to be easier to audit and understand than one in which there are much longer dependency paths.
Another core principle is to ensure that audit paths are generally horizontal and vertical, with a top-to-bottom and left-to-right flow.
These principles are discussed in this section. We note that the principles are generally aligned with each other (although may conflict in some specific circumstances), and that there may also be cases where a strict following of the principle may not maximise transparency, with the “meta-principle” of creating transparency usually needing to be dominant, in case of conflicts between the general principles.
An initial discussion of modular approaches was covered to some extent in cover in Chapter 6. In fact, although that discussion was presented within the context of the overall workbook structure, the generic structures presented in Figure 6.1 are more widely applicable (including at the worksheet level); indeed, Figure 6.1 is intended to represent model structures in general.
In this section, we discuss the use of modularised structures within workbook calculation areas. We use a simple example to demonstrate how such structures are often more flexible, transparent and have shorter audit paths than the alternatives.
The file Ch7.1.InputsAndStructures.xlsx contains several worksheets which demonstrate various possible approaches to the structure and layout of the inputs and calculations of a simple model. Despite its simplicity, the example is sufficient to highlight many of the core principles above.
Figure 7.5 shows SheetA1 of the file, which creates a forecast based on applying an assumed growth rate in each period, and with the input assumptions (cells C3:D3) held in a central area, with the calculations based on these being in Row 8 and Row 9.
This is a structure that is frequently observed, and indeed it conforms to best practices in the sense that the inputs are held separately and are clearly marked. However, one can also observe some potential disadvantages:
Of course, the formulae in Row 11 and Row 12 can be corrected or rebuilt, resulting in a model shown in Figure 7.7 (and in SheetA3 of the file).
Note that an alternative approach would be to use “localised inputs”. Figure 7.8 (SheetA4 in the file) shows the approach in which the calculations use values of the inputs that have been transferred from the central input area into the corresponding rows of the calculations. Note that the initial calculation area (Row 8 and Row 9) can then be copied (to Row 11 and Row 12), with only the cells in the transfer area (cells C11 and D12) needing to be relinked; the completed model is shown in Figure 7.9 (and contained in SheetA5 of the file).
The approach with transfer areas largely overcomes many of the disadvantages of the original model, in that the audit paths are horizontal and vertical (not diagonal), and the calculation areas can be copied. Note also that the total length of all audit paths in this approach is shorter than in the original model: although a single diagonal line has a shorter length than that of the sum of the two horizontal and vertical lines that arrive at the same point, the original model has more such diagonal lines: for example, in Figure 7.7, there are three diagonal lines from cell C3, whereas in Figure 7.8, these are replaced by a single vertical line and three shorter horizontal lines.
It is also worthwhile noting that, whilst diagonal audit paths can be easy to follow in very small models, they are very hard to follow in larger models, due to the difficulty of scrolling diagonally. Hence the importance of having only horizontal and vertical paths as far as possible.
The above approach, using modularised structures and transfer areas for the centralised assumptions, has the potential advantage that all inputs are shown in a single place. It also has some disadvantages (albeit less significant than in the original model):
An alternative approach is therefore to use fully modularised structures from the beginning. Figure 7.10 (SheetB1 in the file) shows the use of a fully modularised structure for the original (revenue only) model.
Figure 7.11 (Sheet B2 in the file) shows that the original module can easily be copied and the input values altered as appropriate. It also shows the simple flow of the audit paths, and that they are very short.
It is also worth noting that the length of the audit paths in this approach is in proportion to the number of modules (since each module is self-contained), whereas in the approach shown in Figure 7.9, the audit paths are not only longer, but also have a total length which would increase according to the square of the number of modules (when in the same worksheet): the audit path from the central area to any new module has a length which includes all previous modules. Therefore in a larger model such as one would typically have in real life, the modular approach affords much better scalability.
In terms of linking this discussion to the structure presented in Chapter 6, we noted that the generic best practices structures (Figure 6.1) may apply at the worksheet level as well as at the workbook level. Thus, the fully modularised structure represents part of a Type III structure (in which only the local data and calculations have been completed, but these have not yet been brought together through the intermediate calculations, such as that of profit as the difference between revenue and cost, nor linked to other global assumptions (such as an exchange rate) that may be relevant in general contexts).
Note that, the placement of “local” inputs within their own modules does mean that not all the model's inputs are grouped together (with only the globally applicable inputs held centrally). However, this should pose no significant issue, providing that the inputs are clearly formatted, and the existence of the modularised structure is clear.
The creation of short audit paths is not only driven by workbook and worksheet structure, but also by the way that formulae are structured. Generally speaking, the paths used within formulae should be short, with any required longer paths being outside the formulae (and using simple cell references as much as possible), i.e. “short paths within formulae, long paths to link formulae”.
For example, instead of using a formula such as:
one could split the calculations:
In a sense, doing the latter is rather like using a modular approach in which the ranges A1:A15, C1:C15 and E1:E15 are each input areas to the calculations of each module (with each module's calculations simply being the SUM function), and where the final formulae (in cell H18) is used to bring the calculations in the modules together.
Note once again that the latter approach has shorter total audit paths, and also that it is much easy to audit, since the interim summation calculations are shown explicitly and so are easy to check (whereas to detect a source of error in the original approach would be more complex).
Such issues become much more important in models with multiple worksheets. For example, instead of using a formula (say in the Model sheet) such as:
the alternative is to build the summation into each of the worksheets Data1, Data2 and Data3:
and in the Models worksheet, use:
This example highlights the fact that, where dependent formulae may potentially be placed at some distance from their inputs, it is often better to restructure the calculations so that the components required in the formula are calculated as closely as possible to the inputs, with the components then brought together in the final calculation.
Note that, in addition to the general reduction in the total length of audit paths that arise from using this approach, it also less error-prone and sometimes more computationally efficient. For example, when using the original approach (in which the formula sums a range that is on a different sheet), it is more likely that changes may be made to the range (e.g. to Data1!A1:A15), such as adding a new data point at the end of the range (i.e. in Cell A of Data1), or cutting out some rows from the range, or introducing an additional row within the range. Each of these may have unintended consequences, since unless one inspects all dependents of the range before making such changes, errors may be introduced. Where the range has many dependent formula, this can be cumbersome. Further, it may be that a summation of the range is required several times within the model. To calculate it each time (embedded in separate formulae), is not only computationally inefficient, but also leads to the range having many dependents, which can hinder the process of checking whether changes can be made to the range (e.g. adding a new data point) without causing errors.
Finally, note also that when working with multi-worksheet models, it can also be helpful to use structured “transfer” areas in the sheets (to take and pass information to other sheets), with these areas containing (as far as possible) only cell references (not formulae). These are rather like the mirror sheets to link workbooks, as discussed in Chapter 6, but are of course only ranges in the worksheets, rather than being whole worksheets. In particular, cross-worksheet references should generally only be conducted on individual cells (or ranges) and not within formulae.
The way in which logic flows should be clear and intuitive. In principle, this means that generally, the logic should follow a left-to-right and top-to-bottom flow (the “model as you read” principle). This is equivalent to saying that the audit paths (dependence or precedence arrows) would also follow these directions. If there is a mixed logical flow (e.g. most items at the bottom depending on those at the top, but a few items at the top depending on those at the bottom), then the logic is hard to follow, the model is difficult to adapt, and there is also a higher likelihood of creating unintentional circular references.
In fact, there are cases where the strict interpretation of the “model as you read” principle may not be optimal. For example, in forecast models where historic information is used to calibrate assumptions, the flow may often be reversed at some points in the model. Figure 7.12 shows an example of a frequently used structure in which the historic growth rate (Cell D4) is calculated in a “model as you read” manner, whereas the forecast assumptions (in cells E4:G4) are subsequently used in a bottom-to-top flow.
Note that the strict adherence to the “top-to-bottom” principle can be achieved. Figure 7.13 shows such a case, which uses historic data (for 2015 and 2016) as well as forecast assumptions. (The formula is analogous to when updating with actuals, as discussed in Chapter 22, using the functions such as ISBLANK.) Thus, whilst the principle has been fully respected, the model is larger and more complex.
The flow may also need to be compromised where summary calculations are shown at the top of the model, or toward the left-hand side. For example, in a 30-year forecast, in which each column represents a year, one may wish to show summary data (such as total revenues for the first 10 years) toward the left of the model (such as after the revenue label) (see Figure 7.14). Note that the summation calculations (in Column E and Column F) refer to items to their right, and therefore violates the principle. The advantages of doing so include not only the visual convenience of being able to see key summary information toward the left (or the top) of the model, but also that respecting the principle would mean that the summary calculations are placed 30 columns to the right, i.e. in a part of the model that is likely not to be viewed, and which may be overwritten inadvertently (e.g. if adjustments are made to the model's calculations and copied across the columns).
The flow principle also typically needs to be compromised in financial statement modelling, since the interrelationships between items often mean that trying to achieve a strict top-to-bottom flow would require repetitions of many items, resulting in a larger model. For example, when calculating operating profit, the depreciation charge would need to refer to the fixed asset calculations, whereas the capital expenditure items may be related to the sales level. A pure top-to-bottom flow would therefore not be able to create the financial statements directly; rather, the statements would be created towards the end of the modelling process by referring to individual items that have already been calculated. On the other hand, a smaller model may be possible, in which the statements are determined more directly, but which would require a less clear flow.
It is important to highlight the identity, role and location of the various components of the model. Clearly, some of this can be done through structural methods: for example, in a simple model, one may hold all inputs in a single area, so that their location is clear without much other effort being required. In more complex cases, inputs may be held in several areas (global inputs and those for each module), and so on.
Formatting of cells and ranges has several important roles:
The inputs to a model may in fact have different roles:
Regarding outputs, the knowledge of which calculations represent the model's outputs (and are not simply intermediate calculations that are of no interest by themselves) is central to understanding the objectives of the model and modelling process. If the identities of the outputs are not clear, it is in fact very hard for another user to understand or use the model. For example, it would not be clear what sensitivity analysis should be conducted (if at all), or what results should be used for presentation and decision-support purposes. Generically speaking, the outputs include all of the items that are at the very end of the dependency tracing path(s), for otherwise calculations have been performed that are not required. However, outputs may also include some intermediate calculations that are not at the end of dependency paths, but are nevertheless of interest for decision-making purposes. This will be compounded if the layout is poor (especially in multi-sheet models and those with non-left-to-right flows), as the determination of the identity of items at the end of the dependency paths is a time-consuming process, and not one that is guaranteed to find all (or even the correct) the outputs. Therefore, in practice, the identity of the full set of outputs (and only the outputs) will not be clear unless specific steps are taken to highlight them.
The most basic approaches to formatting include using colour-coding and the shading of cells, using capitalised, bold, underlined or italic text, and placing borders around ranges. In principle, the inputs could be formatted according to their nature. However, this could result in a large number of colours being used, which can be visually off-putting. In practice, it is important not to use an excessive range of colours, and not to create too many conditions, each of which would have a different format: doing so would add complexity, and therefore not be best practice. Typically, optimal formatting involves using around 3–5 colour/shading combinations as well as 2–3 border combinations (thin and thick borders, double borders and dashed borders in some cases). The author often uses the following:
Some caveats about formatting are worth noting:
An improvement in formatting is often relatively quick to achieve, and can dramatically improve the transparency of some models. A familiarity with some key short-cuts can be useful in this respect, including (see also Chapter 27):
Where one wishes to format a cell (or range) based on its value or content, one can use the Home/Conditional Formatting menu, for example:
Figure 7.15 shows an example of using Conditional Formatting to highlight dates that occurred in the last seven days. Figure 7.16 shows the use of highlighting values that are less than 1%, and Figure 7.17 shows the highlighting of the top two values only.
Custom Formatting can be used to create customised (bespoke) formatting rules, for example:
The menu is accessible under Home/Number (using the Show icon to bring up the Format Cells dialog and choosing the Custom category); or using the Ctrl+1 short-cut. New formats can be created by direct typing in the Type dialog box, or selecting and modifying one of the existing formats.
The file Ch7.2.CustomFormats.xlsx contains several reference examples. Figure 7.18 shows the case in which negative number are displayed in brackets. Note that the format used for positive numbers is one in which there is a blank space at the end (a blank space before the semi-colon in the format definition), so that the display of units for numbers are aligned if a positive number is displayed above a negative one (or vice versa).
It can be helpful to document models, whilst doing so with an emphasis on value-added points, including:
Comments or other text may be created in a variety of ways, including:
One of the main challenges in using comments is to ensure that they are kept up to date. It is easy to overlook the need to update them when there is a change to the formulae or the input data, or to the implicit contextual assumptions of a model. Some techniques that can help include:
The use of hyperlinks can aid model navigation (and improve transparency), and be an alternative to named ranges. However, hyperlinks often become broken (not updated) as a model is changed. This is disconcerting to a user, who sees part of a model that is not working, and hence will have reduced confidence in its overall integrity. If hyperlinks are used, one should ensure (before sharing the model with others, as a minimum) that they are up to date (Chapter 25 demonstrates some simple examples).
52.15.117.231