16.6 Solving System Architecture Optimization Problems

Introduction

The previous sections explained how to formulate a system architecting optimization problem—that is, how to represent a complex architecture as a set of decisions using an encoding scheme (such as an array of binary variables) and write an optimization problem with the goal of finding the combinations of choices that optimize the tradeoffs between different stakeholder needs. In particular, we presented a set of six Patterns that can help the system architect turn most architectural decisions into programmed decisions. In this section, we will discuss how to solve these architecting optimization problems.

The content of this section will thus necessarily be more oriented toward tools and methods than the rest of the text. Nonetheless, our goal is not to provide an exhaustive survey of tools and methods that exist to tackle parts of these problems, but rather to present a minimum set of tools and methods and to focus on how to use them in real-world architecting problems. [26]

Full-Factorial Enumeration

We start with a very simple strategy called full-factorial enumeration, in which all possible architectures are enumerated. Each architecture is then evaluated, so that one can simply choose from the best architectures. Full-factorial enumeration is very simple and transparent, and (as we argued in Chapter 15) simulating an entire tradespace, rather than just the Pareto front, provides additional useful insight into the features of the problem.

“Small enough” architecture optimization problems can be solved by brute force—that is, by simply enumerating and evaluating all architectures. Whether a problem is small enough for full-factorial enumeration depends on the size of the tradespace, the average time it takes to evaluate one architecture on the machine(s) available, and the total computation time tolerable. The longer it takes to evaluate an architecture, the fewer architectures we can evaluate in a given amount of time. Thus there is a tradeoff between modeling breadth (the number of architectures) and modeling depth (the time it takes to evaluate an architecture).

The system architect must be careful when creating the architectural model, in particular the value functions, to ensure that it will be feasible to explore a reasonable portion of the tradespace in the allotted time. Once we have determined that a problem is small enough for full-factorial enumeration, the next step is to actually enumerate all valid architectures. Let us start with a simple example: How do we enumerate all architectures for our Apollo example from Chapter 14? Recall that in the Apollo example we had nine decisions, each with two or more options. In this case, we can simply have a set of nine nested loops, where each loop loops over all options for a decision, as shown in the pseudocode contained in Figure 16.16.

Figure 16.16  Pseudocode for full-factorial enumeration of architectures in the Apollo example.

Architectures = {} % This array will contain the architectures

NumberOfArchs = 0 % This is a counter for the number of architectures

foreach val1 in EOR = {yes, no}

foreach val2 in earthLaunch = {orbit, direct}

foreach val3 in LOR = {yes, no}

foreach val9 in ImFuel = {cryogenic, storable, N/A}

% Add an architecture to the array and increase counter

Architectures (NumberOfArchs++) = [val1 val2 … val9]

end

end

end

end

This approach is simple enough to implement for instances of DECISION-OPTION problems, because the discrete set of possible values for the decisions is usually small and available in a form (such as an array) that can readily be iterated upon. Using this approach for other types of decisions may be challenging and will in general require a preprocessing step in which all options for a particular decision (such as a PARTITIONING decision) are explicitly enumerated up-front and put in a data structure that contains one entry for each option. This can be an extremely large number of options.

Other advanced tools for full-factorial enumeration employ programming paradigms entirely different from those familiar to many engineers, such as declarative programming. Mainstream programming languages such as C and Java are procedural as opposed to declarative: The programmer has to tell the computer how to solve a problem, step by step. Conversely, in declarative programming, the programmer has to provide the goals of the problem, but not the procedure to solve it. Declarative languages have an inference algorithm that takes care of specifying the steps to solve the problem. Rule-based systems are examples of declarative programs. The use of rule-based systems for enumeration of architectures is presented in Appendix C.

Heuristics for Architecture Optimization

We showed in the previous sections how the size of the architecture tradespace can grow to billions of architectures for relatively small numbers of decisions, which implies that full-factorial enumeration is often not feasible. In these cases, we will have to resort to algorithms that explore the architecture tradespace in a more efficient and effective way.

We mentioned earlier that most architecture optimization problems are instances of non-linear, non-smooth combinatorial optimization problems. This means that algorithms that exploit strong properties of the problem (such as gradient-based algorithms and dynamic programming), are often not an option.

For this reason, meta-heuristic optimization algorithms (such as genetic algorithms) are often used to solve these kinds of problems. Meta-heuristic optimization algorithms use very abstract and general search heuristics (mutation, crossover) that assume very little about the features of the problem at hand. As a consequence, they are very flexible and can be applied to a wide variety of problems. Heuristic optimization cannot guarantee that the global optimum is found, and it is less computationally efficient than other combinatorial optimization techniques, [27] But its strength resides in its simplicity, flexibility, and effectiveness in solving a wide ­variety of optimization problems, especially those with the characteristics of architecture optimization problems (categorical decision variables; multiple non-linear, non-smooth value functions and constraints). These algorithms are our tool of choice in this text for solving architecture optimization problems.

Figure 16.17 represents a generic iterative optimization algorithm. It has three main modules: an architecture generator or enumerator, an architecture evaluator, and a central search agent that takes information from the enumerator and evaluator and then decides which regions of the tradespace to explore next by providing search directions and steps in these directions.

Whereas enumeration, evaluation, and down-selection (such as by means of a Pareto filter) are clearly defined tasks in the simple full-factorial enumeration case, the boundary that separates enumeration, search, and down-selection becomes blurry in the partial-search case. Indeed, in most optimization or search algorithms, generation of the search directions is intimately coupled with both the down-selection and the enumeration of new architectures. Architectures are not enumerated up-front, but rather are generated on the fly by applying the search heuristics to the existing architectures. Hence, a population of architectures evolves over time, ideally approximating the Pareto frontier.

A diagram explains the relationship between architecture enumerator, search algorithm, and an architecture evaluator.

Figure 16.17  Typical structure of an optimization algorithm. For smaller tradespaces, the search function becomes unnecessary. The architectures are fully enumerated and evaluated, and then non-dominated architectures are chosen.

A Generic Population-Based Heuristic Optimization

We’ll begin with a class of meta-heuristic optimization algorithms called population-based algorithms. In population-based methods, a population of architectures is evolved and maintained at each iteration, instead of iteratively changing just one architecture. This has some advantages for architecture optimization problems, because the inherent goal is to find a few classes or families of good architectures, rather than finding the best architecture. The flow diagram for a population-based search algorithm is provided in Figure 16.18.

In a population-based search algorithm, the search and enumeration functions are performed by a set of selection and search operators or rules. Selection rules select a subset of the population that will be the basis for the construction of the new population of architectures in the next generation. For example, an intuitive selection rule is to choose architectures that are on the non-dominated Pareto frontier. This is known as elitism. In practice, it has been shown that introducing a few dominated architectures that bring diversity to the population can lead to higher-quality Pareto frontiers with fewer gaps. [28]

Then, a set of search rules will combine information from the architectures in the selected subset to generate a new pool of architectures. For example, in a genetic algorithm, new architectures are generated by mutating some of these architectures (that is, randomly changing the value of a particular decision) and by crossing over pairs of good architectures (combining the values of the decisions of two good architectures to create one or two new architectures).

The new population is evaluated according to a set of metrics (these are labeled evaluation rules in Figure 16.18), and the algorithm continues until a set of termination criteria are met (such as a maximum number of generations).

A Flowchart

Figure 16.18  Population-based search algorithm.

Generating the Initial Population

The algorithm described in the previous section requires defining an initial population of ­architectures—in other words, a sample of the architectural tradespace. A simple way of performing the sampling is to generate random architectures. Mathematically, this requires obtaining random numbers and transforming those numbers into values for the architectural decisions. For example, for an unconstrained DECISION-OPTION problem, one can simply take a random value from the set of alternatives for each architectural decision, and for DOWN-SELECTING problems, one can simply obtain N random Boolean numbers (0, 1).

Obtaining random architectures is more difficult if there are constraints, because one has to check that the randomly generated architecture is consistent with all constraints. In cases with few constraints, a simple bypass is to reject infeasible architectures and keep randomly generating architectures until the desired population size is reached. However, this approach might fail in highly constrained cases, because most random architectures will be infeasible. In this case, it will be necessary to resort to more advanced techniques that either guarantee that every architecture generated is feasible by construction, or apply a repair operator that can turn an infeasible architecture into the most similar feasible architecture.

Finally, note that in some situations, we may want to bias the sampling process to drive it toward regions of interest. Consider, for example, an instance of the PARTITIONING problem with 10 elements. There are more than 115,000 different architectures, but 88% of these consist of 4, 5, or 6 subsets. This is simply because there are many more ways of partitioning a set of 10 elements into 5 subsets (45,000 +) than into 2 subsets (511). If we suspect that architectures with 2 or 3 subsets may actually dominate the tradespace, it is beneficial to bias the sampling process to ensure that a relatively large fraction of the architectures will have 2 or 3 subsets. This can be done by first sampling from the number of subsets from a biased distribution, and then obtaining random samples of the desired number of subsets.

Completing an Initial Population with Deterministic Architectures

One may argue that pure randomness is not the smartest way to sample the tradespace. For example, we may want to guarantee that a few particular architectures are present in the initial population. We call these architectures “architectures of interest.” In particular, if one or more baseline architectures exist for the project at hand, it is always wise to include it (or them) in the initial population.

Furthermore, it is good practice to include architectures that are in the “extreme points” of the architecture space. For example, in PARTITIONING problems, it is good practice to make sure that the monolithic architecture and the fully distributed architecture are both part of the initial population. Similarly, the channelized and fully cross-strapped architectures should be present in the initial population of an ASSIGNING problem.

In some cases, architectures in the extreme points can be obtained by using deterministic sampling schemes from the theory of Design of Experiments. Latin hypercubes and orthogonal arrays are examples of these sampling schemes. [29] It is not our intention to cover these in detail, but the main idea is that if we represent the selected architectures as vectors in the N-dimensional architecture space, we select a subset of architectures (vectors) that test each combination of values exactly once. For example, for 3 binary decisions, instead of enumerating the 8 possible architectures, we note that the following 4 architectures form an orthogonal* array: [0, 0, 0], [1, 1, 0], [0, 1, 1], and [1,0, 1]. If we look at the first and second decisions, each of the 4 possible values [0, 0], [0,1], [1,0], and [1,1] appears exactly once. The same is true for the first and third decisions, and for the second and third decisions. From a practical standpoint, for a given number of decisions N and options m, there exist tables that have pre-computed the ­optimal orthogonal vectors for different sample sizes. There exists a naming convention for such tables, so that table LP(mN) contains the P<N orthogonal arrays of length N containing elements from 1 to m. Thus, for instance, for 4 decisions and 3 options per decision, we can readily find what the 9 out of 81 architectures to select are by looking at the table called L9(34)39 In summary, good initial populations come from combining architectures from these three sources: architectures of interest, architectures in the extremes of the architectural region (such as those obtained through Design of Experiments), and random architectures sampled with care (taking into account how interesting certain regions of the architecture space are a priori).

General Heuristics and Meta-Heuristics for Efficient Search

Any optimization algorithm needs to accomplish two basic functions: exploration and exploitation. Exploration consists of finding new interesting regions of the architectural space. Exploitation consists of hill-climbing—that is, trying to find the best local architecture in a certain region of interest. Both are needed to obtain good architectures. Exploitation without exploration leads to getting “stuck” in local optima, and perhaps missing a better architecture in a different region, and exploration without exploitation leads to locally suboptimal architectures.

Population-based optimization algorithms abound in the literature. They include numerous variants of genetic algorithms, particle swarm optimization, ant colony optimization, the bees algorithm, and harmony search, among others. [30] Generally speaking, all of them have some way of accomplishing the goals of exploration and exploitation.

Hence, we focus in this chapter on two high-level strategies, or meta-heuristics, that are used or adapted in many of these algorithms and that are known to work well for wide ranges of problems: genetic algorithms [31] and local search.

Heuristics in Genetic Algorithms

Genetic algorithms were initially proposed by Holland around 1975. [32] The basic idea is to imitate evolution through the exchange of genetic material between parent chromosomes, and this is accomplished by using two heuristics: crossover and mutation.

The flow of the algorithm is shown in Figure 16.18. An initial population of architectures is evaluated. Then a subset of architectures is selected on the basis of their fitness (the value they deliver to stakeholders) in the single-dimensional case, or based on Pareto ranking in the multi-objective case (architectures that are close to the Pareto front are more likely to be picked). The next population is generated by applying the two heuristics, crossover and mutation, to the selected architectures. The algorithm iterates until the termination criteria are satisfied.

In crossover, the selected architectures are grouped in pairs, and from each pair of parent architectures, two new children architectures are generated that have part of the genetic material (that is, the decisions) from one of their parents (the “father”) and the rest from the other parent (the “mother”).

There are several strategies for deciding what part of the genetic material to take from each parent. The simplest strategy, which is called single-point crossover, is illustrated in Figure 16.19. In single-point crossover, a point in the chromosome (the string of decisions) is chosen randomly. The first child takes all the decisions up to that point from the father and the rest from the mother, whereas the second child takes all the decisions up to that point from the mother and the rest from the father.

Shaded bars represent the Parents and Children. A crossover point is shown vertically about one quarter of the way from the left side of the chart.

Figure 16.19  Single-point crossover.

Variations of this strategy use multiple random crossover points (for example, two-point crossover) or simply decide randomly whether each chromosome is taken from the father or from the mother (uniform crossover).

Holland showed in his foundational book that the effectiveness of genetic algorithms relies on adequate sampling of fragments of chromosomes called schemata. Let us start with a binary chromosome of length N or, equivalently, a simple DOWN-SELECTING problem with N binary decisions. In this case, a schema is any string of length N formed by elements from the set {0,1,x}, where x indicates “don’t care” (that is, 0 or 1). Conceptually, schemata represent groups of architectures that share common features. For example, [1,x,x,,x] is the schema representing all 2N1 architectures that have the first binary decision set to 1. In the NEOSS example, this could be all architectures that carry the first instrument (such as the radiometer). Similarly, the schema [x,1,x,x,0,x,,x] represents all architectures that carry the second instrument and do not carry the fifth instrument.

Essentially, genetic algorithms use individuals (architectures) as a means to sample a space of schemata. Indeed, each architecture can be seen as an instance of 2N schemata. If this is hard to see, consider a trivial architecture given by [1,1,1,,1]. Such an architecture is an instance of the schema [1,x,x,,x], but also of [x,1,x,,x], and so forth. In total, there are [1,1,x,?,x] 2Nof these.

Good schemata are those for which the fitness is better than the average fitness of the population. The fitness of a schema is given by the average fitness of the architectures that are instances of that schema. Because selection of architectures for crossover is done on the basis of fitness, good schemata are more likely to be selected than poor schemata. Moreover, note that crossover generates new architectures that have many schemata in common with their parents, plus a few new ones. This argument qualitatively explains why genetic algorithms converge to good architectures: Good schemata are more likely to be selected for crossover, which preserves some of them and creates some new ones.

In addition to crossover, genetic algorithms also have a mutation heuristic, which applies a small random change to a small fraction of the architectures. The goal of mutation is to decrease the likelihood that the algorithm will get stuck in a suboptimal region. In other words, mutation is enhancing the exploration function of optimization.

Current implementations of genetic algorithms include many other, more sophisticated heuristics (such as sub-populations and migration operators) and some mechanism to ensure that the Pareto front is evenly populated. The interested reader can find thorough descriptions of these mechanisms in specialized books. [33]

Augmenting the Genetic Algorithm with More Heuristics

The initial version of the genetic algorithm formulated by Holland was based largely on crossover and mutation. However, one can adapt the population-based heuristic algorithm presented earlier by adding more heuristics to it. These heuristics can be domain-independent, as are crossover and mutation, or they can use domain-specific knowledge.

An example of another domain-independent heuristic is Tabu Search, which simply consists of maintaining a list of “bad” schemata. The fundamental idea is that this list can then be used to avoid or correct “bad moves” by the other operators. [34]

Domain-specific heuristics can also be used. In the NEOSS instrument-packaging problem, the elements to be partitioned are more than just abstract elements; they are remote sensing instruments, such as radars, lidars, and radiometers. In real life, the system architect has a lot of expert knowledge about these instruments that may facilitate the search process. For example, the system architects may know that the radar altimeter and the radiometer are very synergistic instruments, so it may be obvious to them that these instruments should be placed together on the same satellite. It might take substantial computational time for the algorithm to find out that putting those instruments together is good, because the algorithm is relying to a certain degree on randomness.

In summary, our choice of heuristics for the population-based optimization algorithm can start with crossover and mutation from genetic algorithms, which are two good and simple heuristics. Other domain-independent heuristics (such as Tabu Search) and domain-specific heuristics that capture expert knowledge about the problem at hand can then be added on top of crossover and mutation.

Hence, by diversifying our portfolio of heuristics, we can hope to obtain good performance for a wider range of system architecture optimization problems. Note, however, that this d­iversification, like the diversification of a financial portfolio, may be subject to the effect of dilution. In other words, the effect of good heuristics is partially compensated by the effect of bad ­heuristics. [35] This is why it is important to develop another layer of complexity in the tool that monitors the performance of different heuristics. For example, we could imagine that, instead of dividing the population evenly among heuristics, we could assign more architectures to heuristics that have had better performance so far. The interested reader can survey the literature on hyper-heuristics and machine learning. [36]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.182.221