CHAPTER 29 Machine Vision: Art or Science?

29.1 Introduction

We have examined how images may be processed to remove noise, how features may be detected, how objects may be located from their features, how to set up lighting schemes, how to design hardware systems for automated visual inspection, and so on. The subject is one that has developed over a period of more than 40 years, and it has clearly come a long way, but it has developed piecemeal rather than systematically. Often, development is motivated by the particular interests of small groups of workers and is relatively ad hoc. Coupled with this is the fact that algorithms, processes, and techniques are all limited by the creativity of the various researchers. The process of design tends to be intuitive rather than systematic, and so again some arbitrariness tends to creep in from time to time. As a result, no means has yet been devised for achieving particular aims, but a number of imperfect methods are available and there is a limited scientific basis for choosing between them.

How then can the subject be placed on a firmer foundation? Time may help, but time can also make things more difficult as more methods and results arise which have to be considered. In any case, there is no shortcut to intellectual analysis of the state of the art. This book has attempted to carry out a degree of analysis at every stage, but in this last chapter it is important to tie it all together, to make some general statements on methodology, and to indicate future directions.

Machine vision is an engineering discipline, and, like all such disciplines, it has to be based on science and an understanding of fundamental processes. However, as an engineering discipline it should involve design based on specifications. Once the specifications for a vision system have been laid down, it can be seen how they match up against the constraints provided by nature and technology. In the following we consider first the parameters of relevance for the specification of vision systems; then we examine constraints and their origins. This leads us to some clues as to how the subject could be developed further.

29.2 Parameters of Importance in Machine Vision

The first requirement of any engineering design is that it should work! This applies as much to vision systems as to other parts of engineering. There is no use in devising edge detectors that do not find edges, corner detectors that do not find corners, thinning algorithms that do not thin, 3-D object detection schemes that do not find objects, and so on. But in what way could such schemes fail? First, ignore the possibility of noise or artifacts preventing algorithms from operating properly. Then what remains is the possibility that at any stage important fundamental factors have not been taken into account.

For example, a boundary tracking algorithm can go wrong because it encounters a part of the boundary that is one pixel wide and crosses over instead of continuing. A thinning algorithm can go wrong because every possible local pattern has not been taken into account in the design, and hence it disconnects a skeleton. A 3-D object detection scheme can go wrong because proper checks have not been made to confirm that a set of observed features is not coplanar. Of course, these types of problems may arise very rarely (i.e., only with highly specific types of input data), which is why the design error is not noticed for a time. Often, mathematics or enumeration of possibilities can help to eliminate such errors, so problems can be removed systematically. However, being absolutely sure no error has been made is difficult—and it must not be forgotten that transcription errors in computer programs can contribute to the problems. These factors mean that algorithms should be put to extensive tests with large datasets in order to ensure that they are correct. In the author’s experience, there is no substitute for subjecting algorithms to variegated tests of this type to check out ideas that are “evidently” correct. Although all this is obvious, it is still worth stating, since silly errors continually arise in practice.

At this stage imagine that we have a range of algorithms that all achieve the same results on ideal data and that they really work. The next problem is to compare them critically and, in particular, to find how they react to real data and the nasty realities such as noise that accompany it. These nasty realities may be summed up as follows:

1. noise

2. background clutter

3. occlusions

4. object defects and breakages

5. optical and perspective distortions

6. nonuniform lighting and its consequences

7. effects of stray light, shadows, and glints

In general, algorithms need to be sufficiently robust to overcome these problems. However, things are not so simple in practice. For example, HT and many other algorithms are capable of operating properly and detecting objects or features despite considerable degrees of occlusion. But how much occlusion is permissible? Or how much distortion and noise, or how much of any other nasty reality can be tolerated? In each specific case we could state some figures that would cover the possibilities. For example, we may be able to state that a line detection algorithm must be able to tolerate 50% occlusion, and so a particular HT implementation is (or is not) able to achieve this. However, at this stage we end with a lot of numbers that may mean very little on their own; in particular, they seem different and incompatible. This latter problem can largely be eliminated. Each of the defects can be imagined to obliterate a definite proportion of the object. (In the case of impulse noise this is obvious; with Gaussian noise the equivalence is not so clear, but we suppose here that an equivalence can at least in principle be computed.) Hence, we end up by establishing that artifacts in a particular dataset eliminate a certain proportion of the area and perimeter of all objects, or a certain proportion of all small objects.1 This is a sufficiently clear statement to proceed with the next stage of analysis.

To go further, it is necessary to set up a complete specification for the design of a particular vision algorithm. The specification can be listed as follows (but generality is maintained by not stating any particular algorithmic function):

1. The algorithm must work on ideal data.

2. The algorithm must work on data that is x% corrupted by artifacts.

3. The algorithm must work to p pixels accuracy.

4. The algorithm must operate within s seconds.

5. The algorithm must be trainable.

6. The algorithm must be implemented with failure rate less than 1 per d days.

7. The hardware needed to implement the algorithm must cost less than $L.

No doubt other specifications will arise in practice, but these are ignored here.

Before proceeding, note the following points: (a) specification 2 is a statement about the robustness of algorithms, as explained earlier; (b) the accuracy of p pixels in specification 3 may well be fractional; (c) in specification 5, trainability is a characteristic of certain types of algorithms, and in any instance we can only accept the fact that it is or is not achievable—hence, it is not discussed further here; (d) the failure rate referred to in specification 6 will be taken to be a hardware characteristic and is ignored in what follows.

The forgoing set of specifications may at any stage of technological (especially hardware) development be unachievable. This is because they are phrased in a particular way, so they are not compromisable. However, if a given specification is approaching its limit of achievability, a switch to an alternative algorithm might be possible.2 Alternatively, an internal parameter might be adjusted, which keeps that specification within range while pushing another specification closer to the limits of its range. In general, there will be some hard (nonnegotiable) specifications and others for which a degree of compromise is acceptable. As has been seen in various chapters of the book, this leads to the possibility of tradeoffs—a topic that is reviewed in the next section.

29.3 Tradeoffs

Tradeoffs form one of the most important features of algorithms, because they permit a degree of flexibility subject only to what is possible in the nature of things. Ideally, the tradeoffs that are enunciated by theory provide absolute statements about what is possible, so that if an algorithm approaches these limits it is then probably as “good” as it can possibly be.

Next, there is the problem about where on a tradeoff curve an algorithm should be made to operate. The type of situation was examined carefully in Chapter 28 in a particular context—that of cost–speed tradeoffs of inspection hardware. Generally, the tradeoff curve (or surface) is bounded by hard limits. However, once it has been established that the optimum working point is somewhere within these limits, in a continuum, then it is appropriate to select a criterion function whereby an optimum can be located uniquely. Details will vary from case to case, but the crucial point is that an optimum must exist on a tradeoff curve, and that it can be found systematically once the curve is known. All this implies that the science of the situation has been studied sufficiently so that relevant tradeoffs have been determined.

29.3.1 Some Important Tradeoffs

Earlier chapters of this book have revealed some quite important tradeoffs that are more than just arbitrary relations between relevant parameters. Here, a few examples will have to suffice by way of summary.

First, in Chapter 5 the DG edge operators were found to have only one underlying design parameter—that of operator radius r. Ignoring here the important matter of the effect of a discrete lattice in giving preferred values of r, it was found that:

1. signal-to-noise ratio varies linearly with r, because of underlying signal and noise averaging effects.

2. resolution varies inversely with r, since relevant linear features in the image are averaged over the active area of the neighborhood: the scale at which edge positions are measured is given by the resolution.

3. the accuracy with which edge position (at the current scale) may be measured depends on the square root of the number of pixels in the neighborhood, and hence varies as r.

4. computational load, and associated hardware cost, is proportional to the number of pixels in the neighborhood, and hence vary as r2.

Thus, operator radius carries with it four properties that are intimately related—signal-to-noise ratio, resolution (or scale), accuracy, and hardware/computational cost.

Another important problem was that of fast location of circle centers (Chapter 10): in this case, robustness was seen to be measurable as the amount of noise or signal distortion that can be tolerated. For HT-based schemes, noise, occlusions, distortions, and the like all reduce the peak height in parameter space, thereby reducing the signal-to-noise ratio and impairing accuracy. Furthermore, if a fraction β of the original signal is removed, leaving a fraction γ = 1–β, either by such distortions or occlusions or else by deliberate sampling procedures, then the number of independent measurements of the center location drops to a fraction γ of the optimum. This means that the accuracy of estimation of the center location drops to a fraction around γ of the optimum. (Note, however, that Gaussian noise can act in another way to reduce accuracy—by broadening the peak instead of merely reducing its height—but this effect is ignored here.)

What is important is that the effect of sampling is substantially the same as that of signal distortion, so that the more distortion that must be tolerated, the higher α, the fraction of the total signal sampled, has to be. This means that as the level of distortion increases, the capability for withstanding sampling decreases, and therefore the gains in speed achievable from sampling are reduced—that is, for fixed signal-to-noise ratio and accuracy, a definite robustness-speed tradeoff exists. Alternatively, the situation can be viewed as a three-way relation between accuracy, robustness, and speed of processing. This provides an interesting insight into how the edge operator tradeoff considered earlier might be generalized.

To underline the value of studying such tradeoffs, note that any given algorithm will have a particular set of adjustable parameters which are found to control—and hence lead to tradeoffs between—the important quantities such as speed of processing, signal-to-noise ratio, and attainable accuracy already mentioned. Ultimately, such practically realizable tradeoffs (i.e., arising from the given algorithm) should be considered against those that may be deduced on purely theoretical grounds. Such considerations would then indicate whether a better algorithm might exist than the one currently being examined.

29.3.2 Tradeoffs for Two-stage Template Matching

Two-stage template matching has been mentioned a number of times in this book as a means whereby the normally slow and computationally intensive process of template matching may be speeded up. In general, it involves looking for easily distinguishable subfeatures, so that locating the features that are ultimately being sought involves only the minor problem of eliminating false alarms. This strategy is useful because the first stage eliminates the bulk of the raw image data, so that only a relatively trivial testing process remains. This latter process can then be made as rigorous as necessary. In contrast, the first “skimming” stage can be relatively crude (see, e.g., the lateral histogram approach of Chapter 13), the main criterion being that it must not eliminate any of the desired features. False positives are permitted but not false negatives. Needless to say, the efficiency of the overall two-stage process is limited by the number of false alarms thrown up by the first stage.

Suppose that the first stage is subject to a threshold h1 and the second stage to a threshold h2. If h1 is set very low, then the process reverts to the normal template matching situation, since the first stage does not eliminate any part of the image. Setting h1 = 0 initially is useful so that h2 may be adjusted to its normal working value. Then h1 can be increased to improve efficiency (reduce overall computation); a natural limit arises when false negatives start to occur—that is, some of the desired features are not being located. Further increases in h1 now have the effect of cutting down available signal, although speed continues to increase. This gives a tradeoff between signal-to-noise ratio, and hence accuracy of location and speed.

In a particular application in which objects were being located by the HT, the number of edge points located were reduced as h1 increased, so accuracy of object location was reduced (Davies, 1988i). A criterion function approach was then used to determine an optimum working condition. A suitable criterion function turned out to be C = T/A, where T is the total execution time and A the achievable accuracy. Although this approach gave a useful optimum, the optimum can be improved further if a mix of normal two-stage template matching and random sampling is used. This turns the problem into a 2-D optimization problem with adjustable parameters h1 and u (the random sampling coefficient, equal to 1/α). However, in reality these types of problems are even more complex than indicated so far. In general, this is a 3-D optimization problem, the relevant parameters being h1, h2 and u, although in fact a good approximation to the global optimum may be obtained by the procedure of adjusting h2 first, and then optimizing h1 and u together—or even of adjusting h2 first, then h1 and then u (Davies, 1988i). Further details are beyond the scope of the present discussion.

29.4 Future Directions

It has been indicated once or twice that the constraints and tradeoffs limiting algorithms are sometimes not accidental but rather the result of underlying technological or natural constraints. If so, it is important to determine this in as many cases as possible; otherwise, workers may spend much time on algorithm development only to find their efforts repeatedly being thwarted. Usually, this is more easily said than done, but it underlines the necessity for scientific analysis of fundamentals. At this point we should consider technological constraints in order to see what possibilities exist for advancing machine vision dramatically in the future.

The well-known law due to Moore (Noyce, 1977) relating to computer hardware states that the number of components that can be incorporated into a single integrated circuit increases by a factor of about two per year. Certainly, this was so for the 20 years following 1959, although the rate subsequently decreased somewhat (not enough, however, to prevent the growth from remaining approximately exponential). It is not the purpose of this chapter to speculate on the accuracy of Moore’s law. However, it is useful to suppose that computer memory and power will grow by a factor approaching two per year in the foreseeable future. Similarly, computer speeds may also grow at roughly this rate in the foreseeable future. When then of vision?

Unfortunately, many vision processes such as search are inherently NP-complete and hence demand computation that grows exponentially with some internal parameter such as the number of nodes in a match graph. This means that the advance of technology is able to give only a roughly linear improvement in this internal parameter (e.g., something like one extra node in a match graph every two years). It is therefore not solving the major search and other problems but only easing them.

NP-completeness apart, Chapter 28 was able to give an optimistic view that the relentless advance of computer power described by Moore’s law is now leading to an era when conventional PCs will be able to cope with a fair proportion of vision tasks. When combined with specially designed algorithms (see Section 23.4), it should prove possible to implement many of the simpler tasks in this way, leading to a much less strenuous life for the vision systems designer.

29.5 Hardware, Algorithms, and Processes

The previous section raised the hope that improvements in hardware systems will provide the key to the development of impressive vision capabilities. However, it seems likely that breakthroughs in vision algorithms will also be required before this can come about. My belief is that until robots can play with objects and materials in the way that tiny children do they will not be able to build up sufficient information and the necessary databases for handling the complexities of real vision. The real world is too complex for all the rules to be written down overtly; these rules have to be internalized by training each brain individually. In some ways this approach is better, since it is more flexible and adaptable and at the same time more likely to be able to correct for the errors that would arise in direct transference of huge databases or programs. Nor should it be forgotten that it is the underlying processes of vision and intelligence that are important. Hardware merely provides a means of implementation. If an idea is devised for a hardware solution to a visual problem, it reflects an underlying algorithmic process that either is or is not effective. Once it is known to be effective, then the hardware implementation can be analyzed to confirm its utility. Of course, we must not segregate algorithms too much from hardware design. In the end, it is necessary to optimize the whole system, which means considering both together. However, the underlying processes should be considered first, before a hardware solution is frozen in. Hardware should not be the driving force since there is a danger that some type of hardware implementation (especially one that is temporarily new and promising) will take over and make workers blind to underlying processes. Hardware should not be the tail that wags the vision dog.

29.6 A Retrospective View

This book has progressed steadily from low-level ideas through intermediate-level methods to high-level processing, covering 3-D image analysis, the necessary technology, and so on—admittedly with its own type of detailed examples and emphasis. Many ideas have been covered, and many strategies described. But where have we got to, and to what extent have we solved the problems of vision referred to in Chapter 1?

Among the worst of all the problems of vision is that of minimizing the amount of processing required to achieve particular image recognition and measurement tasks. Not only do images contain huge amounts of data, but often they need to be interpreted in frighteningly small amounts of time, and the underlying search and other tasks tend to be subject to combinatorial explosions. Yet, in retrospect, we seem to have remarkably few general tools for coping with these problems. Indeed, the truly general tools available3 appear to be the:

1. reducing high-dimensional problems to lower-dimensional problems that can be solved in turn.

2. the Hough transform and other indexing techniques.

3. location of features that are in some sense sparse, and that can hence help to reduce redundancy quickly (obvious examples of such features are edges and corners).

4. two-stage and multistage template matching.

5. random sampling.

These are said to be general tools because they appear in one guise or another in a number of situations, with totally different data. However, it is pertinent to ask to what extent these are genuine tools rather than almost accidental means (or tricks) by which computation may be reduced. Further analysis yields interesting answers to this question, as we will now see.

First, consider the Hough transform, which takes a variety of forms—the normal parametrization of a line in an abstract parameter space, the GHT, which is parametrized in a space congruent to image space, the adaptive thresholding transform (Chapter 4), which is parametrized in an abstract 2-D parameter space, and so on. What is common about these forms is the choice of a representation in which the data peak naturally at various points, so that analysis can proceed with improved efficiency. The relation with item (3) above now becomes clear, making it less likely that either of these procedures is purely accidental in nature.

Next, item (1) appears in many guises—see, for example, the lateral histogram approach (Chapter 13) and the approaches used to locate ellipses (Chapter 12). Thus, item (1) has much in common with item (4). Note also that item (5) can be considered a special case of item (4). (Random sampling is a form of two-stage template matching with a “null” first stage, capable of eliminating large numbers of input patterns with particularly high efficiency; see Davies, 1988i.) Finally, note that the example of so-called two-stage template matching covered in Section 29.3.1 was actually part of a larger problem that was really multistage. The edge detector was two-stage, but this was incorporated in an HT, which was itself two-stage, making the whole problem at least four-stage. It can now be seen that items (1)–(5) are all forms of multistage matching (or sequential pattern recognition), which are potentially more powerful and efficient than a single-stage approach. Similar conclusions are arrived at in Appendix A, which deals with robust statistics and their application to machine vision.

Hence, we are coming to a view that there is just one general tool for increasing efficiency. However, in practical terms this may not itself be too useful a conclusion, since the subject of image analysis is also concerned with the ways in which this underlying idea may actually be realized—how are complex tasks to be broken down into the most appropriate multistage processes, and how then is the most suitable representation found for sparse feature location? Probably the five-point list given above throws the right sort of light onto this particular problem—although in the long run it will hardly turn out to be the whole truth!

29.7 Just a Glimpse of Vision?

This book has solved some of the problems it set itself—starting with low-level processing, concentrating on strategies, limitations, and optimizations of intermediate-level processing, going some way with higher-level tasks, and attempting to create an awareness of the underlying processes of vision. Yet this program has taken a lot of space and has thereby had to omit many exciting developments. Notable omissions are advanced work on artificial intelligence and knowledge-based systems and their application in vision; details of much important work on motion and image sequence analysis; practical details of the hardware architectures and techniques; detailed theoretical studies of neural computation, which offer many possibilities for marrying algorithms and architectures; and the huge variety of modern applications in areas such as crime detection and prevention, face recognition and biometrics, and so on. While separate volumes are available on most of these topics, until now a gap appeared to be present, which this book has aimed to fill.

In any field, it is tempting to say that all the main problems have been solved and that only details remain. Indeed, this chapter may inadvertently have implied that only the tradeoffs between optimization parameters need to be determined and the best answers read off. However, as indicated in (Chapter 28), there are deeper problems still to be solved—such as finding out how to make valid specifications for image data that are naturally so variegated and varied that setting a detailed specification appears just now to be virtually impossible.

29.8 Bibliographical and Historical Notes

Much of this chapter has summarized the work of earlier chapters and attempted to give it some perspective so the reader is referred back to appropriate sections for further detail. However, two-stage template matching has been highlighted in the current chapter. The earliest work on this topic was carried out by Rosenfeld and VanderBrug (1977) and VanderBrug and Rosenfeld (1977). Later work included that on lateral histograms for corner detection by Wu and Rosenfeld (1983), while the ideas ofSection 29.3.2 were developed by Davies (1988i). Two-stage template matching harks back to the spatial matched filtering concept discussed in (Chapter 11) and elsewhere. Ultimately, this concept is limited by the variability of the objects to be detected. However, it has been shown that some account can be taken of this problem, for example, in the design of filter masks (see Davies, 1992e).

With regard to the topics that could not be covered in this book (see the previous section), the reader should refer to the bibliographies at the ends of the individual chapters—though this leaves notable omissions on highly relevant AI techniques. Here the reader is referred to Russell and Norvig (2003).

1 Certain of the nasty realities (such as optical distortions) tend to act in such a way as to reduce accuracy, but we concentrate here on robustness of object detection.

2 But note that several, or all, relevant algorithms may be subject to almost identical limitations because of underlying technological or natural constraints.

3 We here consider only intermediate-level processing, ignoring, for example, efficient AI tree-search methods relevant for purely abstract high-level processing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.187.116