APPENDIX A

Summary of Algorithmic Concepts

Chapter 1: Algorithm Basics

Understanding algorithms—To understand an algorithm, you need to understand certain facts about it:

  • Behavior—Does the algorithm always find the best solution?
  • Speed—How does speed vary with the number of inputs?
  • Memory requirements—Are they reasonable? How does memory use vary with number of inputs?
  • Main techniques—Can you reuse those techniques?

Algorithms and data structures—A data structure holds data in some arrangement. An algorithm produces a result. You need an algorithm to build and use a data structure.

Pseudocode is text that looks a lot like code but not in any particular programming language. You need to translate it into an actual programming language before you can execute it.

Algorithmic goals—To be useful, an algorithm must be correct, maintainable, and efficient.

Big O notation describes the relationship between the number of inputs and runtime or memory requirements as the problem size grows large. Big O notation ignores constant multiples and considers only the fastest-growing function that describes performance.

Runtime functions—Some common runtime functions in order of increasing speed of growth are 1 (constant), log N, sqrt(N), N, N log N, N2, 2N, and N!

Chapter 2: Numeric Algorithms

Randomization—A program can randomize a collection of objects. It can then pick the first items in the randomized collection to make a random selection. For example, to select five cards from a deck of cards, a program can randomize the deck and then pick the first five cards. You can use a source of randomness to pick values with a different range (as in using coin flips to pick a number between 0 and 7). You may be able to use a biased source of randomness to generate fair selections.

Fairness—A pseudorandom process is fair if all the outcomes it can generate occur with equal probability.

Bias—A pseudorandom process is biased if it is not fair. A six-sided die that rolls a 1 half of the time is biased.

Probabilistic algorithm—An algorithm that produces a result with a given certainty is probabilistic. For example, the Fermat primality test detects nonprime numbers at least 50% of the time. If you repeat the test many times, you can conclude that a number is prime with great certainty.

Precalculated values—Fast exponentiation uses precalculated values to quickly calculate exponents. The sieve of Eratosthenes also uses precalculated values to quickly eliminate numbers as potential primes. Many programs can be sped up by using precalculated values either calculated on the fly or calculated in advance and saved for later use.

Modeling accuracy—The rectangle and trapezoid rules show that better modeling of a problem can lead to a better result without necessarily requiring a lot of extra work.

Adaptive techniques—Many algorithms can be improved if you can focus more work on parts of the problem that are most difficult and less work on parts of the problem that are easy to handle.

Monte Carlo simulation—Some algorithms can use pseudorandom values to estimate a result. These methods often don't give the accuracy of deterministic methods, but they often are easy to apply when a deterministic approach is difficult.

Chapter 3: Linked Lists

Linked lists are built from objects called cells that each contain a piece of data plus a link to the next cell in the list.

In a doubly linked list, cells have links to the next and previous cells in the list.

Sentinel—In linked lists, a sentinel is a cell that does not contain useful data but is used to mark the beginning or end of the list.

Linked-list operations—It is easy to store a collection of objects in a linked list. The list can grow as needed to hold more objects. It is easy to add and remove items to and from a linked list. Linked lists are not very efficient, however, for sorting or searching for items.

Chapter 4: Arrays

An array is a contiguous piece of memory that holds items.

Array packing—You can pack items into a piece of memory by mapping array indices to memory locations. For example, to save space you can make a triangular array by mapping indices in the triangular array into positions in a one-dimensional array. Similarly, you can make arrays with nonzero lower bounds by mapping the bounds into a regular zero-based array.

Sparse arrays—You can use linked data structures to build space-efficient sparse arrays or matrices.

Chapter 5: Stacks and Queues

A stack is a data structure that provides last-in-first-out (LIFO) access to items. You can implement a stack in a linked list or array, although you may need to resize the array if it becomes full.

A queue is a data structure that provides first-in-first-out (FIFO) access to items. You can implement a stack in a linked list or circular array, although you may need to resize the array if it becomes full.

In a priority queue, items are retrieved in priority order.

Pronounced “deck,” a deque is a queue that allows you to add or remove items from either end.

Uses—Other algorithms often use stacks and queues to hold items while they are being processed.

Chapter 6: Sorting

Insertion—As in insertionsort, the algorithm takes an item and inserts it into the correct position in some data structure.

Selection—As in selectionsort, the algorithm examines the items that have not yet been processed, finds the best one at that moment, and adds it to some data structure.

Decreasing range of interest—As in bubblesort, the algorithm keeps track of the range of interest and restricts that range over time to reduce the number of items it must consider.

A heap is a tree in which every node has value at least as large as its children's values. A heap can be used as a priority queue.

Storing a complete tree—An algorithm can store a complete tree in an array.

Divide and conquer—An algorithm breaks the problem into smaller pieces and solves the pieces, usually recursively.

Randomization—Quicksort can use randomization to reduce the chance of worst-case behavior because of items initially in a particular order. (This doesn't protect it if there are many duplicate items.)

Parallelization—Quicksort, mergesort, and bucketsort can be parallelized.

External sorting—Because of how it moves through memory, mergesort performs external sorting well.

Counting—If items have a limited range, you may be able to count them instead of sorting them as countingsort does.

Partitioning—Bucketsort partitions items into buckets to simplify the problem.

Picking the right algorithm—Sorting algorithms provide a good example of when it is important to pick the right algorithm for the problem. Picking the right algorithm can mean the difference between solving a problem in seconds, minutes, or years.

Chapter 7: Searching

A linear (exhaustive) search simply loops through all the possible items, looking for a target, giving it O(N) runtime. If the list is sorted, the algorithm can stop early if it passes the point where the item would be.

A binary search algorithm divides in half the area to be searched at each step, giving it O(log N) runtime. This concept applies to many problems.

An interpolation search algorithm uses interpolation to guess where a target item is, giving it an O(log(log N)) runtime.

Chapter 8: Hash Tables

Hashing requirements—Hashing requires a data structure to hold values, a hashing function to map values into the data structure, and a collision resolution policy.

Mapping—Hash tables use a hashing function to map values from one domain (names, employee IDs) to another domain (indices in an array). A good hashing function maps any input values to a random distribution of output values.

Chaining—Using linked lists to hold bucket overflow.

Sorted chaining—Sorting the linked lists improves performance in many algorithms.

Open addressing—Mapping data directly to array entries.

Marking items as deleted—Many data structures cannot easily remove items. Instead, you may be able to mark items as deleted and then reuse their spots later.

Clustering—In some algorithms, such as linear probing and quadratic probing, the probability of an item's landing in different places may not be equal. This can reduce the algorithm's efficiency.

Randomization—By randomizing data, you can sometimes avoid bad consequences. Double hashing uses a second hash function to avoid clustering.

Chapter 9: Recursion

Recursive metrics—For recursive algorithms, in addition to studying runtime and memory requirements, you must consider maximum depth of recursion and use of stack space.

Recursive approach—A recursive algorithm calls itself to solve a smaller problem. When the recursive calls reach a point where the problem is small enough, the algorithm solves it without recursion and returns to the calling instance.

Recursive definitions—Some sequences, such as the factorial function and Fibonacci numbers, have natural recursive definitions. Those lead easily to recursive algorithms, although they are not always as efficient as nonrecursive algorithms.

Self-similar fractals are curves that start with an initiator curve and then recursively replace parts of the curve with a suitably scaled, rotated, and translated generator curve.

Backtracking is a recursive algorithm that considers partial solutions. If it finds a partial solution that cannot be extended to a full solution, the algorithm discards the partial solution, backtracks to the previous feasible test solution, and continues searching from there.

A selection is an unordered subset of a set of objects. The number of selections without duplicates of k items taken from a total of n items is given by this equation:

images

The number of selections with duplicates of k items taken from a total of n items is given by images

A permutation is an ordered subset of items taken from a set. The number of permutations with duplicates of k items taken from a total of n items is given by nk. The number of permutations without duplicates is n × (n − 1) × (n − 2) × ... × (n − k + 1). For the special case where k = n, which is what most people think of as permutation, this is n!

Tail recursion removal—You can replace tail recursion with a loop that resets parameters before the end of the loop.

Storing intermediate values—If a calculation such as the Fibonacci numbers must recalculate the same values many times, you can save time by storing values in a lookup table so that you need to calculate them only once.

General recursion removal—You can remove recursion more generally by mimicking how a program calls a method recursively. Push variables onto stacks before recursion, and pop them off afterwards.

Chapter 10: Trees

Many algorithms use trees, so it's important to remember at least the most basic tree properties.

Logarithmic growth—If a tree is reasonably balanced, its height grows like the logarithm of the number of nodes it contains.

Lots of leaves—If a binary tree is reasonably balanced, roughly half its nodes are leaves.

Inductive reasoning—Proof by induction is a useful technique for proving tree theorems.

Branches and object references—You can use object references or pointers to link a node to its children. You can use similar references to create threads through trees. You also can use similar references to build networks.

Traversals—Preorder, inorder, postorder, and breadth-first traversals process the nodes in a tree in different orders.

Sorted trees—A sorted tree arranges its nodes so that they are processed in sorted order during an inorder traversal.

Threads—You can use special branches or links for threads that let you visit the nodes in a tree or network in unusual orders. A data structure can have as many threads as you find useful, although maintaining threads requires extra work.

Knowledge trees—The animal game uses a knowledge tree to represent questions and the results that answers to those questions give.

Expressions can be represented and evaluated as trees.

Quadtrees and octtrees subdivide two- or three-dimensional space to make locating objects fast.

Chapter 11: Balanced Trees

Amortized operations—Sometimes you can do a little extra work during common operations such as rebalancing a tree when you add or remove a value to avoid performing much longer operations later.

AVL trees use rotations to rebalance the tree, so the heights of two subtrees at any node differ by at most 1.

2-3 trees—Every internal node has either two or three children. If a node has three children and you add a node to it, it splits into two nodes with two children. If a node has two children and you remove one of them, the node rearranges with a sibling node or merges with a value in its parent.

B-trees—Every node holds between K and 2 × K values, where K is the tree's order. A node holding M values has M + 1 children. B-trees are a generalization of 2-3 trees and use similar node splitting and merging.

Top-down B-trees—When moving down into the tree to add a value, the algorithm splits any full nodes it encounters so that there is room if needed. This is another example of an amortized operation in which the algorithm does some extra work to make later operations easier.

Chapter 12: Decision Trees

You can model many problems with decision trees. A branch in a decision tree represents a single decision.

Game trees are a special kind of decision tree. A minimax strategy lets you minimize your opponent's board position. Game tree heuristics include precomputed initial moves and responses, looking for patterns, and numeric board locations (that may change over time).

Types of problems—Many complex problems come in two forms: a yes/no form and an optimization form. Decision trees help model the optimization form. If a decision tree doesn't find a solution to the yes/no form, you cannot conclude that a solution doesn't exist.

Examples:

  • To find an ordering of items, a branch at level K in the tree represents selecting one of the remaining items for the Kth position in the ordering. The tree has N! leaf nodes.
  • To select a subset of a collection of items, a branch at level K in the tree determines whether the Kth item is in the set. Each node has two branches. The tree has 2N leaf nodes.
  • To select a subset of M items from a collection of N items, a branch at level K in the tree selects one of the N − K remaining items. The tree has N × (N − 1) × ... × K leaf nodes.

Branch and bound can be much faster than exhaustive search for finding an optimal solution.

Heuristics—All these trees are enormous for large problem sizes, so often they cannot be solved exactly, and you must turn to heuristics. Heuristics that can apply to any decision tree include random search, improving paths, and simulated annealing. Hill climbing and sorted hill climbing often can give good results extremely quickly, although you need to define what “hill climbing” means for a particular problem.

Chapter 13: Basic Network Algorithms

For a review of network terminology, review the first section in Chapter 13, and see Table 13-1.

Some network representations use a separate Link class. Others store link information in the node where the link starts.

Traversals—Depth-first and breadth-first network traversal algorithms work much as tree traversal algorithms do, except that you need to add a property to the node class so that you can tell when you've already visited a node. If you don't do that, the traversal may get stuck in an infinite loop.

Connectivity—You can determine whether a network is connected starting from a given node by traversing the network from that node and then determining whether the traversal visited every node in the network.

A spanning tree is a tree that covers every node in the network. A minimal spanning tree has the least possible cost. The spanning tree algorithm described in Chapter 13 is a good example of a greedy algorithm.

A label-setting algorithm always adds items to the solution that will remain in the final solution. A label-setting shortest-path algorithm is a breadth-first traversal of the network.

A label-correcting algorithm adds items to the solution that may later be replaced with different items. A label-correcting shortest-path algorithm is a depth-first traversal of the network.

An all-pairs shortest-path algorithm finds paths between any two nodes in a network. It has polynomial runtime, but the O(N3) polynomial is large enough that the algorithm is slow for big networks. It also takes O(N2) memory, so it can use a lot of memory for large networks.

Chapter 14: More Network Algorithms

Topological sorting extends a partial ordering to a full ordering so that you can perform tasks in a feasible order. Topological sorting also lets you determine whether a network contains cycles.

If a map is two-colorable, it is easy to find a two-coloring. Determining whether a map is three-colorable is a very hard problem. You can color any map with at most four colors, but that doesn't mean it will be easy to find a four-coloring. Often a greedy algorithm finds a coloring without too many extra colors.

You can find maximal flows by using a (virtual) residual capacity network. An augmenting path shows how to improve the flows. You can use maximal flows to perform work assignment, to find a minimum flow cut, and to determine the number of disjointed paths from a source to a sink.

Chapter 15: String Algorithms

You can use parenthesis matching to help parse mathematical, logical, or other parenthesized expressions. You also can use parenthesis matching to build parse trees, which you can then use to evaluate expressions multiple times or generate data structures.

Regular expressions let a program determine whether a string contains a substring that matches a pattern. The algorithms described for working with regular expressions use DFAs and NFAs to process strings. DFAs and NFAs are also useful in other situations where you want to use a set of simple rules to control a virtual computer.

The Boyer-Moore algorithm lets a program check a string for a particular substring without examining every character in the string. In other situations you may be able to apply the idea that a simple test (checking the end of a target substring) may be able to exclude a large area where the target may appear.

The edit distance algorithm determines how similar two strings or files are. In any situation where you can define the types of changes that are allowed, you can use a similar approach to calculate how similar two objects are.

Chapter 16: Cryptography

For most programmers, these algorithms have only academic or entertainment value. They do demonstrate a couple of useful techniques, however.

Transposition ciphers—Even though several of these algorithms treat message text as if it were written into an array, you don't necessarily need to build the array if you can use simple calculations to figure out where a piece of text would be in the array.

One-time pads provide provably strong encryption. Their disadvantage is that you need to safely give the same pad to the sender and receiver.

Block ciphers break a message into blocks and then encrypt the blocks separately. Many of them also apply relatively simple operations to blocks repeatedly to increase obfuscation.

Public-key encryption algorithms essentially publish a partial transformation. Senders apply the partial transformation, and then the receiver finishes the transformation to recover the original message.

Other cryptographic algorithms include hashing and document signing.

Chapter 17: Complexity Theory

Complexity classes—Problems (not algorithms) are grouped into complexity classes depending on how difficult they are to solve.

P and NP—P is the set of problems that can be solved in polynomial time by a deterministic computer. NP is the set of problems that can be solved in polynomial time by a nondeterministic computer. Perhaps the most profound question in computer science is whether P and NP are equal.

DTIME and NTIME—A problem is in DTIME(f(N)) if it can be solved by a deterministic computer in O(f(N)) time. A problem is in NTIME(f(N)) if it can be solved by a nondeterministic computer in O(f(N)) time.

EXPTIME and NEXPTIME—A problem is in EXPTIME if it can be solved by a deterministic computer in exponential time. A problem is in NEXPTIME if it can be solved by a nondeterministic computer in exponential time.

Reduction—You can use polynomial-time reductions to show that a problem is at least as hard as another problem. (In everyday programming, you can sometimes reduce one problem to another so you don't need to come up with a completely new solution.)

NP-complete—A problem is NP-complete if it is in NP and all other problems in NP can be reduced to it.

Detection, reporting, and optimization—By using reductions, you can show that these forms of problems have equivalent difficulty, at least in terms of complexity theory.

Chapter 18: Distributed Algorithms

There are many kinds of parallelism, including systolic arrays, distributed computing with networked computers, multi-CPU processing with multiple CPUs or cores on a single computer, and quantum computing. Currently distributed computing with networked computers and multi-CPU computers is the most common form of parallel computing.

Debugging parallel algorithms can be very difficult because synchronization issues make it hard to reproduce incorrect behavior.

Embarrassingly parallel algorithms are those that naturally break into pieces that can be shared among several processes with minimal communication.

Mergesort has a naturally distributed implementation.

The dining philosophers problem addresses deadlock and livelock.

The two generals problem addresses insecure communications.

The byzantine generals problem and its variants deal with processes that may fail in the worst possible way, giving incorrect results rather than no results.

The consensus problem addresses the issue of multiple processes agreeing on a common result.

Leader election addresses the problem of picking a process to be the leader. The leader can then coordinate to help solve many other distributed problems.

A snapshot records the state of a distributed system, including all the processes' internal states and any messages that are in transit at the time.

Clock synchronization attempts to synchronize one process with another in the presence of communication delays.

Chapter 19: Interview Puzzles

Interview puzzles don't necessarily do a very good job of testing a candidate's reasoning and problem-solving abilities. They often rely on trivia, sneaky tricks, or working backwards from a solution to a problem. Candidates who know the trick can often solve these puzzles without much creative thought. Candidates who don't know the trick often have a hard time even if they are creative thinkers.

Puzzles that involve programming are better than those that involve marbles or decks of cards. Programming challenges that don't involve clever tricks are even better than programming puzzles.

If an interview includes a puzzle that is too hard, it may rattle the candidate, and you won't learn much about his or her normal behavior for the rest of the interview (or day).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.22.58