3. Data Structures

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Data Structures

Data structures are often considered the domain of other programming languages, and not of SQL. In this chapter, we show that data structures can also be very useful for modeling problems and implementing algorithms in SQL. To solve a real-world problem, it is often helpful to have a number of abstract data models available. If you can adapt the problem to be solved to an abstract model, the implementation of a solution is usually much easier.

Abstract data structures help you solve problems that otherwise seem complex. In this chapter, we show you how to perform operations on linear structures such as lists, stacks, and queues in SQL. We also show several recipes for working with multidimensional data structures, such as matrices and arrays. Please note that by “multidimensional,” we are not referring to OLAP data structures, but to arrays and matrices as programmatic data structures.

Types of Data Structures

In this chapter, we plan to discuss the following three major types of data structures:

Lists
Stacks and queues
Arrays and matrices

Lists, stacks, and queues are all linear data structures; the term linear referring to the fact that, conceptually, you’re dealing with a set of elements arranged in the form of a line. The use of SQL is well-suited to such structures because they map easily onto the relational table paradigm. Arrays and matrices, on the other hand, are multidimensional data structures. You can use SQL to work with arrays and matrices, but it’s better-suited for linear structures.

Lists

Lists are the most common linear data structure in computing. In contrast to other programming languages, lists are the easiest type of data structure to implement in SQL. A list is a sequence of elements with a known order. A list is unlimited in size and can shrink or grow on demand. Elements can be added to a list, removed from a list, and you can query to see whether a given element is in a list.

As you can see, the properties of a list are very similar to the properties of a SQL table, with the exception that a list has an order to it. That is why lists are so easy to implement. SQL tables are also unlimited in size, can be queried, and allow for entries to be added (inserted) and removed (deleted).

In this chapter we use a SQL table with an index to define a list:

CREATE TABLE List (
   Id INTEGER, 
   ...
)

A list in SQL is, therefore, a table with at least two attributes. One attribute is an index identifying the order of each element in the list. Any other attributes are data-carrying attributes, meaning that they hold the actual data in the list.

When you implement a list in SQL, it’s very helpful if your list index is arithmetically increasing. By arithmetically, we mean that your list index should increase by 1 as new elements are appended to the list. We are going to assume in this chapter’s recipes that our lists have arithmetically increasing indices.

All basic list operations are directly implementable in SQL. INSERT is used to add a new element to a list, DELETE is used to remove an element from a list, and SELECT is used to query for elements by value or by index. These operations are so straightforward that we won’t show recipes for them in this chapter.

Lists are of particular interest in SQL, since they allow for pattern matching and for cumulative aggregation queries. The most important pattern-matching queries are for finding regions, runs, and sequences. Having said that, let’s take a look at what those terms mean.

Cumulative aggregation queries

A cumulative aggregation query is similar to a running aggregate query, with only one difference. A running aggregate is the type produced by the GROUP BY clause in conjunction with an aggregate function, and it generates one aggregate value for each group of elements. Cumulative aggregates, on the other hand, include all elements from the beginning of the list. For example, you might produce a report showing sales in January, sales in January and February, sales in January through March, and so forth. While running aggregates are used mainly in statistics, cumulative aggregates are used mainly in combinatorial problems.

Regions

A region is a continuous pattern in a list in which all values of the elements are the same. A typical usage of such a pattern is when you look for empty slots in a list. Let’s say that you have a list where elements can be either empty or full slots. If you are looking for five sequential empty slots, you are actually looking for a region of that size. The most typical example is when you build a warehouse application and you have a triple-size container. To store it, you need to find an empty slot that could fit three containers of regular size. Therefore, you are looking for a region of size three.

Runs

A run is a continuous pattern in which values are increasing monotonically. Each value is larger than the previous one. For example, say that you have a list of consecutively taken measurements for a volcano. In a given period of time, you may find that the temperature of the volcano were consistently rising. The measurements in that period of time, each representing a higher temperature than the one previous, would be considered a run.

Sequences

A sequence is a run in which values form an arithmetic progression with a difference of 1. Each value is exactly one increment larger than the previous value. With respect to the volcano example that we used to describe a run, a series of temperature measurements in which each measurement was exactly 1 degree higher than the previous would represent a sequence.

Stacks and Queues

When building a software system, there is often a requirement to serialize data that is being processed. In such cases, the following special linear structures are particularly useful:

Stack
Queue
Priority queue

Stacks

A stack is a special kind of list in which all addition and removal of elements takes place on one side. A stack is a last in, first out (LIFO) structure. The LIFO concept is one that accountants and computer programmers are equally likely to understand. Each element that is added to a stack is put on top of all the others. Only the topmost item is accessible at any given time, and items in a stack are removed from the top down.

The position of an element in the stack is maintained with its index. The top of the stack is the element with the highest index. To find the highest index, a simple query is needed:

SELECT MAX(Id) FROM Stack

Typical operations on a stack are the following:

PUSH: To add a new element onto a stack
POP: To remove the top element from the stack
TOP: To find the topmost element

Queues

A queue is a list to which items are added at the top and from which items are removed at the bottom. Thus, a queue implements a first in, first out (FIFO) mechanism. In all other aspects, a queue is exactly the same as a stack. Queues are probably the most used data structure of all because they provide a robust serialization mechanism.

Typical operations on queues are the following:

TOP: To find the top element
ENQUEUE: To add a new element to the queue
DEQUEUE: To remove an element from the queue

In a multiclient application, you can use a queue as a temporary repository for data. For example, several users might collect data through forms on their computers and store the data for each form in a queue. On the server side, you might have one or more interface applications that DEQUEUE and process the sets of data ENQUEUED by the client applications. Serialization is achieved because all client applications place their data into the queue, and all server-side applications pull that data out in the same order in which it was placed into the queue.

Priority queues

A priority queue is a special kind of a queue in which the top element — the element that is next in line to be DEQUEUED — is not the oldest one. A priority queue uses a priority property to find the top candidate, thus implementing a first in, priority out (FIPO) structure. No matter what the input order, items are DEQUEUED in priority order. The easiest mechanism for implementing a priority queue is to order the elements in the queue by the value of one of their properties. For example, given a queue of scheduling data, you might prioritize based on time, with the most imminent events taking priority over those that are farther out into the future. As you’ll see, it is very easy in SQL to implement priority queues once you have the basic mechanisms for stacks and queues in place.

Arrays and Matrices

As opposed to linear data structures, multidimensional data structures are not very natural to SQL. In this chapter, we discuss two types of multidimensional data structure:

Arrays
Matrices

It is possible to implement such structures in SQL, but their manipulation soon becomes inefficient, and it is probably worthwhile considering an external implementation where feasible. If you need to perform an operation on arrays or matrices and if you are using MS SQL 2000, consider the TABLE datatype. It provides a convenient way to work with arrays; however, it is limited only to stored procedures and can be used only as temporary storage.

Arrays

An array is a multidimensional data structure that, rather than having just one value, can have many values associated with it. An array can have one or more dimensions. A one-dimensional array allows you to store a list of values. A two-dimensional array allows you to store a grid of values, essentially a list of lists. A three-dimensional array allows you to store a cube of values, and so forth. SQL programmers typically must make some compromises when working with arrays to achieve reasonable performance. The most widely used implementation of an array is to have one column in a table store the index of dimension in an array and have other columns store values for the element in question. In other words, each row in an array table stores the coordinates of one array element together with its values.

One of the biggest dangers when working with arrays is to break an array’s structure. Therefore, you have to build some mechanism into your implementation to ensure that your array structures remain consistent. Otherwise, operations with arrays are fairly simple. The addition of a new element is as simple as an INSERT with new coordinates. Likewise, the removal of an element is as simple as executing a DELETE.

Matrices

A matrix is a special case of array. Matrices have only two dimensions, must be finite, and their indices must arithmetically increase (no gaps). We show operations on matrices to demonstrate that SQL can be used for algebraic operations. However, it is unlikely that you will have much use for matrices in business applications. In the recipes, we show you how to perform arithmetic operations on matrices, how to transpose matrices, how to print matrices, and more.

When working with matrices, SQL can be useful if you store a large number of small matrices in one table. An advantage over other languages is that SQL can easily perform an operation on many elements of a matrix. Other programming languages require you to use a FOR loop or some other kind of looping mechanism to step through all elements of a matrix to perform the operation.

Working Example

Because of the nature of the material in this chapter, we haven’t been able to apply one common example across all the recipes. We have a linear-structure example that we use in our recipes on lists, stacks, and queues. We extend that example to cover arrays. However, when it comes to matrices, we decided to use a simple, textbook-style example for clarity.

Linear Structures

You are building a quality-control system for a biotech company. One of the production lines produces a special liquid for laboratory research. A specific characteristic you need to measure is the liquid’s purity, which is indicated by a purity index. The normal level of purity is represented by a purity index of 100. Everything over 100 is assumed to be abnormal and requires some action to bring the product back into spec.

The liquid comes off the production line in containers. The table that stores quality control data contains the ID of each container along with the measured purity of the liquid within the container. The following CREATE TABLE statement shows the structure of this table. The ContainerId column contains the container ID numbers, and the Purity column contains the purity index values.

CREATE TABLE ProductionLine (
   ContainerId INTEGER,
   Purity INTEGER
)

Currently, the table contains purity information on 12 different containers:

ContainerId Purity      
----------- ----------- 
1           100
2           100
3           101
4           102
5           103
6           100
7           103
8           108
9           109
10          100
11          100
12          100

Your job, our job actually, is to develop some tools to help in evaluating the quality of the product and the functioning of the production machines.

Arrays

To demonstrate the use of arrays, we’ll extend our production-line example. If you need to control several production lines, you can represent their purity data as an array. For example:

CREATE TABLE ProductionFacility(
   Line INTEGER,
   ContainerId INTEGER,
   Purity INTEGER
)

At first glance, this table might not look like an array, but it is. Each row represents one purity-level reading, which is stored in the Purity column. The Line and ContainerId columns are the indices to the array. Given a line number and a container ID number, you can easily retrieve the associated purity value.

Our facility has four production lines, and our ProductionFacility table is currently populated with the following data:

Line        ContainerId Purity      
----------- ----------- ----------- 
0           1           100
0           2           100
0           3           100
1           1           102
1           2           103
1           3           100
2           1           103
2           2           108
2           3           109
3           1           100
3           2           100
3           3           100

The sequence of rows in this output is not important. And, while we’ve chosen to identify production lines by number, the production lines could easily be identified by a name or by some other identifier. However, the ContainerId values for each line must not only be numeric; they must be sequential for us to make use of the linear structure and related algorithms that we demonstrate in this chapter’s recipes.

Matrices

When you write a program to work with matrices, it’s usually best to store your matrices in a table just as you would store arrays. Use one row for each matrix element, and include the element’s coordinates as part of that row. For demonstration purposes, we are going to use the following Matrices table:

CREATE TABLE Matrices (
   Matrix VARCHAR(20),
   X INTEGER,
   Y INTEGER,
   Value INTEGER
)

Each row in the Matrices table represents one matrix element. The columns are used as follows:

Name: Associates a row with a specific matrix. Our table can hold multiple matrices, and each is given a unique name.
X: Holds the X coordinate of the element in question.
Y: Holds the Y coordinate of the element in question.
Value: Holds the value of the element in question.

We fill our sample table with the following:

Two 2 x 2 matrices, named A and B
One vector, named S
One 3 x 3 matrix, named D

Following is the output from a SELECT query against the Matrices table, showing the raw data that we are going to operate on in our recipes:

SELECT * FROM Matrices:

Name                 X           Y           Value       
-------------------- ----------- ----------- ----------- 
A                    1           1           6
A                    1           2           3
A                    2           1           4
A                    2           2           7
B                    1           1           6
B                    1           2           3
B                    2           1           5
B                    2           2           2
S                    1           1           5
S                    2           1           6
D                    1           1           3
D                    1           2           4
D                    1           3           5
D                    2           1           5
D                    2           2           6
D                    2           3           7
D                    3           1           8
D                    3           2           9
D                    3           3           0

Finding Regions

Problem

Find a region in a list. In our example, you must find all containers in the production line that have a purity index of 100. These represent normal production output. Furthermore, you want only cases where at least two containers in succession have a purity of 100. An odd container with a purity of 100 in the midst of a number of containers with bad purity levels is not to be reported as normal production output.

Solution

Look at the problem as a region finding problem and use the following code:

SELECT DISTINCT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE
   p1.Purity=100 AND p2.Purity=100 AND
   abs(p1.ContainerId-p2.ContainerId)=1

ContainerID 
----------- 
1
2
10
11
12

Discussion

Obviously, if it weren’t for the requirement to have at least two containers in a row with normal purity before reporting a container as normal, the result could be obtained by finding all containers with a purity level of 100:

SELECT * FROM ProductionLine
WHERE Purity=100

To return the correct result, we have to use a technique to find regions in the list. To find neighboring rows with the same value, we need two copies of the same table. We name them p1 and p2:

SELECT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2

Then, we filter out all rows that do not match the criterion of having one neighbor of the same value. The trick here to finding neighbors is calculating the distance between the p1.ContainerId and p2.ContainerId. If the distance is 1, the two elements are neighbors. If they have the same value, they should be included in the result:

SELECT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE 
   abs(p1.ContainerId-p2.ContainerId)=1

We then add another condition to the WHERE clause to restrict the results further to only those cases where the two neighboring containers have a purity of 100:

SELECT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE 
   p1.Purity=100 AND p2.Purity=100 AND
   abs(p1.ContainerId-p2.ContainerId)=1

Finally, in the SELECT clause, we use DISTINCT to eliminate repeated references to the same container:

SELECT DISTINCT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE
   p1.Purity=100 AND p2.Purity=100 AND
   abs(p1.ContainerId-p2.ContainerId)=1

You can try to run the query without the DISTINCT clause, and, as you’ll see using our sample data, it will return container ID 11 twice. This is because the 11th row has two neighbors with a purity of 100 (10 and 12) and, thus, is reported twice.

Reporting Region Boundaries

Problem

As in the previous recipe, you want to find regions in the data. However, now you want to report only the boundaries of the regions and not all members of the region. Reporting only region boundaries is useful when a data set is large and/or when the sizes of any regions are expected to be large.

Solution

To find region boundaries in our ProductionLine data, use the following query:

SELECT p1.ContainerId RegBeg, p2.ContainerId RegEnd
FROM ProductionLine p1, ProductionLine p2
WHERE (p1.ContainerId < p2.ContainerId) AND
   NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity!=100 AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity=100) 
      OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity=100))

RegBeg      RegEnd      
----------- ----------- 
1           2
10          12

Discussion

The solution presented here is based on the idea first published by Rozenshtein, Abramovich, and Birger (Optimizing Transact-SQL, Fremont, SQL Forum Press:1995) and is still considered a classic.

Just as before, we need two tables to find neighboring rows. We name them p1 and p2 and produce a join between them. We then write a “less than” condition in the main query’s WHERE clause to make sure that any p1 row represents an element with an index lower than the corresponding p2 row:

WHERE (p1.ContainerId < p2.ContainerId) AND

Then we write a subquery named p3, which makes use of a third instance of the ProductionLine table. For every candidate pair from the outermost query, we verify that there are no rows between candidates with a purity other than 100:

NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity!=100 AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId)

If there is even one row returned by this subquery, that means the region is broken (i.e., it is not a continuous region of purity=100), and the candidate pair should be discarded. However, this isn’t quite enough to get us to our desired result, because the subquery does not eliminate smaller regions that are wholly contained within larger regions. For example, in the region between 10 and 12, we would also find regions 10-11 and 11-12. The solution is in two additional conditions at the end of the subquery that check on the lower and higher boundaries for possible neighbors that comply with the region requirement:

(p3.ContainerId=p1.ContainerId-1 AND p3.Purity=100) OR
(p3.ContainerId=p2.ContainerId+1 AND p3.Purity=100)

This final set of conditions ensures that only regions that cannot be extended anymore, those that are the largest, are reported.

Limiting Region Size

Problem

You want to find regions of a certain size. In our example, let’s say that you want to find all those regions in which there are exactly two containers of 100 purity. Such regions can be packed automatically for shipment; others must be sorted manually.

Solution

Because we used increasing ContainerId numbers, we can limit the size of a region in the WHERE clause of our query by limiting the distance between indices using the formula SIZE-1. Since we are looking for the regions of size 2, we use 2-1 or 1:

SELECT p1.ContainerId RegBeg, p2.ContainerId RegEnd
FROM ProductionLine p1, ProductionLine p2
WHERE (p1.ContainerId < p2.ContainerId) AND
   p2.ContainerId-p1.ContainerId=1 AND
   NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity!=100 AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId))

RegBeg      RegEnd      
----------- ----------- 
1           2
10          11
11          12

Discussion

This query is commonly used in various combinatorial problems. It is similar to the query in the previous recipe, but with two important differences. First, it limits the size of the region to a fixed size. Since the table has arithmetically increasing ContainerId values, this can be achieved by a restriction on the difference between two indices.

p2.ContainerId-p1.ContainerId=1

The second difference is that you do not need to search for the largest possible regions, so the last two conditions in the WHERE clause of the previous recipe’s subquery can be omitted from this query.

It is very easy to extend this recipe to find all regions that are larger or smaller than a certain size. For example, to find all regions of two or more containers, use the following WHERE clause restriction:

...
p2.ContrainerId - p1.ContainerId >=1
...

In the same way, you could limit the query to all regions having five or fewer rows:

...
p2.ContrainerId - p1.ContainerId <=4
...

These last two modifications would return all candidate regions, even those smaller regions that are inside larger regions. Depending on the results you want, if you use one of these modifications, you may wish to add back those two WHERE conditions from the previous recipe that limited the regions returned to only those that are not contained within another larger region.

Ranking Regions by Size

Problem

You want to list all regions in the table, and you want to list them according to their size. With respect to our example, you wish to list all regions of two or more containers with a purity of 100, and you wish to sort that list by the number of containers in each region.

Solution

Use the following query to produce the desired list:

SELECT 
   p1.ContainerId RegBeg, p2.ContainerId RegEnd, 
   p2.ContainerId-p1.ContainerId+1 RegionSize
FROM ProductionLine p1, ProductionLine p2
WHERE (p1.ContainerId < p2.ContainerId) AND
   NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity!=100 AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity=100) 
      OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity=100))
ORDER BY p2.ContainerId-p1.ContainerId DESC

RegBeg      RegEnd      RegionSize  
----------- ----------- ----------- 
10          12          3
1           2           2

Discussion

As you can see, this query is similar to the one that is used to find regions. The added feature is the ORDER BY clause, which sorts the regions according to their size. It relies on the fact that the table uses an arithmetically increasing index through which the size of a region can be calculated based on the difference between the two indices making up the region’s borders.

Rather than just report the beginning and ending index for each region, this query uses the same calculation in the SELECT list as in the ORDER BY clause to report the size of each region in terms of the number of containers.

The query comes in handy when you have to prepare data for a best-fitting algorithm, and you wish to use the database to presort the data.

You can expand on the solution shown in this recipe, if you like, to show the smallest available region that is still larger than a given size. To do this, add a WHERE clause expression to limit the size of the regions that are sorted. For example:

SELECT TOP 1
   p1.ContainerId RegBeg, p2.ContainerId RegEnd, 
   p2.ContainerId-p1.ContainerId+1 RegionSize
FROM ProductionLine p1, ProductionLine p2
WHERE 
   (p1.ContainerId < p2.ContainerId) AND
   (p2.ContainerId-p1.ContainerId)>=2 AND 
   NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity!=100 AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity=100) 
      OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity=100))
ORDER BY p2.ContainerId-p1.ContainerId ASC

This query returns the smallest possible region that still fits into the limit. In this case, only the first region that fits the limitations is returned.

Working with Sequences

Problem

You want to find any arithmetically increasing sequence in the data. In our example, you want to find any sequences in which the purity level is arithmetically increasing (100, 101, 102, etc.). Any such sequences that are three or more containers long indicate that a production line is overheating.

Solution

Find sequences in the ProductionLine data using the following query:

SELECT 
   p1.ContainerId SeqBeg, p2.ContainerId SeqEnd,
   p2.ContainerId-p1.ContainerId+1 SequenceSize
FROM ProductionLine p1, ProductionLine p2
WHERE 
   (p1.ContainerId < p2.ContainerId) AND
   NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity-p3.ContainerId!=p1.Purity-p1.ContainerId AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND 
         p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId) 
      OR (p3.ContainerId=p2.ContainerId+1 AND 
         p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId))

SeqBeg      SeqEnd      SequenceSize 
----------- ----------- ------------ 
2           5           4
8           9           2

Discussion

This query uses a framework similar to that used to find regions. The difference is that the subquery contains a WHERE clause condition to identify sequences. To explain exactly how that works, let’s begin by looking at the raw data:

ContainerId Purity      Diff        
----------- ----------- ----------- 
1           100         99
2           100         98
3           101         98
4           102         98
5           103         98
6           100         94
7           103         96
8           108         100
9           109         100
10          100         90
11          100         89
12          100         88

Notice that the purity levels of containers 2-5 represent an arithmetically increasing sequence. Notice also that the container IDs represent an arithmetically increasing sequence. We can use the fact that both sequences are monotonic to our advantage. It means that, within any given sequence, the difference between the ContainerId and Purity will be a constant. For example:

100 - 2 = 98
101 - 3 = 98
...

We put knowledge of this pattern to use in the subquery, which uses the following WHERE condition to find any candidates that break the sequence:

p3.Purity-p3.ContainerId!=p1.Purity-p1.ContainerId

If any row in a candidate sequence (p3) has the same difference as the first row of the sequence (p1), it is a member of the sequence. If not, the candidate pair (p1, p2) should be discarded.

The rest of the framework is exactly the same as for finding regions, and you can easily extend it for sequences in the same ways that you can when finding regions. For example, to find only sequences larger than three rows, add the following WHERE condition:

p2.ContainerId-p1.ContainerId>=2 AND

For example:

SELECT 
   p1.ContainerId SeqBeg, p2.ContainerId SeqEnd,
   p2.ContainerId-p1.ContainerId+1 SequenceSize
FROM ProductionLine p1, ProductionLine p2
WHERE 
   (p1.ContainerId < p2.ContainerId) AND
   p2.ContainerId-p1.ContainerId>=2 AND
   NOT EXISTS(SELECT * FROM ProductionLine p3 
      WHERE (p3.Purity-p3.ContainerId!=p1.Purity-p1.ContainerId AND 
      p3.ContainerId BETWEEN p1.ContainerId AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND 
         p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId) 
      OR (p3.ContainerId=p2.ContainerId+1 AND 
         p3.Purity-p3.ContainerId=p1.Purity-p1.ContainerId))

SeqBeg      SeqEnd      SequenceSize 
----------- ----------- ------------ 
2           5           4

With this framework, you can use algorithms for regions and apply them to sequences with minimal changes to the code. Usually, you just have to add an additional condition such as the one we added in this recipe.

Working with Runs

Problem

You want to find runs in your table. In our example, you want to find any increasing (arithmetically and nonarithmetically) sequences of purity values.

Solutions

Use the following query:

SELECT 
   p1.ContainerId SeqBeg, p2.ContainerId SeqEnd
FROM ProductionLine p1, ProductionLine p2
WHERE 
   (p1.ContainerId < p2.ContainerId) AND
   NOT EXISTS(SELECT * FROM ProductionLine p3, ProductionLine p4 
      WHERE ( 
      p3.Purity<=p4.Purity AND 
      p4.ContainerId=p3.ContainerId-1 AND
      p3.ContainerId BETWEEN p1.ContainerId+1 AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity<p1.Purity) 
      OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity>p2.Purity))

SeqBeg      SeqEnd      
----------- ----------- 
2           5
6           9

Discussion

This query uses a framework similar to that which you’ve seen many times before in this chapter. Unlike a sequence, a run is a continuously increasing, though not necessarily monotonically increasing, series of values. Unlike the previous recipe in which we were looking for monotonically increasing sequences, we do not have a constant difference between ContainerId and Purity values. Consequently, we need a fourth table, p4 in this instance, to check for rows in the middle of a candidate interval that do not comply with the run requirement. This p4 table comes into play in the subquery, where we join it to p3.

For every element between p1 and p2, p3 and its predecessor are compared to see if their values are increasing:

p3.Purity<=p4.Purity AND 
p4.ContainerId=p3.ContainerId-1 AND
p3.ContainerId BETWEEN p1.ContainerId+1 AND p2.ContainerId

The BETWEEN clause limits the scope to rows between the borders (p1 and p2) of the candidate run in question. The p1 border is increased by 1, which covers all pairs within the scope. Note that there is always one less pair than the number of rows.

In a manner similar to other queries for regions and sequences, the last two conditions in the subquery’s WHERE clause ensure that the borders of the candidate run cannot be extended:

(p3.ContainerId=p1.ContainerId-1 AND p3.Purity<p1.Purity) OR
(p3.ContainerId=p2.ContainerId+1 AND p3.Purity>p2.Purity)

If a row can be returned to satisfy these conditions, then the run can be extended and should be rejected in favor of the larger run.

The common framework that this solution shares with earlier recipes allows you to take techniques presented earlier for regions and sequences and apply them to runs.

Cumulative Aggregates in Lists

Problem

You need to report cumulative totals and averages. With respect to our example, assume that the Purity value is, instead, a measure of weight, say kilograms. For packaging purposes, you want to see at which container the total weight of a production line’s output rises above 1,000. Likewise, you are interested to see how each additional container affects the average weight in a shipment.

Solution

Use the following query to calculate both a cumulative total and a running average weight in one pass:

SELECT 
   p1.ContainerId, SUM(p2.Purity) Total, AVG(p2.Purity) Average 
FROM ProductionLine p1, ProductionLine p2
WHERE 
   p1.ContainerId >= p2.ContainerId
GROUP BY p1.ContainerId 

ContainerId Total       Average     
----------- ----------- ----------- 
1           100         100
2           200         100
3           301         100
4           403         100
5           506         101
6           606         101
7           709         101
8           817         102
9           926         102
10          1026        102
11          1126        102
12          1226        102

Discussion

The code uses an old SQL trick for ordering. You take two instances of the ProductionLine table, named p1 and p2, and you cross-join them. Then you group the results by p1.ContainerId, and you limit the second table’s (p2’s) rows so that they have ContainerId values smaller than the p1 row to which they are joined. This forces the server to produce an intermediate result set that looks as follows:

p1_Id       p1_Purity   p2_Id       p2_Purity   
----------- ----------- ----------- ----------- 
1           100         1           100
2           100         1           100
2           100         2           100
3           101         1           100
3           101         2           100
3           101         3           101
4           102         1           100
4           102         2           100
4           102         3           101
4           102         4           102
5           103         1           100
...

Each group, identified by p1.ContainerId, includes all rows from p2 with lower or equivalent ContainerId values. The AVG and SUM functions are then applied to the p2_Purity column. The two functions work on p2 rows in each group and, thus, calculate cumulative results.

Implementing a Stack

Problem

You need to implement a stack data structure in SQL. With respect to our example, you need to build an interface to a processing machine that adds and removes containers to and from the production line. The production line should be handled as a stack. Therefore, you must implement the POP, PUSH, and TOP functions.

Solution

Use the ProductionLine table as a stack. The following sections show how to implement the various stack functions using SQL. Notice our use of the ContainerId column to keep stack elements in their proper order. Each new element pushed onto the stack gets a higher ContainerId value than the previous element. The top of the stack is defined as the row with the highest ContainerId value.

TOP function in SQL

Implementing the TOP function is very easy. You just use the SELECT statement to retrieve the most recently added row in the stack, which, by definition, will be the row with the highest ContainerId value.

SELECT TOP 1 * FROM ProductionLine ORDER BY ContainerId DESC

If you want to embed the query into a procedure and expand it, use the following framework:

CREATE PROCEDURE TopProduction
AS
SELECT TOP 1 * FROM ProductionLine ORDER BY ContainerId DESC

POP function in SQL

The POP function is implemented here as a procedure to give you a framework that you can build on. The first statement is our TOP function, which retrieves the topmost element in the stack. The second select prints that element. The delete statement then removes the element from the stack.

CREATE PROCEDURE Pop 
AS 
DECLARE @id INTEGER

SELECT TOP 1 @id=ContainerId FROM ProductionLine 
   ORDER BY ContainerId DESC
SELECT * FROM ProductionLine WHERE @id=ContainerId
DELETE FROM ProductionLine WHERE @id=ContainerId

PUSH function in SQL

The PUSH function adds a new element to the stack. The first SELECT retrieves the top Id. Then the new element is inserted. The last SELECT in the procedure prints the new top element so that you can verify that it was added correctly.

CREATE PROCEDURE Push @Purity INTEGER 
AS 
DECLARE @id INTEGER

SELECT TOP 1 @id=ContainerId FROM ProductionLine 
   ORDER BY ContainerId DESC
INSERT INTO ProductionLine(ContainerId,Purity) VALUES(@id+1, @Purity)
SELECT * FROM ProductionLine WHERE ContainerId=@id+1

Discussion

SQL is very convenient for the implementation of linear data structures. In this recipe, we work with an often encountered problem, in which serialization is required, but for which there is no need for a full-sized transactional system.

The code shown in our solution is a simplified version of a real-world system. If you want to use the concept in a live system, make sure that you make the procedures transactional. In addition, if more than one user is using the POP and PUSH functions, use some increment mechanism other than the plain MAX function used in our solution. For example, you can use SQL server’s native solutions, such as Microsoft’s IDENITY or UNIQUEIDENTIFIER datatypes to ensure uniqueness of id values. Be careful: such solutions can be costly and are not always applicable.

In order to test the functions, try the sequence in the following example:

PUSH 120


ContainerId Purity      
----------- ----------- 
13          120


PUSH 130


ContainerId Purity      
----------- ----------- 
14          130


POP

ContainerId Purity      
----------- ----------- 
14          130


POP

ContainerId Purity      
----------- ----------- 
13          120

As you can see, the functions work as expected, and they add or remove the elements on the stack.

Our sample functions simply display data retrieved from the stack, but you could easily store the return values in variables for further manipulation.

Following this pattern, you can implement stack mechanisms using any SQL table, as long as there is a column such as ContainerId that you can use to establish an ordering of the stack elements.

Implementing Queues

Problem

You need to implement a queue with standard operations, such as TOP, ENQUEUE, and DEQUEUE. With respect to our example, you wish to implement the same sort of functionality as in the previous recipe, but this time you wish to treat the production line as a queue, not as a stack.

Solution

Use the ProductionLine table as a queue. The following sections then show you how to implement the standard queuing functions.

TOP function in SQL

SELECT TOP 1 *  FROM ProductionLine ORDER BY ContainerId ASC

DEQUEUE function in SQL

CREATE PROCEDURE dequeue 
AS 
DECLARE @id INTEGER

SELECT TOP 1 @id=ContainerId FROM ProductionLine ORDER BY ContainerId ASC
SELECT * FROM ProductionLine WHERE @id=ContainerId
DELETE FROM ProductionLine WHERE @id=ContainerId

ENQUEUE function in SQL

CREATE PROCEDURE enqueue @Purity INTEGER 
AS 
DECLARE @id INTEGER

SELECT TOP 1 @id=ContainerId FROM ProductionLine ORDER BY ContainerId DESC
INSERT INTO ProductionLine(ContainerId,Purity) VALUES(@id+1, @Purity)
SELECT * FROM ProductionLine WHERE ContainerId=@id+1

Discussion

As you can see, the queue mechanisms are very similar to the stack mechanisms. In fact, the only difference is in the TOP function. When working with queues, the TOP function always looks for the oldest element in the table, not for the most recently added element. We accomplished this by ordering ASC rather than DESC.

To create the DEQUEUE function, we took the POP function that we used for our stack solution and changed the TOP statement (the first SELECT) so that the function became a DEQUEUE function. The PUSH and ENQUEUE functions actually use the same code, because the process for adding an element to a queue is the same as for adding an element to a stack.

Please note that since we are adding elements to the top and removing them from the bottom, the ContainerId value is always increasing. If, when implementing a queue, you think you might possibly run out of index values, you’ll need to code some sort of reset mechanism to wrap the index back around to the beginning once its upper limit is reached.

Implementing Priority Queues

Problem

You need to implement priority-based queues. In our example, the higher the purity index, the higher the priority. For these queues, you want to implement standard operations such as TOP, ENQUEUE, or DEQUEUE.

Solution

As with stacks and regular queues, we can implement the priority queue in the ProductionLine table.

TOP function in SQL

SELECT TOP 1 *  FROM ProductionLine ORDER BY Purity DESC

DEQUEUE function in SQL

CREATE PROCEDURE dequeue 
AS 
DECLARE @id INTEGER

SELECT TOP 1 @id=ContainerId FROM ProductionLine ORDER BY Purity DESC
SELECT * FROM ProductionLine WHERE @id=ContainerId
DELETE FROM ProductionLine WHERE @id=ContainerId

ENQUEUE function in SQL

CREATE PROCEDURE enqueue @Purity INTEGER 
AS 
DECLARE @id INTEGER

SELECT TOP 1 @id=ContainerId FROM ProductionLine ORDER BY ContainerId DESC
INSERT INTO ProductionLine(ContainerId,Purity) VALUES(@id+1, @Purity)
SELECT * FROM ProductionLine WHERE ContainerId=@id+1

Discussion

Priority queues use a framework almost identical to that used for stacks and regular queues. The difference, again, is only in how the TOP function is implemented. When you adjust TOP to look at the queue in terms of priority, in our case at the Purity column, all the other pieces fall into place. The ENQUEUE function is the same as for regular queues. Except for the use of a priority-based TOP function, the DEQUEUE function is also the same as that for regular queues.

Tip

When you use a table as a priority queue, the ENQUEUE function can no longer ensure a monotonically increasing index (as is the case with stacks and queues). That’s because the DEQUEUE function takes elements out of the queue based on their priority and not their index. For example, if you have 10 elements identified with index values 1 through 10 and the fifth element is removed because it has the highest priority, there will be a gap in the index. But when you add a new element, the ENQUEUE function will not fill that gap, but rather add the new element with an index value of 11. It’s easy to overlook this behavior, which can cause some confusion, so keep it in mind as you work with priority queues.

Comparing Two Rows in an Array

Problem

You want to check to see whether two rows in an array are equal. In our example, you want to check if two production lines in the ProductionFacility table are equal.

Solution

To check rows in the table for equality, use the following code. The result will be a list of rows in the array that are equivalent:

SELECT p1.Line p1_Line, 'is equal to', p2.Line p2_Line
FROM ProductionFacility p1, ProductionFacility p2
WHERE p1.Purity=p2.Purity AND p1.ContainerId=p2.ContainerId AND
   p1.Line<p2.Line
GROUP BY p1.Line, p2.Line
HAVING 
   COUNT(*)=(SELECT COUNT(*) FROM ProductionFacility p3 WHERE p3.Line=p1.Line) 
   AND
   COUNT(*)=(SELECT COUNT(*) FROM ProductionFacility p4 WHERE p4.Line=p2.Line) 

p1_Line                 p2_Line     
----------- ----------- ----------- 
0           is equal to 3

Discussion

This query ends up being quite expensive, using four table instances; as a result, it is a good demonstration of how SQL is not very efficient in working with arrays. However, expensive as it is, it does allow you to get results using only one query.

The FROM clause creates a cross-join between the two instances of the ProductionFacility. We name the two instances p1 and p2 for easier reference. We define in the SELECT statement that the result will report one line for each pair of rows that are equal. Since the cross-join produces many rows, we use the GROUP BY statement to limit the result to just one row of output per row in the array.

The WHERE clause specifies three conditions:

Purity levels must be equal.
Container IDs must be equal.
Production-line numbers from p1 must be less than those from p2.

If you work with multidimensional arrays, simply add additional comparison clauses to the WHERE clause to compare parameters for equality. To compare for full equality between two rows, you must have one comparison expression for each dimension in your array. In our example, the two comparison clauses involve the ContainerId and Line columns. The comparison expression involving the Purity columns is what we use to determine whether two array elements are equal. So a match on ContainerId and Line defines two elements that need to be compared, and the test of equality involves the Purity column.

The intermediate results at this point, without the GROUP BY clause, are as follows:

SELECT p1.ContainerId, p1.Purity, p1.Line, p2.Line
FROM ProductionFacility p1, ProductionFacility p2
WHERE p1.Purity=p2.Purity AND p1.ContainerId=p2.ContainerId AND
   p1.Line<p2.Line

ContainerId Purity      Line        Line        
----------- ----------- ----------- ----------- 
3           100         0           1
1           100         0           3
2           100         0           3
3           100         0           3
3           100         1           3

Add in the GROUP BY clause and we get:

SELECT COUNT(*) ContainerCount, p1.Line, p2.Line
FROM ProductionFacility p1, ProductionFacility p2
WHERE p1.Purity=p2.Purity AND p1.ContainerId=p2.ContainerId AND
   p1.Line<p2.Line
GROUP BY p1.Line, p2.Line

ContainerCount Line        Line        
-------------- ----------- ----------- 
1              0           1
3              0           3
1              1           3

The HAVING clause is the expensive one. It compares the number of matched pairs from the WHERE clause to the number of columns in both rows. The first subquery checks for the number of rows in p1, and the second, for the number of rows in p2. The HAVING clause makes sure that only lines of equal size are reported in the final result. In our example, each production line has produced three containers. Looking at the intermediate results shown here, you can see that the only two production lines with a container count of three are lines 0 and 3. The HAVING clause ensures that those are reported as the final output from the query.

Printing Matrices and Arrays

Problem

You want to print a matrix and an array.

Solution

Use the following pivoting technique, which, in this case, prints matrix D:

SELECT  X, 
   MAX(CASE Y WHEN 1 THEN Value END) y1,
   MAX(CASE Y WHEN 2 THEN Value END) y2,
   MAX(CASE Y WHEN 3 THEN Value END) y3
FROM Matrices
WHERE Matrix='D'
GROUP BY X
ORDER BY X

X           y1          y2          y3          
----------- ----------- ----------- ----------- 
1           3           4           5
2           5           6           7
3           8           9           0

Discussion

See the discussion on the use of Pivot tables in Chapter 1. Note particularly that the number of CASE expressions must match the Y dimension of the matrix. In this case, we know the matrix we want to print has three columns, so we wrote three CASE expressions.

Let’s say that you want to print an array in a report-like fashion with each dimension in a separate column. In our example, you wish to print a report of purity levels for all containers in all production lines, and you wish each production line to be represented by its own column.

Use the same pivoting technique as used earlier in the recipe for printing matrices:

SELECT  ContainerId, 
   MAX(CASE Line WHEN 0 THEN Purity END) Line0,
   MAX(CASE Line WHEN 1 THEN Purity END) Line1,
   MAX(CASE Line WHEN 2 THEN Purity END) Line2,
   MAX(CASE Line WHEN 3 THEN Purity END) Line3
FROM ProductionFacility
GROUP BY ContainerId
ORDER BY ContainerId

ContainerId Line0       Line1       Line2       Line3       
----------- ----------- ----------- ----------- ----------- 
1           100         102         103         100
2           100         103         108         100
3           100         100         109         100

Transposing a Matrix

Problem

You want to transpose a matrix. To transpose a matrix, swap all X and Y values. For example, an element located at X=1, Y=2 will be swapped with the element located at X=2, Y=1.

Solution

Follow the pattern used for the following query, which transposes matrix D:

SELECT Y AS X, X AS Y, Value 
FROM Matrices
WHERE Matrix='D'

X           Y           Value       
----------- ----------- ----------- 
1           1           3
2           2           4
3           3           5
1           1           5
2           2           6
3           3           7
1           1           8
2           2           9
3           3           0

Discussion

Transposition is probably one of the easiest operations on matrices. The only thing you have to do is to report X as Y and Y as X, and that transposes the matrix. If you wish to store your transposition — this recipe only prints the transposed version — you can write an INSERT . . . SELECT . . . FROM statement:

INSERT INTO Matrices
SELECT 'Dt',Y, X, Value 
FROM Matrices
WHERE Matrix='D'

This statement transposes matrix D and stores the results in a new matrix named Dt.

Calculating a Matrix Trace

Problem

You want to calculate the trace of a matrix. The trace of a matrix is the summation of values along the matrix’s main diagonal.

Solution

Use the following query to calculate the trace of a matrix. In this case, we calculate the trace of the matrix D.

SELECT SUM(Value) Trace 
FROM Matrices
WHERE Matrix='D' and X=Y

The result:

Trace       
----------- 
9

Discussion

When the X and Y coordinates of an element are the same, the element is on the main diagonal. The WHERE clause in this query restricts the results to only those elements. We need to add those elements together, which we do using the SUM function, and we have the trace of the matrix.

Comparing Two Matrices for Size

Problem

You want to compare two matrices to see whether they are equal in size. By equal in size, we mean that their highest X and Y dimensions are the same.

Solution

To compare matrices A and B for size, use the following query:

SELECT m1.Matrix, 'is of equal size as', m2.Matrix
FROM Matrices m1, Matrices m2
WHERE m1.X=m2.X AND m1.Y=m2.Y AND m1.Matrix='A' AND m2.Matrix='B'
GROUP BY m1.Matrix, m2.Matrix 
HAVING 
   COUNT(*)=(SELECT COUNT(*) FROM Matrices WHERE Matrix='A') 
   AND COUNT(*)=(SELECT COUNT(*) FROM Matrices WHERE Matrix='B')

Matrix                                   Matrix               
-------------------- ------------------- -------------------- 
A                    is of equal size as B

Discussion

Some matrix operations require that the matrices involved are the same size. Use the query in this recipe to verify that such is the case.

First, we create two instances of the Matrices table (m1 and m2) and restrict each to one of the matrices that we are interested in. In our case, m1 represents matrix A, while m2 represents matrix B. If the matrices are equal, this will give us two rows for each combination of X and Y index values.

Next, in the WHERE clause, we match the coordinates of the two matrices. The GROUP BY clause is used so that query reports only one row of output. The results are grouped by the two matrix names. The HAVING clause then tests to ensure that the total number of rows summarized matches the total number of elements in A and B. If the totals all match, the two matrices are the same size.

Adding and Subtracting Matrices

Problem

You want to add or subtract matrices in the table.

Solution

To add matrices A and B, use:

SELECT DISTINCT m1.X, m2.Y, m1.Value+m2.Value Value
FROM Matrices m1, Matrices m2
WHERE m1.Matrix='A' AND m2.Matrix='B'
   AND m1.X=m2.X AND m1.Y=m2.Y

X           Y           Value       
----------- ----------- ----------- 
1           1           12
1           2           6
2           1           9
2           2           9

To subtract matrix B from matrix A, use this query, but replace the plus sign with a minus sign. The results of A-B are:

x           y           Value       
----------- ----------- ----------- 
1           1           0
1           2           0
2           1           -1
2           2           5

Discussion

This code follows the definitions of matrix addition and subtraction from algebra. To add two matrices, they must be of the same dimension (i.e., they must be equal), and then you just add elements on the same coordinates. Subtraction works the same way, except that you subtract element values rather than add.

The trick to this recipe’s solution is in matching the elements on the same coordinates from the two matrices. We assume that the matrices are already of the same dimension; in other words, we assume they are equal. Then, we create two instances of the Matrices table (m1 and m2). We restrict m1 in the WHERE clause so that it represents matrix A, and we restrict m2 so that it represents matrix B. The elements of each matrix are now matched, and the plus or minus operator in the SELECT clause calculates the sum or difference.

Multiplying Matrices

Problem

You want to implement matrix multiplication in SQL.

Solution

There are three ways that you can multiply a matrix:

By a scalar value
By a vector of values
By another matrix

When multiplying by a vector, the length of the vector must correspond to the maximum X index. When you multiply two matrices, the matrices must be equal.

Multiplying a matrix by a scalar value

To multiply matrix A by scalar 5, just multiply all rows of that matrix by 5:

SELECT DISTINCT X, Y ,Value*5 Value
FROM Matrices
WHERE Matrix='A'

X           Y           Value       
----------- ----------- ----------- 
1           1           30
1           2           15
2           1           20
2           2           35

Multiplying a matrix with a vector

To multiply matrix A by scalar S, use the following query:

SELECT m1.X, SUM(m1.Value*v.Value) VALUE
FROM Matrices m1, Matrices v
WHERE m1.Matrix='A' AND v.Matrix='S' AND m1.Y=v.X 
GROUP BY m1.X

X           Value       
----------- ----------- 
1           48
2           62

Multiplying two matrices

To multiply matrix A by matrix B, use the following code:

SELECT m1.X, m2.Y, SUM(m1.Value*m2.Value) Value
FROM Matrices m1, Matrices m2
WHERE m1.Matrix='A' AND m2.Matrix='B' AND m1.Y=m2.X
GROUP BY m1.X, m2.Y

X           Y           Value       
----------- ----------- ----------- 
1           1           51
2           1           59
1           2           24
2           2           26

Discussion

The biggest danger while working with matrices is in the confusion of indices. The SQL statements in this recipe can be used only if matrices or vectors are represented exactly as in our example. In any case, it is probably a good idea to check the indices carefully in the query.

Another issue to be concerned about is that you must ensure that you are multiplying data with the appropriate dimensions. When multiplying two matrices, their dimensions must match. When multiplying a matrix by a vector, the dimension of the vector must match the X dimension of the matrix. While it’s possible to extend these queries to check for dimensional equality, this significantly increases the cost. If you can, it’s best to build such checking mechanisms somewhere else.

Multiplying by a scalar

The easiest of multiplications uses SQL features to extract all elements of a matrix easily and just multiply them with a scalar. The SELECT list of such a query simply uses multiplication, in our case Value*5, to return the specified results.

Multiplying by a vector

Multiplication of a matrix by a vector is a bit more difficult. In our example, if we write down the matrix A and the vector S together, we will get the following:

Matrix A:   6   3
            4   7

Vector S:   5   6

Algebraic rules state that the first vector element multiplies values in the first matrix column, the second vector element multiplies values in the second matrix column, and so forth. This gives us the following matrix of values:

6x5   3x6
4x5   7x6

The final step is to sum all the values in each row of this matrix, so the result is a vector:

6x5 + 3x6 = 30 + 18 = 48
4x5 + 7x6 = 20 + 42 = 62

As you can see, the result of multiplying a matrix by a vector is another vector. In our case, the result vector is as follows:

48   62

Multiplying by a matrix

The query to multiply two matrices together uses the same principle as the query for multiplying a matrix by a vector. The query cross-matches the elements according to their position, performs multiplications, and sums the results of those multiplications so that the result is a vector. In our example, the following two matrices are multiplied together:

Matrix A      Matrix B
   6   3         6   3
   4   7         5   2

When we say that in matrix multiplication you “cross-match” elements, we mean that that X,Y values from one matrix are multiplied by the corresponding Y,X values from the other. For example, element 1,2 from matrix A must be multiplied by element 2,1 from matrix B. In our example, this cross-matching yields the following multiplications:

6*6   3*5
4*6   7*5
6*3   3*2
4*3   7*2

The results must then be summed into a vector:

6*6 + 3*5 = 36 + 15 = 51
4*6 + 7*5 = 24 + 35 = 59
6*3 + 3*2 = 18 +  6 = 24
4*3 + 7*2 = 12 + 14 = 26

Squaring a matrix

The matrix multiplication query can easily be modified to square a matrix. To square a matrix is to multiply it by itself. The only thing that has to be changed is that both m1 and m2 must be restricted to the same matrix. In the following example, m1 and m2 both represent matrix A:

SELECT m1.X, m2.Y, SUM(m1.Value*m2.Value) Value
FROM Matrices m1, Matrices m2
WHERE m1.Matrix='A' AND m2.Matrix='A' AND m1.Y=m2.X
GROUP BY m1.X, m2.Y

The results are then the square of A:

X           Y           Value       
----------- ----------- ----------- 
1           1           48
2           1           52
1           2           39
2           2           61

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Data Structures

Create new playlist

Sign In

Sign Up

Chapter 3. Data Structures

Types of Data Structures

Lists

Cumulative aggregation queries

Regions

Runs

Sequences

Stacks and Queues

Stacks

Queues

Priority queues

Arrays and Matrices

Arrays

Matrices

Working Example

Linear Structures

Arrays

Matrices

Finding Regions

Problem

Solution

Discussion

Reporting Region Boundaries

Problem

Solution

Discussion

Limiting Region Size

Problem

Solution

Discussion

Ranking Regions by Size

Problem

Solution

Discussion

Working with Sequences

Problem

Solution

Discussion

Working with Runs

Problem

Solutions

Discussion

Cumulative Aggregates in Lists

Problem

Solution

Discussion

Implementing a Stack

Problem

Solution

TOP function in SQL

POP function in SQL

PUSH function in SQL

Discussion

Implementing Queues

Problem

Solution

TOP function in SQL

DEQUEUE function in SQL

ENQUEUE function in SQL

Discussion

Implementing Priority Queues

Problem

Solution

TOP function in SQL

DEQUEUE function in SQL

ENQUEUE function in SQL

Discussion

Tip

Comparing Two Rows in an Array

Problem

Solution

Discussion

Printing Matrices and Arrays

Problem

Solution

Discussion

Table of Contents for
3. Data Structures