Chapter 22 Processing in the DW 2.0 environment

The distinguishing characteristics of the DW 2.0 environment are the types of data found in the various sectors. In many ways the data and the sectors define DW 2.0. But from an architectural standpoint there is more to DW 2.0 than a data architecture. Another way to perceive DW 2.0 is through the processes found in the various environments or sectors.

There are various types of transactions and processes that run through the DW 2.0 environment. Perhaps the simplest of these transactions is the simple request for data. Figure 22.1 shows a simple request for data.

images

FIGURE 22.1 A simple access transaction.

In a simple request for data there is a desire to find one or two rows of data and then to display them interactively. This simple kind of transaction uses very few system resources and is very simple in terms of its logic. This type of transaction is found in the online environment because it is very easy to get good performance when the system is executing this kind of transaction.

The transaction depicted here is predetermined. It and its logic are preprogrammed so that the end user merely sets the transaction in motion for it to execute.

A variation of the simple transaction is the complex transaction. The complex transaction usually looks at more rows of data than the simple transaction. And the complex transaction entails quite a bit of logic, something not found in the simple transaction. As long as the complex transaction does not require too many rows of data to execute, it can be freely mixed in the interactive workload with no serious degradation of performance.

Figure 22.2 shows a complex transaction.

images

FIGURE 22.2 A complex transaction.

Complex transactions are almost always preprogrammed. They are merely set in motion by the end user for the program to be executed.

Another variation of the simple transaction is the transaction that is created on an ad hoc basis. Figure 22.3 depicts an ad hoc transaction.

images

FIGURE 22.3 An ad hoc transaction.

An ad hoc transaction is usually fairly simple. It is not often that there is complex logic found in the ad hoc transaction. As a rule the ad hoc transaction does not look at too many rows of data. But occasionally an end user will submit an ad hoc query that looks at a very large amount of data. When an ad hoc transaction that looks at very large amounts of data is set in motion, performance suffers.

For this reason there usually are not too many ad hoc transactions that are found in the interactive environment. Instead it is much more normal for the ad hoc query to be found in the integrated environment.

In many cases the ad hoc query is found in the data mart environment. The ad hoc queries are often produced by business intelligence software. The end user actually does not enter anything but parameters into the business intelligence software. Once the parameters are written, the software actually shapes the query.

Another kind of query is the access query. The access query differs from the simple access query in that the access query often accesses many rows of data.

Figure 22.4 shows an access query.

images

FIGURE 22.4 An access transaction.

The logic behind an access query is usually fairly simple. However, the volume of data touched by the access query can be considerable.

Access queries are used by analysts to scan entire vistas of data. There are times when looking at one or two rows of data simply does not supply the needed information. Because they access lots of data, access transactions are not normally run in the Interactive Sector. If they are run there, they are run at odd times of the day when there is no deleterious effect on the overall performance of the system. Instead it is much more common to see access transactions run in the Integrated and Archival Sectors. Once in a great while an access query is run in the near-line environment.

Another common type of process found in the DW 2.0 environment is the transformation process. The transformation process is one in which whole vistas of data are accessed, altered, and written out as another file. Transformation processes are almost never run in the interactive environment during peak-period processing.

Figure 22.5 illustrates a transformation process.

images

FIGURE 22.5 A transformation process.

Transformation processes usually entail complex algorithms. In some cases the transformation process entails seriously complex processes. For this reason it is unusual for a transformation process to be written in any other than a preprogrammed basis. Stated differently, ad hoc transformation processes are almost never ad hoc in nature.

One of the by-products of the transformation process is metadata. The transformations that the transformation process executes are formed by metadata. Therefore, as documentation of the processing, the metadata is written out and is useful for many people in the DW 2.0 environment.

Transformation processing applies to both structured and unstructured data.

Yet another type of process is statistical processing. Statistical processing is useful for a mathematical analysis of large amounts of data. In almost every case statistical processing entails the access of many rows of data. For this reason statistical processes are not run when online response time is an issue.

Figure 22.6 shows a statistical process.

images

FIGURE 22.6 A statistical transaction process.

Statistical processes often entail complex processing logic. They are often a part of a stream of analysis known as heuristic processing. In heuristic processing one step of analysis is not obvious until the immediately preceding step is complete.

As such, heuristic processing mandates a sort of ad hoc processing.

The various sectors of the DW 2.0 environment typically run different kinds of processes.

Figure 22.7 shows the kinds of processes found in the Interactive Sector.

images

FIGURE 22.7 What the workload of the interactive environment looks like, for the most part.

In the Interactive Sector are found simple transactions and complex transactions. There are no statistical processes. There are no access processes. There are only transactions that can be run in a disciplined manner and that are run when there is no conflict for resources. In other words, the job stream of the interactive environment is for small, fast-running, well-disciplined transactions.

Anything other than small, fast-running, well-disciplined transactions disrupts the flow of transactions and has a negative impact on performance.

The integrated environment runs a wide variety of transactions. Figure 22.8 shows the transaction types that are run in the integrated environment.

In Figure 22.8 it is seen that transformation processing is run as data enters the integrated environment. Once the environment has been created, the environment has ad hoc transactions run, access transactions run, and complex transactions run.

The net result of the transactions run in the integrated environment is a mixed workload. Because the workload is mixed, the overall system performance is spotty.

The near-line processing is shown in Figure 22.9.

images

FIGURE 22.8 What processing in the integrated structured environment looks like.

images

FIGURE 22.9 There is little or no end-user transaction processing in the near-line environment.

There are actually very few transactions run in the near-line environment. About the only two types of transactions run in the near-line environment are access transactions and replacement transactions. A replacement transaction is a specialized transaction in which small amounts of data are taken from the near-line environment and placed in the Integrated Sector.

There are actually very few transactions that are run in the archival environment. However, the transactions that are run there are usually very resource intensive transactions.

Figure 22.10 shows the transaction types that run in the archival environment.

The types of transactions that are common to the archival environment are the statistical process and the access transactions. If passive indexes have been built, then oftentimes processing in the archival environment is reasonably efficient. But if passive indexes have not been built, then it is normal to have to do full data base scans in the archival environment.

images

FIGURE 22.10 Only access and statistical processing is found in the archival environment.

Online and high performance is not an issue in the archival environment.

The only environment in which processing occurs for unstructured data is the integrated environment.

Figure 22.11 illustrates the unstructured integrated environment and the types of transactions that are typically found there.

There is a polyglot of transactions found in the unstructured integrated environment. Simple transactions, simple ad hoc transactions, complex transactions, and access transactions are found in the unstructured Integrated Sector. In addition, data that is placed in the unstructured integrated environment is passed through the transformation process for unstructured data.

images

FIGURE 22.11 All sorts of analytical processing are found in the unstructured integrated environment.

SUMMARY

A part of the DW 2.0 environment is processing. Some of the various kinds of processes found in the DW 2.0 environment include

simple transactions;
complex transactions;
transformations;
access transactions;
statistical transactions.

Because of the data found there and the performance characteristics of the various sectors, different kinds of processes have an affinity for different places in the DW 2.0 environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.145.173