Mining sequence dataset

Sequential pattern mining is the major task for sequence dataset mining. The A-Priori-life algorithm is used to mine sequence patterns that use the A-Priori-life algorithm, which applies a breath-first strategy. However, for the pattern-growth method, a depth-first strategy is used instead. The algorithm sometimes integrates with constraints for various reasons.

The common purchase patterns of the customers of the store can be mined from sequential patterns. In other aspects, especially advertisement or market campaign, sequential patterns play an important role. The individual customer's behavior can be predicted from sequential patterns in the domain of web log mining, web page recommendation system, bioinformatics analysis, medical treatment sequence track and analysis, and disaster prevention and safety management.

The rules in this chapter, which are mined from sequence patterns, are of many types. Some of them are listed as follows:

  • A sequential rule is Mining sequence dataset, where Mining sequence dataset
  • A label sequential rule (LSR) is of the form Mining sequence dataset, where Y is a sequence, and X a sequence generated from Y by replacing some of its items with wildcards
  • A class sequential rule (CSR) is defined as X if:
    Mining sequence dataset

Sequence dataset

A sequence dataset S is defined as a set of tuples, (sid, s), in which sid is a sequence ID, and s is a sequence.

The support of a sequence X in a sequence dataset S is the number of tuples in S, which contains X: Sequence dataset.

Here is a property intrinsic to sequential patterns, and it is applied to related algorithms such as the A-Priori property for the A-Priory algorithm. For a sequence X and its subsequence Y, Sequence dataset.

The GSP algorithm

The generalized sequential patterns (GSP) algorithm is an A-Priori-like algorithm, but it is applied to sequence patterns. It is a level-wise algorithm and has a breadth-first approach. Here is the feature list:

  • GSP is an extension of the A-Priori algorithm

    It uses the A-Priori property (downward-closed), that is, given the minimum support count, if a sequence is not accepted, all its super sequence will be discarded.

  • The features require multiple passes of the initial transaction dataset
  • It uses the horizontal data format
  • In each pass, the candidate's set is generated by a self-join of the patterns found in the previous pass
  • In the k-pass, a sequence pattern is accepted only if all its (k-1) subpatterns are accepted in the (k-1) pass

The overview of GSP algorithm goes here.

The GSP algorithm

Here is the pseudocode:

The GSP algorithm
The GSP algorithm
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.206.204