Web sentiment analysis and CLIQUE

CLustering In QUEst (CLIQUE) is a bottom-up and grid-based clustering algorithm. The idea behind this algorithm is the Apriori feature, that is, the monotonicity of dense units with respect to dimensionality. If a set of data points, S, is a cluster in a k-dimensional projection of the space, then S is also contained in a cluster in any (k-1)-dimensional projections of this space.

The algorithm proceeds by passes. The one-dimensional dense units are produced by one pass through the data. The candidate k-dimensional units are generated using the candidate-generation procedure and the determined (k-l)-dimensional dense units that are fetched at the (k-1) pass.

The characteristics of the CLIQUE algorithm are as follows:

  • Efficient for high-dimensional datasets
  • Interpretability of results
  • Scalability and usability

The CLIQUE algorithm contains three steps to cluster a dataset. First, a group of subspaces is selected to cluster the dataset. Then, clustering is independently executed in every subspace. Finally, a concise summary of every cluster is produced in the form of a disjunctive normal form (DNF) expression.

The CLIQUE algorithm

The summarized pseudocode for the CLIQUE algorithm is as follows:

  1. Indentification of subspaces that contain clusters.
  2. Indentification of cluster.
  3. Generation minimal description for the cluster.

The candidate-generation algorithm is illustrated as follows:

The CLIQUE algorithm

Here is the algorithm to find the connected components of the graph; this is equivalent to finding clusters:

The CLIQUE algorithm

The R implementation

Please take a look at the R codes file ch_06_clique.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_06_clique.R")

Web sentiment analysis

Web sentiment analysis is used to identify the idea or thought behind the text, for example, the sentiment analysis of microblogs such as Twitter. One simple example is comparing the post with a predefined labeled word list for sentiment judging. Another example is that we can judge a movie review as thumbs up or thumbs down.

Web sentiment analyses are used in news article analyses of biases pertaining to specific views, newsgroup evaluation, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.11.211