Chapter 2: Mathematical Foundations (5/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

= a

i,1

+ · · · + a

i,k

+ · · · + a

i,n

weighted by the mixing weights a

i,k

. The same generative model can be

written in vectorial form as x = ¹

k=1

, where the observed random vector

x is represented by the basis vectors a

= (a

1,k

, · · · , a

m,k

)

. The basis vectors a

form the columns of the mixing matrix A = (a

, · · · , a

) and the generative

formula can be written as x = As, where s = (s

, · · · , s

)

. Given the model

and realizations (samples) x

, · · · , x

of the random vector x, the task is

to estimate both the mixing matrix A and the sources s. This is done by

adaptively calculating the vectors

and setting up a cost function which

either maximizes the non-gaussianity of the calculated s

= (

× x) or

minimizes the mutual information. In some cases, a prior knowledge of the

probability distributions of the sources can be used in the cost function. The

original sources s can be recovered by multiplying the observed signals x

with the inverse of the mixing matrix W = A

−1

, also known as the unmixing

matrix. Here it is assumed that the mixing matrix is square (n = m). If the

number of basis vectors is greater than the dimensionality of the observed

vectors, n > m, the task is overcomplete but is still solvable with the pseudo

inverse. In Linear noisy ICA model, with the added assumption of zeromean

and uncorrelated Gaussian noise n ~ N(0, diag(¹)), the ICA model takes the

form x = As + n. And in Non-linear ICA model, the mixing of the sources

does not need to be linear. Using a nonlinear mixing function f(·|θ) with

parameters θ, non-linear ICA model is x = f(s|θ) + n.

2.5.3 Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) is a group of algorithms in

multivariate analysis and linear algebra where a matrix, X, is factorized

into (usually) two matrices, W and H: nmf(X)  WH.

Factorization of matrices is generally non-unique, and a number

of different methods of doing so have been developed (e.g., principal

component analysis and singular value decomposition) by incorporating

different constraints; non-negative matrix factorization differs from these

methods in that it enforces the constraint that the factors W and H must be

non-negative, i.e., all elements must be equal to or greater than zero.

Let matrix V be the product of the matrices W and H such that:

WH = V

Matrix multiplication can be implemented as linear combinations of

column vectors in W with coefﬁ cients supplied by cell values in H. Each

column in V can be computed as follows:

ji j

∑

Mathematical Foundations 41

42 Applied Data Mining

where N is the number of columns in W, v

is the ith column vector of the

product matrix V, H

is the cell value in the jth row and ith column of the

matrix H, w

is the jth column of the matrix W. When multiplying matrices

the factor matrices can be of signiﬁ cantly lower rank than the product matrix

and it is this property that forms the basis of NMF. If we can factorize a

matrix into factors of signiﬁ cantly lower rank than the original matrix, then

the column vectors of the ﬁ rst factor matrix can be considered as spanning

vectors of the vector space deﬁ ned by the original matrix.

Here is an example based on a text-mining application:

• Let the input matrix (the matrix to be factored) be V with 10000 rows

and 500 columns where words are in rows and documents are in

columns. In other words, we have 500 documents indexed by 10000

words. It follows that a column vector v in V represents a document.

• Assume we ask the algorithm to ﬁ nd 10 features in order to generate a

features matrix W with 10000 rows and 10 columns and a coefﬁ cients

matrix H with 10 rows and 500 columns.

• The product of W and H is a matrix with 10000 rows and 500 columns,

the same shape as the input matrix V and, if the factorization worked,

also a reasonable approximation to the input matrix V.

• From the treatment of matrix multiplication above it follows that each

column in the product matrix WH is a linear combination of the 10

column vectors in the features matrix W with coefﬁ cients supplied by

the coefﬁ cients matrix H.

This last point is the basis of NMF because we can consider each original

document in our example as being built from a small set of hidden features.

NMF generates these features.

It is useful to think of each feature (column vector) in the features matrix

W as a document archetype comprising a set of words where each word’s

cell value deﬁ nes the word’s rank in the feature: The higher a word’s cell

value the higher the word’s rank in the feature. A column in the coefﬁ cients

matrix H represents an original document with a cell value deﬁ ning the

document’s rank for a feature. This follows because each row in H represents

a feature. We can now reconstruct a document (column vector) from our

input matrix by a linear combination of our features (column vectors in

W) where each feature is weighted by the feature’s cell value from the

document’s column in H.

2.5.4 Singular Value Decomposition

A number of data sets are naturally described in matrix form. Examples

range from microarrays to collaborative ﬁ ltering data, to the set of pairwise

distances of a cloud of points. In many of these examples, singular value

decomposition (SVD) provides an efﬁ cient way to construct a low-rank

approximation thus achieving both dimensionality reduction, and effective

denoizing. SVD is also an important tool in the design of approximate linear

algebra algorithms for massive data sets.

In linear algebra, the singular value decomposition (SVD) is a

factorization of a real or complex matrix, with many useful applications in

signal processing and statistics. Formally, the singular value decomposition

of an mn real or complex matrix M is a factorization of the form

M = U¹V*,

where U is an m × m real or complex unitary matrix, is an m × n rectangular

diagonal matrix with non-negative real numbers on the diagonal, and V*

(the conjugate transpose of V) is an n × n real or complex unitary matrix.

The diagonal entries ¹

i,i

of ¹ are known as the singular values of M. The

m columns of U and the n columns of V are called the left-singular vectors

and right-singular vectors of M, respectively.

The singular value decomposition and the eigendecomposition are

closely related. Namely:

• The left-singular vectors of M are eigenvectors of MM*

• The right-singular vectors of M are eigenvectors of MM*

• The non-zero-singular values of M (found on the diagonal entries of

¹) are the square roots of the non-zero eigenvalues of both M*M and

MM*.

Applications which employ the SVD include computing the pseudo

inverse, least squares ﬁ tting of data, matrix approximation, and determining

the rank, range and null space of a matrix.

2.6 Chapter Summary

In this chapter, we have systematically presented the mathematical

foundations used in this book. We start with data organization and

distribution followed by the intensive discussion on distance and similarity

measures. This chapter also covers the important issue of dimensionality

reduction approaches that is commonly used in vector space models. The

aim of this chapter is to lay down a solid foundation for readers to better

understand the techniques and algorithms mentioned in later chapters.

Mathematical Foundations 43

44 Applied Data Mining

References

[1] Chandrika Kamath. Scientiﬁ c Data Mining: A Practical Perspective, Lawrence Livermore

National Laboratory, Livermore, California, 2009.

[2] S. Brin and L. Page. Anatomy of a Large-scale Hypertextual Web Search engine, Proc. 7th

Intl. World-Wide-Web Conference, pp. 107C117, 1998.

[3] Norman Lloyd Johnson, Samuel Kotz, N. Balakrishnan. Continuous Univariate

Distributions, John Wiley and Sons, 2005.

[4] N. Samuel Kotz, Balakrishnan and Norman L. Johnson. Continuous Multivariate

Distributions, Models and Applications, John Wiley and Sons, 2000.

[5] Anand Rajaraman and Jeffrey D. Ullman. Mining of Massive Datasets, Palo Alto, CA, pp.

74C79, 2011.

[6] J. C. Christopher, Burges. Dimension Reduction, Now Publishers Inc., 2010.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Mathematical Foundations (5/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 2: Mathematical Foundations (5/5)