Contents

List of Figures

List of Tables

Preface

Acknowledgments

1 Introduction

1.1 Overview of this Book

1.2 Text Mining and Related Fields

1.3 Advice for Reading this Book

2 Text Patterns

2.1 Introduction

2.2 Regular Expressions

2.3 Finding Words in a Text

2.4 Decomposing Poe’s “The Tell-Tale Heart” into Words

2.5 A Simple Concordance

2.6 First Attempt at Extracting Sentences

2.7 Regex Odds and Ends

2.8 References

Problems

3 Quantitative Text Summaries

3.1 Introduction

3.2 Scalars, Interpolation, and Context in Perl

3.3 Arrays and Context in Perl

3.4 Word Lengths in Poe’s “The Tell-Tale Heart”

3.5 Arrays and Functions

3.6 Hashes

3.7 Two Text Applications

3.8 Complex Data Structures

3.9 References

3.10 First Transition

Problems

4 Probability and Text Sampling

4.1 Introduction

4.2 Probability

4.3 Conditional Probability

4.4 Mean and Variance of Random Variables

4.5 The Bag-of-Words Model for Poe’s “The Black Cat”

4.6 The Effect of Sample Size

4.7 References

Problems

5 Applying Information Retrieval to Text Mining

5.1 Introduction

5.2 Counting Letters and Words

5.3 Text Counts and Vectors

5.4 The Term-Document Matrix Applied to Poe

5.5 Matrix Multiplication

5.6 Functions of Counts

5.7 Document Similarity

5.8 References

Problems

6 Concordance Lines and Corpus Linguistics

6.1 Introduction

6.2 Sampling

6.3 Corpus as Baseline

6.4 Concordancing

6.5 Collocations and Concordance Lines

6.6 Applications with References

6.7 Second Transition

Problems

7 Multivariate Techniques with Text

7.1 Introduction

7.2 Basic Statistics

7.3 Basic linear algebra

7.4 Principal Components Analysis

7.5 Text Applications

7.6 Applications and References

Problems

8 Text Clustering

8.1 Introduction

8.2 Clustering

8.3 A Note on Classification

8.4 References

8.5 Last Transition

Problems

9 A Sample of Additional Topics

9.1 Introduction

9.2 Perl Modules

9.3 Other Languages: Analyzing Goethe in German

9.4 Permutation Tests

9.5 References

Appendix A: Overview of Perl for Text Mining

A.1 Basic Data Structures

A.2 Operators

A.3 Branching and Looping

A.4 A Few Pen Functions

A.5 Introduction to Regular Expressions

Appendix B: Summary of R used in this Book

B.1 Basics of R

B.2 This Book’s R Code

References

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.117.191