Index
A
- abstractive summarization
- accuracy
- adjacency list
- adjacency list format
- adjacency matrix
- Anaconda Python distribution
- annotated corpus
- anomaly
- antecedent
- Apache
- Apache board meeting minutes
- apache_twitter
- Apriori
- aspects
- association rules
- atomic
- attribute-based similarity matching
- attributes
- attributes matching, methods
- attributes of edges
- attributes of nodes
- automatic text summarization
- automatic text summarization techniques
B
- bag of words
- bag of words (bow)
- betweenness centrality
- big data
- blank data values
- blocking methods
- bonus words
- boundary errors
- box-and-whisker plot
- boxplot
- Brown Corpus
C
- CamelCase
- change detection problems
- check constraint
- classification problems
- closed path
- closeness centrality
- clustering-based outlier
- clustering problems
- code, entity matching project
- code and text files, NER project
- coding
- components
- compound score
- confidence,association rules
- consequent
- context-based similarity matching
- corpus
- CREATE and INSERT statements
- CRISP-DM process
- cues
D
- data
- data, exploring
- data, importing into graph structure
- data, social network
- data anomalies
- data append
- data errors
- data file
- datafiles
- data mining
- data quality
- data science
- dataset, entity matching project
- datasources table
- data type errors
- degree
- degree centrality
- dependency modeling problems
- details
- developer channel, Ubuntu
- deviation detection problems
- diameter
- directed network
- direction
- disjoint sets
- distance
- Django IRC chat
- doc2bow()
- document level
- domain
- domain knowledge
- doubletons
E
- edge list
- edge list format
- edges
- entity
- entity matching
- entity matching project
- entity matching techniques
- errors
- explicit
- extractive method
F
- Facebook Research blog
- Fayyad et al. KDD process
- feature engineering
- flaccid designator
- fliers
- FLOSSmole
- FLOSSmole.org
- FLOSSmole data
- FLOSSmole project
- frequent itemsets
G
- gazetteer
- GDF format
- general-purpose data collections
- generalizable
- general user channel, Ubuntu
- Gensim
- Gensim approach
- Gensim changelog
- Gensim documentation
- Gensim LDA
- Gensim LDA model
- Gensim LDA objects
- Gensim LDA passes
- Gensim LDA topics
- GEXF format
- glosses
- gnueIRCsummary.txt
- GnuIRC summaries
- graph
- graph data
- graph data, representing
- graph data format (GDF)
- GraphML format
- graph trail
- graph walk
- Grubbs' test
- gzipped
H
- Hamming distance
- Han et al. KDD process
- hapax
- horizontal merge
- hot deck imputation
I
- 2-itemsets
- 3-itemsets
- implicit
- impute
- in-degree
- InterCaps
- interestingness measures for association rules
- isolates
J
- JavaScript Serialized Object Notation (JSON)
- JSON link series
- JSON node series
- JSON trees
K
- knowledge discovery in databases (KDD)
- knowledge discovery process
L
- Last Observation Carried Forward (LOCF)
- Latent Dirichlet Allocation (LDA)
- Latent Semantic Analysis (LSA)
- Levenshtein distance
- lexicon
- link analysis problems
- links
- linusrants
- Linux Kernel Mailing List (LKML)
- LKML e-mails
- lkmlLinusAll.txt
- logic errors
M
- machine learning
- machine learning based entity matching
- manually, fixing
- market basket analysis
- Matrix Market (MM) format
- maximum normalized residual test
- Message Understanding Conference (MUC)
- minimum support threshold
- missing data
- missing data, fixing
- modified z-score
- modified z-scores
- multi-document
- multiple components
- multivariate data sets
- MySQL
N
- named entity recognition (NER)
- named entity recognition (NER) project
- named entity recognition (NER) systems
- named entity recognition (NER) tool
- natural language processing (NLP)
- Natural Language Toolkit (NLTK)
- negation words
- network
- network, measuring
- NetworkX
- NetworkX file formats
- neutral word
- NLTK
- NLTK documentation page
- nodes
- novelty
- nullable
- null data values
- null words
O
- objectivity score
- opinion mining
- opinion shifters
- opinion words
- out-degree
- outlier
- outlier detection
- outliers
- outliers, statistical detection
- overfitting
P
- Pajek format
- partial matches
- part of speech (POS)
- part of speech, abbreviations
- parts of speech
- path
- pendant nodes
- Penn, noun abbreviations
- Penn Treebank tagger
- position of word
- POS tagger
- precision
- profile
- Python pickle
Q
- question answering (QA) systems
R
- real-world project, network
- recall
- regression problems
- relational database management systems (RDBMS)
- results, entity matching project
- rf_developer_projects table
- rigid designator
- Rmagick on RubyForge
- Rmagick on RubyGems
- RubyForge
- Ruby on Rails
S
- Scikit-learn tutorial
- semantic errors
- sentiment analysis
- sentiment analysis, basics
- sentiment intensity
- sentiment mining application
- sentiment score
- sentiment words
- SentiWordNet
- sequence analysis problems
- set notation
- significant words
- simpleTextSummaryNLTK.py
- single-document
- Six Steps process
- software project tags
- Soundex
- source lines of code (SLOC)
- specificity
- stigma words
- stopwords
- string edit distance
- subgraphs
- subjectivity classification
- summarization problems
- SUMMRY
- Sumy
- Sumy's Edmundson summarizer
- Sumy's LSA summarizer
- Sumy's Luhn summarizer
- Sumy's TextRank summarizer
- sustainable
T
- target
- target data
- terms
- text samples
- text summarization
- text summarization, methods
- topic modeling
- training examples
- tree structure
- tripletons
- true positives (TP)
- type errors
U
- Ubuntu
- undirected network
- univariate data sets
- unsupervised
- upward closure property
V
- Vader sentiment
- Vapor on RubyForge
- Vapor on RubyGems
- vertical merge
- vertices
- visual mining
W
Z
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.