Chapter 1. Understanding collective intelligence
Figure 1.3. Four pillars for user-centric applications
Figure 1.4. An example of a user-centric application—LinkedIn (www.linkedin.com)
Figure 1.5. Classifying user-generated information
Figure 1.6. This tag cloud from del.icio.us shows popular tags at the site.
Figure 1.7. Screen shot from Digg.com showing news items with the number of diggs for each
Figure 1.8. Screenshot from Yahoo! Music recommending songs of interest
Chapter 2. Learning from user interactions
Figure 2.1. Synchronous and asynchronous learning services
Figure 2.2. Architecture for embedding and deriving intelligence in an event-driven system
Figure 2.3. Architecture for embedding intelligence in a non-event-driven system
Figure 2.4. A user interacts with items, which have associated metadata.
Figure 2.5. The three sources for generating metadata about an item
Figure 2.6. Attribute hierarchy of a user profile
Figure 2.7. Term vector representation of text
Figure 2.8. Typical steps involved in analyzing text
Figure 2.9. Two dimensional vectors, v1 and v2
Figure 2.10. Screenshot from YouTube showing related videos for a video
Figure 2.11. Persistence of ratings in a table that stores each user’s ratings in a separate table
Figure 2.14. Saving an item to a list (NY Times.com)
Figure 2.15. Composite pattern for organizing bookmarks together
Figure 2.16. A normal distribution with a mean of 0 and standard deviation of 1
Figure 2.18. The association between a reviewer, an item, and the review of an item
Chapter 3. Extracting intelligence from tags
Figure 3.1. Three ways to generate tags
Figure 3.2. Screenshot of how a user creates a tag at del.icio.us
Figure 3.3. Amazon allows users to tag a product and see how others have tagged the same product.
Figure 3.4. Tag cloud from squidoo.com
Figure 3.5. Tag cloud of all-time most popular tags at Flickr
Figure 3.6. Combining term vectors from a number of documents to form a tag cloud
Figure 3.8. The tags and tagging_source database tables
Figure 3.9. The MySQLicious schema with sample data
Figure 3.10. Scuttle representation with sample data
Figure 3.11. The normalized Toxi solution with sample data
Figure 3.12. The recommended persistence schema designed for scalability and performance
Figure 3.13. Nesting queries to get the set of tags used
Figure 3.14. Table to store the metadata associated with an item via tags
Figure 3.15. The addition of summary and days tables
Figure 3.16. Class design for implementing a tag cloud
Figure 3.17. The class diagram for FontSizeComputationStrategy
Figure 3.18. Using the Decorator pattern to generate HTML to represent the tag cloud
Chapter 4. Extracting intelligence from content
Figure 4.1. Architecture for integrating internally hosted separate instances server
Figure 4.2. Class model for representing a blog for a user
Figure 4.3. Persistence schema for blogs
Figure 4.4. Relationship between a page, a category, and a user in a wiki
Figure 4.5. Persistence model for a wiki
Figure 4.6. Modeling a message board or a group
Figure 4.7. The schema for the elements of a message board
Figure 4.8. Typical steps involved in analyzing text
Figure 4.9. The hierarchy of analyzers used to create metadata from text
Figure 4.10. The tag cloud for the title consists of four terms.
Figure 4.11. The tag cloud for the body of the text
Figure 4.12. The resulting tag cloud obtained by combining the title and the body
Figure 4.13. The tag cloud after removing the stop words
Figure 4.14. The tag cloud after normalizing the terms
Figure 4.15. Tag cloud for the title after using the bi-term analyzer
Figure 4.16. Tag cloud for the blog after using a bi-term analyzer
Chapter 5. Searching the blogosphere
Figure 5.1. Four steps in searching the blogosphere
Figure 5.2. The generic architecture for the blog searcher
Figure 5.3. The BlogQueryResult object
Figure 5.4. BlogSearchResponseHandler and XMLToken
Figure 5.5. Two implementations for BlogQueryResult
Figure 5.6. Base implementation for BlogSearcher
Figure 5.7. The base class for SAX parsing handlers
Chapter 6. Intelligent web crawling
Figure 6.1. The basic process of web crawling
Figure 6.2. Submitting your site’s sitemap using Google Webmaster tools
Figure 6.3. Number of relevant URLs retrieved as a function of number of URLs visited
Figure 6.4. The Cygwin window after the crawl command
Figure 6.5. The directory structure after the crawl
Figure 6.6. The stats associated with the crawldb
Figure 6.7. The search screen for the Nutch application
Figure 6.8. Searching for collective intelligence using the Nutch search application
Chapter 7. Data mining: process, toolkits, and standards
Figure 7.1. A predictive model makes a prediction based on the values for the input attributes.
Figure 7.3. An example decision tree showing two attributes
Figure 7.4. A multi-layer perceptron where the input from one layer feeds into the next layer
Figure 7.5. The directory structure and some of the files for WEKA
Figure 7.6. WEKA documentation that’s available in the install
Figure 7.7. WEKA GUI with options to start one of four applications
Figure 7.9. Converting a continuous variable into a discrete variable using filters in WEKA
Figure 7.10. A dataset in WEKA is represented by instances.
Figure 7.11. Classifer uses instances to build the model and classifies an instance.
Figure 7.12. Classifer uses instances to build the model and classifies an instance.
Figure 7.14. Association-learning algorithms available in WEKA
Figure 7.16. Key JDM interfaces to describe the physical and logical aspects of the data
Figure 7.17. The model representation in JDM
Figure 7.18. The settings associated with the different kinds of algorithms
Figure 7.19. The interfaces associated with the various tasks supported by JDM
Figure 7.20. The interfaces associated with creating a Connection to the data-mining service
Chapter 8. Building a text analysis toolkit
Figure 8.1. Typical steps involved in analyzing text
Figure 8.2. Example of how the tools developed in this chapter can be leveraged in your application
Figure 8.3. Key classes in the Lucene analysis package
Figure 8.4. Some of the concrete implementations for Tokenizer and TokenFilter
Figure 8.5. The Analyzer class with some of its concrete implementations
Figure 8.6. The Analyzer class with some of its concrete implementations
Figure 8.7. The implementations for the PhrasesCache and SynonymsCache
Figure 8.8. The infrastructure for text analysis
Figure 8.9. Tag infrastructure–related classes
Figure 8.10. Term vector–related infrastructure
Figure 8.11. The TextAnalyzer and the InverseDocFreqEstimator
Figure 8.12. The tag cloud for the title, consisting of five tags
Figure 8.13. The tag cloud for the body, consisting of 15 tags
Figure 8.14. The tag cloud for the combined title and body, consisting of 15 tags
Figure 8.15. An example of automatically detecting relevant terms by analyzing text
Chapter 9. Discovering patterns with clustering
Figure 9.1. The various steps in our example of clustering blog entries
Figure 9.2. The interfaces associated with clustering text
Figure 9.3. The classes for implementing the hierarchical agglomerative clustering algorithm
Figure 9.4. The classes for implementing the hierarchical agglomerative clustering algorithm
Figure 9.5. A ClusteringModel consists of a set of clusters obtained by analyzing the data.
Chapter 10. Making predictions
Figure 10.1. The first node in our decision tree
Figure 10.2. The second split in our decision tree
Figure 10.3. The final decision tree for our example
Figure 10.5. Belief network representation for our example
Figure 10.6. The simplified belief network when only A is known
Figure 10.7. The classes that we develop in this chapter
Figure 10.9. A typical radial basis function
Figure 10.10. The model interfaces corresponding to supervised learning
Figure 10.11. Setting interfaces related to supervised learning
Figure 10.12. Algorithm-specific settings related to supervised learning algorithms
Chapter 11. Intelligent search
Figure 11.1. The entities involved with adding search to your application
Figure 11.2. The key Lucene classes for creating and searching an index
Figure 11.3. Non-compound and compound index files
Figure 11.5. Multiple search instances sharing the same index
Figure 11.6. The default implementation for the Similarity class
Figure 11.7. Query classes available in Lucene
Figure 11.8. Filters available in Lucene
Figure 11.9. HitCollector-related classes
Figure 11.10. Screenshot of Luke in the Documents tab
Figure 11.11. Screenshot of the Solr admin page
Figure 11.12. Screenshot of the home page for collective intelligence at Kosmix
Figure 11.13. Clustering search results using Carrot2 clustering
Chapter 12. Building a recommendation engine
Figure 12.1. An example of the output of a recommendation engine at Amazon.com
Figure 12.2. The inputs and outputs of a recommendation engine
Figure 12.3. Item-based analysis: similar items are recommended
Figure 12.4. User-based analysis: items liked by similar users are recommended
Figure 12.5. WEKA classes related to instance-based learning and nearest-neighbor search
Figure 12.6. Illustration of the dimensionality reduction
Figure 12.7. Screenshot of recommendations to a user at Amazon.com
Figure 12.9. Recommendations based on browsing history
Figure 12.10. Google News with recommended stories using the user’s web history
Figure 12.11. Personalized news stories for a logged-in user
Figure 12.12. Movies related to a movie being recommended at Netflix
Figure 12.13. Home page for a user at Netflix showing the user’s recommended movies
3.143.203.96