0%

Book Description

Discover how to use Neo4j to identify relationships within complex and large graph datasets using graph modeling, graph algorithms, and machine learning

Key Features

  • Get up and running with graph analytics with the help of real-world examples
  • Explore various use cases such as fraud detection, graph-based search, and recommendation systems
  • Get to grips with the Graph Data Science library with the help of examples, and use Neo4j in the cloud for effective application scaling

Book Description

Neo4j is a graph database that includes plugins to run complex graph algorithms.

The book starts with an introduction to the basics of graph analytics, the Cypher query language, and graph architecture components, and helps you to understand why enterprises have started to adopt graph analytics within their organizations. You'll find out how to implement Neo4j algorithms and techniques and explore various graph analytics methods to reveal complex relationships in your data. You'll be able to implement graph analytics catering to different domains such as fraud detection, graph-based search, recommendation systems, social networking, and data management. You'll also learn how to store data in graph databases and extract valuable insights from it. As you become well-versed with the techniques, you'll discover graph machine learning in order to address simple to complex challenges using Neo4j. You will also understand how to use graph data in a machine learning model in order to make predictions based on your data. Finally, you'll get to grips with structuring a web application for production using Neo4j.

By the end of this book, you'll not only be able to harness the power of graphs to handle a broad range of problem areas, but you'll also have learned how to use Neo4j efficiently to identify complex relationships in your data.

What you will learn

  • Become well-versed with Neo4j graph database building blocks, nodes, and relationships
  • Discover how to create, update, and delete nodes and relationships using Cypher querying
  • Use graphs to improve web search and recommendations
  • Understand graph algorithms such as pathfinding, spatial search, centrality, and community detection
  • Find out different steps to integrate graphs in a normal machine learning pipeline
  • Formulate a link prediction problem in the context of machine learning
  • Implement graph embedding algorithms such as DeepWalk, and use them in Neo4j graphs

Who this book is for

This book is for data analysts, business analysts, graph analysts, and database developers looking to store and process graph data to reveal key data insights. This book will also appeal to data scientists who want to build intelligent graph applications catering to different domains. Some experience with Neo4j is required.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Graph Analytics with Neo4j
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  6. Section 1: Graph Modeling with Neo4j
  7. Graph Databases
    1. Graph definition and examples
    2. Graph theory
    3. A bit of history: the Seven Bridges of Königsberg problem
    4. Graph definition
    5. Visualization
    6. Examples of graphs
    7. Networks
    8. Road networks
    9. Computer networks
    10. Social networks
    11. Your data is also a graph
    12. Moving from SQL to graph databases
    13. Database models
    14. SQL and joins
    15. It's all about relationships
    16. Neo4j – the nodes, relationships, and properties model
    17. Building blocks
    18. Nodes
    19. Relationships
    20. Properties
    21. SQL to Neo4j translator
    22. Neo4j use cases
    23. Understanding graph properties
    24. Directed versus undirected
    25. Weighted versus unweighted
    26. Cyclic versus acyclic
    27. Dense versus sparse
    28. Graph traversal
    29. Connected versus disconnected
    30. Considerations for graph modeling in Neo4j
    31. Relationship orientation
    32. Node or property?
    33. Summary
    34. Further reading
  8. The Cypher Query Language
    1. Technical requirements
    2. Creating nodes and relationships
    3. Managing databases with Neo4j Desktop
    4. Creating a node
    5. Selecting nodes
    6. Filtering
    7. Returning properties
    8. Creating a relationship
    9. Selecting relationships
    10. The MERGE keyword
    11. Updating and deleting nodes and relationships
    12. Updating objects
    13. Updating an existing property or creating a new one
    14. Updating all properties of the node
    15. Updating node labels
    16. Deleting a node property
    17. Deleting objects
    18. Pattern matching and data retrieval
    19. Pattern matching
    20. Test data
    21. Graph traversal
    22. Orientation
    23. The number of hops
    24. Variable-length patterns
    25. Optional matches
    26. Using aggregation functions
    27. Count, sum, and average
    28. Creating a list of objects
    29. Unnesting objects
    30. Importing data from CSV or JSON
    31. Data import from Cypher
    32. File location
    33. Local file: the import folder
    34. Changing the default configuration to import a file from another directory
    35. CSV files
    36. CSV files without headers
    37. CSV files with headers
    38. Eager operations
    39. Data import from the command line
    40. APOC utilities for imports
    41. CSV files
    42. JSON files
    43. Importing data from a web API
    44. Setting parameters
    45. Calling the GitHub web API
    46. Summary of import methods
    47. Measuring performance and tuning your query for speed
    48. Cypher query planner
    49. Neo4j indexing
    50. Back to LOAD CSV
    51. The friend-of-friend example
    52. Summary
    53. Questions
    54. Further reading
  9. Empowering Your Business with Pure Cypher
    1. Technical requirements
    2. Knowledge graphs
    3. Attempting a definition of knowledge graphs
    4. Building a knowledge graph from structured data
    5. Building a knowledge graph from unstructured data using NLP
    6. NLP
    7. Neo4j tools for NLP
    8. GraphAware NLP library
    9. Importing test data from the GitHub API
    10. Enriching the graph with NLP
    11. Adding context to a knowledge graph from Wikidata
    12. Introducing RDF and SPARQL
    13. Querying Wikidata
    14. Importing Wikidata into Neo4j
    15. Enhancing a knowledge graph from semantic graphs
    16. Graph-based search
    17. Search methods
    18. Manually building Cypher queries
    19. Automating the English to Cypher translation
    20. Using NLP
    21. Using translation-like models
    22. Recommendation engine
    23. Product similarity recommendations
    24. Products in the same category
    25. Products frequently bought together
    26. Recommendation ordering
    27. Social recommendations
    28. Products bought by a friend of mine
    29. Summary
    30. Questions
    31. Further reading
  10. Section 2: Graph Algorithms
  11. The Graph Data Science Library and Path Finding
    1. Technical requirements
    2. Introducing the Graph Data Science plugin
    3. Extending Neo4j with custom functions and procedures
    4. The difference between procedures and functions
    5. Functions
    6. Procedures
    7. Writing a custom function in Neo4j
    8. GDS library content
    9. Defining the projected graph
    10. Native projections
    11. Cypher projections
    12. Streaming or writing results back to the graph
    13. Understanding the importance of shortest path algorithms through their applications
    14. Routing within a network
    15. GPS
    16. The shortest path within a social network
    17. Other applications
    18. Video games
    19. Science
    20. Dijkstra's shortest paths algorithm
    21. Understanding the algorithm
    22. Running Dijkstra's algorithm on a simple graph
    23. Example implementation
    24. Graph representation
    25. Algorithm
    26. Displaying the full path from A to E
    27. Using the shortest path algorithm within Neo4j
    28. Path visualization
    29. Understanding relationship direction
    30. Finding the shortest path with the A* algorithm and its heuristics
    31. Algorithm principles
    32. Defining the heuristics for A*
    33. Using A* within the Neo4j GDS plugin
    34. Discovering the other path-related algorithms in the GDS plugin
    35. K-shortest path
    36. Single Source Shortest Path (SSSP)
    37. All-pairs shortest path
    38. Optimizing processes using graphs
    39. The traveling-salesman problem
    40. Spanning trees
    41. Prim's algorithm
    42. Finding the minimum spanning tree in a Neo4j graph
    43. Summary
    44. Questions
    45. Further reading
  12. Spatial Data
    1. Technical requirements
    2. Representing spatial attributes
    3. Understanding geographic coordinate systems
    4. Using the Neo4j built-in spatial types
    5. Creating points
    6. Querying by distance
    7. Creating a geometry layer in Neo4j with neo4j-spatial
    8. Introducing the neo4j-spatial library
    9. A note on spatial indexes
    10. Creating a spatial layer of points
    11. Defining the spatial layer
    12. Adding points to a spatial layer
    13. Defining the type of spatial data
    14. Creating layers with polygon geometries
    15. Getting the data
    16. Creating the layer
    17. Performing spatial queries
    18. Finding the distance between two spatial objects
    19. Finding objects contained within other objects
    20. Finding the shortest path based on distance
    21. Importing the data
    22. Preparing the data
    23. Importing data
    24. Creating a spatial layer
    25. Running the shortest path algorithm
    26. Visualizing spatial data with Neo4j
    27. neomap – a Neo4j Desktop application for spatial data
    28. Visualizing nodes with simple layers
    29. Visualizing paths with advanced layer
    30. Using the JavaScript Neo4j driver to visualize shortest paths
    31. Neo4j JS driver
    32. Leaflet and GeoJSON
    33. Summary
    34. Questions
    35. Further reading
  13. Node Importance
    1. Technical requirements
    2. Defining importance
    3. Popularity and information spread
    4. Critical or bridging nodes
    5. Computing degree centrality
    6. Formula
    7. Computing degree centrality in Neo4j
    8. Computing the outgoing degree using GDS
    9. Computing the incoming degree using GDS
    10. Using a named projected graph
    11. Using an anonymous projected graph
    12. Understanding the PageRank algorithm
    13. Building the formula
    14. The damping factor
    15. Normalization
    16. Running the algorithm on an example graph
    17. Implementing the PageRank algorithm using Python
    18. Using GDS to assess PageRank centrality in Neo4j
    19. Comparing degree centrality and the PageRank results
    20. Variants
    21. ArticleRank
    22. Personalized PageRank
    23. Eigenvector centrality
    24. The adjacency matrix
    25. PageRank with matrix notation
    26. Eigenvector centrality
    27. Computing eigenvector centrality in GDS
    28. Path-based centrality metrics
    29. Closeness centrality
    30. Normalization
    31. Computing closeness from the shortest path algorithms
    32. The closeness centrality algorithm
    33. Closeness centrality in multiple-component graphs
    34. Betweenness centrality
    35. Comparing centrality metrics
    36. Applying centrality to fraud detection
    37. Detecting fraud using Neo4j
    38. Using centrality to assess fraud
    39. Creating a projected graph with Cypher projection
    40. Other applications of centrality algorithms
    41. Summary
    42. Exercises
    43. Further reading
  14. Community Detection and Similarity Measures
    1. Technical requirements
    2. Introducing community detection and its applications
    3. Identifying clusters of nodes
    4. Applications of the community detection method
    5. Recommendation engines and targeted marketing
    6. Clusters of products
    7. Clusters of users
    8. Fraud detection
    9. Predicting properties or links
    10. A brief overview of community detection techniques
    11. Detecting graph components and visualizing communities
    12. Weakly connected components
    13. Strongly connected components
    14. Writing the GDS results in the graph
    15. Visualizing a graph with neovis.js
    16. Using NEuler, the Graph Data Science Playground
    17. Usage for community detection visualization
    18. Running the Label Propagation algorithm
    19. Defining Label Propagation
    20. Weighted nodes and relationships
    21. Semi-supervised learning
    22. Implementing Label Propagation in Python
    23. Using the Label Propagation algorithm from the GDS
    24. Using seeds
    25. Writing results to the graph
    26. Understanding the Louvain algorithm
    27. Defining modularity
    28. All nodes are in their own community
    29. All nodes are in the same community
    30. Optimal partition
    31. Steps to reproduce the Louvain algorithm
    32. The Louvain algorithm in the GDS
    33. Syntax
    34. The aggregation method in relationship projection
    35. Intermediate steps
    36. A comparison between Label Propagation and Louvain on the Zachary's karate club graph
    37. Going beyond Louvain for overlapping community detection
    38. A caveat of the Louvain algorithm
    39. Resolution limit
    40. Alternatives to Louvain
    41. Overlapping community detection
    42. Dynamic networks
    43. Measuring the similarity between nodes
    44. Set-based similarities
    45. Overlapping
    46. Definition
    47. Quantifying user similarity in the GitHub graph
    48. Jaccard similarity
    49. Vector-based similarities
    50. Euclidean distance
    51. Cosine similarity
    52. Summary
    53. Questions
    54. Further reading
  15. Section 3: Machine Learning on Graphs
  16. Using Graph-based Features in Machine Learning
    1. Technical requirements
    2. Building a data science project
    3. Problem definition – asking the right question
    4. Supervised versus unsupervised learning
    5. Regression versus classification
    6. Introducing the problem for this chapter
    7. Getting and cleaning data
    8. Data characterization
    9. Quantifying the dataset size
    10. Labels
    11. Columns
    12. Data visualization
    13. Data cleaning
    14. Outliers detection
    15. Missing data
    16. Correlation between variables
    17. Data enrichment
    18. Feature engineering
    19. Building the model
    20. Train/test split and cross-validation
    21. Creating the train and test samples with scikit-learn
    22. Training a model
    23. Evaluating model performances
    24. The steps toward graph machine learning
    25. Building a (knowledge) graph
    26. Creating relationships from existing data
    27. Creating relationships from relational data
    28. Creating relationships from Neo4j
    29. Using an external data source
    30. Importing the data into Neo4j
    31. Graph characterization
    32. The number of nodes and edges
    33. The number of components
    34. Extracting graph-based features
    35. Using graph-based features with pandas and scikit-learn
    36. Extracting graph-based features from Neo4j Browser
    37. Creating the projected graph
    38. Running one or several algorithms
    39. Dropping the projected graph
    40. Extracting the data
    41. Automating graph-based feature creation with the Neo4j Python driver
    42. Discovering the Neo4j Python driver
    43. Basic usage
    44. Transactions
    45. Automating graph-based feature creation with Python
    46. Creating the projected graph
    47. Calling the GDS procedures
    48. Writing results back to the graph
    49. Dropping the projected graph
    50. Exporting the data from Neo4j to pandas
    51. Training a scikit-learn model
    52. Introducing community features
    53. Using both community and centrality features
    54. Summary
    55. Questions
    56. Further reading
  17. Predicting Relationships
    1. Technical requirements
    2. Why use link prediction?
    3. Dynamic graphs
    4. Applications
    5. Recovering missing data
    6. Fighting crime
    7. Research
    8. Making recommendations
    9. Social links (Facebook friends, LinkedIn contacts...)
    10. Product recommendations
    11. Making recommendations using a link prediction algorithm
    12. Creating link prediction metrics with Neo4j
    13. Community-based metrics
    14. Path-related metrics
    15. Distance between nodes
    16. The Katz index
    17. Using local neighborhood information
    18. Common neighbors
    19. Adamic-Adar
    20. Total neighbors
    21. Preferential attachment
    22. Other metrics
    23. Building a link prediction model using an ROC curve
    24. Importing the data into Neo4j
    25. Splitting the graph and computing the score for each edge
    26. Measuring binary classification model performance
    27. Understanding ROC curves
    28. Extracting features and labels
    29. Drawing the ROC curve
    30. Creating the DataFrame
    31. Plotting the ROC curve
    32. Determining the optimal cutoff and computing performances
    33. Building a more complex model using scikit-learn
    34. Saving link prediction results into Neo4j
    35. Predicting relationships in bipartite graphs
    36. Summary
    37. Questions
    38. Further reading
  18. Graph Embedding - from Graphs to Matrices
    1. Technical requirements
    2. Why do we need embedding?
    3. Why is embedding needed?
    4. One-hot encoding
    5. Creating features for words – the manual way
    6. Embedding specifications
    7. The graph embedding landscape
    8. Adjacency-based embedding
    9. The adjacency matrix and graph Laplacian
    10. Eigenvectors embedding
    11. Locally linear embedding
    12. Similarity-based embedding
    13. High-Order Proximity preserved Embedding (HOPE)
    14. Computing node embedding with Python
    15. Creating a networkx graph
    16. The Neo4j test graph
    17. Extracting the edge list data from Neo4j
    18. Creating a networkx graph matrix from pandas
    19. Fitting a node embedding algorithm
    20. Extracting embeddings from artificial neural networks
    21. Artificial neural networks in a nutshell
    22. A reminder about neural network principles
    23. Neurons, layers, and forward propagation
    24. Different types of neural networks
    25. Skip-graph model
    26. Fake task
    27. Input
    28. Word representation before embedding
    29. Target
    30. Hidden layer
    31. Output layer
    32. DeepWalk node embedding
    33. Generating node context through random walks
    34. Generating random walks from the GDS
    35. DeepWalk embedding with karateclub
    36. Node2vec, a DeepWalk alternative
    37. Node2vec from the GDS (≥ 1.3)
    38. Getting the embedding results from Python
    39. Graph neural networks 
    40. Extending the principles of CNNs and RNNs to build GNNs
    41. Message propagation and aggregation
    42. Taking into account node properties
    43. Applications of GNNs
    44. Image analysis
    45. Video analysis
    46. Zero-shot learning
    47. Text analysis
    48. And there's more...
    49. Using GNNs in practice
    50. GNNs from the GDS – GraphSAGE
    51. Going further with graph algorithms
    52. State-of-the-art graph algorithms
    53. Summary
    54. Questions
    55. Further reading
  19. Section 4: Neo4j for Production
  20. Using Neo4j in Your Web Application
    1. Technical requirements
    2. Creating a full-stack web application using Python and Graph Object Mappers
    3. Toying with neomodel
    4. Defining the properties of structured nodes
    5. StructuredNode versus SemiStructuredNode
    6. Adding properties
    7. Creating nodes
    8. Querying nodes
    9. Filtering nodes
    10. Integrating relationship knowledge
    11. Simple relationship
    12. Relationship with properties
    13. Building a web application backed by Neo4j using Flask and neomodel
    14. Creating toy data
    15. Login page
    16. Creating the Flask application
    17. Adapting the model
    18. The login form
    19. The login template
    20. The login view
    21. Reading data – listing owned repositories
    22. Altering the graph – adding a contribution
    23. Understanding GraphQL APIs by example – GitHub API v4
    24. Endpoints
    25. Returned attributes
    26. Query parameters
    27. Mutations
    28. Developing a React application using GRANDstack
    29. GRANDstack – GraphQL, React, Apollo, and Neo4j Database
    30. Creating the API
    31. Writing the GraphQL schema
    32. Defining types
    33. Starting the application
    34. Testing with the GraphQL playground
    35. Calling the API from Python
    36. Using variables
    37. Mutations
    38. Building the user interface
    39. Creating a simple component
    40. Getting data from the GraphQL API
    41. Writing a simple component
    42. Adding navigation
    43. Mutation
    44. Refreshing data after the mutation
    45. Summary
    46. Questions
    47. Further reading
  21. Neo4j at Scale
    1. Technical requirements
    2. Measuring GDS performance
    3. Estimating memory usage with the estimate procedures
    4. Estimating projected graph memory usage
    5. Fictive graph
    6. Graph defined by native or Cypher projection
    7. Estimating algorithm memory usage
    8. The stats running mode
    9. Measuring time performances for some of the algorithms
    10. Configuring Neo4j 4.0 for big data
    11. The landscape prior to Neo4j 4.0
    12. Memory settings
    13. Neo4j in the cloud
    14. Sharding with Neo4j 4.0
    15. Defining shards
    16. Creating the databases
    17. Querying a sharded graph
    18. The USE statement
    19. Querying all databases
    20. Summary
  22. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
18.216.88.54