0%

Book Description

With Early Release ebooks, you get books in their earliest form—the authors' raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles.



This book will enable you to apply graph thinking to solve complex problems. If you want to learn how to build architectures for extracting value for your domain’s complex problems, then this book is for you.

You’ll learn how to think about your data as a graph, and how to determine if graph technology is right for your application. The book describes techniques for scalable, real-time, and multimodel architectures that solve complex problems, and shows how companies are successfully applying graph thinking in distributed production environments.

Authors Denise Koessler Gosnell and Matthias Broecheler also introduce the Graph Schema Language, a set of terminology and visual illustrations to normalize how graph practitioners communicate conceptual graph models, graph schema, and graph database design.

Table of Contents

  1. 1. Graph Thinking
    1. Why Now? Putting Database Technologies in Context
      1. 1960s-1980s: Hierarchical Data
      2. 1980s-2000s: Entity-Relationship
      3. 2000s-2020s: NoSQL
      4. 2020s-?: Graph
    2. What is Graph Thinking?
      1. Complex Problems and Complex Systems
      2. Complex Problems in Business
    3. Making Technology Decisions to Solve Complex Problems
      1. So you have graph data. What’s next?
      2. Seeing the Bigger Picture
    4. Getting Started on Your Journey with Graph Thinking
  2. 2. Bridges and Tools for Learning Graph Thinking
    1. TL;DR Translating relational concepts to graph terminology
    2. Relational Versus Graph: What’s the Difference?
      1. Data for Our Running Example
    3. Relational Data Modeling
      1. Entities and Attributes
      2. Building up to an ERD
    4. Concepts in Graph Data
      1. Fundamental Elements of a Graph
      2. Adjacency
      3. Neighborhoods
      4. Distance
      5. Degree
    5. The Graph Schema Language
      1. Vertex Labels and Edge Labels
      2. Properties
      3. Edge Direction
      4. Self-Referencing Edge Labels
      5. Multiplicity of your Graph
      6. Full Example Graph Model
    6. Relational Versus Graph: Decisions to Consider
      1. Data Modeling
      2. Understanding Graph Data
      3. Mixing Database Design with Application Purpose
    7. Summary
  3. 3. Getting Started: A Simple Customer 360
    1. TL;DR Relational versus Graph
    2. The Foundational Use Case for Graph Data: C360
      1. Why Do Businesses Care about C360?
    3. Implementing an C360 Application in a Relational System
      1. Data Models
      2. Relational Implementation
      3. Example C360 Queries
    4. Implementing an C360 Application in a Graph System
      1. Data Models
      2. Graph Implementation
      3. Example C360 Queries
    5. Relational versus Graph: How to choose?
      1. Relational versus Graph: Data Modeling
      2. Relational versus Graph: Representing Relationships
      3. Relational versus Graph: Query Languages
      4. Relational versus Graph: Main Points
    6. Summary
      1. Why not relational?
      2. Making A Technology Choice for Your C360 Application
  4. 4. Exploring Neighborhoods in Development
    1. TL;DR: Building a More Realistic Customer 360
    2. Graph Data Modeling 101
      1. Should this be a vertex or an edge?
      2. Lost? Walk me through direction.
      3. A Graph has No Name: Common Mistakes in Naming
      4. Our Full Development Graph Model
      5. Before We Start Building
      6. Our Thoughts on the Importance of Data, Queries, and the End User
    3. Implementation Details for Exploring Neighborhoods in Development
      1. Generating More Data for our Expanded Example
    4. Basic Gremlin Navigation
    5. Advanced Gremlin: Shaping Your Query Results
      1. Shaping Query Results: project(), fold(), and unfold()
      2. Removing data from the results with the where(neq()) pattern
      3. Planning for robust result payloads with the coalesce() step
    6. Moving from Development and into Production
  5. 5. Exploring Neighborhoods in Production
    1. TL;DR: Understanding Distributed Graph Data in Apache Cassandra
    2. Working with Graph Data in Apache Cassandra
      1. The most important topic to understand about data modeling: Primary Keys
      2. Partition Keys and Data Locality in a Distributed Environment
      3. Understanding Edges Part 1: Edges in Adjacency Lists
      4. Understanding Edges Part 2: Clustering Columns
      5. Understanding Edges Part 3: Accessing Edges during Traversals
    3. Graph Data Modeling 201
      1. Finding Indexes with an Intelligent Index Recommendation System
    4. Production Implementation Details
      1. Materialized Views and Adding Time onto Edges
      2. Our Final C360 Production Schema
      3. Updating our Gremlin Queries to use time on edges
    5. Moving onto More Complex, Distributed Graph Problems
      1. Our First 10 Tips to get from Development to Production
  6. 6. Using Trees in Development
    1. TL;DR: Navigating Trees, Nested, and Hierarchical Data
    2. Seeing Hierarchies and Nested Data: Three Examples
      1. Hierarchical Data in a Bill of Materials
      2. Hierarchical Data in Version Control Systems
      3. Hierarchical Data in Self-Organizing Networks
      4. Why Graph Technology for Hierarchical Data?
    3. Finding your way through a forest of terminology
      1. Trees, Roots, and Leaves
      2. Depth in Walks, Paths, and Cycles
    4. Understanding Hierarchies with Our Sensor Data
      1. Understand the data
      2. Conceptual Model using the GSL Notation
      3. Implement Schema
      4. Before we build our queries
    5. Querying From Leaves to Roots in Development
      1. From this sensor, where has it sent information?
      2. From this sensor, what was its path to a tower?
      3. From bottom up to top down
    6. Querying From Roots to Leaves in Development
      1. Setup Query: Which tower has the most sensor connections so that we could explore it for our example?
      2. What sensors have connected to Georgetown, directly?
      3. Find all sensors that connected to Georgetown
      4. Depth limiting in recursion
    7. Going back in time
  7. 7. Using Trees in Production
    1. TL;DR: Understanding Branching Factor, Depth, and Cycles
    2. Understanding Time In the Sensor Data
      1. Final Thoughts on Time Series Data in Graphs
    3. Understanding Branching Factor in Our Example
      1. What is branching factor?
      2. How do we get around branching factor?
    4. Production Schema for our Sensor Data
    5. Querying From Leaves to Roots in Production
      1. From this sensor, where has it sent information and at what time?
      2. From this sensor, find all trees up to a tower by time.
      3. From this sensor, find a valid tree starting zero to a tower.
      4. Advanced Gremlin: Understanding the where().by() pattern
    6. Querying From Roots to Leaves in Production
      1. What sensors have connected to Georgetown, directly, by time?
      2. What valid paths can we find from Georgetown down to all sensors?
    7. Applying Your Queries to Tower Failure Scenarios
      1. Applying the Final Results of our Complex Problem
    8. Seeing the forest for the trees
  8. 8. Finding Paths in Development
    1. TL;DR: Quantifying Trust in Networks
    2. Thinking about trust: Three Examples
    3. Fundamental Concepts about Paths
      1. Shortest Paths
      2. Depth-First Search and Breadth-First Search
      3. Learning to see application features as different path problems
    4. Finding Paths in a Trust Network
      1. Source Data
      2. A brief primer on Bitcoin Terminology
      3. Creating our Development Schema
      4. Loading Data
      5. Exploring Communities of Trust
    5. Understanding Traversals with our Bitcoin Trust Network
      1. Which addresses are in the first neighborhood?
      2. Which addresses are in the second neighborhood?
      3. Which addresses are in the second neighborhood, but not the first?
      4. Evaluation Strategies with the Gremlin Query language
      5. Pick a random address to use for our example
    6. Shortest Path Queries
      1. Finding Paths of a Fixed Length
      2. Finding Paths of Any Length
      3. Augmenting our Paths with the Trust Scores
      4. Do you trust this person?
  9. 9. Finding Paths in Production
    1. TL;DR: Understanding Weights, Distance, and Pruning
    2. Three examples of using weighted paths in distributed applications
    3. Weighted Paths and Search Algorithms
      1. Shortest Weighted Path Problem Definition
      2. Shortest Weighted Path Search Optimizations
    4. Normalization of Edge Weights for Shortest Path Problems
      1. Normalizing the edge Weights
      2. Updating Our Graph
      3. Exploring the Normalized Edge Weights with the Queries from Chapter 8
      4. Some Thoughts before moving onto Shortest Weighted Path Queries
    5. Shortest Weighted Path Queries
      1. Building a Shortest Weighted Path Query
    6. Weighted Paths and Trust
  10. 10. Recommendations in Development
    1. TL;DR: Collaborative Filtering for Movie Recommendations
    2. Recommendation System Examples
      1. How we give recommendations in Healthcare
      2. How we experience recommendations in Social Media
      3. How we use deeply connect data for recommendations in Ecommerce
    3. An Introduction to Collaborative Filtering
      1. Understanding the Problem and Domain
      2. Collaborative Filtering with Graph Data
      3. Recommendations via Item-Based Collaborative Filtering with Graph Data
      4. Three Different Models for Ranking Recommendations
    4. Movie Data: Schema, Loading, and Query Review
      1. Data Model for Movie Recommendations
      2. Schema Code for Movie Recommendations
      3. Loading the Movie Data
      4. Neighborhood Queries in the Movie Data
      5. Tree Queries in the Movie Data
      6. Path Queries in the Movie Data
    5. Item-Based Collaborative Filtering in Gremlin
      1. Model 1: Counting Paths in the Recommendation Set
      2. Model 2: NPS-Inspired
      3. Model 3: Normalized NPS
      4. Choosing Your Own Adventure: Movies and Graph Problems Edition
  11. 11. Simple Entity Resolution in Graphs
    1. TL;DR: Merging Multiple Datasets into One Graph
    2. Defining a Different Complex Problem: Entity Resolution
      1. Seeing the Complex Problem
    3. Analyzing the Two Movie Datasets
      1. MovieLens Dataset
      2. Kaggle Dataset
      3. Development Schema
    4. Matching and Merging the Movie Data
      1. Our Matching Process
    5. Resolving False Positives
      1. False Positives found in the MovieLens dataset
      2. Additional Errors Discovered in the Entity Resolution Process
      3. Final Analysis of the merging process
      4. The Role of Graph Structure in Merging Movie Data
  12. 12. Recommendations in Production
    1. TL;DR: Understanding Shortcut Edges, Pre-Computation, and Advanced Pruning Techniques
    2. Shortcut Edges for Recommendations in Real-Time
      1. Where Our Development Process Doesn’t Scale
      2. How We Fix Scaling Issues: Short-Cut Edges
      3. Seeing What we Designed to Deliver in Production
      4. Pruning: Different Ways to Pre-Compute Short-Cut Edges
      5. When to Compute: Considerations for Updating Your Recommendations
    3. Calculating Short-Cut edges for our Movie Data
      1. Breaking Down the Complex Problem of Pre-Calculating Short-Cut Edges
      2. Addressing the Elephant in the Room: OLTP versus OLAP
    4. Production Schema and Data Loading for Movie Recommendations
      1. Production Schema for Movie Recommendations
      2. Production Data Loading for Movie Recommendations
    5. Recommendation Queries with Shortcut Edges
      1. Confirming our Edges loaded correctly
      2. Production Recommendations for Our User
      3. Understanding Response Time in Production by Counting Edge Partitions
      4. Final Thoughts on Reasoning about Distributed Graph Query Performance
18.119.132.223