Title PageCopyright and CreditsHands-On Graph Analytics with Neo4jAbout PacktWhy subscribe?ContributorsAbout the authorAbout the reviewersPackt is searching for authors like youPrefaceWho this book is forWhat this book coversTo get the most out of this bookDownload the example code filesDownload the color imagesConventions usedGet in touchReviewsSection 1: Graph Modeling with Neo4jGraph DatabasesGraph definition and examplesGraph theoryA bit of history: the Seven Bridges of Königsberg problemGraph definitionVisualizationExamples of graphsNetworksRoad networksComputer networksSocial networksYour data is also a graphMoving from SQL to graph databasesDatabase modelsSQL and joinsIt's all about relationshipsNeo4j – the nodes, relationships, and properties modelBuilding blocksNodesRelationshipsPropertiesSQL to Neo4j translatorNeo4j use casesUnderstanding graph propertiesDirected versus undirectedWeighted versus unweightedCyclic versus acyclicDense versus sparseGraph traversalConnected versus disconnectedConsiderations for graph modeling in Neo4jRelationship orientationNode or property?SummaryFurther readingThe Cypher Query LanguageTechnical requirementsCreating nodes and relationshipsManaging databases with Neo4j DesktopCreating a nodeSelecting nodesFilteringReturning propertiesCreating a relationshipSelecting relationshipsThe MERGE keywordUpdating and deleting nodes and relationshipsUpdating objectsUpdating an existing property or creating a new oneUpdating all properties of the nodeUpdating node labelsDeleting a node propertyDeleting objectsPattern matching and data retrievalPattern matchingTest dataGraph traversalOrientationThe number of hopsVariable-length patternsOptional matchesUsing aggregation functionsCount, sum, and averageCreating a list of objectsUnnesting objectsImporting data from CSV or JSONData import from CypherFile locationLocal file: the import folderChanging the default configuration to import a file from another directoryCSV filesCSV files without headersCSV files with headersEager operationsData import from the command lineAPOC utilities for importsCSV filesJSON filesImporting data from a web APISetting parametersCalling the GitHub web APISummary of import methodsMeasuring performance and tuning your query for speedCypher query plannerNeo4j indexingBack to LOAD CSVThe friend-of-friend exampleSummaryQuestionsFurther readingEmpowering Your Business with Pure CypherTechnical requirementsKnowledge graphsAttempting a definition of knowledge graphsBuilding a knowledge graph from structured dataBuilding a knowledge graph from unstructured data using NLPNLPNeo4j tools for NLPGraphAware NLP libraryImporting test data from the GitHub APIEnriching the graph with NLPAdding context to a knowledge graph from WikidataIntroducing RDF and SPARQLQuerying WikidataImporting Wikidata into Neo4jEnhancing a knowledge graph from semantic graphsGraph-based searchSearch methodsManually building Cypher queriesAutomating the English to Cypher translationUsing NLPUsing translation-like modelsRecommendation engineProduct similarity recommendationsProducts in the same categoryProducts frequently bought togetherRecommendation orderingSocial recommendationsProducts bought by a friend of mineSummaryQuestionsFurther readingSection 2: Graph AlgorithmsThe Graph Data Science Library and Path FindingTechnical requirementsIntroducing the Graph Data Science pluginExtending Neo4j with custom functions and proceduresThe difference between procedures and functionsFunctionsProceduresWriting a custom function in Neo4jGDS library contentDefining the projected graphNative projectionsCypher projectionsStreaming or writing results back to the graphUnderstanding the importance of shortest path algorithms through their applicationsRouting within a networkGPSThe shortest path within a social networkOther applicationsVideo gamesScienceDijkstra's shortest paths algorithmUnderstanding the algorithmRunning Dijkstra's algorithm on a simple graphExample implementationGraph representationAlgorithmDisplaying the full path from A to EUsing the shortest path algorithm within Neo4jPath visualizationUnderstanding relationship directionFinding the shortest path with the A* algorithm and its heuristicsAlgorithm principlesDefining the heuristics for A*Using A* within the Neo4j GDS pluginDiscovering the other path-related algorithms in the GDS pluginK-shortest pathSingle Source Shortest Path (SSSP)All-pairs shortest pathOptimizing processes using graphsThe traveling-salesman problemSpanning treesPrim's algorithmFinding the minimum spanning tree in a Neo4j graphSummaryQuestionsFurther readingSpatial DataTechnical requirementsRepresenting spatial attributesUnderstanding geographic coordinate systemsUsing the Neo4j built-in spatial typesCreating pointsQuerying by distanceCreating a geometry layer in Neo4j with neo4j-spatialIntroducing the neo4j-spatial libraryA note on spatial indexesCreating a spatial layer of pointsDefining the spatial layerAdding points to a spatial layerDefining the type of spatial dataCreating layers with polygon geometriesGetting the dataCreating the layerPerforming spatial queriesFinding the distance between two spatial objectsFinding objects contained within other objectsFinding the shortest path based on distanceImporting the dataPreparing the dataImporting dataCreating a spatial layerRunning the shortest path algorithmVisualizing spatial data with Neo4jneomap – a Neo4j Desktop application for spatial dataVisualizing nodes with simple layersVisualizing paths with advanced layerUsing the JavaScript Neo4j driver to visualize shortest pathsNeo4j JS driverLeaflet and GeoJSONSummaryQuestionsFurther readingNode ImportanceTechnical requirementsDefining importancePopularity and information spreadCritical or bridging nodesComputing degree centralityFormulaComputing degree centrality in Neo4jComputing the outgoing degree using GDSComputing the incoming degree using GDSUsing a named projected graphUsing an anonymous projected graphUnderstanding the PageRank algorithmBuilding the formulaThe damping factorNormalizationRunning the algorithm on an example graphImplementing the PageRank algorithm using PythonUsing GDS to assess PageRank centrality in Neo4jComparing degree centrality and the PageRank resultsVariantsArticleRankPersonalized PageRankEigenvector centralityThe adjacency matrixPageRank with matrix notationEigenvector centralityComputing eigenvector centrality in GDSPath-based centrality metricsCloseness centralityNormalizationComputing closeness from the shortest path algorithmsThe closeness centrality algorithmCloseness centrality in multiple-component graphsBetweenness centralityComparing centrality metricsApplying centrality to fraud detectionDetecting fraud using Neo4jUsing centrality to assess fraudCreating a projected graph with Cypher projectionOther applications of centrality algorithmsSummaryExercisesFurther readingCommunity Detection and Similarity MeasuresTechnical requirementsIntroducing community detection and its applicationsIdentifying clusters of nodesApplications of the community detection methodRecommendation engines and targeted marketingClusters of productsClusters of usersFraud detectionPredicting properties or linksA brief overview of community detection techniquesDetecting graph components and visualizing communitiesWeakly connected componentsStrongly connected componentsWriting the GDS results in the graphVisualizing a graph with neovis.jsUsing NEuler, the Graph Data Science PlaygroundUsage for community detection visualizationRunning the Label Propagation algorithmDefining Label PropagationWeighted nodes and relationshipsSemi-supervised learningImplementing Label Propagation in PythonUsing the Label Propagation algorithm from the GDSUsing seedsWriting results to the graphUnderstanding the Louvain algorithmDefining modularityAll nodes are in their own communityAll nodes are in the same communityOptimal partitionSteps to reproduce the Louvain algorithmThe Louvain algorithm in the GDSSyntaxThe aggregation method in relationship projectionIntermediate stepsA comparison between Label Propagation and Louvain on the Zachary's karate club graphGoing beyond Louvain for overlapping community detectionA caveat of the Louvain algorithmResolution limitAlternatives to LouvainOverlapping community detectionDynamic networksMeasuring the similarity between nodesSet-based similaritiesOverlappingDefinitionQuantifying user similarity in the GitHub graphJaccard similarityVector-based similaritiesEuclidean distanceCosine similaritySummaryQuestionsFurther readingSection 3: Machine Learning on GraphsUsing Graph-based Features in Machine LearningTechnical requirementsBuilding a data science projectProblem definition – asking the right questionSupervised versus unsupervised learningRegression versus classificationIntroducing the problem for this chapterGetting and cleaning dataData characterizationQuantifying the dataset sizeLabelsColumnsData visualizationData cleaningOutliers detectionMissing dataCorrelation between variablesData enrichmentFeature engineeringBuilding the modelTrain/test split and cross-validationCreating the train and test samples with scikit-learnTraining a modelEvaluating model performancesThe steps toward graph machine learningBuilding a (knowledge) graphCreating relationships from existing dataCreating relationships from relational dataCreating relationships from Neo4jUsing an external data sourceImporting the data into Neo4jGraph characterizationThe number of nodes and edgesThe number of componentsExtracting graph-based featuresUsing graph-based features with pandas and scikit-learnExtracting graph-based features from Neo4j BrowserCreating the projected graphRunning one or several algorithmsDropping the projected graphExtracting the dataAutomating graph-based feature creation with the Neo4j Python driverDiscovering the Neo4j Python driverBasic usageTransactionsAutomating graph-based feature creation with PythonCreating the projected graphCalling the GDS proceduresWriting results back to the graphDropping the projected graphExporting the data from Neo4j to pandasTraining a scikit-learn modelIntroducing community featuresUsing both community and centrality featuresSummaryQuestionsFurther readingPredicting RelationshipsTechnical requirementsWhy use link prediction?Dynamic graphsApplicationsRecovering missing dataFighting crimeResearchMaking recommendationsSocial links (Facebook friends, LinkedIn contacts...)Product recommendationsMaking recommendations using a link prediction algorithmCreating link prediction metrics with Neo4jCommunity-based metricsPath-related metricsDistance between nodesThe Katz indexUsing local neighborhood informationCommon neighborsAdamic-AdarTotal neighborsPreferential attachmentOther metricsBuilding a link prediction model using an ROC curveImporting the data into Neo4jSplitting the graph and computing the score for each edgeMeasuring binary classification model performanceUnderstanding ROC curvesExtracting features and labelsDrawing the ROC curveCreating the DataFramePlotting the ROC curveDetermining the optimal cutoff and computing performancesBuilding a more complex model using scikit-learnSaving link prediction results into Neo4jPredicting relationships in bipartite graphsSummaryQuestionsFurther readingGraph Embedding - from Graphs to MatricesTechnical requirementsWhy do we need embedding?Why is embedding needed?One-hot encodingCreating features for words – the manual wayEmbedding specificationsThe graph embedding landscapeAdjacency-based embeddingThe adjacency matrix and graph LaplacianEigenvectors embeddingLocally linear embeddingSimilarity-based embeddingHigh-Order Proximity preserved Embedding (HOPE)Computing node embedding with PythonCreating a networkx graphThe Neo4j test graphExtracting the edge list data from Neo4jCreating a networkx graph matrix from pandasFitting a node embedding algorithmExtracting embeddings from artificial neural networksArtificial neural networks in a nutshellA reminder about neural network principlesNeurons, layers, and forward propagationDifferent types of neural networksSkip-graph modelFake taskInputWord representation before embeddingTargetHidden layerOutput layerDeepWalk node embeddingGenerating node context through random walksGenerating random walks from the GDSDeepWalk embedding with karateclubNode2vec, a DeepWalk alternativeNode2vec from the GDS (≥ 1.3)Getting the embedding results from PythonGraph neural networks Extending the principles of CNNs and RNNs to build GNNsMessage propagation and aggregationTaking into account node propertiesApplications of GNNsImage analysisVideo analysisZero-shot learningText analysisAnd there's more...Using GNNs in practiceGNNs from the GDS – GraphSAGEGoing further with graph algorithmsState-of-the-art graph algorithmsSummaryQuestionsFurther readingSection 4: Neo4j for ProductionUsing Neo4j in Your Web ApplicationTechnical requirementsCreating a full-stack web application using Python and Graph Object MappersToying with neomodelDefining the properties of structured nodesStructuredNode versus SemiStructuredNodeAdding propertiesCreating nodesQuerying nodesFiltering nodesIntegrating relationship knowledgeSimple relationshipRelationship with propertiesBuilding a web application backed by Neo4j using Flask and neomodelCreating toy dataLogin pageCreating the Flask applicationAdapting the modelThe login formThe login templateThe login viewReading data – listing owned repositoriesAltering the graph – adding a contributionUnderstanding GraphQL APIs by example – GitHub API v4EndpointsReturned attributesQuery parametersMutationsDeveloping a React application using GRANDstackGRANDstack – GraphQL, React, Apollo, and Neo4j DatabaseCreating the APIWriting the GraphQL schemaDefining typesStarting the applicationTesting with the GraphQL playgroundCalling the API from PythonUsing variablesMutationsBuilding the user interfaceCreating a simple componentGetting data from the GraphQL APIWriting a simple componentAdding navigationMutationRefreshing data after the mutationSummaryQuestionsFurther readingNeo4j at ScaleTechnical requirementsMeasuring GDS performanceEstimating memory usage with the estimate proceduresEstimating projected graph memory usageFictive graphGraph defined by native or Cypher projectionEstimating algorithm memory usageThe stats running modeMeasuring time performances for some of the algorithmsConfiguring Neo4j 4.0 for big dataThe landscape prior to Neo4j 4.0Memory settingsNeo4j in the cloudSharding with Neo4j 4.0Defining shardsCreating the databasesQuerying a sharded graphThe USE statementQuerying all databasesSummaryOther Books You May EnjoyLeave a review - let other readers know what you think