Interest in graph databases and especially Neo4j is increasing, both because of the naturalness of a graph data model and the range of data analyses they permit. This book is a journey inside the world of graphs and Neo4j. We will explore Neo4j and Cypher, but also different plugins (officially supported or from third parties), to extend database capabilities in terms of data types (APOC or Neo4j Spatial) or Data Science and Machine Learning applications using the Graph Data Science (GDS) plugin or the GraphAware NLP plugins.

A large part of the book covers graph algorithms. You will learn both how they work by running through an example implementation in python for the most famous algorithms (shortest path, PageRank or Label Propagation) and how to use them in practice from a Neo4j graph. We will also give some example applications to inspire you about when to use these algorithms for your use-cases.

Once you will be more familiar with the different types of algorithms that can be run on a graph to extract information about its individual components (nodes) or the overall graph structure, we will switch to some Data Science problems and lean how a graph structure and graph algorithms can enhance a model predictive power.

Finally, we will see that Neo4j, on top of being a fantastic tool for data analysis, can also be used to expose the data in a web application for our analysis to go live.

Who this book is for

This book is for data analysts, business analysts, graph analysts, and database developers looking to store and process graph data to reveal key data insights. This book will also appeal to data scientists who want to build intelligent graph applications catering to different domains. Some experience with Neo4j is required.

Although Python is used to demonstrate some algorithms in "Section 2: Graph Algorithms", we have kept the implementation simple (without any "Python magic") such that it should be accessible to you even if you are not familiar with Python. The following sections however requires some more experience with this language, especially its Data Science ecosystem, including scikit-learn, pandas or seaborn would be a plus.

What this book covers

Chapter 1, Graph Databases, provides a review of graph database concepts, starting from graph theory and important definitions to the node and relationship model of Neo4j.

Chapter 2, The Cypher Query Language, covers the basics of Cypher, the query language used by Neo4j, which will be used throughout this book for data import and pattern matching. APOC utilities for data import are also studied. In this chapter, we will start building the graph of Neo4j contributors on GitHub, which will be used elsewhere in this book.

Chapter 3, Empowering Your Business with Pure Cypher, explains how to build a knowledge graph from structured and unstructured data (using NLP) and start applying it from graph-based search or recommendation engines. We will use the graph of Neo4j contributors on GitHub and extend this thanks to natural language analysis and external publicly available knowledge graphs (namely, Wikidata).

Chapter 4, The Graph Data Science Library and Path Finding, explains the main principles of the graph data science plugin for Neo4j and uses our first algorithms with the shortest path-finding applications.

Chapter 5, Spatial Data, explains how, thanks to the Neo4j Spatial plugin, we will be able to store and query spatial data (points, lines, and polygons). Coupling Neo4j Spatial with the graph data science plugin, we will create a routing engine in Manhattan, New York.

Chapter 6, Node Importance, covers the different centrality algorithms, depending on how you define node importance, their applications, and usage from the GDS.

Chapter 7, Community Detection and Similarity Measures, covers the different algorithms to detect structures in a graph and how to visualize them using JavaScript libraries.

Chapter 8, Using Graph-Based Features in Machine Learning, explains how, starting from a flat CSV file, we will build a full machine learning project, reviewing the different steps required to build a predictive pipeline (feature engineering, model training, and model evaluation). We will then transform our flat CSV data to a graph using extra knowledge of our data and learn how graph algorithms can enhance the performance of a classification task.

Chapter 9, Predicting Relationships, explains how, in a time-evolving graph, we will formulate a link prediction problem as a machine learning problem with a training and test set.

Chapter 10, Graph Embedding – from Graphs to Matrices, explains how algorithms can automatically learn features for each node in a graph. Using an analogy with word embedding, we will learn how the DeepWalk algorithm works. We will then go even deeper and learn about graph neural networks and their use cases. Applications will be given using both Python and some dependencies and the GDS implementation of node2vec and GraphSAGE.

Chapter 11, Using Neo4j in Your Web Application, covers how, in order to use Neo4j and the tools we have studied in the previous chapter in a live application, we will create a web application using either Python and its popular Flask framework, or JavaScript and a GraphQL API.

Chapter 12, Neo4j at Scale, provides an overview of the possibilities offered by the GDS and Neo4j 4 in order to manage big data.

To get the most out of this book

You will need to have access to a Neo4j database that you can manage (that is, update settings and add plugins). The recommended way is to use Neo4j Desktop, which you can download from https://neo4j.com/download/.

You can create one graph per chapter. Only the "GitHub" graph we will start creating in chapter 2 and enrich in chapter 3 will be reused later in the book, but instructions will be given to recreate it in case you have not read the preceding chapters. You are also given a set of questions at the end of every chapter for self-learning.

All codes, except those in Chapter 5, Spatial Data and Chapter 11, Using Neo4j in Your Web Application, are compatible with both Neo4j 3.5 and Neo4j 4.x.

Regarding Chapter 5, since Neo4j Spatial is not yet compatible with Neo4j 4.x (the most recent version at the time of writing is 0.26.2), the code in this chapter is only valid for Neo4j 3.5. Similarly, for Chapter 11, we will rely on the neomodel package (last version at the time of writing is 3.3.2), not yet compatible with Neo4j 4.

Software/hardware covered in the book	OS requirements
Neo4j ≥ 3.5	Windows, Linux, or macOS; a minimum of 8 GB of RAM
APOC (Neo4j plugin) ≥ 3.5.0.11	Windows, Linux, or macOS; a minimum of 8 GB of RAM
neo4j-spatial (plugin)	Windows, Linux, or macOS; a minimum of 8 GB of RAM
Neo4j Graph Data Science plugin (GDS) ≥ 1.0	Windows, Linux, or macOS; a minimum of 8 GB of RAM
Python ≥ 3.6	Windows, Linux, or macOS; a minimum of 8 GB of RAM
Node.js ≥ v10 & npm (Second part of chapter 11 only)	Windows, Linux, or macOS; a minimum of 8 GB of RAM

For the second part of Chapter 11, where we will create a React application with the GRANDstack, you will also need Node.js and npm installed on your system.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the Support tab.
Click on Code Downloads.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Analytics-with-Neo4j. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781839212611_ColorImages.pdf

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

 submit = SubmitField('Submit')

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

password = StringField('Password', validators=[DataRequired()],
                           widget=PasswordInput(hide_value=False)
    )

Any command-line input or output is written as follows:

python models.py

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Table of Contents for Preface

Create new playlist

Sign In

Sign Up