Differences with industry graph databases

In some graph technologies, search results are based on the popularity of records. This means that, if we take music data that describes people and what they listen to and then try to search for Mozart to get related artists, we will get the following results. For the sake of simplicity, I have represented the result rows in the form of green boxes, with the related artist on top of it. The bigger the green box, the more popular the artist. In our search example, the first row will naturally be Mozart, but then we'll find Coldplay, The Beatles, and, somewhere at the end, we should find Bach.

Coldplay and The Beatles are pretty popular across the dataset, and they will most likely be present in every single graph exploration. However, their popularity is diluting the signal we are looking for, that is, classical music artists related to Mozart; they are creating noise. They are called super-connected entities because data points are never more than a couple of hops away from them; they will always end up touching a super-connected entity, as shown in the following diagram:

Their inclusion is usually what happens in mainstream graphing technology; this is because it's not their job to calculate the relevancy and the significance of the results. The good news is that it's exactly what Elastic graph is good at.

When we throw terms in the Elasticsearch indices, it naturally knows which data is the most interesting and leverages that to build the graph.

Elasticsearch looks for the reinforcement of many documents to show the strength/relevancy of the connection.

Table of Contents for Differences with industry graph databases

Create new playlist

Sign In

Sign Up

Table of Contents for
Differences with industry graph databases