GDELT dataset

In order to validate our implementation, we use the GDELT dataset we analyzed in the previous chapter. We extracted all of the communities and spent some time looking at the person names to see whether or not our community clustering was consistent. The full picture of the communities is reported in Figure 7 and has been realized using the Gephi software, where only the top few thousand connections have been imported:

GDELT dataset
Figure 7: Community detection on January 12

We first observe that most of the communities we detected are totally aligned with the ones we could eyeball on a force-directed layout, giving a good confidence level about the algorithm accuracy.

The Bowie effect

Any well-defined community has been properly identified, and the less obvious ones are the ones surrounding highly connected vertices such as David Bowie. The name David Bowie being heavily mentioned in GDELT articles alongside so many different persons that, on that day of January 12, 2016, it became too large to be part of its logical community (music industry) and formed a broader community impacting all its surrounding vertices. There is definitely an interesting pattern here as this community structure gives us clear insights about a potential breaking news article for a particular person on a particular day.

Looking at the David Bowie's closest communities in Figure 8, we observe the nodes to be highly interconnected because of what we will be calling the Bowie effect. In fact, there have been so many tributes paid from so many various communities that the number of triangles formed across different communities has been abnormally high. As a result, it brought different logical communities closer to each other, communities that were theoretically not meant to be, such as the 70s rock star idols close enough to religious people.

The small world phenomenon, as defined in the 60s by Stanley Milgram, states that everyone is connected through a short number of acquaintances. Kevin Bacon, an American actor, even suggested he would be connected to every other actor by a maximum depth of 6 connections, also known as its Bacon Number (https://oracleofbacon.org/).

On that day, the Kevin Bacon Number of Pope Francis and Mick Jagger was only 1 thanks to the Cardinal Gianfranco Ravasi who tweeted about David Bowie.

The Bowie effect
Figure 8: Communities surrounding David Bowie, January 12

Although the Bowie's effect, by its nature of a breaking news article, is a true pattern on that particular graph structure, its effect could have been minimized using weighted edges based on names frequency count. Indeed, some random noise from the GDELT dataset could be enough to close critical triangles from two different communities and therefore bring them close to each other, no matter the weight of this critical edge. This limitation is common for all un-weighted algorithms and would require a preprocessing phase to reduce this unwanted noise.

Smaller communities

We can, however, observe some more defined communities here, such as the UK politicians Tony Blair, David Cameron, and Boris Johnson or the movie directors Christopher Nolan, Martin Scorsese, or Quentin Tarantino. Looking at a broader level, we can detect well-defined communities, such as tennis players, footballers, artists, or politicians of a specific country. As an undeniable proof of accuracy, we even detected Matt Leblanc, Courtney Cox, Matthew Perry, and Jennifer Anniston as being part of a same Friends community and Luke Skywalker, Anakin Skywalker, Chewbacca, and Emperor Palpatine as part of the Star Wars community and its recently lost actress, Carrie Fisher. An example of professional boxer's communities is reported in Figure 9:

Smaller communities
Figure 9: Professional boxer communities

Using Accumulo cell level security

We have previously discussed the nature of cell-level security in Accumulo. In the context of the graphs that we have produced here, the usefulness of security can be well simulated. If we configure Accumulo such that rows containing David Bowie are securely labeled differently to all other rows, then we can turn on and off the Bowie's effect. Any Accumulo user with full access will see the complete graph provided earlier. If we then restrict that user to everything other than David Bowie (a simple change to the Authorization in AccumuloReader), then we see the following figure. This new graph is very interesting as it serves a number of purposes:

  • It removes the noise created by the social media effect of David Bowie's death, thereby revealing the true communities involved
  • It removes many of the false links between entities, thereby increasing their Bacon number and showing their true relationship
  • It demonstrates that it is possible to remove a key figure in a graph and still retain a large amount of useful information, thereby demonstrating the point made earlier regarding the removal of key entities for security reasons (as discussed in Cell security)

It also has to be said, of course, that by removing an entity, we may also be removing key relationships between entities; that is, the contact chaining effect and this is a negative aspect when specifically trying to relate individual entities-overall, the communities, however, remain intact.

Using Accumulo cell level security
Figure 10: David Bowie's communities with restricted access

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.108.175