Practical Implementation: Capturing Open Graph Data from a Web Source

We’ve discussed, at length, the topic of how you can make any site a producer of Open Graph information—that is, a rich provider of entity-based social data. Now that we understand that, let’s look into the process of creating an Open Graph consumer.

We will explore similar implementations of this process in two languages: PHP and Python. The end product is the same, so you can use either one you prefer.

PHP implementation: Open Graph node

First, let’s explore an Open Graph protocol parser implementation using PHP. In this example, we’ll develop a class that contains all of the functionality we need to parse Open Graph tags from any web source that contains them.

So what do want to get out of this class structure? If we break it down into a few elements, at a base level our only requirements are that it:

  • Includes a method for capturing and storing all <meta> tags with a property attribute starting with og: from a provided URL.

  • Provides one method for returning a single Open Graph tag value, and another for returning the entire list of obtained tags.

Now let’s see how these simple requirements play out when implemented in an actual PHP class structure:

<?php
/*******************************************************************************
 * Class Name: Open Graph Parser
 * Description: Parses an HTML document to retrieve and store Open Graph
 *              tags from the meta data
 * Useage:
 *   $url = 'http://www.example.com/index.html';
 *   $graph = new OpenGraph($url);
 *   print_r($graph->get_one('title'));  //get only title element
 *   print_r($graph->get_all());         //return all Open Graph tags
 ******************************************************************************/
class OpenGraph{
   //the open graph associative array
   private static $og_content = array();

   /***************************************************************************
    * Function: Class Constructor
    * Description: Initiates the request to fetch OG data
    * Params: $url (string) - URL of page to collect OG tags from
    **************************************************************************/
   public function __construct($url){
      if ($url){
         self::$og_content = self::get_graph($url);
      }
   }

   /***************************************************************************
    * Function: Get Open Graph
    * Description: Initiates the request to fetch OG data
    * Params: $url (string) - URL of page to collect OG tags from
    * Return: Object - associative array containing the OG data in format
    *              property : content
    **************************************************************************/
   private function get_graph($url){
      //fetch html content from web source and filter to meta data
      $dom = new DOMDocument();
      @$dom->loadHtmlFile($url);
      $tags = $dom->getElementsByTagName('meta'),

      //set open graph search tag and return object
      $og_pattern = '/^og:/';
      $graph_content = array();

      //for each open graph tag, store in return object as property : content
      foreach ($tags as $element){
         if (preg_match($og_pattern, $element->getAttribute('property'))){
            $graph_content[preg_replace($og_pattern, '',
               $element->getAttribute('property'))] =
               $element->getAttribute('content'),
         }
      }

      //store all open graph tags
      return $graph_content;
   }

   /***************************************************************************
    * Function: Get One Tag
    * Description: Fetches the content of one OG tag
    * Return: String - the content of one requested OG tag
    **************************************************************************/
   public function get_one($element){
      return self::$og_content[$element];
   }

   /***************************************************************************
    * Function: Get All Tags
    * Description: Fetches the content of one OG tag
    * Return: Object - The entire OG associative array
    **************************************************************************/
   public function get_all(){
      return self::$og_content;
   }
}
?>

Analyzing our code by individual sections, we can see that within the class the following methods are available:

__construct

This is the class constructor that is run when you create a new instance of the class using the code new OpenGraph(). The constructor accepts a URL string as the single parameter; this is the URL that the class will access to collect its Open Graph metadata. Once in the constructor, if a URL string was specified, the class og_content property will be set to the return value of the get_graph method—i.e., the associative array of Open Graph tags.

get_graph

Once initiated, the get_graph method will capture the content of the URL as a DOM document, then further filter the resulting value to return only <meta> tags within the content. We then loop through each <meta> tag that was found. If the <meta> tag contains a property attribute that starts with og:, the tag is a valid Open Graph tag. The key of the return associative array is set to the property value (minus the og:, which is stripped out of the string), and the value is set to the content of the tag. Once all valid tags are stored within the return associative array, it is returned from the method.

get_one

Provides a public method to allow you to return one Open Graph tag from the obtained graph data. The single argument that is allowed is a string representing the property value of the Open Graph tag. The method returns the string value of the content of that same tag.

get_all

Provides a public method to allow you to return all Open Graph tags from the obtained graph data. This method does not take any arguments from the user and returns the entire associative array in the format of property: content.

Now that we have our class structure together, we can explore how to use it in a practical implementation use case. For this example, we are revisiting our old Yelp restaurant review example from earlier in the chapter. In a separate file, we can build out the requests:

<?php
require_once('OpenGraph.php'),

//set url to get OG data from and initialtize class
$url = 'http://www.yelp.com/biz/the-restaurant-at-wente-vineyards-livermore-2';
$graph = new OpenGraph($url);

//print title and then the entire meta graph
print_r($graph->get_one('title'));
print_r($graph->get_all());
?>

We first set the URL from which we want to scrape the Open Graph metadata. Following that, we create a new Open Graph class object, passing in that URL. The class constructor will scrape the Open Graph data from the provided URL (if available) and store it within that instance of the class.

We can then begin making public method requests against the class object to display some of the Open Graph data that we captured.

First, we make a request to the get_one(...) method, passing in the string title as the argument to the method call. This signifies that we want to return the Open Graph <meta> tag content whose property is og:title.

When we call the get_one(...) method, the following string will be printed on the page:

The Restaurant at Wente Vineyards

We then make a request to the public get_all() method. This method will fetch the entire associative array of Open Graph tags that we were able to pull from the specified page. Once we print out the return value from that method, we are presented with the following:

Array
(
  [url] => http://www.yelp.com/biz/gATFcGOL-q0tqm9HTaXJpg
  [longitude] => −121.7567068
  [type] => restaurant
  [description] =>
  [latitude] => 37.6246361
  [title] => The Restaurant at Wente Vineyards
  [image] => http://media2.px.yelpcdn.com/bphoto/iVSnIDCj-fWiPffHHkUVsQ/m
)

You can obtain the full class file and sample implementation from https://github.com/jcleblanc/programming-social-applications/tree/master/open-graph/php-parser/.

You can use this simple implementation to scrape Open Graph data from any web source. It will allow you to access stored values to obtain rich entity information about a web page, extending the user social graph beyond the traditional confines of a social networking container or any single web page.

Python implementation: Open Graph node

Now let’s look at the same Open Graph protocol tag-parsing class, but this time using Python. Much like the PHP example, we’re going to be creating a class that contains the following functionality:

  • Includes a method for capturing and storing all <meta> tags with a property attribute starting with og: from a provided URL.

  • Provides one method for returning a single Open Graph tag value, and another for returning the entire list of obtained tags.

Let’s take a look at how this implementation is built.

Note

The following Open Graph Python implementation uses an HTML/XML parser called Beautiful Soup to capture <meta> tags from a provided source. Beautiful Soup is a tremendously valuable parsing library for Python and can be downloaded and installed from http://www.crummy.com/software/BeautifulSoup/.

import urllib
import re
from BeautifulSoup import BeautifulSoup

"""
" Class: Open Graph Parser
" Description: Parses an HTML document to retrieve and store Open Graph
"              tags from the meta data
" Useage:
"    url = 'http://www.nhl.com/ice/player.htm?id=8468482';
"    og_instance = OpenGraphParser(url)
"    print og_instance.get_one('og:title')
"    print og_instance.get_all()
"""
class OpenGraphParser:
   og_content = {}

   """
   " Method: Init
   " Description: Initializes the open graph fetch.  If url was provided,
   "           og_content will be set to return value of get_graph method
   " Arguments: url (string) - The URL from which to collect the OG data
   """
   def __init__(self, url):
      if url is not None:
         self.og_content = self.get_graph(url)

   """
   " Method: Get Open Graph
   " Description: Fetches HTML from provided url then filters to only meta tags.
   "           Goes through all meta tags and any starting with og: get
   "           stored and returned to the init method.
   " Arguments: url (string) - The URL from which to collect the OG data
   " Returns: dictionary - The matching OG tags
   """
   def get_graph(self, url):
      #fetch all meta tags from the url source
      sock = urllib.urlopen(url)
      htmlSource = sock.read()
      sock.close()
      soup = BeautifulSoup(htmlSource)
      meta = soup.findAll('meta')

      #get all og:* tags from meta data
      content = {}
      for tag in meta:
         if tag.has_key('property'):
            if re.search('og:', tag['property']) is not None:
               content[re.sub('og:', '', tag['property'])] = tag['content']

      return content

   """
   " Method: Get One Tag
   " Description: Returns the content of one OG tag
   " Arguments: tag (string) - The OG tag whose content should be returned
   " Returns: string - the value of the OG tag
   """
   def get_one(self, tag):
      return self.og_content[tag]

   """
   " Method: Get All Tags
   " Description: Returns all found OG tags
   " Returns: dictionary - All OG tags
   """
   def get_all(self):
      return self.og_content

This class structure will make up the core functionality behind our parser. When instantiated, the class object will fetch and store all Open Graph <meta> tags from the provided source URL and will allow you to pull any of that data as needed.

The OpenGraphParser class consists of a number of methods to help us accomplish our goals:

__init__

Initializes the class instance. The init method accepts a single argument, a string whose content is the URL from which we will attempt to obtain Open Graph <meta> tag data. If the URL exists, the class property og_content will be set to the return value of the get_graph(...) method—i.e., the Open Graph <meta> tag data.

get_graph

The meat of the fetch request, get_graph accepts one argument—the URL from which we will obtain the Open Graph data. This method starts out by fetching the HTML document from the provided URL, then uses Beautiful Soup to fetch all <meta> tags that exist within the source. We then loop through all <meta> tags, and if they have a property value that begins with og:(signifying an Open Graph tag), we store the key and value in our dictionary variable. Once all tags are obtained, the dictionary is returned.

get_one

Provides a means for obtaining the value of a single Open Graph tag from the stored dictionary property. This method accepts one argument, the key whose value should be returned, and returns the value of that key as a string.

get_all

Provides a means for obtaining all Open Graph data stored within the class instance. This method does not accept any arguments and returns the dictionary object containing all Open Graph data.

Now that we have the class in place, we can begin to fetch Open Graph data from a given URL. If we implement our new class, we can see how this works:

from OpenGraph import OpenGraphParser

#initialize open graph parser class instance with url
url = 'http://www.nhl.com/ice/player.htm?id=8468482';
og_instance = OpenGraphParser(url)

#output since description and entire og tag dictionary
print og_instance.get_one('description')
print og_instance.get_all()

We first import the class into our Python script so that we can use it. Then we need to initialize an instance of the class object. We set the URL that we want to obtain (in this case, the NHL player page for Dany Heatley) and create a new class instance, passing through the URL to the class __init__ method.

Now that we have the Open Graph data locked in our class object, we can begin extracting the information as required. We first make a request to the get_once(...) method, passing in description as our argument. This will obtain the Open Graph tag for og:description, returning a string similar to the following:

Dany Heatley of the San Jose Sharks. 2010-2011 Stats:
19 Games Played, 7 Goals, 12 Assists

The next method that we call is a request to get_all(). When this method is called, it will return the entire dictionary of Open Graph tags. When we print this out, we should have something similar to the following:

{
  u'site_name': u'NHL.com',
  u'description': u'Dany Heatley of the San Jose Sharks. 2010-2011
          Stats: 19 Games Played, 7 Goals, 12 Assists',
  u'title': u'Dany Heatley',
  u'url': u'http://sharks.nhl.com/club/player.htm?id=8468482',
  u'image': u'http://cdn.nhl.com/photos/mugs/thumb/8468482.jpg',
  u'type': u'athlete'
}

You can obtain the full class file and sample implementation from https://github.com/jcleblanc/programming-social-applications/tree/master/open-graph/python-parser/.

Using this simple class structure and a few lines of setup code, we can obtain a set of Open Graph tags from a web source. We can then use this data to begin to define a valuable source of entity social graph information for users, companies, or organizations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.240.119