We’ve discussed, at length, the topic of how you can make any site a producer of Open Graph information—that is, a rich provider of entity-based social data. Now that we understand that, let’s look into the process of creating an Open Graph consumer.
We will explore similar implementations of this process in two languages: PHP and Python. The end product is the same, so you can use either one you prefer.
The full code for this sample is available at https://github.com/jcleblanc/programming-social-applications/tree/master/chapter_10/opengraph-php-parser.
First, let’s explore an Open Graph protocol parser implementation using PHP. In this example, we’ll develop a class that contains all of the functionality we need to parse Open Graph tags from any web source that contains them.
So what do want to get out of this class structure? If we break it down into a few elements, at a base level our only requirements are that it:
Includes a method for capturing and storing all <meta>
tags with a property
attribute starting with og:
from a provided URL.
Provides one method for returning a single Open Graph tag value, and another for returning the entire list of obtained tags.
Now let’s see how these simple requirements play out when implemented in an actual PHP class structure:
<?php /******************************************************************************* * Class Name: Open Graph Parser * Description: Parses an HTML document to retrieve and store Open Graph * tags from the meta data * Useage: * $url = 'http://www.example.com/index.html'; * $graph = new OpenGraph($url); * print_r($graph->get_one('title')); //get only title element * print_r($graph->get_all()); //return all Open Graph tags ******************************************************************************/ class OpenGraph{ //the open graph associative array private static $og_content = array(); /*************************************************************************** * Function: Class Constructor * Description: Initiates the request to fetch OG data * Params: $url (string) - URL of page to collect OG tags from **************************************************************************/ public function __construct($url){ if ($url){ self::$og_content = self::get_graph($url); } } /*************************************************************************** * Function: Get Open Graph * Description: Initiates the request to fetch OG data * Params: $url (string) - URL of page to collect OG tags from * Return: Object - associative array containing the OG data in format * property : content **************************************************************************/ private function get_graph($url){ //fetch html content from web source and filter to meta data $dom = new DOMDocument(); @$dom->loadHtmlFile($url); $tags = $dom->getElementsByTagName('meta'), //set open graph search tag and return object $og_pattern = '/^og:/'; $graph_content = array(); //for each open graph tag, store in return object as property : content foreach ($tags as $element){ if (preg_match($og_pattern, $element->getAttribute('property'))){ $graph_content[preg_replace($og_pattern, '', $element->getAttribute('property'))] = $element->getAttribute('content'), } } //store all open graph tags return $graph_content; } /*************************************************************************** * Function: Get One Tag * Description: Fetches the content of one OG tag * Return: String - the content of one requested OG tag **************************************************************************/ public function get_one($element){ return self::$og_content[$element]; } /*************************************************************************** * Function: Get All Tags * Description: Fetches the content of one OG tag * Return: Object - The entire OG associative array **************************************************************************/ public function get_all(){ return self::$og_content; } } ?>
Analyzing our code by individual sections, we can see that within the class the following methods are available:
__construct
This is the class constructor that is run when you create
a new instance of the class using the code new OpenGraph()
. The constructor
accepts a URL string as the single parameter; this is the URL
that the class will access to collect its Open Graph metadata.
Once in the constructor, if a URL string was specified, the
class og_content
property
will be set to the return value of the get_graph
method—i.e., the associative
array of Open Graph tags.
get_graph
Once initiated, the get_graph
method will capture the
content of the URL as a DOM document, then further filter the
resulting value to return only <meta>
tags within the content.
We then loop through each <meta>
tag that was found. If
the <meta>
tag contains
a property attribute that starts with og:
, the tag is a valid Open Graph
tag. The key of the return associative array is set to the
property value (minus the og:
, which is stripped out of the
string), and the value is set to the content of the tag. Once
all valid tags are stored within the return associative array,
it is returned from the method.
get_one
Provides a public method to allow you to return one Open Graph tag from the obtained graph data. The single argument that is allowed is a string representing the property value of the Open Graph tag. The method returns the string value of the content of that same tag.
get_all
Provides a public method to allow you to return all Open
Graph tags from the obtained graph data. This method does not
take any arguments from the user and returns the entire
associative array in the format of
property
:
content
.
Now that we have our class structure together, we can explore how to use it in a practical implementation use case. For this example, we are revisiting our old Yelp restaurant review example from earlier in the chapter. In a separate file, we can build out the requests:
<?php require_once('OpenGraph.php'), //set url to get OG data from and initialtize class $url = 'http://www.yelp.com/biz/the-restaurant-at-wente-vineyards-livermore-2'; $graph = new OpenGraph($url); //print title and then the entire meta graph print_r($graph->get_one('title')); print_r($graph->get_all()); ?>
We first set the URL from which we want to scrape the Open Graph metadata. Following that, we create a new Open Graph class object, passing in that URL. The class constructor will scrape the Open Graph data from the provided URL (if available) and store it within that instance of the class.
We can then begin making public method requests against the class object to display some of the Open Graph data that we captured.
First, we make a request to the get_one(...)
method, passing in the string
title
as the argument to the method
call. This signifies that we want to return the Open Graph <meta>
tag content whose property
is og:title
.
When we call the get_one(...)
method, the following string will be printed on the page:
The Restaurant at Wente Vineyards
We then make a request to the public get_all()
method. This method will fetch the
entire associative array of Open Graph tags that we were able to pull
from the specified page. Once we print out the return value from that
method, we are presented with the following:
Array ( [url] => http://www.yelp.com/biz/gATFcGOL-q0tqm9HTaXJpg [longitude] => −121.7567068 [type] => restaurant [description] => [latitude] => 37.6246361 [title] => The Restaurant at Wente Vineyards [image] => http://media2.px.yelpcdn.com/bphoto/iVSnIDCj-fWiPffHHkUVsQ/m )
You can obtain the full class file and sample implementation from https://github.com/jcleblanc/programming-social-applications/tree/master/open-graph/php-parser/.
You can use this simple implementation to scrape Open Graph data from any web source. It will allow you to access stored values to obtain rich entity information about a web page, extending the user social graph beyond the traditional confines of a social networking container or any single web page.
The full code for this sample is available at https://github.com/jcleblanc/programming-social-applications/tree/master/chapter_10/opengraph-python-parser.
Now let’s look at the same Open Graph protocol tag-parsing class, but this time using Python. Much like the PHP example, we’re going to be creating a class that contains the following functionality:
Includes a method for capturing and storing all <meta>
tags with a property
attribute starting with og:
from a provided URL.
Provides one method for returning a single Open Graph tag value, and another for returning the entire list of obtained tags.
Let’s take a look at how this implementation is built.
The following Open Graph Python implementation uses an
HTML/XML parser called Beautiful Soup to capture <meta>
tags from a provided source.
Beautiful Soup is a tremendously valuable parsing library for Python
and can be downloaded and installed from http://www.crummy.com/software/BeautifulSoup/.
import urllib import re from BeautifulSoup import BeautifulSoup """ " Class: Open Graph Parser " Description: Parses an HTML document to retrieve and store Open Graph " tags from the meta data " Useage: " url = 'http://www.nhl.com/ice/player.htm?id=8468482'; " og_instance = OpenGraphParser(url) " print og_instance.get_one('og:title') " print og_instance.get_all() """ class OpenGraphParser: og_content = {} """ " Method: Init " Description: Initializes the open graph fetch. If url was provided, " og_content will be set to return value of get_graph method " Arguments: url (string) - The URL from which to collect the OG data """ def __init__(self, url): if url is not None: self.og_content = self.get_graph(url) """ " Method: Get Open Graph " Description: Fetches HTML from provided url then filters to only meta tags. " Goes through all meta tags and any starting with og: get " stored and returned to the init method. " Arguments: url (string) - The URL from which to collect the OG data " Returns: dictionary - The matching OG tags """ def get_graph(self, url): #fetch all meta tags from the url source sock = urllib.urlopen(url) htmlSource = sock.read() sock.close() soup = BeautifulSoup(htmlSource) meta = soup.findAll('meta') #get all og:* tags from meta data content = {} for tag in meta: if tag.has_key('property'): if re.search('og:', tag['property']) is not None: content[re.sub('og:', '', tag['property'])] = tag['content'] return content """ " Method: Get One Tag " Description: Returns the content of one OG tag " Arguments: tag (string) - The OG tag whose content should be returned " Returns: string - the value of the OG tag """ def get_one(self, tag): return self.og_content[tag] """ " Method: Get All Tags " Description: Returns all found OG tags " Returns: dictionary - All OG tags """ def get_all(self): return self.og_content
This class structure will make up the core functionality behind
our parser. When instantiated, the class object will fetch and store
all Open Graph <meta>
tags
from the provided source URL and will allow you to pull any of that
data as needed.
The OpenGraphParser
class
consists of a number of methods to help us accomplish our
goals:
__init__
Initializes the class instance. The init
method accepts a single argument,
a string whose content is the URL from which we will attempt to
obtain Open Graph <meta>
tag data. If the URL
exists, the class property og_content
will be set to the return value of the get_graph(...)
method—i.e., the Open
Graph <meta>
tag
data.
get_graph
The meat of the fetch request, get_graph
accepts one argument—the URL
from which we will obtain the Open Graph data. This method
starts out by fetching the HTML document from the provided URL,
then uses Beautiful Soup to fetch all <meta>
tags that exist within
the source. We then loop through all <meta>
tags, and if they have a
property value that begins with og:
(signifying an Open Graph tag), we
store the key and value in our dictionary
variable. Once all tags are
obtained, the dictionary is returned.
get_one
Provides a means for obtaining the value of a single Open Graph tag from the stored dictionary property. This method accepts one argument, the key whose value should be returned, and returns the value of that key as a string.
get_all
Provides a means for obtaining all Open Graph data stored within the class instance. This method does not accept any arguments and returns the dictionary object containing all Open Graph data.
Now that we have the class in place, we can begin to fetch Open Graph data from a given URL. If we implement our new class, we can see how this works:
from OpenGraph import OpenGraphParser #initialize open graph parser class instance with url url = 'http://www.nhl.com/ice/player.htm?id=8468482'; og_instance = OpenGraphParser(url) #output since description and entire og tag dictionary print og_instance.get_one('description') print og_instance.get_all()
We first import the class into our Python script so that we can
use it. Then we need to initialize an instance of the class object. We
set the URL that we want to obtain (in this case, the NHL player page
for Dany Heatley) and create a new class instance, passing through the
URL to the class __init__
method.
Now that we have the Open Graph data locked in our class object,
we can begin extracting the information as required. We first make a
request to the get_once(...)
method, passing in description
as
our argument. This will obtain the Open Graph tag for og:description
, returning a string similar
to the following:
Dany Heatley of the San Jose Sharks. 2010-2011 Stats: 19 Games Played, 7 Goals, 12 Assists
The next method that we call is a request to get_all()
. When this method is called, it
will return the entire dictionary of Open Graph tags. When we print
this out, we should have something similar to the following:
{ u'site_name': u'NHL.com', u'description': u'Dany Heatley of the San Jose Sharks. 2010-2011 Stats: 19 Games Played, 7 Goals, 12 Assists', u'title': u'Dany Heatley', u'url': u'http://sharks.nhl.com/club/player.htm?id=8468482', u'image': u'http://cdn.nhl.com/photos/mugs/thumb/8468482.jpg', u'type': u'athlete' }
You can obtain the full class file and sample implementation from https://github.com/jcleblanc/programming-social-applications/tree/master/open-graph/python-parser/.
Using this simple class structure and a few lines of setup code, we can obtain a set of Open Graph tags from a web source. We can then use this data to begin to define a valuable source of entity social graph information for users, companies, or organizations.
3.139.240.119