From the standpoint of the social web, Facebook truly is an all-in-one wonder. Given that its more than 500 million users can update their public statuses to let their friends know what they’re doing/thinking/etc., exchange lengthier messages in a fashion similar to emailing back and forth, engage in real-time chat, organize and share their photos, “check in” to physical locales, and do about a dozen other things via the site, it’s not all that surprising that Facebook edged out Google as the most visited website as 2010 came to a close. Figure 9-1 shows a chart that juxtaposes Google and Facebook visiting figures just in case there’s any doubt in your mind. This is particularly exciting because where there are a lot of regular users, there’s lots of interesting data. In this chapter, we’ll take advantage of Facebook’s incredibly powerful APIs for mining this data to discover your most connected friends, cluster your friends based on common interests, and get a quick indicator of what the people in your social network are talking about.
We’ll start with a brief overview of common Facebook APIs, then quickly transition into writing some scripts that take advantage of these APIs so that we can analyze and visualize some of your social data. Virtually all of the techniques we’ve applied in previous chapters could be applied to your Facebook data, because the Facebook platform is so rich and diverse. As in most of the other chapters, we won’t be able to cover all of the ground that could possibly be covered: a small tome could be devoted just to the many interesting applications and data-mining contraptions you could build on such a rich platform that gives you access to so many details about the people closest to you. As with any other rich developer platform, make sure that any techniques you apply from this chapter in production applications take into account users’ privacy, and remember to review Facebook’s developer principles and policies regularly so as to avoid any unexpected surprises when rolling out your application to the rest of the world.
This section provides a brief overview of how to complete Facebook’s OAuth 2.0 flow for a desktop application to get you an access token, then quickly transitions into some data-gathering exercises. To keep this chapter as simple as possible, we won’t discuss building a Facebook application: there are plenty of tutorials online that can teach you how to do that, and introducing Facebook application development would require an overview of a server platform such as Google App Engine (GAE), since Facebook apps must be externally hosted in your own server environment. However, a GAE version of the scripts that are presented in this chapter is available for download at http://github.com/ptwobrussell/Mining-the-Social-Web/tree/master/web_code/facebook_gae_demo_app . It’s easy to deploy and is a bona fide Facebook application that you can use as a starting point once you decide that you’re ready to go down that path and convert your scripts into deployable apps.
As with any other OAuth-enabled app, to get started you’ll need to acquire an application ID and secret to use for authorization, opt into the developer community, and create an “application.” The following list summarizes the main steps, and some visual cues are provided in Figure 9-2:
First, if you don’t already have one, you’ll need to set up a Facebook account. Just go to http://facebook.com and sign up to join the party.
Next, you’ll need to install the Developer application by visiting http://www.facebook.com/developers and clicking through the request to install the application.
Once the Developer application is installed, you can click the “Set Up New Application” button to create your application.
Figure 9-2. From top to bottom: a) the button you’ll click from http://www.facebook.com/developers to set up a new application, b) the dialog you’ll complete to give your app a name and acknowledge the terms of service, c) your application now appears in the list of applications, and d) your app’s settings, including your OAuth 2.0 app ID and secret
Once you’ve completed the security check, your app will have an ID and secret that you can use to complete the steps involved in Facebook’s OAuth 2.0 implementation, and you’ll be presented with a form that you’ll need to fill out to specify your application’s Web Site settings. Just enter the URL that you eventually plan to use for hosting your app as the Site URL and include the same domain as the Site Domain. Facebook uses your Web Site settings as part of the OAuth flow, and you’ll receive an error message during the OAuth dance if they’re not filled out appropriately.
It may not be obvious, but perhaps the simplest way for you to get back to your development application once you’ve left it is to just return to http://facebook.com/developers (requires a login).
With the basic details of application registration out of the way, the next step is writing a script that handles authentication and gets you an access token that you can use to access APIs. The overall flow for the process is actually a little simpler than what you’ve seen in previous chapters involving Twitter and LinkedIn. Our script will pop open a web browser, you’ll sign into your Facebook account, and then it’ll present a special code (your access token) that you’ll copy/paste into a prompt so that it can be saved out to disk and used in future requests. Example 9-1 illustrates the process and is nothing more than a cursory implementation of the flow described in “Desktop Application Authentication”. A brief review of Facebook’s authentication documentation may be helpful; refer back to No, You Can’t Have My Password if you haven’t read it already. However, note that the flow implemented in Example 9-1 for a desktop application is a little simpler than the flow involved in authenticating a web app.
Example 9-1. Getting an OAuth 2.0 access token for a desktop app (facebook__login.py)
# -*- coding: utf-8 -*- import os import sys import webbrowser import urllib def login(): # Get this value from your Facebook application's settings CLIENT_ID = '' REDIRECT_URI = 'http://miningthesocialweb.appspot.com/static/facebook_oauth_helper.html' # You could customize which extended permissions are being requested on the login # page or by editing the list below. By default, all the ones that make sense for # read access as described on http://developers.facebook.com/docs/authentication/ # are included. (And yes, it would be probably be ridiculous to request this much # access if you wanted to launch a successful production application.) EXTENDED_PERMS = [ 'user_about_me', 'friends_about_me', 'user_activities', 'friends_activities', 'user_birthday', 'friends_birthday', 'user_education_history', 'friends_education_history', 'user_events', 'friends_events', 'user_groups', 'friends_groups', 'user_hometown', 'friends_hometown', 'user_interests', 'friends_interests', 'user_likes', 'friends_likes', 'user_location', 'friends_location', 'user_notes', 'friends_notes', 'user_online_presence', 'friends_online_presence', 'user_photo_video_tags', 'friends_photo_video_tags', 'user_photos', 'friends_photos', 'user_relationships', 'friends_relationships', 'user_religion_politics', 'friends_religion_politics', 'user_status', 'friends_status', 'user_videos', 'friends_videos', 'user_website', 'friends_website', 'user_work_history', 'friends_work_history', 'email', 'read_friendlists', 'read_requests', 'read_stream', 'user_checkins', 'friends_checkins', ] args = dict(client_id=CLIENT_ID, redirect_uri=REDIRECT_URI, scope=','.join(EXTENDED_PERMS), type='user_agent', display='popup' ) webbrowser.open('https://graph.facebook.com/oauth/authorize?' + urllib.urlencode(args)) # Optionally, store your access token locally for convenient use as opposed # to passing it as a command line parameter into scripts... access_token = raw_input('Enter your access_token: ') if not os.path.isdir('out'): os.mkdir('out') filename = os.path.join('out', 'facebook.access_token') f = open(filename, 'w') f.write(access_token) f.close() print >> sys.stderr, "Access token stored to local file: 'out/facebook.access_token'" return access_token if __name__ == '__main__': login()
One important detail you’re probably wondering about is the
definition of EXTENDED_PERMS
, and a brief explanation is
certainly in order. The first time you try to log in to the application,
it’ll notify you that the application is requesting lots of extended permissions so that you can have
maximum flexibility in accessing the data that’s available to you (Figure 9-3). The details of extended
permissions are described in Facebook’s authentication
documentation, but the short story is that, by default,
applications can only access some basic data from user profiles—such as
name, gender, and profile picture—and explicit permissions must be
granted to access additional data. The subtlety
to observe here is that you might be able to see certain details about
your friends, such as things that they “like” or activity on their walls
through your normal Facebook account, but your app cannot access these
same details unless you have granted it explicit permission to do
so. In other words, there’s a difference between what you’ll
see in your friends’ profiles when signed in to facebook.com and the
data you’ll get back when requesting information through the API. This
is because it’s Facebook (not you) who is exercising the platform to get
data when you’re logged in to facebook.com, but it’s you (as a
developer) who is requesting data when you build your app.
If your app does not request extended permissions to access data but tries to access it anyway, you may get back empty data objects as opposed to an explicit error message.
Figure 9-3. The Kitchen Sink: the sample dialog you would see if an application requested permission to access everything available to it
The Facebook platform’s documentation is continually evolving, and they may not tell you everything you need to know about extended permissions. For example, it appears that in order to access religious or political information for friends (your friends_religion_politics extended permission), those friends must have your app installed and have explicitly authorized access to this data as well via their user_religion_politics extended permission.
Now, outfitted with a shiny new access token that has permissions to access all of your data, it’s time to move on to more interesting things.
As you’ve probably heard, the Facebook developer ecosystem is complex, continually evolving,[57] and filled with many twists and turns involving the most sophisticated privacy controls the Web has ever seen. In addition to sporting a standard battery of REST APIs and a more advanced SQL-like language for querying data in a manner similar to SQL called Facebook Query Language (FQL), Facebook unveiled the Graph API and the Open Graph protocol (OGP) in April 2010 at the F8 conference. In short, OGP is a mechanism that enables you to make any web page an object in a rich social graph by injecting some RDFa (more on this in a moment) metadata into the page, and the Graph API is a simple and intuitive mechanism for querying the graph. Each object has a particular type. At the time of this writing, the Graph API supports the following types of objects, as described in the Graph API Reference:
An individual application registered on the Facebook Platform
A check-in made through Facebook Places
A Facebook event
A Facebook group
A shared link
A Facebook note
A Facebook page
An individual photo
An individual entry in a profile’s feed
A status message on a user’s wall
An individual subscription from an application to get real-time updates for an object type
A user profile
An individual video
The Graph API Reference contains detailed documentation for each object type, describing the types of properties and connections you can expect to exist for each object.
Example 9-2 is the canonical example from the documentation that demonstrates how to turn the IMDB’s page on The Rock into an object in the Open Graph protocol as part of an XHTML document that uses namespaces. These bits of metadata have great potential once realized at a massive scale, because they enable a URI like http://www.imdb.com/title/tt0117500 to unambiguously represent any web page—whether it’s for a person, company, product, etc.—in a machine-readable way and furthers the vision for a semantic web.
Example 9-2. Sample RDFa for the Open Graph protocol
<html xmlns:og="http://ogp.me/ns#"> <head> <title>The Rock (1996)</title> <meta property="og:title" content="The Rock" /> <meta property="og:type" content="movie" /> <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" /> <meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" /> ... </head> ... </html>
When considering the possibilities with OGP, be forward-thinking and creative, but bear in mind that it’s brand new and still evolving. As it relates to the semantic web and web standards in general, consternation about the use of “open” has surfaced and various kinks in the spec are still being worked out It is essentially a single-vendor effort, and it’s little more than on par with the capabilities of meta elements from the much earlier days of the Web. In effect, OGP is really more of a snowflake than a standard at this moment, but the potential is in place for that to change, and many exciting things may happen as the future unfolds and innovation takes place. We’ll return to the topic of the semantic web in Chapter 10, where we briefly discuss its vast potential. Let’s now turn back and hone in on how to put the Graph API to work by building a simple Facebook app to mine social data.
Because of the titular similarity, it’s easy to confuse Google’s Social Graph API with Facebook’s Graph API, even though they are quite different.
At its core, the Graph API is incredibly simple: substitute an object’s ID in the URI http(s)://graph.facebook.com/ID to fetch details about the object. For example, fetching the URL http://graph.facebook.com/http://www.imdb.com/title/tt0117500 in your web browser would return the response in Example 9-3.
Example 9-3. A sample response for an Open Graph query to http://graph.facebook.com/http://www.imdb.com/title/tt0117500
{ "id": "114324145263104", "name": "The Rock (1996)", "picture": "http://profile.ak.fbcdn.net/hprofile-ak-snc4/hs344.snc4/41581...jpg", "link": "http://www.imdb.com/title/tt0117500/", "category": "Movie", "description": "Directed by Michael Bay. With Sean Connery, Nicolas Cage, ...", "likes" : 3 }
If you inspect the source for the URL http://www.imdb.com/title/tt0117500, you’ll find that
fields in the response correspond to the data in the meta
tags of the page, and this is no coincidence. The delivery of rich
metadata in response to a simple query is the whole idea behind the way
OGP is designed to work. Where it gets more interesting is when you
explicitly request additional metadata for an object in the page by
appending the query string parameter metadata=1
to the
request. A sample response for the query https://graph.facebook.com/114324145263104?metadata=1 is
shown in Example 9-4.
Example 9-4. A sample response for an Open Graph query to http://graph.facebook.com/http://www.imdb.com/title/tt0117500?metadata=1 with the optional metadata included
{ "id": "118133258218514", "name": "The Rock (1996)", "picture": "http://profile.ak.fbcdn.net/hprofile-ak-snc4/..._s.jpg", "link": "http://www.imdb.com/title/tt0117500", "category": "Movie", "website": "http://www.imdb.com/title/tt0117500", "description": "Directed by Michael Bay. With Sean Connery, Nicolas Cage, ...", "likes": 3, "metadata": { "connections": { "feed": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/feed", "posts": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/posts", "tagged": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/tagged", "statuses": "http://graph.facebook.com/http://www.imdb.com/title/...", "links": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/links", "notes": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/notes", "photos": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/photos", "albums": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/albums", "events": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/events", "videos": "http://graph.facebook.com/http://www.imdb.com/title/tt0117500/videos" }, "fields": [ { "name": "id", "description": "The Page's ID. Publicly available. A JSON string." }, { "name": "name", "description": "The Page's name. Publicly available. A JSON string." }, { "name": "category", "description": "The Page's category. Publicly available. A JSON string." }, { "name": "likes", "description": "\* The number of users who like the Page..." } ] }, "type": "page" }
The items in metadata.connections
are pointers
to other nodes in the graph that you can crawl to get to other
interesting bits of data. For example, you could follow the “photos”
link to pull down photos associated with the movie, and potentially walk
links associated with the photos to discover who posted them or see
comments that might have been made about them. In case it hasn’t already
occurred to you, you are also an object in the graph. Try visiting the
same URL prefix, but substitute in your own Facebook ID or username as
the URL context and see for yourself. Given that you are the logical
center of your own social network, we’ll be revisiting this possibility
at length throughout the rest of this chapter. The next section digs
deeper into Graph API queries, and the section after it takes a closer
look at FQL queries.
Facebook no
longer maintains an official Python SDK for the Graph API;
however, a community-fork of that same repository that appears to be
actively maintained is available and can be easily installed by
downloading and executing the standard python setup.py
install
command or, directly from GitHub with pip
as follows: pip install -e
git+git://github.com/pythonforfacebook/facebook-sdk.git#egg=git-latest
(keeping in mind that you may have to first easy_install
pip
if you don't already have it.) There are several nice
examples
of how you could use this module to quickly get through the OAuth
dance and build a full-blown Facebook app that’s hosted on a platform
like Google App Engine. We’ll just be narrowing in on some particular
portions of the GraphAPI
class (defined in
facebook.py), which we’ll use in standalone
scripts. A few of these methods follow:
get_object(self, id, **args)
Example: get_object("me", metadata=1)
get_objects(self, id, **args)
Example: get_objects(["me", "some_other_id"],
metadata=1)
get_connections(self, id, connection_name,
**args)
Example: get_connections("me", "friends",
metadata=1)
request(self, path, args=None,
post_args=None)
Example: request("search", {"q" : "programming",
"type" : "group"})
Unlike with other social networks, there don’t appear to be clearly published guidelines about Facebook API rate limits. Although the availability of the APIs seems to be quite generous, you should still carefully design your application to use the APIs as little as possible and handle any/all error conditions, just to be on the safe side. The closest thing to guidelines you’re likely to find as of late 2010 are developer discussions in forums.
The most common (and often, the only) keyword argument you’ll
probably use is metadata=1
, in order to get back the
connections associated with an object in addition to just the object
details themselves. Take a look at Example 9-5, which introduces the
GraphAPI
class and uses its get_objects
method to query for “programming groups”. It relays an important
characteristic about the sizes of the result sets you may get for many
types of requests.
Example 9-5. Querying the Open Graph for “programming” groups (facebook__graph_query.py)
# -*- coding: utf-8 -*- import sys import json import facebook import urllib2 from facebook__login import login try: ACCESS_TOKEN = open('out/facebook.access_token').read() Q = sys.argv[1] except IOError, e: try: # If you pass in the access token from the Facebook app as a command line # parameter, be sure to wrap it in single quotes so that the shell # doesn't interpret any characters in it ACCESS_TOKEN = sys.argv[1] Q = sys.argv[2] except: print >> sys.stderr, "Could not either find access token in 'facebook.access_token' or parse args." ACCESS_TOKEN = login() Q = sys.argv[1] LIMIT = 100 gapi = facebook.GraphAPI(ACCESS_TOKEN) # Find groups with the query term in their name group_ids = [] i = 0 while True: results = gapi.request('search', { 'q': Q, 'type': 'group', 'limit': LIMIT, 'offset': LIMIT * i, }) if not results['data']: break ids = [group['id'] for group in results['data'] if group['name' ].lower().find('programming') > -1] # once groups stop containing the term we are looking for in their name, bail out if len(ids) == 0: break group_ids += ids i += 1 if not group_ids: print 'No results' sys.exit() # Get details for the groups groups = gapi.get_objects(group_ids, metadata=1) # Count the number of members in each group. The FQL API documentation at # http://developers.facebook.com/docs/reference/fql/group_member hints that for # groups with more than 500 members, we'll only get back a random subset of up # to 500 members. for g in groups: group = groups[g] conn = urllib2.urlopen(group['metadata']['connections']['members']) try: members = json.loads(conn.read())['data'] finally: conn.close() print group['name'], len(members)
Sample results for the query for “programming” are presented in
Example 9-6, and it’s no coincidence that
the upper bound of the result sets approaches 500. As the comment in
the code notes, the FQL documentation states that when you query the
group_member
table, your results will be limited to 500
total items. Unfortunately, the Graph API documentation is still
evolving and, at the time of this writing, similar warnings are not
documented (although they hopefully will be soon). In counting the
members of groups, the takeaway is that you’ll often be working with a
reasonably sized random sample. Visualizing Your Entire Social Network describes a
different scenario in which a somewhat unexpected truncation of
results occurs, and how to work around this by dispatching multiple
queries.
Example 9-6. Sample results from Example 9-5
Graffiti Art Programming 492 C++ Programming 495 Basic Programming 495 Programming 215 C Programming 493 C programming language 492 Programming 490 ACM Programming Competitors 496 programming 494 COMPUTER PROGRAMMING 494 Programming with Python 494 Game Programming 494 ASLMU Programming 494 Programming 352 Programming 450 Programmation - Programming 480
A sample web application that encapsulates most of the example code from this chapter and uses the same basic pattern is hosted on GAE, if you’d like to take it for a spin before laying down some code of your own. Figure 9-4 illustrates the results of our sample query for “programming” groups. Recall that you can install and fully customize the GAE-powered Facebook app yourself if that’s a better option for you than running scripts from a local console. In terms of productivity, it’s probably best to develop with local scripts and then roll functionality into the GAE codebase once it’s ready so as to maintain a speedy development cycle.
As you learned in the previous section,
there’s not much overhead involved in writing simple routines to
interact with the Graph API, because objects in the graph are simple
and you’re passed URLs that you can use as-is to walk the object’s
connections. For more advanced types of queries or certain workflows,
however, you may find FQL to be a better fit for the problem.
Extensive FQL
documentation exists online. I won’t go into too much depth
here since the online documentation is authoritative and constantly evolving with the platform,
but the gist is predictable if you’re familiar with basic SQL syntax.
FQL queries have the form select [fields] from [table] where
[conditions]
, but various restrictions apply that prevent FQL
from being anything more than a carefully selected small subset of
SQL. For example, only one table name can appear in the
from
clause, and the conditions that can appear in the
where
clause are limited (but usually adequate) and must
be marked as indexed fields in the FQL documentation. In terms of
executing FQL queries on the Facebook platform at a 50,000-foot level,
all that’s necessary is to send your FQL queries to one of two API
endpoints: https://api.facebook.com/method/fql.query or https://api.facebook.com/method/fql.multiquery. The
difference between them is discussed next.
Example 9-7 illustrates a simple FQL query that fetches the names, genders, and relationship statuses of the currently logged-in user’s friends.
Example 9-7. A nested FQL query that ties together user and connection data
select name, sex, relationship_status from user where uid in (select target_id from connection where source_id = me() and target_type = 'user')
This nested query works by first executing the subquery:
select target_id from connection where source_id = me() and target_type = 'user'
which produces a list of user ID values. The special
me()
directive is a convenient shortcut that corresponds
to the currently logged-in user’s ID, and the connection
table is designed to enable queries where you are looking up the
currently logged-in user’s friends. Note that while most connections
stored in the connection
table are among users, other types of connections may exist
among users and other object types, such as 'page'
, so
the presence of a target_type
filter is important. The
outer query is then evaluated, which resolves to:
select name, sex, relationship_status from user where uid in ( ... )
from the user table. The FQL user table has a wealth of information that can enable many interesting types of analysis on your friends. Check it out.
The general form of the results set from this FQL query is shown in Example 9-8.
Example 9-8. Sample FQL results query
[ { "name": "Matthew Russell", "relationship_status": "Married", "sex": "male" }, ... ]
An FQL multiquery works in essentially the same way, except that you can run multiple queries and reference query results as table names using the hash symbol. Example 9-9 is an equivalent FQL multiquery to the nested query shown previously.
Example 9-9. An FQL multiquery that ties together user and connections data
{ "name_sex_relationships" : "select name, sex, relationship_status from user where uid in (select target_id from #ids)", "ids" : "select target_id from connection where source_id = me() and target_type = 'user'" }
Note that whereas the single list of objects is returned from the nested query, the results from both components of the FQL multiquery are returned, as Example 9-10 demonstrates.
Example 9-10. Sample FQL multiquery results
[ { "fql_result_set": [ { "target_id": -1 }, ... ], "name": "ids" }, { "fql_result_set": [ { "name": "Matthew Russell", "relationship_status": "Married", "sex": "male" }, ... ], "name" : "name_sex_relationships" } ]
Programmatically, the query logic is pretty simple and can be wrapped up into a small class. Example 9-11 demonstrates an FQL class that can take a query from the command line and run it. Here are a couple of sample queries that you could try running:
$ python facebook__fql_query.py 'select name, sex, relationship_status from user where uid in (select target_id from connection where source_id = me())' $ python facebook__fql_query.py '{"name_sex_relationships" : "select name, sex, relationship_status from user where uid in (select target_id from #ids)", "ids" : "select target_id from connection where source_id = me()"}'
Example 9-11. Encapsulating FQL queries with a small Python class abstraction (facebook__fql_query.py)
# -*- coding: utf-8 -*- import sys from urllib import urlencode import json import urllib2 from facebook__login import login class FQL(object): ENDPOINT = 'https://api.facebook.com/method/' def __init__(self, access_token=None): self.access_token = access_token def _fetch(cls, url, params=None): conn = urllib2.urlopen(url, data=urlencode(params)) try: return json.loads(conn.read()) finally: conn.close() def query(self, q): if q.strip().startswith('{'): return self.multiquery(q) else: params = dict(query=q, access_token=self.access_token, format='json') url = self.ENDPOINT + 'fql.query' return self._fetch(url, params=params) def multiquery(self, q): params = dict(queries=q, access_token=self.access_token, format='json') url = self.ENDPOINT + 'fql.multiquery' return self._fetch(url, params=params) # Sample usage... if __name__ == '__main__': try: ACCESS_TOKEN = open('out/facebook.access_token').read() Q = sys.argv[1] except IOError, e: try: # If you pass in the access token from the Facebook app as a command line # parameter, be sure to wrap it in single quotes so that the shell # doesn't interpret any characters in it. You may also need to escape # the # character ACCESS_TOKEN = sys.argv[1] Q = sys.argv[2] except IndexError, e: print >> sys.stderr, "Could not either find access token in 'facebook.access_token' or parse args." ACCESS_TOKEN = login() Q = sys.argv[1] fql = FQL(access_token=ACCESS_TOKEN) result = fql.query(Q) print json.dumps(result, indent=4)
The sample GAE app provided as part of this chapter’s source code detects and runs FQL queries as well as Graph API queries, so you can use it as a sort of playground to experiment with FQL. With some basic infrastructure for executing queries now in place, the next section walks through some use cases for building data-powered visualizations and UI widgets.
3.137.178.9