Chapter 2. Microformats: Semantic Markup and Common Sense Collide

In terms of the Web’s ongoing evolution, microformats are an important step forward because they provide an effective mechanism for embedding “smarter data” into web pages and are easy for content authors to implement. Put succinctly, microformats are simply conventions for unambiguously including structured data into web pages in an entirely value-added way. This chapter begins by briefly introducing the microformats landscape and then digs right into some examples involving specific uses of the XFN (XHTML Friends Network), geo, hRecipe, and hReview microformats. In particular, we’ll mine human relationships out of blogrolls, extract coordinates from web pages, parse out recipes from foodnetwork.com, and analyze reviews on some of those recipes. The example code listings in this chapter aren’t implemented with the intention of being “full spec parsers,” but should be more than enough to get you on your way.

Although it might be somewhat of a stretch to call data decorated with microformats like geo or hRecipe “social data,” it’s still interesting and will inevitably play an increased role in social data mashups. At the time this book was written, nearly half of all web developers reported some use of microformats, the microformats.org community had just celebrated its fifth birthday, and Google reported that 94% of the time, microformats are involved in Rich Snippets. If Google has anything to say about it, we’ll see significant growth in microformats; in fact, according to ReadWriteWeb, Google wants to see at least 50% of web pages contain some form of semantic markup and is encouraging “beneficial peer pressure” for companies to support such initiatives. Any way you slice it, you’ll be seeing more of microformats in the future if you’re paying attention to the web space, so let’s get to work.

XFN and Friends

Semantic web enthusiasts herald that technologies such as FOAF (Friend of a Friend—an ontology describing relations between people, their activities, etc.) may one day be the catalyst that drives robust decentralized social networks that could be construed as the antithesis of tightly controlled platforms like Facebook. And although so-called semantic web technologies such as FOAF don’t seem to have quite yet reached the tipping point that would lead them into ubiquity, this isn’t too surprising. If you know much about the short history of the Web, you’ll recognize that innovation is rampant and that the highly decentralized nature in which the Web operates is not very conducive to overnight revolutions (see Chapter 10). Rather, change seems to happen continually, fluidly, and in a very evolutionary way. The way that microformats have evolved to fill the void of “intelligent data” on the Web is a particularly good example of bridging existing technology with up-and-coming standards that aren’t quite there yet. In this particular case, it’s a story of narrowing the gap between a fairly ambiguous web, primarily based on the human-readable HTML 4.01 standard, with a more semantic web in which information is much less ambiguous and friendlier to machine interpretation.

The beauty of microformats is that they provide a way to embed data that’s related to social networking, calendaring, resumes, and shared bookmarks, and they are much more into existing HTML markup right now, in an entirely backward-compatible way. The overall ecosystem is quite diverse with some microformats, such as geo, being quite established while others are slowly gaining ground and achieving newfound popularity with search engines, social media sites, and blogging platforms. As this book was written, notable developments in the microformats community were underway, including an announcement from Google that they had begun supporting hRecipe as part of their Rich Snippets initiative. Table 2-1 provides a synopsis of a few popular microformats and related initiatives you’re likely to encounter if you look around on the Web. For more examples, see http://microformats.org/wiki/examples-in-the-wild.

Table 2-1. Some popular technologies for embedding structured data into web pages

TechnologyPurposePopularityMarkup specificationType
XFNRepresenting human-readable relationships in hyperlinksWidely used, especially by blogging platformsSemantic HTML, XHTMLMicroformat
geoEmbedding geocoordinates for people and objectsWidely used, especially by sites such as MapQuest and WikipediaSemantic HTML, XHTMLMicroformat
hCardIdentifying people, companies, and other contact infoWidely usedSemantic HTML, XHTMLMicroformat
hCalendarEmbedding iCalendar dataSteadily gaining tractionSemantic HTML, XHTMLMicroformat
hResumeEmbedding resume and CV informationWidely used by sites such as LinkedIn[a]Semantic HTML, XHTMLMicroformat
hRecipeIdentifying recipesWidely used by niche sites such as foodnetwork.comSemantic HTML, XHTMLMicroformat
MicrodataEmbedding name/value pairs into web pages authored in HTML5An emerging technology, but gaining tractionHTML5W3C initiative
RDFaEmbedding unambiguous facts into XHTML pages according to specialized vocabularies created by subject-matter expertsHit-or-miss depending on the particular vocabulary; vocabularies such as FOAF are steadily gaining ground while others are remaining obscureXHTML[b]W3C initiative
Open Graph protocolEmbedding profiles of real-world things into XHTML pagesSteadily gaining traction and has tremendous potential given the reach of the Facebook platformXHTML (RDFa-based)Facebook platform initiative

[b] Embedding RDFa into semantic markup and HTML5 is an active effort at the time of this writing. See the W3C HTML+RDFa 1.1 Working Draft.

There are many other microformats that you’re likely to encounter, but a good rule of thumb is to watch what the bigger fish in the pond—such as Google, Yahoo!, and Facebook—are doing. The more support a microformat gets from a player with significant leverage, the more likely it will be to succeed and become useful for data mining.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.20.231