CHAPTER 6 Understanding Markup

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6

Understanding Markup

Markup Matters	96
Many Meanings, Many Markups	96
What’s Your CMS Got to Do with It?	98
The Semantics of...“Semantic”	99
The Lowdown on Markdown	104
Many Ways to Get to Markup	106
The Secret to Markup	107

You have an understanding of your content’s inherent structure. You’ve thought about its descriptive metadata, and about the rules and conditions that will help give it life. But how will it keep that shape and form as it travels beyond your CMS and outside the bounds of a single site?

More often than not, the answer is markup.

Just like an editor marks up a book before it goes to print—adding in-line notes that dictate where a block quote should start and stop, or when to use bullets—markup is a way to add directions to your content about what different pieces of text are, allowing you to make automated decisions about what those pieces of text should do when they’re displayed. In other words, it’s the code that wraps around your content chunks and lends them machine-readable meaning.

In this chapter, we’ll take a look at why markup matters for content, explore the types of markup you may hear mentioned in conversations with technical teams (as well as which ones are likely to be used by whom), and see why knowing just a bit about this oft-mystified m-word can help you make better decisions about how you plan, structure, write, and share content.

Markup Matters

Let’s say you’ve got some bright, shiny new content. It’s brilliant! It’s perfect! It’s ready! Yet all that care you’ve given it won’t matter one bit if the systems it encounters on its way to publication don’t understand what it is and what to do with it. This content, smart and stylish as it might seem at first, is silent—incapable of describing itself to anyone but a human who’s actually reading it.

That’s how markup can help: It gives your content a voice that other machines can understand, making it capable of describing itself and allowing it to keep its style and soul intact as it flexes to meet the demands of different devices and experiences.

Many Meanings, Many Markups

But what is markup, really? It can seem a little opaque to the uninitiated, but don’t let that slow you down. It’s simply code that carries the content chunks and metadata you’ve already outlined wherever they need to go. The code itself might vary depending on the type of markup you’re using (more on that later, don’t worry). But the idea is about the same: wrapped around each meaningful bit is a snippet of code that tells machines what that chunk is.

In addition to marking up the chunks of content your users actually consume, markup also lends any descriptive metadata you’ve collected about a piece of content a life of its own—so if that copy deck is from a story about, say, the city of San Francisco, you’d want markup that keeps that topical tag with the content as well.

At a very basic level, you can divide markup approaches into two categories: the stuff designed to make it look pretty, and the stuff that actually makes it smarter.

Presentational

If you’re used to a word processor like Microsoft Word, presentational markup is the equivalent of changing your font to bold and 18 point every time you want to make a headline, as opposed to using Word’s “styles” section, where you can label those headlines as headings instead of simply making them bigger. When you mark up content with presentation tags, you’ll only be able to describe how the content should look—like adding an HTML <font color=”purple”> tag to turn a specific line of text a lovely shade of lilac, or increasing font size for emphasis. You usually add it directly into the content’s HTML or using the WYSIWYG editor in a CMS, hit preview, and ta-da: everything looks perfect (well, if you like purple, I guess).

Or does it? Let’s say you want to make a headline stand out, so you beef the font size up by 20 points and take a peek at the page on your desktop. Perfect. But then let’s say that same content will eventually be seen on a mobile-optimized version of the site, on a partner’s site, and on a Kindle. Does that mega-size headline fit all those different formats? Does that formatting even render on all those screens? And what happens next year, when you update your branding and have to manually go through and replace all that presentational markup with whatever the new style guide dictates?

In a sense, this kind of markup is a lot like makeup. While my olive-skinned friend looks fantastic in scarlet lipstick, that same shade gives my face a particularly ghastly pallor. We may both want to emphasize our lips, but that doesn’t mean we should use the same color to do so.

When you use presentational markup, that’s precisely what you’re doing: locking content into just one shade, one way to look...even when it’s being displayed in places where it looks silly.

If you can’t be sure exactly where and how your content is going to be displayed, both now and in the future, then purely presentational markup simply doesn’t have enough power to pass muster.

Semantic

Semantic markup, on the other hand, is designed to reveal, in a machine-readable way, the intrinsic meaning in your content, and to provide the machines that read it information they can use to apply a style sheet that determines how it should be displayed. It gives your content information about itself—telling it things like “this is a headline,” rather than “this should be in large type.”

The further away from a traditional “desktop website” your content travels, the more this distinction matters—because those entering content will never be able to anticipate, much less design their content around, all the different (and unknown) places and devices where it might be viewed.

Semantic markup is useful because it allows those putting content into the system to control their work’s meaning, while leaving control for how it looks to those responsible for the platform where that content will be seen. That is, instead of dictating design from inside the database, aesthetics are controlled at the output level.

Remember the concept of “metadata is the new art direction” from Chapter 5, “Designing Content Systems”? Well, the more self-aware your content is, the easier it is to implement those rule-based layouts—and, you know, make them not suck. And the more you, the one who’s taken time to know your content well, are involved in this work, the better that markup will be.

In short, semantic markup is the code that keeps all that thinking you did in Chapters 3, 4, and 5 intact—the code that ensures your carefully modeled content, and the meaning that structure gives it, stay strong, allowing it to withstand the stresses of shifting across time and space. And it’s precisely this sort of markup—the markup that says to “emphasize the lips,” rather than “use red lipstick”—that we want to focus our discussion on today.

What’s Your CMS Got to Do with It?

If you’re used to authoring content in a CMS, all this talk of code wrapping around your content might seem out of place. Isn’t that what the CMS is for? Well, yes...sort of. But here’s the deal: When you enter content in a CMS, you’re dealing with its interfaces—the external-facing screens that allow you to manipulate the content that’s stored in an underlying database.

When your CMS interfaces match up with your content model, showing fields for each of the critical components rather than a wide-open text box with a WYSIWYG editor, then the content you’re sending into that underlying database will be structured. From there, markup that matches your content model can be applied automatically.

Many CMSs aren’t quite there yet, and so authors and editors add presentational markup to their text for good reasons: because it seems to give them some control over how it looks. The problem is, when this happens, that presentational information—like our lilac example—gets stored right alongside the content itself, forever. And the result makes for content that’s messy and less manageable over time.

Instead, the future demands that content be stored in a way that’s independent of the code that marks it up, but structured so it can be translated to use whatever markup you need—and, as you’ll see in a moment, there are many types of markup out there.

While not all CMSs are prepared for this sort of content management today, the only way they’re going to get better is if folks like you and I start asking them to.

The Semantics of...“Semantic”

All right, you understand that markup exists, and that it helps your content retain the meaning and relationships you’ve already defined. Now let’s get into the confusing bit. Because unfortunately, the term “semantic markup” is, well, semantically a little unclear. As you start working with structured content, you’re likely to hear many people talk about semantic markup, yet meaning substantially different things.

Broadly speaking, these folks tend to be talking about either HTML (or HTML-like) markup formats, or much more complex, enterprise-level structural markup. Let’s take a moment to learn about both, so you can get more comfortable when they come up in conversation.

HTML Markup

Until recently, HTML included fairly limited semantic markup—that is, you could label a main headline with an <H1> or a chunk of text to be emphasized with <em>, rather than simply changing the font size or using the <i> tag for italics. While these tags are semantic—as in, they refer to the content’s substance rather than its presentation—they’re fairly weak, giving us little to go by when it comes to what the content actually means.

Today, however, the semantic possibilities of HTML have increased substantially, allowing people to use several approaches to HTML-based markup that give content much more semantic richness.

Microformats

Microformats are an open data standard that builds on HTML to add metadata to pieces of content, identifying information as something specific, like a “person” or a “location.” Because microformats are an open standard, many industries and organizations have added new microformats for specialty data. They work by adding specific classes to snippets of HTML, with those classes defining what the content within the snippet is. Microformats can be used to lend machine-readable meaning to chunks of incredibly small pieces of content, like a date or time, even if it’s in the middle of a paragraph of other text.

HTML5 Microdata

New in the HTML5 spec is the microdata extension, which is built off earlier microformats work. Microdata goes beyond traditional presentational HTML tags and allows you to mark up content with standards-compliant, semantically rich HTML—for example, marking up content as an “event” or “organization.” However, HTML5 is, as of this writing, still in working draft status. That means not all of these new extensions are universally supported, and some may not reach mass adoption.

Schema.org

Schema.org is an HTML5-based approach launched in 2010 by Bing, Google, and Yahoo! Designed to create a common language across search engines, Schema.org arranges HTML5 microdata into taxonomies of content types that start broadly and branch into ever-more-specific elements. Its provenance in the big search engines may give it some weight and staying power, but it’s also contested by those who see it as two big players attempting to force their ontology onto everyone else.

Taken together, all these HTML-based markup approaches represent a way to make chunks of content much easier for machines to read and parse, and for information like dates, addresses, people, and other common entities to be universally understood. However, the meaning you can glean from any of these forms of markup is still somewhat limited. For example, while HTML5 now offers an “aside” tag to use for information that’s secondary to the main content, there’s nothing about that tag that would give a system receiving that content information about what that aside actually is, or how it fits into the rest of the content with which it’s associated.

Structural Markup

That’s where the other kind of semantic markup comes in. Those in fields like technical communications tend to take a more enterprise-level approach to markup, creating massive systems of content structure that are comparable to databases. Unlike HTML markup and its limited semantic elements like “aside,” this sort of markup is generally capable of as many specifics as you need.

For example, let’s say you have a chunk of content you’ve defined in your model as a copy deck—a short teaser that leads into a story. With these types of markup, that content would be stored as a copy deck in your database, and a label would then be present in the markup for that content when it’s displayed beyond the database. If that content travels to a system using a common language, the receiving system will immediately know it’s a copy deck.

There are several common approaches to this kind of markup that you might hear about.

XML

The mother of many markup approaches, XML (Extensible Markup Language) is designed to structure, store, and transport information using a set of rules to mark up text with metadata. As opposed to HTML, XML allows you to define your own tags, so you’re not limited to a preset list of entities (that’s why it’s called “extensible”). Because of this, it forms the basis for a number of the other markup approaches listed next, but this also gives XML plenty of critics who say it’s too clunky and difficult to write and use, as well as those who say it’s too generic to be a standard.

RDF

RDF, which stands for Resource Description Framework, is a generic method used to describe concepts—specifically, to describe things and their relationships with other things. It can be written using a variety of other languages, including both XML and JSON. It’s the glue that holds linked data together, providing a language for describing data by using three elements to form a machine-readable statement: a subject, a predicate, and an object—such as stating that the Declaration of Independence has an author of Thomas Jefferson. However, in practice, RDF is currently used relatively little.

OWL

Web ontology language, somewhat confusingly abbreviated as OWL, is built on RDF and expressed using XML. It has more vocabulary than RDF, and can therefore express more complex relationships and richer properties. For example, you can create a single statement to express that two concepts have a symmetrical relationship—e.g., not just that my husband is my spouse, but also that his spouse is me. With RDF alone, this would take two statements to communicate.

DITA

If you’ve worked with—or as—a technical communicator, you’ve likely heard of the XML-based data model called DITA, or Darwin Information Type Architecture. Designed by IBM to handle its own technical content, DITA works with modularized content to organize it into categories based on topic. Because of this, it tends to be really good at structuring things like help content, but may not be useful for all kinds of content.

JSON

If you start paying attention to markup conversations, you’ll probably also hear a lot about JSON—and particularly how it compares to XML, with proponents and detractors on both sides. JSON, or JavaScript Object Notation, is a lightweight data interchange format designed to be easier to read and write, and also easier for computers to parse, than XML. Many organizations now offer both JSON and XML versions of their APIs, which we’ll talk about at length in Chapter 7, “Making Sense of Content APIs,” making the same structured content available in either language.

OWL, RDF, DITA, blah blah blah. All these markup languages can be hard to keep track of, especially if your job is more about the how and why of content than determining the best XML-based languages for API-driven mash-ups or whatever. Rather than getting lost in all the acronyms, it’s probably best to just understand that these different approaches exist, have a basic understanding of what they mean, and be ready to delve into specifics about their implication for your content’s structure later, when or if the need arises.

Ultimately, just keep in mind that your content will eventually become code. The more you know about how these systems work and what’s being used for what, the better you can evaluate your content’s needs against them and the more you can participate in conversations with those on the database end of the spectrum.

What About the Semantic Web?

Once you understand a bit about markup, and about making content machine-readable and interoperable, then it’s time to consider some of the exciting stuff that markup makes possible. One of those things is the Semantic Web: a Web where all content shares a common framework and can be shared, reused, and understood across systems—to the point where, say, machines know whether the term “blackberry” is referring to the fruit or the phone.

A completely semantic Web is a lofty goal—one not without its detractors, I might note—and our path toward it is still meandering at best. But a more semantic Web seems closer than ever with the recent advent of linked data, which is made possible through structured content and markup.

Coined by Tim Berners-Lee—yes, the guy who invented the World Wide Web—in 2006, linked data means exactly what it sounds like: bits of information that are linked to other, equivalent sets of data elsewhere on the Internet (often referred to as “in the cloud”), as illustrated in Figure 6.1. The idea is that, as opposed to HTML links, which link one document (e.g., a page) to another, linked data connects the things those pages are about by connecting the actual data behind those two pages instead. This gives both databases access to the information in the other, and that information then becomes more useful to both people and machines.

FIGURE 6.1
Linked data connects content from different places, like between your website and Wikipedia, based on shared content attributes—and it’s getting more and more useful for connecting content across sources.

For example, consider The New York Times. Since the 19th century, it’s been maintaining a tremendous index of people, organizations, places, and descriptors in the news. Starting in 1913, it began publishing that data first in a quarterly index, and later an annual one.¹ Now that its collection has been digitized, the Times has opened it up as linked data at http://data.nytimes.com, making this extensive list of topics—well over 10,000 as of this writing, with plans to continually add more—accessible to anyone who wants it.

What can you do with information like this? All kinds of things. At a basic level, you could extract links to all the stories about Syria. Getting more complex, you could automatically pull in detailed definitions, bios, and other supplementary content that your organization could never produce and maintain itself, richening your users’ experience without increasing your content production needs. And you can do all this without relying on time-consuming manual linking, or less-than-relevant automated content based on simple tags.

If you’re preparing content for the future, then all this stuff is important for a big reason. The more semantic you make your content now, the closer its markup will get us to this future—and the more cool stuff like this your content will be capable of.

Now who doesn’t want that?

The Lowdown on Markdown

Just because your content needs markup, that doesn’t mean you or those working in your content management system necessarily need to be able to write it, or even that you must use separate fields to distinguish every little bit of content from the others.

Instead, some organizations are experimenting with crafting their content in markdown—a lightweight alternative to using HTML to give content shape that was created by John Gruber of Daring Fireball fame. Unlike full HTML, markdown allows authors to write in a standardized but natural language that can be easily read and understood not just by computers, but by humans as well.

For example, instead of the <h1> tag used in HTML, you can simply use a single hash mark to denote a heading:

# This text is an H1

Meanwhile, subheadings are made just by adding additional hash marks, like so:

## This text is an H2
##### This text is an H5

No need to close brackets or fuss with backslashes. Writing markdown is quick and simple, and is often used by people authoring common content types like blog posts.

How can something as simple as markdown help get you to markup, though? Take Portland, Oregon–based mobile startup Cloud Four, where co-founder and developer Lyza Gardner started by playing with markdown for the firm’s own website, and then began exploring how it could be used for clients’ content as well.

As a developer, Gardner likes markdown because it takes away all her crutches and forces her to write semantically. If she were working directly in the code, she could add non-semantic cheats to get content to do what she wants: a <div> tag here, a float there. But then those elements, which give her what she wants right now, would end up muddying her database forever. Instead, when she writes in markdown, everything she does ends up completely semantic, without it being a burden to write.

Designed specifically for writing, not publishing, markdown doesn’t take the place of HTML. Instead, it has a smaller, simpler syntax associated with it—a syntax that’s all about writing text, rather than doing everything that HTML can do. But if you want to include an element that doesn’t exist in markdown’s syntax, you can simply start writing in HTML and markdown will understand.

In addition to keeping devs like Gardner from filling their content with code cheats rather than semantic solutions, markdown is also appealing to those who want to avoid checking long lists of CMS boxes and filling out endless fields. After all, it can be easier to add some markdown to a document and push it through the system than to go through five different editing screens for a piece of content.

Even better, markdown doesn’t have to stay markdown. Designed for conversion to other formats, it can easily be turned into any kind of markup language you need with one of a handful of automatic programs. In fact, Gardner has built markdown-to-CMS plug-ins for some of her clients, making it fast and efficient for users to author in markdown and have their content automatically make it into a CMS like Drupal.

Are all content authors ready to start writing in markdown? My gut says no. While human-readable, it’s a big step for folks still tied tightly to working in Microsoft Word. But many—especially those with writing and editing backgrounds, like journalists or editorial staff—could be, with a little training.

That training just might start with you, dear reader—the person who’s been thinking about structure and who can bridge the gap between the code and the content person who’s learning to publish more effectively online.

Markdown makes a lot of sense, but it’s no panacea to our content problems. Because it’s so stripped down, it doesn’t allow you to do things like select from a closed taxonomy or include descriptive metadata. It’s also not designed to create true content “chunks,” such as breaking out summaries or teasers away from the rest of the content. However, there are also an increasing number of extensions that add additional capabilities to markdown, some of which even allow markdown documents to house some forms of metadata.

In short, markdown isn’t some holy grail. It’s just one way smart people are trying to separate their content from its presentation, and make getting clean, easily stored content online more efficiently. But, for projects with many content types and lots of inherent structure, it may not be enough—at least, not yet.

Many Ways to Get to Markup

Meanwhile, other organizations are trying completely different approaches, some of which don’t even seem to have a lot of structure at first glance. Take West Virginia University, a public higher education institution with more than 30,000 students. With countless departments and programs making endless updates to more than two dozen different websites, the university’s online presence is effectively in the hands of hundreds of people...and many of them highly unlikely to have—or want—training in creating effective digital content.

For years, these fragmented CMS users created even more fragmented content, like faculty listings that were hard-coded and had to be manually updated to add or remove names. So while preparing for a major site and CMS overhaul, what’s a future-focused team to do? Crack down and insist on content processes that make markup easy? Fight the good fight, even when distributed departments don’t want to? Create endless CMS forms?

Instead, the team took a more experimental approach. Rather than asking CMS users to enter content into complex form fields, they are building a system that allows them to edit pages as they appear on the desktop site, literally the same view an end user gets. An editor types content into the page’s editable areas—a headline here, body content there, an address over on the right, etc.—and hits save. Once that content is saved, it’s mapped back to a structured database. Based on the page template that was used, the database knows the content type. Based on how the template’s fields were mapped, the database knows that a faculty member’s name, bio, and contact information were entered. And the CMS user? He simply knows that his page looks like he wants.

From there, that structured content can then be used to automatically update directory listings and other related and reusable content, without worrying the author over all the details.

It’s not without its problems, of course. WVU knows it can’t get as specific with structure as some people might want with this approach. But it’s confident that it’s a step in the right direction, considering that to date, nearly every update—including updating the directory listings and a profile page separately when a new faculty member joins campus—has been made manually. It’s also difficult to validate certain kinds of content this way, like ensuring that the text entered in a field mapped back to “phone number” in the database actually was a phone number.

The team at WVU has thought about all these weaknesses and more. But, for them, it’s giving them a shot at creating structured content and using semantic markup—which is so much more than they’ve ever had before.

The Secret to Markup

OK, here it is: There is none. The fact is, it depends—on your project, your priorities, your publication channels, and your purpose. While there’s good reason to want to standardize, getting religious about which one is “right” is likely less than productive.

The good part of all that? It might matter less than you think, especially if you’re working with the content itself. Because more than any specific markup type, you simply need content that’s capable of being marked up: content with clear structure and chunks that are based on meaning, not presentation. And, if you’ve been following the course of this book, you’re already on the way to having those figured out.

Also exciting is that more and more, the markup itself can be added after the fact with systems that can translate between different types of markup, seamlessly moving content stored in XML into an API that uses JSON, for example.

Speaking of APIs, that’s what’s next. In Chapter 7, we’ll take a brief dive into the API world to understand more about what they are, how they can help get your content where it needs to go, and what content strategists, writers, IAs, and others on the nontechnical side of the spectrum need to know to make them successful.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 6 Understanding Markup

Create new playlist

Sign In

Sign Up