Learn about PDF Files

It seems like PDF files have been around for a long time—in fact, if you’re young enough, they may have been around for your entire life. Nonetheless, PDFs are younger than the Mac platform, and, in their short time on this planet, they have (just like you) gone through more than a few changes.

You don’t need to know most of the information in this chapter to use PDFpen productively. If you are eager to get going, feel free to jump ahead to the next chapter, Understand the Tools. But if you like to understand why as well as how, this brief chapter gives you background on how the PDF file format evolved, what it can contain, and what you can reasonably expect to be able do with it.

A Short History of the PDF Format

Interestingly, HTML and PDF both originated from similar dreams at roughly the same time: the late 1980s. In the case of HTML’s creator, Tim Berners-Lee, the dream was to make the scientific papers being developed at the CERN particle physics lab in Switzerland available to all the CERN scientists using the lab’s computer network, regardless of the type of computer attached to the network. In the case of John Warnock, PDF’s creator and one of the founders of Adobe Systems, it was the dream of “being able to send full text and graphics documents (newspapers, magazine articles, technical manuals, etc.) over electronic mail distribution networks,” regardless of the type of computer receiving them.

Adobe had already achieved major success with its invention of PostScript, a computer language designed to describe the contents and layout of document pages in such a way that a printer (most notably, the first Apple LaserWriter printer) could print those documents faithfully at any resolution. PostScript, a device-independent language, became one of the foundations of PDF.

In 1991, the same year that HTML 1.0 was unleashed upon the world, Adobe introduced something at the Seybold conference that it called Interchange PostScript, or IPS—the first public mention of what would become PDF. IPS became known as PDF 1.0 at Comdex in late 1992. By the middle of the following year, Adobe released the first tool for editing and viewing PDF documents: Acrobat 1.0 (which Adobe originally called Carousel).

Until quite recently, the nature and capabilities of PDF have been inextricably linked to the current version of Acrobat. Almost every major release of Acrobat has been tied to a major revision of the PDF specification. A Wikipedia article summarizes the high points.

A Peek at What Is Inside

I said that PostScript became one of the foundations of the portable document format. That’s true, as far as it goes; however, a PDF file doesn’t contain actual PostScript code. Instead, it contains page-drawing instructions that are like PostScript instructions but simplified and designed for efficient processing. PDF instructions manipulate the objects displayed on a page, which roughly fall into three types:

  • Graphic path objects: These objects contain information about the lines, rectangles, and curves on a page, and how they are to be placed, drawn, and filled.

  • PDF image objects: You can also think of these objects as raster images—a stream of pixels, in specific colors, at a specific resolution, presented in a specific rectangular area (the display on your Mac screen is a raster image, even if the items depicted there started out as something else). The PDF image object is unique unto the PDF specification. When you make a PDF file by, say, saving a webpage to PDF in Safari, the images on that webpage are converted into PDF image objects.

  • Text objects: These objects contain the text, font, and location information (and several other textual attributes) needed to represent text on a PDF page. The running text you see on a PDF page may consist of a lot of different text objects assembled together for viewing. These are just a stream of drawing instructions: Text objects don’t include the concept of words, paragraphs, and so on; they contain only information about how they are supposed to look, where they are to be placed, and the characters that are to be drawn.

Holding it all together is a great big tree structure from which hang the individual pages (each containing a bunch of objects) and all the other information that is necessary to print or display the PDF document. It is up to the PDF rendering program (such as Preview, Acrobat Reader, Safari, or PDFpen) to work its way down the tree, assemble the objects and related information that belong to each page, and draw those objects on some device, such as a screen or a printer.

This is a simplified view, to be sure; there are all sorts of other objects, such as form elements and annotations, that a PDF can contain as well. Such objects can (and often do) contain one or more of the three basic objects described here: a form object for a checkbox, for example, includes graphic path objects that describe its appearance.

On Having Realistic Expectations

As a quick look at the history of PDF and its internal structure reveals, PDF is not an editing format. PDF was designed to be a delivery format intended, ultimately, for the eyes of human readers.

Although the format has over time developed features that make machine parsing, analysis, and even editing more practical, PDF files are primarily intended to maintain their look across a wide range of devices: they are meant to be exact visual representations of printed pages, and almost everything about them is designed to make that representation more exact and efficient. Any information within the PDF specification that enhances editing was added as an afterthought and was not one of the original goals of the format’s developers.

Here’s a quick guide to some of the edits you can make within the limits of the format:

  • Touch up text: You can make small text revisions, such as fixing typos. However, don’t expect to add whole paragraphs within an existing text block (other than in PDFpen-created text imprints), or to move paragraphs seamlessly from one text block to another: the PDF specification doesn’t include a definition for paragraphs. Also, keep in mind that a PDF may use fonts that you don’t have on your devices; this may affect the appearance of edited text. See Add, Edit, and Remove Text for more details.

  • Adjust images: You can’t edit the details of an existing image (which is stored in a special PDF image format), but you can move it around, crop it, and delete it. You can also adjust its colors, make portions of it transparent (great for scans of signatures), and straighten it. See Add and Alter Pictures for more.

  • Add new text imprints and images: Although editing existing text within a PDF can be tricky, adding an entirely new, editable text imprint is far easier. Same goes for images: you can always plop a new one down on a page. See Add, Edit, and Remove Text and Add and Alter Pictures for more.

  • Annotate: You can add notes, comments, and various graphic objects to mark up a PDF document. You can also add audio annotations and even attach whole document files as annotations. You can mark up the text with colors, underscores, strikethroughs, and squiggles. See Take Notes on a PDF and Copyedit and Review a PDF.

  • Move pages around: You can modify the order in which pages appear, as well as add or remove them. See Create a PDF and Rearrange, Rotate, and Crop Pages for more details.

From the foregoing you might think that PDF editing resembles Dr. Johnson’s description of a dog walking on two legs: “It is not done well; but you are surprised to find it done at all.” That’s not really true: with a tool like PDFpen the dog can walk much more gracefully than you might expect: it might even dance for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.203.143