Chapter 14. Working with PDF

Adobe’s Portable Document Format (PDF) is the file format that Adobe created to facilitate the distribution of multi-page documents. As seen in Chapter 2, “From QuickDraw to Quartz 2D,” Quartz 2D bases its imaging model on the graphics systems of PostScript and PDF. In the early days of PDF, a program called the Distiller would execute PostScript code and collect all the drawing commands into a PDF file. When a PDF viewer played back those commands, the user would see all of the drawing that would have been done by the original PostScript program without having to reinterpret that program itself.

Quartz 2D uses PDF as its metafile format. A metafile captures a series of drawing commands that create an image, rather than the image itself. The PDF files that Quartz generates record the paths, images, and text that an application draws into a PDF graphics context. This preserves the color fidelity, resolution independence, and other hallmark features associated with Quartz 2D graphics.

Not every PDF file is a Quartz 2D metafile, but Quartz 2D can draw the contents of PDF files that are not metafiles. This is a powerful feature that allows many applications on the system to enjoy the benefits of PDF’s vector graphics with the investment of a few lines of code.

The metafiles generated by Quartz 2D are PDF files, but Quartz is not a generalized tool for writing PDF files. Core Graphics handles many of the graphics and interactive features that are defined in the PDF spec. For example, the PDF specification includes shadings that you cannot create through Quartz 2D. PDF files also can contain interactive forms. Quartz 2D might display the contents of the forms but does not offer any features for creating those forms.

Note

Working with PDF

Quartz 2D does provide some routines that allow you to access the structure and content of a PDF file in its raw form. If you need very low-level access to the contents of a PDF file, you can use the CGPDFScanner API to get at the guts of PDF files. To scan the PDF file, your application registers a series of callbacks that you want Quartz 2D to invoke when it encounters certain PDF operators in a PDF stream. You can then call on Quartz to parse the PDF stream and invoke your callbacks using the CGPDFScanner API. If you want more information on using this mechanism to explore the structure and contents of PDF files, please read the “Parsing PDF Content” chapter of Apple’s Quartz 2D Programming Guide.

http://developer.apple.com/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/index.html

Mac OS X includes an additional technology know as PDFKit that allows you to access features of PDF that Quartz 2D does not. Through the PDF kit your application can access and display PDF file annotations and work with the outlines embedded in PDF documents. For more information on PDFKit you can visit the Apple Developer Connection web site.

http://developer.apple.com/documentation/GraphicsImaging/Conceptual/PDFKitGuide/index.html

This chapter examines the facilities in Quartz 2D for creating new PDF documents and drawing existing documents as well as some of the information that Quartz 2D allows you to extract from PDF files.

Drawing PDF Documents

The technique of drawing PDF files in Quartz 2D is very similar to the technique for drawing bitmap images discussed in Chapter 8, “Image Basics.” Instead of creating a CGImage, an application must create an instance of the opaque data type CGPDFDocument. The process used to create CGImages and CGPDFDocuments are very similar, though. With a PDF document object in hand, Quartz 2D can retrieve individual pages of the file as instances of the CGPDFPage abstract data type. Drawing that page onto a CGContext is as easy as calling CGContextDrawPDFPage.

Creating a CGPDFDocument

A CGPDFDocument obtains its PDF data from a CGDataProvider. CGDataProviders were first looked at in Chapter 8 when creating CGImages from bitmap data. Just to recap, Quartz 2D can create data providers for any of a number of data sources, and Quartz provides utility routines for creating data providers from popular sources. For example, two popular data sources are blocks of data in memory, which you can access with CGDataProviderCreateWithData, and files on disk you can access with a data provider created with CGDataProviderCreateWithURL. For custom data sources, the library can create a data provider called CGDataProviderCreatewithDirectAccess that uses a set of callbacks to obtain data.

Once you have created a data provider, the routine CGPDFDocumentCreateWithProvider will extract the PDF data and create an CGPDFDocument instance.

The case of reading PDF data from a file is so common that Quartz 2D also offers a utility routine, CGPDFDocumentCreateWithURL. This routine creates a data provider on your behalf and returns a CGPDFDocument that uses that data provider. As with CGImages, you will probably want to pass this routine a URL to a local resource most of the time. Core Graphics can handle URLs for remote files, but it has no mechanism to pass you progress information and errors. To provide a proper user experience with download progress bars and proper error handling, you are probably better off using another Mac OS X mechanism to download the file data.

Retrieving Pages

Quartz 2D does not draw PDF documents; it draws the pages in PDF documents. After creating a PDF document, you will have to extract the pages you want to draw from it. The routine CGPDFDocumentGetNumberOfPages returns the number of pages in a CGPDFDocument. An instance of the CGPDFPage opaque data type identifies an individual page in the document. Calling CGPDFDocumentGetPage retrieves the CGPDFPage from a document by its page number.

Drawing Pages

The CGContextDrawPDFPage method does just as its name implies; it draws a CGPDFPage object into the context. The only parameter to this routine is the CGPDFPageRef it should draw. Quartz 2D draws the page so that the lower left corner of the page’s mediaBox attribute is at the origin and aligned to the coordinate axes.

Speaking of the mediaBox, the PDF file format keeps track of several different rectangles, and the mediaBox is just one of them. The mediaBox is the rectangle that defines the physical boundaries of the media (usually paper) that the file’s author intended that page to be printed on. For example, if the page was intended for US letter sized paper, then the mediaBox would be 8.5 × 11 inches or 612 × 792 points.

In addition to the mediaBox, Quartz includes routines to retrieve any of these rectangles:

  • Crop box—This is a hint to the program displaying the PDF that the page’s image should be clipped to this rectangle. If no crop box is specified in the PDF data, the crop box defaults to the mediaBox.

  • Bleed box—This box has great merit for PDFs used in professional printing but will probably be of little use to applications outside of that realm. The bleed box specifies an area of the page that, when printed, contains important marks about how to process the sheet of paper the page was printed on after it comes off the press. For example, if the PDF contains a graphic that is to be printed on a cereal box, the bleed box might include labels that indicate what month the box was printed, where to cut away the excess cardboard, and where the box must fold to form its final shape. If no bleed box is specified in the PDF data, its value defaults to the crop box.

  • Trim box—The trim box is another professional printing tool. Professional printers will often print their images on large sheets and then cut those sheets down (trim them) to their final size. The trim box represents how much of the page will be left after trimming. Most applications will not need to work with the trim box, but Quartz 2D makes it available if you need it. If the trim box is not found in the PDF data, it also uses the crop box as its default.

  • Art box—Simply put, the art box is that area of the page that the PDF author considers to have meaningful content. Exactly what constitutes the art box is at the discretion of the program that generated the file. This box also defaults to the crop box if it is otherwise unspecified.

Because Quartz 2D always draws the page at the origin when you call CGContextDrawPDFPage, coordinate system transformations are the only way to relocate it on the output device. The routine CGPDFPageGetDrawingTransform simplifies the task of creating an appropriate transformation. To use the routine, you select one of the boxes just described and the rectangle in user space the box will map into when you draw the page. The parameters to the routine can rotate the page by a multiple of 90 degrees, changing landscape pages to portrait orientations and vice-versa. The routine can also preserve the aspect ratio of the PDF box if the transform has a scaling component. Naturally CGPDFPageGetDrawingTransform returns a CGAffineTransform that should be concatenated onto the CTM of a context before a call to CGContextDrawPDFPage.

A PDF Drawing Example

To demonstrate how this process comes together, the following sample draws a collage of the pages in a PDF document. Figure 14.1 shows the effect of running this code on a PDF with three pages.

Fanned Pages of a PDF File

Figure 14.1. Fanned Pages of a PDF File

The fan-shaped collage is created by reading a PDF from a file and then drawing successive pages with the appropriate transformations. This code is part of the DrawPDF sample application.

Example 14.1. Drawing a PDF Document

void DrawTravelBrochure(CGContextRef cgContext, CGRect viewBounds)
{
    float backgroundColor[] = { 0.7, 1.0 };

    CGColorSpaceRef grayColorSpace =
            CGColorSpaceCreateWithName(kCGColorSpaceGenericGray);

    /* draw the background color */
    CGContextSetFillColorSpace(cgContext, grayColorSpace);
    CGContextSetFillColor(cgContext, backgroundColor);
    CGContextAddRect(cgContext, viewBounds);
    CGContextFillPath(cgContext);

    // Adjust the origin for the rest of the image
    CGContextTranslateCTM(cgContext,
           CGRectGetMidX(viewBounds),
           viewBounds.origin.y + viewBounds.size.height / 8.0);

    // Grab a reference to the PDF file
    CFURLRef pdfURL = CFBundleCopyResourceURL(
            CFBundleGetMainBundle(),
            CFSTR("TravelBrochure"),
            CFSTR("pdf"), NULL);

    // Create a CGPDFDocument from it
    CGPDFDocumentRef pdfDocument = CGPDFDocumentCreateWithURL(pdfURL);

    // Count the number of pages in the document and calculate
    // a rotation angle from it
    size_t numPages = CGPDFDocumentGetNumberOfPages(pdfDocument);
    float angleStep = pi / (3.0 * (float)numPages);

    // Calculate a "nice" size for the page
    for(short pageCtr = numPages; pageCtr > 0; pageCtr--) {
            // Draw the current page
            CGPDFPageRef currentPage =
                    CGPDFDocumentGetPage(pdfDocument, pageCtr);
    // Run a simple calculation to get a suggested drawingRect
    CGRect mediaBox =
            CGPDFPageGetBoxRect(currentPage, kCGPDFMediaBox);
    float suggestedHeight = viewBounds.size.height * 2.0 / 3.0;
    CGRect suggestedPageRect = CGRectMake(0, 0,
           suggestedHeight *
           (mediaBox.size.width / mediaBox.size.height),
           suggestedHeight);

    CGContextSaveGState(cgContext);

    // Calculate the transform to position the page
    CGAffineTransform pageTransform =
           CGPDFPageGetDrawingTransform(
                   currentPage,
                   kCGPDFMediaBox,
                   suggestedPageRect,
                   0, true);
    CGContextConcatCTM(cgContext, pageTransform);

    // Erase a rectangle with a shadow where the page will go
    CGContextSaveGState(cgContext);
    CGContextSetShadow(cgContext, CGSizeMake(8, 8), 5);
    CGContextAddRect(cgContext, mediaBox);
    CGContextFillPath(cgContext);
    CGContextRestoreGState(cgContext);

    // Draw the PDF page
    CGContextDrawPDFPage(cgContext, currentPage);

    // Draw a frame around the page
    CGContextAddRect(cgContext, mediaBox);
    CGContextStrokePath(cgContext);

    CGContextRestoreGState(cgContext);

    CFRelease(currentPage);
    currentPage = NULL;
           CGContextRotateCTM(cgContext, angleStep);
    }
}

The code begins by drawing the gray background using simple line art drawing techniques. It then positions the origin to the center of the drawing area and close to the bottom. This point will become the pivot of the fan.

The sample application stores its PDF in the application’s resources. The code retrieves a URL for the file and creates a CGPDFDocumentRef by calling CGPDFDocumentCreateWithURL. The code uses the routine CGPDFDocumentGetNumberOfPages to find out how many pages are in the PDF. The code uses the page count to control the loop that draws the individual pages.

Using the Painter’s Model, the loop stacks the pages from the last page to the first one. Inside of the loop, the computer retrieves a page from the document using CGPDFDocumentGetPage. There is a short calculation to determine a “nice” bounding rectangle for the page. The seed of that calculation is the mediabox retrieved by CGPDFPageGetBoxRect.

The drawing routine wants to scale the PDF to fit within the rectangle it just calculated. It uses the routine CGPDFPageGetDrawingTransform to request a transformation to fit the mediaBox to the “nice” rectangle. The CGContextConcatCTM incorporates this transformation into the context.

The short segment of code that follows uses Quartz 2D’s drop-shadow capability to draw a gray rectangle with a drop shadow where the computer will draw the PDF page. The CGContextDrawPDFPage routine draws the page’s image onto the context, and the drawing ends with a simple frame around the page to emphasize its boundaries.

After drawing the page, the code at the end of the loop simply adjusts the coordinate system for the next page.

Creating PDF Documents

Creating a PDF with Quartz 2D is a straightforward task. The first step is to create a graphics context using the methods of the CGPDFContext opaque data type. Any graphics drawn into this context will be recorded into the PDF file.

The CGPDFContext is different from other graphics contexts. Just as Quartz 2D draws pages of a PDF, not documents, it also draws in pages of a PDF Context and not the context alone. The routine CGPDFContextBeginPage creates a page in the PDF context. After drawing onto that page, a call to CGPDFContextEndPage tells the context that the page is complete.

Another behavior peculiar to the CGPDFContext is the fact that releasing the context is a vital part of the drawing process. Quartz 2D will not finish writing the PDF data collected by a context to its destination until it ensures that all the pages have been added. Releasing the context is the signal that tells Quartz 2D no more graphics are forthcoming. The library will finish generating and writing the PDF data.

Creating PDF Contexts

When reading PDF documents and images, Quartz 2D relies on a CGDataProvider to supply PDF data. When writing PDF data, the library uses an analogous object known as a CGDataConsumer. A data consumer is an abstract representation of some mechanism that accepts a stream of data and (presumably) stores it. The routine CGDataConsumerCreate makes the most generic data, one that invokes callback routines whenever the system generates PDF data. The callback routines can store the PDF data in any way they like.

Most applications will want to store PDF data into a file or into a block of memory. Quartz 2D provides two utility routines that create data consumers for each case. CGDataConsumerCreateWithURL allows you to create a data consumer whose destination is any particular URL. If you use a file URL, then Quartz 2D will write the PDF data into a file. You can use network URLs, but as was mentioned before, you will have no way to trap networking errors when Quartz 2D is handling your networking code. If you want to create PDF data in memory, the data consumer API offers the CGDataConsumerCreateWithCFData routine. This data consumer will write the PDF data into an instance of the CFMutableData opaque data type.

One you have a data consumer to catch the PDF data, the routine CGPDFContextCreate returns a new graphics context. In addition to the data consumer, this routine accepts a rectangle and a dictionary as arguments. The rectangle is the default mediaBox for the PDF document. If the mediaBox parameter is NULL, Quartz will use a US Letter page by default. The default page size is 612 × 792 points, or 8.5 × 11 inches.

The dictionary that you pass to CGPDFContextCreate allows you to specify metadata about the PDF file. This metadata is stored inside the PDF so that it is available to other PDF processing systems. Adding dictionary entries with the keys kCGPDFContextTitle, kCGPDFContextAuthor, and kCGPDFContextCreator will add information about the origin of the document. Keys like kCGPDFContextOw nerPassword, kCGPDFContextUserPassword, kCGPDFContextAllowsPrinting, and kCGPDFContextAllowsCopying let you inject access control information into the PDF. There are additional keys that control professional printing and PDF/X options that pre-press applications use to manage PDF documents. All of these keys are documented in the CGPDFContext.h header file and in the developer documentation.

Adding Pages

Quartz 2D has two routines for adding pages to a PDF context, CGContextBeginPage, and CGPDFContextBeginPage. CGContextBeginPage is the older of the two routines and has fewer options than CGPDFContextBeginPage.

As indicated by its name, CGContextBeginPage is a part of the abstract CGCon-text API. This routine takes a rectangle as a parameter. The rectangle represents the mediaBox of the new page. This rectangle supersedes the mediaBox that the context was created with. If the parameter is NULL, page’s mediaBox will match the mediaBox of the context.

The CGPDFContextBeginPage offers more control over the attributes of the page it creates. This routine accepts a dictionary of information about the page. One obvious key this dictionary might contain is kCGPDFContextMediaBox. Naturally, this key specifies the mediaBox for the page and overrides the mediaBox supplied by the context. The page information dictionary can also include keys that specify the other PDF page rectangles like the cropBox, or the artBox.

As you finish drawing each page, you should notify Quartz using CGContextEndPage or CGPDFContextEndPage. Be sure to pair each routine with the corresponding function that you used to begin the page. These routines are essential because they give Quartz the chance to update its internal data structures.

Drawing the PDF

Once the context contains a page, the same routines that draw in any other context will generate PDF data. Quartz records and encodes each drawing command into the PDF data that is sent to the data consumer.

There are some simple drawing techniques for PDF contexts that can help Quartz optimize the contents of a PDF file. For example, if the PDF will contain multiple copies of the same image, the resulting file will be smaller if each call to CGImageDraw uses the same CGImageRef. This gives the system the opportunity to take note of the fact that the same image is drawn multiple times. Quartz can optimize the output stream so that only one copy of the image’s data will be written out.

In a similar fashion, repeatedly drawing a CGLayerRef into the context allows the system to store a single copy of the graphic in the file and reuse it every time it appears.

Release the Context

After all the pages have been drawn, a call to CGContextRelease prompts Quartz 2D to collect all the information about the pages and write the document’s data. As mentioned in the overview at the beginning of this chapter, releasing the context is a vital part of successfully creating a PDF. If the context is not released and the system writes anything to the destination, it might not contain a complete PDF.

An Example That Creates a PDF

Listing 14.2 is a short code sample that creates a PDF file using Quartz 2D. Many of the illustrations in this book were created by sending Quartz 2D graphics to PDF files using similar code.

Example 14.2. Creating a Simple PDF

void CreatePDFFile()
{
    // Ask the user where we should save the file.
    CFURLRef saveLocation = CopySaveLocation();
    if(NULL != saveLocation) {
            CGRect mediaBox = CGRectMake(0, 0, 576.0, 576.0);
            // Create a dictionary to store our document attributes
            CFMutableDictionaryRef attributes =
                    CFDictionaryCreateMutable(
                           NULL, 3,
                           &kCFTypeDictionaryKeyCallBacks,
                           &kCFTypeDictionaryValueCallBacks);

            CFDictionaryAddValue(attributes, kCGPDFContextAuthor,
                    CFSTR("Scott Thompson"));
            CFDictionaryAddValue(attributes, kCGPDFContextTitle,
                    CFSTR("Sample PDF"));
            CFDictionaryAddValue(attributes, kCGPDFContextCreator,
                    CFSTR("CreatePDF Sample Code"));

            // Create a PDF Context that we will draw the graphics into
            CGContextRef pdfContext =
                    CGPDFContextCreateWithURL(saveLocation,
                           &mediaBox, attributes);

            CFRelease(attributes);

            // Begin a PDF page
            CGPDFContextBeginPage(pdfContext, NULL);

            DrawImageToExport(pdfContext, mediaBox.size);

            // End the PDF page
            CGPDFContextEndPage(pdfContext);

            // Finalize the PDF document
            CGContextRelease(pdfContext);
    }
}

The code sample begins by calling the CopySaveLocation routine. This routine puts up a standard system dialog asking the user to choose a destination for the PDF file. The value returned is a file URL. You create a CFDictionary and fill out three of the attribute keys. These keys will become part of the metadata contained in the PDF file.

Using the URL that was selected, you create a PDF document at that location using CGPDFContextCreateWithURL. Naturally, you pass in the metadata dictionary. You also supply a mediaBox rectangle for the PDF file. This box will also serve as the default page size. You then release the attributes dictionary at this point because it is not used again.

To draw in the PDF file, you have to add a page and the sample calls CGPDFContextBeginPage to do just that. Next you pass in NULL for the page info dictionary, which tells Quartz 2D that this page needs to inherit its mediaBox from the document.

At this point any drawing commands sent to the PDF context will be recorded on the first page. The routine DrawImageToExport is used to create a simple graphic on the PDF page. After drawing, the code simply ends the page using CGPDFContextEndPage.

Before leaving the routine, the code is very careful to release the PDF context so that Quartz 2D can finalize the PDF data and ensure that it is all written into the file.

Adding Hyperlinks to PDFs

Some PDF readers allow you to navigate using hyperlinks attached to hot spots in the PDF. Quartz 2D can annotate its metafiles with hyperlinks as well. The Quartz 2D API allows you to create hyperlinks both to locations within the PDF document and to URLs that are external to the PDF.

Internal Links

Creating a link within a PDF document is a two-step process. The first step is to annotate the PDF document with anchor points. An anchor point is simply a named location in the document that is the destination of an internal link. Quartz 2D allows you to create an anchor point anywhere on a page. The routine that creates an anchor point is CGPDFContextAddDestinationAtPoint. This routine accepts the name of the anchor point and its location on the current page.

To identify a hot spot in the PDF that will take you to an anchor point, your application can call the routine CGPDFContextSetDestinationForRect. This routine accepts the rectangle on the current page that becomes the hot spot. It also accepts the name of the anchor that is the destination of the link.

URL Hyperlinks

In addition to hyperlinks that navigate within a document, Quartz 2D allows you to add hyperlinks to network URLs. To create a URL hot spot, you call the Quartz routine CGPDFContextSetURLForRect. This function accepts the URL that is the destination of the hyperlink and a rectangle on the current page as parameters. The area under the rectangle becomes a hot spot in the resulting PDF.

Quartz 2D is intimately tied to the PDF standard. Not only does it derive its imaging model from PDF, it offers applications an easy to use mechanism for creating simple PDFs and drawing many more. You can even add some interaction features to the PDFs you create using Quartz 2D. The resolution independence of PDF graphics make them attractive in printing environments. They may also play an important role in the future as high-resolution monitors and other devices become commonplace. By adopting the PDF metafile capabilities in Quartz 2D, your application will be ready for these advancements today.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.59.5