Chapter 8. Embedded Files

This chapter explains how a PDF can be used as a container for other files, much as a ZIP file can, while still providing rich page content to accompany them.

In most cases, file formats (such as .docx or .xslx) will be converted into PDF for distribution. However, sometimes it can be useful to have the original file as well. Unfortunately, there is a good chance that the two files will become disconnected, so having a way to embed or attach the original inside of the PDF is a useful capability. Additionally, you might choose to embed other files related to the PDF that aren’t the actual content, such as XML data.

For these reasons and more, PDF supports the ability to embed other files inside of itself and then have them presented in the UI of the PDF viewer.

File Specifications

At the heart of embedding files is the file specification dictionary. This dictionary actually supports both embedded and referenced files, but we will focus strictly on the embedded form (see Figure 8-1). In order to ensure that the dictionary can be identified, it must contain a Type key whose value is Filespec. Additionally, there must be three other keys present in the dictionary: F, UF, and EF (see Example 8-1 for a sample).

The F key contains the name of the file in a special encoding specific to file specification strings (ISO 32000-1:2008, 7.11.2), which is the “standard encoding for the platform on which the document is being viewed.” For most modern operating systems, that’s UTF-8, but it isn’t required to be so. However, the UF key contains the name encoded as standard 16-bit Unicode. The EF key refers to the embedded file dictionary, which is a simple dictionary with a single key, F, whose value is an embedded file stream where the actual data for the embedded file lives, along with some additional metadata about the file.

Note

An optional Desc key can be provided whose value is a human-readable description of the file.

Example of two embedded files in a PDF
Figure 8-1. Two embedded files
Example 8-1. Sample file specification dictionaries
% file specification for a file with a simple ASCII name
20 0 obj
<<
    /F (Untitled.docx)
    /UF (Untitled.docx)
    /EF << /F 22 0 R >>
    /Type /Filespec
    /Desc (Something I found on my disk)
>>
endobj


% file specification for a file with a name requiring Unicode
31 0 obj
<<
    /F (04 ... ...mp3)
    /Type /Filespec
    /Desc (Favorite Israeli music)
    /EF << /F 32 0 R >>
    /UF (þÿ.0.4. .Ñ.Õ.Ý. .ä.Ý...m.p.3)    % 04 בום פם.mp3
>>
endobj

Embedded File Streams

An embedded file stream is simply a stream object that contains the data for an embedded file. As such, it can be stored and compressed using filters (see Stream Objects) such as Flate—the same technology used in a ZIP file. A variety of additional information can be present in the embedded file stream’s dictionary, such as the file’s Internet media type (aka MIME type), as the value of the Subtype key. Other information, such as the date and time at which the file was created or last modified, can be included in the embedded file parameter dictionary (which is the value of the Params key). Example 8-2 shows an example of an embedded file stream.

Example 8-2. Example embedded file stream
32 0 obj
<<
    /Subtype /audio/mpeg
    /Filter /FlateDecode    % compressed using Flate/ZIP technology
    /Length 1000830         % encoded length
    /Params <<
        /ModDate (D:20100809110201)
        /CheckSum <1E2AFAC553A11A00E20A02774BA42EBF>
        /CreationDate (D:20130113152115-05'00')
        /Size 3930112    % decoded length
    >>
>>
stream
    % Flate-compressed stream data goes here....
endstream
endobj

Note

The value of the CheckSum key in the embedded file parameter dictionary is a 16-byte string that is the checksum of the bytes of the uncompressed embedded file, as calculated by applying the standard MD5 message-digest algorithm to the bytes of the embedded file stream.

A file specification and its associated embedded file stream are only one piece of the puzzle; it still needs to be connected to something in the PDF structure so that it can be found by the PDF viewer. If the file is associated with some specific content on a specific page, a FileAttachment annotation would be appropriate (see FileAttachment Annotations). However, if the file is more global to the document, the EmbeddedFiles name tree would be the place (see The EmbeddedFiles Name Tree).

URL File Specifications

Although not used for embedded files, there is a special type of file specification called a URL that is used in other parts of PDF as the standard way to specify that the data stream of the file should be retrieved from a given uniform resource locator (URL).

To declare a file specification as a URL file specification, the FS key will have the value (of type Name) URL (see Example 8-3). In addition, the value of the F key will not be a file specification string, but instead will be a URL of the form defined in RFC 1738, “Uniform Resource Locators”.

Note

As the character-encoding requirements specified in RFC 1738 restrict the URL to 7-bit US ASCII, which is a strict subset of PDFDocEncoding, the value can also be considered to be in that encoding.

Example 8-3. Example URL file specification
<<
    /FS /URL
    /F (http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf)
>>

Ways to Embed Files

Files can be connected to a PDF in two ways, depending on whether they are to be associated with specific content in a particular location or globally with the PDF as a whole. In the former case, we’ll use file attachment annotations. In the latter case, the approach will be to add an EmbeddedFiles key to the document’s name dictionary.

FileAttachment Annotations

The file attachment annotation is a simple type of annotation; it’s similar to the text annotation, except that rather than having a Contents key with the text to be displayed, the Contents are some descriptive text about the file, such as the filename. It also contains an FS key that points to the file specification dictionary of the attached file (see File Specifications for more), and its Subtype key has a value of FileAttachment.

Example 8-4 shows the result of placing a file attachment annotation that specifies the paperclip icon next to the text in Hello World.pdf.

Example 8-4. Example FileAttachment annotation
% the annotation object/dictionary
41 0 obj
<<
    /C [0.25 0.333328 1]
    /Type /Annot
    /Contents (world.jpg)
    /Name /Paperclip
    /Subtype /FileAttachment
    /FS 42 0 R
    /Rect [390.162 599.772 397.162 616.772]
>>
endobj

% the file specification dictionary
42 0 obj
<<
    /F (world.jpg)
    /Type /Filespec
    /UF (world.jpg)
    /EF << /F 43 0 R >>
>>
endobj

% and the embedded file stream
43 0 obj
<<
    /Subtype /image/jpeg
    /Length 25531
    /DL 20172
    /Params <<
        /ModDate (D:20121020024106-04'00')
        /CheckSum <19D579AB5B7C8F46B63C37F385707872>
        /CreationDate (D:20121020024106-04'00')
        /Size 20172
    >>
>>
stream
% Stream data goes here...
endstream
endobj
Example of a FileAttachment annotation

The EmbeddedFiles Name Tree

Embedded file streams are associated with the document as a whole by adding to the document’s name dictionary an EmbeddedFiles key, whose value is a name tree. That name tree maps name strings to file specifications that refer to embedded file streams (Embedded File Streams) through their EF entries (see Example 8-5).

Example 8-5. Sample EmbeddedFile name tree
8 0 obj
<<
    /Type /Catalog
    /Names 16 0 R
    /PageMode /UseAttachments
    /Metadata 1 0 R            % not included in the sample
    /Pages 5 0 R               % not included in the sample
>>
endobj

16 0 obj
<<
    /EmbeddedFiles 17 0 R
>>
endobj

17 0 obj
<<
    /Names [
        (Some Embedded File) 21 0 R
        (Untitled.docx) 20 0 R
    ]
>>
endobj

20 0 obj
<<
    /F (Untitled.docx)
    /UF (Untitled.docx)
    /EF << /F 22 0 R >>
    /Type /Filespec
    /Desc (Something I found on my disk)
>>
endobj

21 0 obj
<<
    /F (Some Embedded File)
    /Type /Filespec
    /Desc (Something else on my disk)
    /EF << /F 32 0 R >>
    /UF (Some Embedded File)
>>
endobj

Collections

A PDF with embedded files is useful where the page content is the primary focus for the person who will read the document. However, sometimes you have a collection of documents that need to be grouped together, but none of them have any higher priority than another. Thus, the embedded files themselves are the focus. For example, it might be all the materials for a legal case or for bidding on an engineering job. In those cases, you want the PDF viewer to present the list of files and any associated metadata about them, rather than the normal view of a primary document’s page content. It is for this purpose that the portable collections (or just “collections”) feature of PDF is used. Figure 8-2 shows an example.

A collection of Email Messages
Figure 8-2. Collection of email messages

Note

There is no requirement that documents in a collection have an implicit relationship or even a similarity; however, showing differentiating characteristics of related documents can be helpful for document navigation.

The Collection Dictionary

The contents of a collection are the files listed in the EmbeddedFiles. Any file in the name tree will be part of the collection, while any embedded files that are not in the tree will not. To make these files be a collection instead of just a loose set of embedded files, there needs to be a collection dictionary in the PDF that is the value of the Collection key in the document’s catalog dictionary (see Example 8-6 for a simple example). Although none of the keys in the collection dictionary are required, a useful collection dictionary would contain at least two keys: D and View.

D
The D key has a string value that is the name of a PDF in the EmbeddedFiles name tree that you want the PDF viewer to show initially. It is recommended that this either be the key document in the collection or instructions about how to navigate the collection.
View
The View key has a value (of type name) that will tell the PDF viewer whether to present the list of files from the collection in details mode (D), tile mode (T), or initially hidden (H).
Example 8-6. A simple collection dictionary
43 0 obj
<<
    /Type /Catalog
    /Collection 44 0 R
    /Names 42 0 R        % this would be a standard EmbeddedFiles name tree
    /Pages 39 0 R        % this would be a standard page dictionary
>>
endobj

44 0 obj
<<
    /Type /Collection
    /D (Index)
    /View /D
>>
endobj

Collection Schema

While a simple list can be useful, it is more likely that there is additional information about each file that could be displayed as part of the collection interface presented by the PDF viewer. For example, if the files represented a movie catalog, displaying the movies’ release dates and durations, as in Figure 8-3, might be useful.

Example of a collection of movies
Figure 8-3. Example movie collection

To create a set of fields such as those in the example image, a collection schema dictionary is included in the collection dictionary as the value of the Schema key, with each key in the dictionary having a value that is a collection field dictionary. It would look something like Example 8-7.

Example 8-7. Example collection schema
<<
    /Type /CollectionSchema
    /YEAR <<
        /Subtype /N            % type of the data is a name
        /N (Year)
        /Type /CollectionField
        /O 0
    >>
    /DURATION <<
        /Subtype /N            % type of the data is a name
        /N (Duration)
        /Type /CollectionField
        /O 2
    >>
    /TITLE <<
        /Subtype /S            % type of the data is a string
        /N (Movie title)
        /Type /CollectionField
        /O 1
    >>
    /DVD <<
        /Subtype /D            % type of the data is a date
        /N (DVD)
        /Type /CollectionField
        /O 3
    >>
>>

In the example schema there are four fields—YEAR, DURATION, TITLE, and DVD—representing not only the names of the fields, but also their types. These fields will then be associated with each of the files specified in the EmbeddedFiles name tree through the addition of a CI key in each file specification dictionary.

Note

In our example, the names are in all capital letters, but that’s not required in any way. Using all caps just ensures that the values will be unique names in the PDF. Example 8-8 shows a sample file specification.

Example 8-8. File specification with associated collection item dictionary
<<
    /F (kubrick12.pdf)
    /CI
        <<
            /Type /CollectionItem
            /YEAR 1999
            /DURATION 153
            /TITLE (Eyes Wide Shut)
            /DVD (D:20001130000000+01'00')
        >>
    /EF
        <<
            /F 35 0 R
            /UF 35 0 R
        >>
    /UF (kubrick12.pdf)
    /Type /Filespec
    /Desc (Eyes Wide Shut)
>>

With all that data at our disposal, we can also choose to have the file list sorted based on any of the elements of the schema rather than the default order of the EmbeddedFiles name tree. This is done by including a Sort key in the collection dictionary whose value is its associated collection sort dictionary, as shown in Example 8-9.

Example 8-9. Example collection sort dictionary
% Sort the collection based on the YEAR field, in descending order
<<
    /Type /CollectionSort
    /S /YEAR
    /A false
>>

GoToE Actions

Previously, in Actions, you learned about actions that allowed a user to navigate within the existing document (GoTo), or to an external document (GoToR). Now that you’ve seen how to embed documents inside of a PDF, let’s see how to navigate to an embedded document.

The GoToE (or “embedded go-to”) action is quite similar to a remote go-to action, but it allows jumping to an embedded PDF file. Both file attachment annotations and entries in the EmbeddedFiles name tree are supported. These embedded files may in turn contain embedded files, and the GoToE action can point through one or more parent PDFs to the final destination PDF (also called the target PDF) via the target dictionary.

The action dictionary for a GoToE action will consist of the same three keys found in both the GoTo and GoToR actions—Type (with a value of Action), S (with a value of GoToE), and D (whose value is the destination in the target PDF).

The value of the T key in the action dictionary is a target dictionary that locates the target in relation to the source, in much the same way that a relative path describes the physical relationship between two files in a filesystem. Target dictionaries may be nested recursively to specify one or more intermediate targets before reaching the final one.

The “relative path” described by the target dictionary need not only go down the hierarchy, but may also go up, just as the “..” entry would signify in a DOS or Unix path. The “direction” is specified by the R (relationship) key and has a value of either P (parent) or C (child). Example 8-10 shows a few sample GoToE actions.

Example 8-10. Example GoToE actions
% Simple target of just a single embedded file
1 0 obj
<<
    /Type /Action
    /S    /GoToE
    /D    [ 0 /FitH 794 ]
    /T <<
        /N (Our First PDF.pdf)
        /R /C
    >>
>>

% Target that navigates up and then back down into another PDF
1 0 obj
<<
    /Type /Action
    /S    /GoToE
    /D    [ 0 /FitW 612 ]
    /T <<
        /R /P                  % navigate up to the parent
        /T <<
            /R /C              % now down to one of its children
            /N (Target.pdf)    % named Target.pdf
        >>
    >>
>>

% Target that navigates up twice and then back down twice
1 0 obj
<<
    /Type /Action
    /S    /GoToE
    /D    [ 0 /Fit ]
    /T <<
        /R /P                          % navigate up to the parent
        /T <<
            /R /P                      % and up again
            /T <<
                /R /C                  % now down to one of its children
                /N (Intermediate.pdf)  % named Intermediate.pdf
                /T <<
                    /R /C              % and one of its children
                    /N (Final.pdf)     % named Final.pdf
                >>
            >>
        >>
    >>
>>

What’s Next

In this chapter, you learned about how to embed a file into a PDF (connecting it either to the document as a whole or to a specific place on a page) using a file specification dictionary and its associated embedded file stream. You aso learned how to instruct a PDF viewer to show your embedded files as rich collection of documents.

Next, you will learn how to work with multimedia objects in PDF, such as videos and sounds.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.9.169