XML Signature

XML Signature[1] is a W3C Recommended specification for representing the electronic signature of data items as an XML element and the processing for creating and verifying this element. Although the signature itself is represented as an XML element, the signed data items could be files containing any type of digital data, including XML, or elements within an XML document. Also, one signature element can store a signature over multiple data items. The name, namespace and structure of the signature element are specified by the XML Signature standard.

[1] The complete title of the specification is XML Signature Syntax and Processing, and it can be found at http://www.w3.org/TR/xmldsig-core. It was granted recommendation status on 12 February 2002.

An Example

Let us understand the structure of a signature element with the help of an example. We take a couple of elements from the XML document of Listing 7-1 as data items to be signed, create the signature element and insert it within the input document.

Let us pick the elements title and bookinfo, with id values of "book_title" and "book_info" respectively, for signing. The process of signing these elements needs a few more parameters, but let us ignore them for the time being. The resulting file, after some readability enhancement, is shown in Listing 7-3.

Listing 7-3. An XML document with Signature element
<?xml version="1.0" encoding="UTF-8"?>
<bk:book id="j2ee_sec"
xmlns:bk="http://www.pankaj-k.net/schemas/book">
  <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">

    <ds:SignedInfo>
      <ds:CanonicalizationMethod
Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
      <ds:SignatureMethod
        Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
      <ds:Reference URI="#book_info">
        <ds:Transforms>
          <ds:Transform
Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
        </ds:Transforms>
        <ds:DigestMethod
          Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<ds:DigestValue>vcpDtXSLXqgR+eUuJIofb3993Us=</ds:DigestValue>
      </ds:Reference>
      <ds:Reference URI="#book_title">
        ...children skipped ...
      </ds:Reference>
    </ds:SignedInfo>

    <ds:SignatureValue>
MCwCFCxyR35ZP1lYEMrALAjQ8PHFN2UiAhRr5qq5l5+QZn2blCazUy/rBIpVgw==
    </ds:SignatureValue>

    <ds:KeyInfo>key information skipped</ds:KeyInfo>
  </ds:Signature>

  <title id="book_title"
subject="bk:programming">J2EE Security</title>
  <author id="book_author">Pankaj Kumar</author>
  <publisher id="book_publisher">Prentice Hall</publisher>
  <bookinfo id="book_info"
      xmlns:bi="http://www.pankaj-k.net/schemas/bookinfo"
      xmlns:book="http://www.pankaj-k.net/schemas/book">
    <bi:categories book:area="technology" book:type="profession">
      <bi:category>Security<!-- Main Category --></bi:category>
      <bi:category>Enterprise Technology</bi:category>
    </bi:categories>
    <bk:keywords>J2EE, Security, Servlet, EJB, Web Service
</bk:keywords>
  </bookinfo>
</bk:book>

Let us spend some time going over this XML document.

  • The element Signature has been created and inserted into the original document as the first child of element bk:book. This element is associated with namespace URI "http://www.w3.org/2000/09/xmldsig#" and URI prefix ds. Although contained within the same document as the signed elements, the Signature element is separate from the signed elements, and, hence, is called a detached signature. The signed elements themselves are not modified.

  • The signature element ds:Signature contains elements ds:SignedInfo, ds:SignatureValue and ds:KeyInfo as its immediate children. These elements capture information on digital signature algorithm, signed elements, signature data bytes, and keys used for creating the signature.

  • The element ds:SignedInfo indicates that this element has been canonicalized as per the steps in W3C standard Canonical XML (identified by URI "http://www.w3.org/TR/2001/REC-xml-c14n-20010315"), the signature has been computed using algorithm DSA-SHA1 (identified by URI "http://www.w3.org/2000/09/xmldsig#dsa-sha1") and signed elements are the ones with ID values "book_info" and "book_title". We talk more about canonicalization shortly.

  • The signed Elements are identified within ds:SignedInfo through ds:Reference elements. Each ds:Reference element includes not only the URI pointing to the signed element but also the canonicalization method applied to the signed element before computing the digest, the algorithm to compute the digest value and the digest data bytes. The digest data bytes are base64 encoded. As indicated in the Signature element, children of ds:Reference element corresponding to "book_title" element are skipped to save space.

  • The element ds:SignatureValue contains the base64 encoded signature data bytes.

  • The element ds:KeyInfo contains the signer's public key or information to retrieve this key for validating the signature. In the above document, the key values are not shown.

The above XML document and the description should give you a good idea of what an XML Signature element looks like. Later on, we write a Java program to sign the identified elements of source file, book.xml, to produce the output shown in Listing 7-3. But before that, let us understand what we mean by canonicalization and dig a little bit into the structure of the Signature element.

XML Canonicalization

With XML representation, it is possible to have multiple textual representations for the same content. For example, different textual rendering of the same element, differing only in the order of attribute assignments and namespace declarations, has the same underlying content, for the ordering of these entities is not significant for an XML processor. There are many such aspects of XML that would cause the same content to be rendered differently.

Let us rewrite the XML file of Listing 7-1, changing certain aspects of the textual representation, but without introducing any change in the underlying content, as shown in Listing 7-4.

Listing 7-4. A textually different rendering of Listing 7-1
<bk:book id="j2ee_sec" xmlns:bk="http://www.pankaj-k.net/schemas/book">
  <title subject="bk:programming" id="book_title">J2EE Security</title>
  <author id="book_author">Pankaj Kumar</author>
  <publisher id="book_publisher">Prentice Hall</publisher>
  <bookinfo id="book_info"
      xmlns:book="http://www.pankaj-k.net/schemas/book"
      xmlns:bi="http://www.pankaj-k.net/schemas/bookinfo">
    <bi:categories book:area='technology' book:type="profession">
      <bi:category>Security</bi:category>
      <bi:category>Enterprise Technology</bi:category>
    </bi:categories>
    <bk:keywords>J2EE, Security, Servlet, EJB, Web Service
</bk:keywords>
  </bookinfo>
</bk:book>

Can you spot the changes? They are:

  • The XML declaration at the beginning of the file has been removed.

  • The order of attributes is reversed in the element title.

  • A new line character is removed in the start tag of the element bookinfo.

  • The order of namespace declarations is reversed in the start tag of the element bookinfo.

  • The single quotes surrounding the value of the attribute bk:area in the start tag of the element bi:categories have been replaced by double quotes.

  • The comment <!-- main Category --> has been removed from the first bi:category element.

None of these changes alter the underlying content and it is fair to say that Listing 7-1 and Listing 7-4 are representions of the same XML document for most applications[2]. However, the minor changes could be problematic for applications like digital signature that depend on exact byte-sequence, for changes of this kind may be introduced unintentionally by normal XML processing on a signed element. Extra precaution is needed to make sure that such changes don't break the signature. Such precaution is available in the form of a transformation process known as canonicalization. This process ensures that two different textual representations of the same underlying content map to the same canonicalized representation. We provide a list of operations performed by this process, leaving the details to references in the Further Reading section.

[2] Some applications, such as XML aware text-processors, might treat them differently.

  1. The document is encoded in UTF-8.

  2. Line breaks are normalized to #xA on input, before parsing.

  3. Attribute values are normalized, as if by a validating processor.

  4. Character and parsed entity references are replaced.

  5. CDATA sections are replaced with their character content.

  6. The XML declaration and DTD (Document Type Declaration) are removed.

  7. Empty elements are converted to start-end tag pairs.

  8. Whitespace outside of the document element and within start and end tags is normalized.

  9. All whitespace in character content is retained (excluding characters removed during line feed normalization).

  10. Attribute value delimiters are set to quotation marks (double quotes).

  11. Special characters in attribute values and character content are replaced by character references.

  12. Superfluous namespace declarations are removed from each element.

  13. Default attributes are added to each element.

  14. Lexicographic order is imposed on the namespace declarations and attributes of each element.

Note that this list doesn't include stripping away comments. The capability to retain comments or strip them away is parameterized and is specified through the canonicalization algorithm, identified by a URI. For example, URI "http://www.w3.org/TR/2001/REC-xml-c14n-20010315" refers to normal canonicalization that strips away comments. Comments can be retained in the canonicalized form by using the algorithm identified by URI "http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments".

Listing 7-5 shows the result of applying normal canonicalization on the document of either Listing 7-1 or Listing 7-4. Lines in bold indicate change from Listing 7-1. Keep in mind that not all the changes introduced by canonicalization are modifications. Some changes require moving things out, such as XML declaration and comments.

Listing 7-5. Canonicalized book.xml
<bk:book xmlns:bk="http://www.pankaj-k.net/schemas/book" id="j2ee_sec">
  <title id="book_title" subject="bk:programming">J2EE Security</title>
  <author id="book_author">Pankaj Kumar</author>
  <publisher id="book_publisher">Prentice Hall</publisher>
  <bookinfo xmlns:bi="http://www.pankaj-k.net/schemas/bookinfo" xmlns:book="http://www
.pankaj-k.net/schemas/book" id="book_info">
    <bi:categories book:area="technology" book:type="profession">
      <bi:category>Security</bi:category>
      <bi:category>Enterprise Technology</bi:category>
    </bi:categories>
    <keywords>J2EE, Security, Servlet, EJB, Web Service</keywords>
  </bookinfo>
</bk:book>

This works fine for complete documents but what if canonicalization is performed on an element within a document and not on the whole document? Let us look at the result of applying normal canonicalization on the title element with id value "book_title":

<title xmlns:bk="http://www.pankaj-k.net/schemas/book"
     id="book_title" subject="bk:programming">J2EE Security</title>

Notice that the namespace declaration for prefix bk has been brought into the title start tag. In other words, it includes the namespace context defined by its ancestor. For this reason, normal canonicalization is also known as inclusive canonicalization.

Exclusive Canonicalization

Normal canonicalization, as explained in the previous section, is adequate for most situations but falls short in scenarios where the parent elements of the signed element may change or the signed element itself may move to another location. As we saw, this can happen if the canonicalized element includes the context consisting of the namespace declarations and attributes in the default "xml:" namespace of its parents.

This behavior of canonicalization is problematic for the use of signed XML data as protocol-specific headers, which can be added or removed in the course of normal processing. We talk about one such protocol, WS-Security, in Chapter 11, Web Service Security.

To avoid the problems introduced by inclusive nature of the normal canonicalization, a form of canonicalization has been defined that keeps the canonicalized element free from its context and is defined in the W3C standard Exclusive XML Canonicalization. This standard defines additional steps, on top of those specified by the Canonical XML, to produce exclusive canonicalized representations, meaning that the namespace declaration and other context information is excluded from the resulting output. Applying this canonicalization on the title element of our previous example would produce:

<title id="book_title" subject="bk:programming">J2EE Security</title>

A detailed description of this process is beyond the scope of this chapter. Refer to references listed in the section Further Reading for more information.

With this brief overview of inclusive and exclusive canonicalization, let us get back to our main topic, XML Signature.

The Structure of the Signature Element

The syntactical structure of the Signature element is shown below, using a simplified and somewhat loosely but intuitively defined notation. In this notation, an optional component is marked by a question mark (?); zero or more occurrences are specified by a star symbol (*), one or more occurrences are indicated by a plus sign (+), and multiple components are grouped together within parentheses. The same notation is used to explain the structure of XML elements at other locations in the book as well.

<Signature ID?>
  <SignedInfo ID?>
  <SigantureValue ID?>base64 encoded data bytes</SignatureValue>
 (<KeyInfo>)?
 (<Object ID? MimeType? Encoding?>arbitrary elements</Object>)*
</Signature>

A number of child elements of the element Signature have an optional ID attribute. As per XML validation rules, ID values corresponding to different elements within a document must be unique. These values provide a simple mechanism to address these elements within the document.

We have already come across the elements SignedInfo, SignatureValue and KeyInfo in the XML document of Listing 7-3. Although present in the document, KeyInfo is optional as per the above syntax. As this element is primarily used to retrieve the public key for validating the signature, its absence would mean that the signature validation process would have to get the keys based on application context.

There can be zero or more Object elements in a Signature. These elements are typically used to hold data over which the signature has been applied, resulting into a Signature element which envelops the signed data. Such signatures are aptly known as enveloping signatures. Attribute MimeType describes the data within the Object and attribute Encoding denotes the encoding method through a URI. For example, a base64 encoded PNG image data may be specified with MimeType as 'image/png' and Encoding URI as "http://www.w3.org/2000/09/xmldsig#base64". It should be noted that attribute MimeType is only for information to the application and is not relevant for XML Signature-related processing.

Let us now take a peek into the SignedInfo element:

<SignedInfo ID?>
  <CanonicalizationMethod Algorithm/>
  <SignatureMethod Algorithm>algo. specific elements</SignatureMethod>
 (<Reference URI? ID? Type?>
   (<Transforms>
     (<Transform Algorithm?>
       (<XPath>xpath expression</XPath>|algo. Specific elements)*
      </Transform>)+
    </Transforms>)?
    <DigestMethod Algorithm?>method specific elements</DigestMethod>
    <DigestValue>digest data bytes</DigestValue>
  </Reference>)+
</SignedInfo>

The attribute Algorithm of the element CanonicalizationMethod specifies what kind of canonicalization should be applied to the SignedInfo element before applying the signature algorithm. Recall from the previous discussion that a canonicalization mechanism is identified by a URI. Similarly, for the element SignatureMethod, the attribute Algorithm identifies the algorithm used for creating and validating the signature element. Look at Table 7-1 for algorithm identification URIs defined within the XML Signature specification.

Table 7-1. XML Signature Algorithm Identifiers
AlgorithmIdentifier
SHA-1 Digesthttp://www.w3.org/2000/09/xmldsig#sha1
DSA Signature with SHA-1 Digesthttp://www.w3.org/2000/09/xmldsig#dsa-sha1
DSA Signature with SHA-1 Digesthttp://www.w3.org/2000/09/xmldsig#dsa-sha1
Canonical XML without commentshttp://www.w3.org/TR/2001/REC-xml-c14n-20010315
Canonical XML with commentshttp://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments
Base64 Transformhttp://www.w3.org/2000/09/xmldsig#base64
XPath filtering Transformhttp://www.w3.org/TR/1999/REC-xpath-19991116
Enveloped Signature Transformhttp://www.w3.org/2000/09/xmldsig#enveloped-signature
XSLT Transformhttp://www.w3.org/TR/1999/REC-xslt-19991116
Note: Identifiers as specified in XML Signature Syntax and Processing, W3C Recommendation of 12 February 2002.

The process of signature creation involves canonicalizing the SignedInfo element and applying the signature algorithm, taking the private keys as the parameter, to the output of the canonicalization process. As we know, the process of signing is nothing but computing a digest value and encrypting it with the private key of the signer. The data bytes corresponding to the signature are base64 encoded and stored as the text value of the element SignedValue. The element SignedInfo itself gets constructed from the Reference elements. A Reference element references a to-be-signed data item and includes a sequence of Transform elements, a DigestMethod element and a DigestValue element. The elements Transform and DigestMethod have the Algorithm attribute, identifying the algorithm used for transformation and digest computation, respectively. For each Reference element, the data item is accessed and the specified transforms are applied in sequence. The output of this process is used for computing the digest as per the specified algorithm and the base64 encoding of the computed value is stored as text value of DigestValue element. This process is illustrated in Figure 7-1.

Figure 7-1. Steps in creating an XML signature.


The validation process is quite similar, requiring almost all the steps of signature creation. The only difference is that the digest value of the SignedInfo element is compared against the decrypted signature data bytes. This decryption is done using the public keys of the signer.

Our example document had a detached Signature element, pointing to signed elements within the same document. We also talked about the possibility of holding signed data items within the Object elements of the Signature. Let us further explore the structural relationship between the Signature element and the signed data items.

One or more Reference elements specify the data items being signed. Each Reference corresponds to a single data item and identifies the item through an optional URI attribute. Among all the Reference elements within a single SignedInfo, only one URI may be absent. If a Reference URI is absent, it is left to the application to implicitly identify the data item. The rationale for such facility is that lightweight protocols could benefit from simplified syntax, as the only signed element would be known from the context.

The URI mechanism to reference signed data elements allows:

  • A signed data item to be an Object element within the Signature element itself, resulting in an enveloping signature.

  • A signed data item to be the root element of the document containing the Signature element, resulting in an enveloped signature. In this case, the Signature element itself must be excluded from the data to be signed or verified.

  • A signed data item to be an element within the document but outside the Signature element, resulting in an internal detached signature. We have already seen an example of this.

  • A signed data item to be any URI addressable, external resource, resulting in an external detached signature.

Figure 7-2 illustrates these different ways of packaging Signature element.

Figure 7-2. Different ways of packaging XML Signature.


Why have so many different packaging schemes? Different use-scenarios demand different packaging mechanisms. For example, a detached signature is useful in determining the integrity of data in files without modifying the files themselves whereas an attached signature is useful for message-based protocols where a single message must have the data as well as the signature.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.154.16