Chapter 12. Sending Attachments with Web Services

Sending Attachments with Web Services

One of the earliest criticisms of XML and therefore Web services is in the area of transmitting noncharacter data. The idea of converting everything into characters is not considered practical for certain types of data such as photographs and executable code.

Because an XML document is transmitted as a set of characters, other types of binary data—such as images, audio files, or compiled programs (like Java class files)—are problematic.

The solution to this problem is to send these files as attachments to a SOAP message. This hour explains the different ways in which this can be done.

In this hour, you will learn how to attack this problem when using Web services. You will learn

  • How to covert binary data to Base64 encoding so that it can be sent like text

  • How to encode attachments with MIME

  • How to encode with DIME

  • How the new SOAP 1.2 Attachment Feature clarifies the issue

The Problem with Binary Data

The Problem with Binary Data

All computer data is binary, in that it is composed of 1s and 0s. Some patterns of 1s and 0s have been assigned to represent the characters in natural languages such as English, French, Chinese, and so on. In the United States, the most commonly used character set is called the American Standard Code for Information Interchange (ASCII). In the U.S. version of the code, US-ASCII, codes range from the 7-bit representation of the numbers 0 to 127 for a total of 128 different values. These values represent the letters of the alphabet, the capitalized form of the letters of the alphabet, the numbers from 0 to 9, common punctuation marks, and some special characters that are used in data communications. Table 12.1 shows some of the common characters and their US-ASCII codes.

Table 12.1. Sample ASCII Codes

Decimal

Character

Bit Pattern

32

Space

0100000

60

<

0111100

62

>

0111110

65

A

1000001

97

a

1100001

90

Z

1011010

122

z

1111010

48

0

0110000

57

9

0111001

38

&

0100110

47

/

0101111

If we wanted to encode this line (elements and data)

<AaZz>909090</AaZz>

It would be sent as these bits:

0111100 1000001 1100001 1011010 1111010 0111110
0111001 0110000 0111001 0110000 0111001 0110000
0111100 0101111 1000001 1100001 1011010 1111010 0111110

The client computer would translate the characters in the XML into the bit representations—taking special care to deal with LF, CR, and period (.)—and send them to the Web service’s server. The server would translate the bits into characters and send them to the Web service. As long as all of the information in the XML document can be represented as ASCII characters, this strategy works well.

Note

Sample ASCII Codes

Other character sets exist to represent languages such as French, Chinese, Arabic, and so on. With the exception of the differences in the length and composition of the binary strings that make up these characters sets, they are processed the same way as US-ASCII.

Binary data is any data representation that does not map to a character set. Although it is still composed of 1s and 0s, it does not conform to the strict rules of character encoding. If you introduce non-ASCII data into the preceding message and send it to an XML parser, confusion will be the result. Unless you find a way to represent binary data in Web services, certain types of data can’t be sent.

Using Base64 Encoding

Using Base64 Encoding

The conventional solution is to convert the binary data into XML friendly characters using an encoding scheme called Base64. It is popular because it is conceptually very simple. Base64 encoding takes three bytes of binary data, which is normally grouped into octets (8-bit groupings), and combines them into a single 24-bit grouping. It then creates four 6-bit groupings and stores them as if they were US-ASCII. The “=” character (111101) is used as metadata. For example, consider the following set of binary values:

01100011 00011101 01111110

This string can’t be treated as US-ASCII characters because there is no way to guarantee that forbidden character sequences (US-ASCII control characters) are not embedded in it. These characters are divided into four sets of 6-bits as shown here:

011000 110001 110101 111110

Special characters are added to the stream. Each of these 6-bit sets is used as an index to an array of characters that are never used as metadata. In this Base64 alphabet, the bit strings have the values shown in Table 12.2.

Table 12.2. Typical Base64 Alphabet Characters

Decimal

Character

Bit Pattern

0

A

000000

1

B

000001

2

C

000010

23

X

010111

24

Y

011000

25

Z

011001

26

a

011010

27

b

011011

28

c

011100

49

x

110001

50

y

110010

51

z

110011

52

0

110100

53

1

110101

60

8

111100

61

9

111101

62

+

111110

63

/

111111

The resulting string would be Yx1+. You could then take this data, place it in an element like this, and send it to the Web service:

<imageData>Yx1+</imageData>

You have noticed that this data is not human readable, but it will transmit without error. On the server side, the characters are retranslated:

Y 011000
X 010111
1 110101
+ 111110

A set of bits is created from these:

011000 110001 110101 111110

These bits are regrouped into octets and all metadata removed:

01100011 00011101 01111110

The Web service now has the original data in its memory where it can process it. You might think that this approach is the solution to the binary data transfer problem. The fact that the data is about 33% larger after the encoding is performed and the fact that the encoding/decoding process consumes resources at both the client and the server can cause performance problems in certain systems.

In cases in which bandwidth and processing resources are abundant, Base64 encoding works well for sending attachments inside Web services transactions. In many cases, however, bandwidth and processing resources are scarce. This is particularly true in the future wireless Web services world. In these situations, a more economical approach is needed.

Multipurpose Internet Mail Extensions

Multipurpose Internet Mail Extensions

The problem of sending binary data from one computer to another is not new with Web services. As far back as 1992, the Internet Engineering Task Force (IETF) released a standard for sending attachments along with email called Multipurpose Internet Mail Extensions (MIME) in a document called RFC 2387. You can download this document at http://www.ietf.org/rfc/rfc2387.txt.

Prior to RFC 2387, the following restrictions applied to email:

  • The message may contain only US-ASCII characters.

  • The maximum line length allowed is 1,000 characters.

  • The message must not be longer than a predefined maximum size.

After RFC 2387 email could add the following types of attachments:

  • Character sets other than US-ASCII are supported.

  • Image files can be sent.

  • Audio files are allowed.

  • Video can be sent.

  • Multiple attachments are allowed in the same message.

  • Messages may have more than one font.

  • Messages may be of any size.

  • Binary files may be sent as attachments.

MIME works by adding a Content-Type header that can be used to specify the type and subtype of the data being sent in the attachment. Seven types of attachments are specified:

  • Type 1—Text—Text data in a character set

  • Type 2—Image—Still image data

  • Type 3—Audio—Audio or voice data

  • Type 4—Video—Moving image data

  • Type 5—Message—Encapsulating a mail message

  • Type 6—Multipart—Combinations of several of the other types into one message

  • Type 7—Application—Binary or application data

In addition to the Content-Type, a MIME header must contain a MIME-Version and a Content-Transfer-Encoding field. It may also contain a Content-ID to allow references from one attachment to another, and a Content-Description, which allows the sender to add a descriptive message to the attachment. Listing 12.1 shows a sample multipart message.

Example 12.1. A MIME Multipart Message

From [email protected] Sun Aug 6 18:32:49 1995
Return-Path: [email protected]
Received: from elf.ecitele.com ([147.234.56.1]) by taurus.math.tau.ac.il
(8.6.10/math) with ESMTP id SAA03892 for <[email protected]>;
Sun, 6 Aug 1995 18:31:14 +0300
From: [email protected]
Received: from pc-ranlahat.ecitele.com (pc-ranlahat.ecitele.com
[147.234.18.108]) by elf.ecitele.com (8.6.12/8.6.12) with
SMTP id SAA23658; Sun, 6 Aug 1995 18:27:18 +0300
Date: Sun, 6 Aug 95 18:14:51 IST
Subject: sending gifs
To: shlomit <[email protected]>
X-Mailer: Chameleon V0.05, TCP/IP for Windows, NetManage Inc.
Message-ID: <[email protected]>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="pc-
ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072"
Status: R

--pc-ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hi Shlomit,

I'm sending you some GIF files

Bye
Ran
--pc-ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072
Content-Type: IMAGE/gif; SizeOnDisk=658; name="FINGERAC.GIF"
Content-Transfer-Encoding: BASE64
Content-Description: FINGERAC.GIF

R0lGODdhJAAkAPYPAAAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8AAAD/
AP//AAAA//8A/wD//////wAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8A
AAD/AP//AAAA//8A/wD//////wAAAIAAAACAAICAAAAAgIAAgACAgMDAwICA
gP8AAAD/AP//AAAA//8A/wD//////wAAAIAAAACAAICAAAAAgIAAgACAgMDA
.
.
.
CNELhsvE2KnSg9vdq9O2q6W0p6LfyrXmmcLpB4WwnvC2uAD0msLFqPm3/9Lt
U8TOVzF02QK6ozSJ3MEH/oihWniJYKJ/hIpRnPWoWUJEEy1xajSNHQB5GkUq
enQAgbxB+EIRc6myUkuXxMYRW8mQJymMNWUB3URw3UVyhaSt+wYRIrWntpqi
YmoR49FDSgcxJXnKp65dXr/2FDvwqtmLgQAAOw==

--pc-ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072
Content-Type: IMAGE/gif; SizeOnDisk=787; name="GOPHRACT.GIF"
Content-Transfer-Encoding: BASE64
Content-Description: GOPHRACT.GIF

R0lGODdhJAAkAPYPAAAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8AAAD/
AP//AAAA//8A/wD//////wAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8A
AAD/AP//AAAA//8A/wD//////wAAAIAAAACAAICAAAAAgIAAgACAgMDAwICA
.
.
.
3Inq3k8AQZ8WXZTT4strJp3aRNiLX0iq9hrKBCqUaMp9U3grumzHlBHWmlAn
ThXZkqTapm1RfkQ6F6C7q2LzmpVLEkAAxG0jRuw17xC1dQEQZAsgc+zbxf/q
OUbwWFMAzolSF8rMGdfp17Bjy57N2jEu1bgPZcZFmrbv37udfeo9vLjx48Q/
2V374DTy58Wda15Oqjqk3NhTBwIAOw==

--pc-ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072
Content-Type: AUDIO/wav; SizeOnDisk=22230; name="PHONE.WAV"
Content-Transfer-Encoding: BASE64
Content-Description: PHONE.WAV

UklGRs5WAABXQVZFZm10IBAAAAABAAEAESsAABErAAABAAgAZGF0YalWAABd
an+RiHhreI2hi3RsfpGReFtrgZSIdmp5kaSLcWp/kpB1V2uDlId0aXqUpYpw
a4CVj3NWa4aVhnFme5alimxqgpmNcFVsiJWGb2Z9mqWIa2uFmo1uU2uKloVs
Zn+epYdpa4Wbi2pRbIyXhWtlgJ+mhmdrhpyLaVBsjJeDamWBo6aHZ2uHn4tn
UG6PmYVqZoKkpoZla4igi2ZObpGXg2dlgqamhmVriqGKZE5ukZeDZmWDqKWF
.
.
.
g29sgpyag2xxhpaDaWB1jI+Bbm+GnpWAbHSIloJmYnaNjH9ucYaekn5sdoqU
gGVmeY+Mf2xzh5+PfWt2ipR/ZGl6j4t+bHSHno15a3mLkn5ha32RintsdYqe
i3hse4yQe2Fsf5CIemx2i56IdGx9jY96YG6BkYd4bnmNnIh0bn6PjHhgb4KQ
hnZue4+bh3Nvf5CLdmFxg4+FdW5+kZmFcHCBkYh1ZHSGj4V0b36SloNwc4KR
h3NldYiPg3RwgZWUgnB0g5GFcGZ2ioyCc3GClZGAb3SDkIFuaXmLi4FzdIOW
j39wdoWPgGtqeYuKgHN1hZeNfnB4ho1/amx6i4h/cHWFlop6b3qHjX5pbn6M
iH1xeIeXiHlwe4iMfWdvf4yGe3F5iJaGeHF9iop6ZnCAjIV6cXuLlYV2c36K
iHlnc4GLg3Z0g5SLe3N5goeBeXl6eH9/gX+CgH17go2Ge3V9gYOAfn57eHsA

--pc-ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072--

 

The first MIME headers in this email appear just before the first message.

MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="pc-
ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072"

This tells the server that the content of this message is MULTIPART and that its subtype is MIXED, meaning that this message has multiple attachments that are of different types. It also defines a unique string as a boundary between message parts.

The boundary is repeated, and the first content type and subtype are shown:

--pc-ranlahat.ecitele.com:807722592:1402405494:1451687977:1917059072
Content-Type: TEXT/PLAIN; charset=US-ASCII

After the text message, another boundary appears followed by another header. This header says that this is a GIF image that is being sent using BASE64 encoding.

Content-Type: IMAGE/gif; SizeOnDisk=658; name="FINGERAC.GIF"
Content-Transfer-Encoding: BASE64
Content-Description: FINGERAC.GIF

R0lGODdhJAAkAPYPAAAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8AAAD/
AP//AAAA//8A/wD//////wAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8A

This is followed by another GIF file:

Content-Type: IMAGE/gif; SizeOnDisk=787; name="GOPHRACT.GIF"
Content-Transfer-Encoding: BASE64
Content-Description: GOPHRACT.GIF

R0lGODdhJAAkAPYPAAAAAIAAAACAAICAAAAAgIAAgACAgMDAwICAgP8AAAD/

And is finally followed by a WAV file:

Content-Type: AUDIO/wav; SizeOnDisk=22230; name="PHONE.WAV"
Content-Transfer-Encoding: BASE64
Content-Description: PHONE.WAV

UklGRs5WAABXQVZFZm10IBAAAAABAAEAESsAABErAAABAAgAZGF0YalWAABd

Later in the hour, you will learn how MIME can be used to send attachments to SOAP messages.

Direct Internet Message Encapsulation

Direct Internet Message Encapsulation

In the world of standards, it seems as if there are always several to choose from in the early stages. An alternative standard, Direct Internet Message Encapsulation (DIME), has been proposed to the IETF to be used instead of MIME.

DIME is less flexible than MIME because it is based on a simpler message format. DIME is written from the ground up to be used in conjunction with SOAP for the specific purpose of adding attachments to Web services messages. As a result, the DIME header contains very little information about the attachments, deferring most of the details to the body of the SOAP message.

DIME could be thought of as a new version of MIME that is designed for Web services. It uses the existing MIME Content-types and subtypes to identify the encoding of the records.

A DIME message consists of one or more DIME records. Each record contains information about its own contents.

00001 1 0 0 0010 00000000000000000000
0000000000000000 0000000000101000
00000000000000000000000110110101
http://schemas.xmlsoap.org/soap/envelope
<soap-env:Envelope
 xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/"
 xmlns:msg="http://example.com/DimeExample/Messages/"
 xmlns:ref="http://schemas.xmlsoap.org/ws/2002/04/reference/"
>
 <soap-env:Body>
  <msg:GetMediaFile>
   <msg:fileName>myMediaFile.mpg
   </msg:fileName>
   <msg:file ref:location=
     "uuid:F2DA3C9C-74D3-4A46-B925-B150D62D9483" />
  </msg:GetMediaFile>
 </soap-env:Body>
</soap-env:Envelope>
-------------------------------------------------------------------------
00001 0 0 1 0001 00000000000000000000
0000000000101001 0000000000001010
00000000000101011010101011100000
uuid:F2DA3C9C-74D3-4A46-B925-B150D62D9483
video/mpeg
<<First 1.42 MB of binary data for myMediaFile.mpg>>
-------------------------------------------------------------------------
00001 0 1 0 0000 00000000000000000000
0000000000000000 0000000000000000
00000000000010000110110001000000
<<Remaining 552 KB of binary data for myMediaFile.mpg>>

The bits that you see at the top of each record are in a fixed format and are used to specify the following data about the record:

  • Version (5 bits)—Version of the DIME message.

  • MB (1 bit)—First record indicator.

  • ME (1 bit)—Last record indicator.

  • CF (1 bit)—Chunked Flag—Indicates whether a record has been chopped into pieces for convenience in transmitting the document.

  • TYPE_T (4 bits)—Structure and format information of the TYPE field.

  • OPTIONS_LENGTH (16 bits)—The length of the OPTIONS field.

  • ID_LENGTH (16 bits)—The length of the ID field.

  • TYPE_LENGTH (16 bits)—The length of the TYPE field.

  • DATA_LENGTH (32 bits)—The length of the data.

  • OPTIONS—Any information sent by the DIME encoder.

  • ID—A URI to identify the payload.

  • TYPE—The type reference URI or MIME type and subtype of the payload.

  • DATA—The actual data.

By using the data in the header, a parser can determine exactly what the data in the record is, where it starts, and where it ends.

00001 1 0 0 0010 00000000000000000000
0000000000000000 0000000000101000
00000000000000000000000110110101

The first record contains the SOAP message itself. It might seem strange to have another object type encapsulating the SOAP message, but from a practical standpoint it makes sense. When the DIME parser gets the message, it can send the SOAP message on to the SOAP parser and store the attachments in some type of cache. When the SOAP message refers to the attachment, the Web service can retrieve and process the attachment.

 <soap-env:Body>
  <msg:GetMediaFile>
   <msg:fileName>myMediaFile.mpg
   </msg:fileName>
   <msg:file ref:location=
       "uuid:F2DA3C9C-74D3-4A46-B925-B150D62D9483" />
 </msg:GetMediaFile>
</soap-env:Body>

Notice that the uuid in the SOAP message is identical to the one in the next DIME record.

uuid:F2DA3C9C-74D3-4A46-B925-B150D62D9483

This makes it easy for the Web service to be certain about the identity of the attachment.

The fixed format of the header makes processing it much faster than the freer format of the MIME standard.

Note

Direct Internet Message Encapsulation

Microsoft and IBM submitted a document called WS-Attachments to the Internet Engineering Task Force in an attempt to start it down the road to becoming a standard or recommendation. This document formalized the format that was described in the earlier DIME section of this hour.

Understanding the New SOAP 1.2 Attachment Feature

Understanding the New SOAP 1.2 Attachment Feature

On August 14, 2002, the World Wide Web Consortium (W3C) published a working draft of what it calls the SOAP 1.2 Attachment Feature. It states that this draft is “based in part on the WS-Attachments proposal” mentioned in the previous section of this hour. The URL for this draft is http://www.w3.org/TR/2002/WD-soap12-af-20020814/.

This draft proposal doesn’t require that a SOAP receiver process any of the secondary parts of a compound document. The receiver determines, based on the primary SOAP message, whether to process the attachments.

In addition, the draft proposal does not specify that either DIME or MIME be used to specify the document, but it mentions them both in the context of how a message might actually be sent. As examples, it lists the following three ways to handle the compound message:

  • The primary SOAP message part and the attachment can be encapsulated in a single DIME message and sent using a protocol such as TCP or HTTP. This means that your software can send attachments using DIME.

  • The primary SOAP message part and the attachment can be encapsulated in a single MIME message and transmitted using a protocol such as HTTP. This means that you can send attachments using MIME encapsulation if you choose to.

  • The primary SOAP message part can be exchanged using the HTTP binding without any encapsulation, and the attachment can be transmitted using a separate request. This makes it legal to send only the data describing the attachment along with instructions on how to perform the transfer.

As a caution, the draft mentions the potential security problems associated with attachments. This document makes particular mention of using the “application/postscript” and the “message/external-body” media type. Therefore, you should avoid sending these types of attachments because they can contain viruses.

Summary

In this hour, you learned about the difficulties associated with attaching noncharacter data to XML and SOAP messages. These difficulties include data corruption and bloated message size. In particular, you saw how attachments can be included using Base64 encoding.

Following that, you learned about the potential use of MIME and DIME to format SOAP messages that contain attachments. Finally, you learned about the SOAP 1.2 Attachment Feature that is currently being circulated by the W3C as a draft as of this writing.

By using these technologies, you can send binary data along with your XML documents in your Web services transactions.

Q&A

Q

Why do we need additional standards beyond MIME?

A

MIME was originally designed for use in email messages. DIME was designed to be used in Web services. DIME is easier to parse and therefore consumes fewer resources than MIME.

Q

Why didn’t the SOAP 1.2 Attachment Feature specify either MIME or DIME?

A

The decision appears to be a compromise. The draft proposal left the door open to please both sides.

Q

Why don’t we just place the binary data in the middle of an XML document?

A

XML parsers expect character data and control characters to compose 100% of the transmission. Binary data can contain bit patterns that cause errors in these parsers.

Workshop

The Workshop is designed to help you review what you’ve learned and begin learning how to put your knowledge into practice.

Quiz

1.

How does MIME define a boundary between messages?

2.

What is the role of the WS-Attachment proposal now that the W3C has issued the SOAP 1.2 Attachment Feature?

3.

What type of encapsulation is specified by the SOAP 1.2 Attachment Feature?

Quiz Answers

1.

The message creator defines the boundary in the header.

2.

The WS-Attachment proposal provided input to the SOAP 1.2 Attachment Feature. Now that the SOAP draft has been published, it will become the focal point.

3.

The draft doesn’t specify an encapsulation, but it allows for any encapsulation including MIME or DIME (or neither) to be used.

Activities

1.

Using Java classes, create a Web service that consumes an attachment.

2.

Create a client program that sends a message with an attachment to the Web service that you wrote in Activity 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.203.172