base64

Base64 encoding is used whenever a file needs to be sent via the Internet mail system and the 7bit and quoted-printable encoding types are unsuitable. In other words, any form of data can be base64 encoded (since octets of arbitrary sequence are allowed), but it is more efficient to leave 7-bit text alone and to encode other textual data with quoted-printable. Base64 encoding is described in RFC 2045.

Base64-encoded information is not meant to be human readable, even if the original data was.

Base64 encoding takes three octets (24 bits) and maps them into four 6-bit blocks, then represents each 6-bit block with a character in a 64-character alphabet (2 to the 6th power is 64). Because of this mapping, base64 encoded information is about one-third larger than the original data.

To encode a bit stream with the base64 method, first convert the bit stream into an octet stream, if necessary. As described in the section “Quoted-Printable,” earlier in this chapter, an octet is a sequence of 8 bits in the network-standard big-endian order.

Next, read the first 24 bits and convert them into four 6-bit blocks. Represent each block as a letter from the following alphabet. Use the decimal value of the block as an index into the alphabet table.

Value  Encoding   Value  Encoding   Value  Encoding   Value  Encoding
    0  A             16  Q             32  g             48  w
    1  B             17  R             33  h             49  x
    2  C             18  S             34  i             50  y
    3  D             19  T             35  j             51  z
    4  E             20  U             36  k             52  0
    5  F             21  V             37  1             53  1
    6  G             22  W             38  m             54  2
    7  H             23  X             39  n             55  3
    8  I             24  y             40  o             56  4
    9  J             25  Z             41  p             57  5
   10  K             26  a             42  q             58  6
   11  L             27  b             43  r             59  7
   12  M             28  c             44  s             60  8
   13  N             29  d             45  t             61  9
   14  0             30  e             46  u             62  +
   15  p             31  f             47  v             63  /

            The equals sign (=) is used for padding

If you were paying attention, you might have noticed that the base64 alphabet does not include any of the characters with special meaning to the RFC 822 message format, such as “.”, CR, and LF. It also avoids “-”, which can then be used in MIME boundaries.

Of course, it is never that easy. There have to be additional rules to handle special cases like an octet stream that isn’t evenly divisible by three! That is where the equals sign comes in. The complete rule set for base64 encoding looks like this:

  1. Convert the original data into an octet stream by ensuring that the bits are in bigendian format.

  2. If the data to be encoded is textual, linebreaks must be converted to CRLF form first.

  3. Remove three octets at a time from the stream, and convert them into four 6-bit indexes into the base64 alphabet.

  4. Convert the four 6-bit indexes into four characters from the base64 table.

  5. Ensure that each line of encoded information is less than 76 characters long, not including the terminating linebreak (CRLF).

  6. When you reach the end of the original data, you may have one or two octets left over. If octets are left over, you will have to “pad” the encoding. If the number of octets in the original data was divisible by 3, no padding is necessary.

  7. To pad the encoding, add zero bits onto the end of the stream until you have an integral number of 6-bit blocks. Apply Rule #3 to get the base64 characters, as normal. Then add either one or two equals signs (=) onto the end of the encoding until the total number of characters is evenly divisible by 4.

See Figure 3-1 for an illustration of encoding and padding.

The base64 algorithm at work on an octet stream
Figure 3-1. The base64 algorithm at work on an octet stream

An example of base64 encoding

Consider the small image in GIF format shown in Figure 3-2.

An example image
Figure 3-2. An example image

If we looked at this image in a binary editor (for you VIM users, try vi -b), we would see something resembling this:

GIF89a^T^@^V^@¬^@^@377377377377~Y~Y377ffÃ377377
f^@^@333^@^@^@^@^@^@!̜NThis art is in the public domain.
 Kevin Hughes, [email protected], September 1995^@!ù^D^A^@^@
^C^@,^@^@^@^@^T^@^V^@^@^COH ºÛp^PA'%~Af7
«Ÿ y^U^XB#)~XߧV^F^Qœ<K^Yxn'6 Ï
Æ~^O^AL       e< Ø®,1~['g^S^Y~J~UTc/^K~M&
 c Af^T~@^O~]Z^ML^@^@;

Some of the preceding characters have been converted into octal representations. They may not appear that way in your editor. The octal representations are the ones that begin with a backslash () and are followed by three numbers (e.g., 302). Nulls (ASCII value 0) are shown as ^@. Lines have been wrapped for convenience in printing. The original data consisted of only one line.

The equivalent octet stream (big-endian byte stream) would start like this:

010001110100100101000110001110000011100101100001 ...
       ^       ^       ^       ^       ^       ^

The preceding octet stream fragment represents the first part of the GIF, the part that reads “GIF89a”. The carets (^) on the line below the octet stream show where the last bit in each octet is located.

Each six bits becomes a base64 character, and padding is added, as shown in Figure 3-1. The end result is limited to 76 characters per line and looks like this:

R0lGODlhFAAWAMIAAP////+Zmf9mZsz//2YAADMzMwAAAAAAACH+TlRoaXMgYXJ0IGlzIGlu
IHRoZSBwdWJsaWMgZG9tYWluLiBLZXZpbiBIdWdoZXMsIGtldmluaEBlaXQuY29tLCBTZXB0
ZW1iZXIgMTk5NQAh+QQBAAADACwAAAAAFAAWAAADTOi6vPNwEEGrJYHEN63H2dZ5FRhCIymY
p6RWBhHP3EsZeG7UNsXsrp5PAUwJZTyhr6gsMZurZxMZnEqVVGMvC40msddjsUFmFIAPnXoN
TAAAOw==

Note that this example ends with two padding characters (==). Padding with one equals sign occurs when only three 6-bit groups are to be encoded in the last step. When there are only two such groups, two equals signs need to be used. Of course, when there are four 6-bit groups at the end of the data, no equals signs need to be added.

Note that it is not possible to have only one 6-bit group left to process at the end of the data (which would result in more than two equals signs if it could happen). This is because the original data is in octets of 8 bits; breaking at least one octet up into 6-bit blocks leaves at least one complete 6-bit block and the start of another, which will be padded with zeros. Therefore, there will always be at least two 6-bit blocks ending the encoded data before adding any padding characters.

Decoding base64

When decoding base 64 data, ignore any linebreaks and characters that aren’t in the base 64 alphabet. If you encounter whitespace or other illegal characters in the data, consider throwing an exception to the user since the data may be corrupted.

Equals signs should not occur anywhere in base64-encoded data except (possibly) at the end of the data. Any other occurrence is an error.

The remainder of base64 decoding is as simple as reversing the encoding steps. Ensure that the padding characters (=) are removed when decoding.

As noted in the section “Decoding quoted-printable,” earlier in this chapter, when decoding any type of data that might be executable in the user’s environment, decoders should not actually execute the decoded content without the user’s express permission.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.76.0