29.15. Encodings, Character Sets, and Languages

As Section 29.12 “Adding and Editing MIME Types” explains, Apache attempts to determine a MIME type for every file that it sends to a browser. In addition to the type, files can also have an encoding that is usually used to indicate how they were compressed. The encoding is determined by the file extension (such as.gz for gzipped data) and can be used by the browser to uncompress the file before displaying it.

For example, this would allow you to create a file called foo.html.gz that contains compressed HTML data and is identified by the web server as such. For large files, sending them in compressed format can save bandwidth and reduce the time it takes for them to be downloaded. Unfortunately, not all browsers support the common .gz and .Z encoding formats, so this feature is not always useful. At the time this book was written, Mozilla and Netscape supported compressed encoding, but IE did not.

Encodings can be defined globally, on a per-virtual server basis, or just for a single directory or URL location. They are usually defined globally, however, and can be viewed and edited by following these steps:

1.
Click on the Default Server icon on the Apache Webserver module's main page.

2.
Click on the MIME Types icon, and scroll down to the Content encodings table. Each row in the table defines two encodings and there is always at least one pair of empty fields for adding a new one. Typically, entries for the x-compress and x-gzip encodings will already exist as they are included in the default Apache configuration.

3.
To add a new encoding, enter its name into the first empty field under the Content encoding column. In the field next to it, enter a space-separated list of filename extensions that are used by files encoded in that format.

4.
To change the name or extensions for an existing encoding, just edit its fields in the table. For example, you can add extra extensions for an encoding by just entering them into the same field as existing ones.

5.
If you want to delete an encoding, just clear its entries in the fields under the Content encoding and Extensions fields.

6.
When you are done editing encodings, click the Save button at the bottom of the page and then click the Apply Changes link.

Apache takes all filename extensions into account when determining a file's MIME type, encoding, language, and character set and does not care about their order. This means that files named foo.html.gz and foo.gz.html are both identified as containing gzip-compressed HTML data.

Another piece of information that Apache can supply to browsers requesting a file is the character set used by text in the file. If all your web pages are in English, or a language like Malay that does not use any non-English letters, then you don't need to care about this. If you are creating HTML pages in a different language that uses characters outside the standard ASCII character set, however, then it is useful and often necessary to indicate to browsers what character set each page is in.

Languages like German and French use special characters, like ö, that are represented by bytes above 128. Others like Chinese and Russian have so many characters that each must be represented by two bytes, using special character sets like Big5 and KOI8. For these languages, it is vital that the browser be informed of the character set of each page so that it can decode the text that it contains and use the correct font to display characters.

As with encodings, Apache determines the character set of each file by looking at its filename extension. For example, a file named foo.html.Big5 would be identified as HTML, in which the text was encoded in the Chinese Big5 format. A file can have both a character set and an encoding, such as foo.html.Big5.gz, and the order in which its extensions fall does not matter.

Character sets can be defined globally or for individual virtual servers and directories. To view and edit the global list of character sets, follow these steps:

1.
On the Apache Webserver module's main page, click on the Default Server icon.

2.
Click on the Languages icon and scroll down to the Extra character sets table. Each row in the table defines two character sets, and there is always at least one pair of empty fields for adding a new one. In the default Apache configuration, several commonly used character sets are already defined.

3.
If you need to add a new character set, enter its standard ISO name into the first empty field under the Charset column and the filename extensions associated with it into the adjacent field under Extensions. Many common character sets are defined by default, so you may just be able to use one of the existing recognized extensions for your files. Multiple extensions must be separated by spaces.

4.
You can change the name or extensions for existing character sets by just editing the fields in the table. It is not usually a good idea to rename the default sets because they use the standard names that are recognized by browsers. Adding extensions is perfectly safe, however.

5.
To delete a character set, just clear out the fields containing its name and any associated extensions.

6.
When you are done editing, click the Save button. If you used up all the blank fields in the Extra character sets table and want to add more, click on the Languages icon again. Otherwise, use the Apply Changes link to make your changes active.

Because most of the commonly used character sets are defined by default in the Apache configuration, it is not usually necessary to add new ones. Instead, you can just find the associated extensions and use them on your filenames.

Apache can also identify the language in which an HTML or text file is written by looking at its filename extensions. At first it may seem that there is no difference between a file's language and its encoding, but that is not always the case. For example, the ISO-8859-2 character set is used for many different European languages, and the Chinese language can be represented by both the Big5 and GB character sets.

Unfortunately, few browsers actually make any use of the language in which a file is written . Some can be configured to request pages in a language chosen by the user, however, and Apache can be set up to use this information to identify the correct file to return. This happens when the Generate Multiviews option on the directory options page is turned on for a directory.

When that option is active, a request for a page like /documents/foo, which does not actually exist, will cause Apache to scan the directory for /documents for all files starting with foo, identify their types and languages, and return the one that best matches the client's specified language. This is useful if you want to be able to have multiple versions of the same page in different languages, but have them all accessible via the same URL.

To view and edit the languages and file extensions recognized by Apache, follow these steps:

1.
Click on the Default Server icon on the Apache Webserver module's main page.

2.
Click on the Languages icon and find the Content languages table. Each row in the table defines two languages, and there is always at least one pair of empty fields for adding a new one. The default Apache configuration contains several commonly used languages.

3.
To add a new language, enter its ISO code into the first empty field under the Language column and a list of extensions separated by spaces for files in that language under the Extensions column.

4.
Existing languages can be edited by just changing their codes and extensions in the table, or deleted by clearing out their fields. It is wise not to change the standard codes for existing default languages.

5.
When you are done editing languages, click the Save button at the bottom of the page. If you ran out of blank fields when adding new ones, click on the Languages icon again to return to the table. Otherwise, use the Apply Changes link to activate your new settings.

As with encodings and character sets, Apache does not care about the ordering of extensions in a filename when working out its type and language. Therefore, both the foo.html.de and foo.de.html files would be identified as HTML documents written in German.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.138.97