CHAPTER 23

Building Web Sites for the World

The Web makes it incredibly easy for you to communicate your message to anybody with an Internet connection and a Web browser, no matter if they're sitting in a café in Moscow's Red Square, in a farmhouse in Ohio, in a cubicle in a Shanghai high-rise, or in an Israeli classroom. Except there is one tiny issue: only about 29 percent of the total Internet population actually speaks English.1 The rest speak Chinese, Japanese, Spanish, German, French, or one of several dozen other languages. Therefore, if you're interested in truly reaching a global audience, you need to think about creating a Web site conforming to not only the visitor's native language, but also standards (most prevalently, currency, dates, numbers, and times).

But creating software capable of being used by the global community is hard, and not only for the obvious reason that one has to have the resources available to translate the Web site text. One also has to think about integrating the language and standards modifications into the existing application in a manner that precludes insanity. This chapter will help you eliminate this second challenge.


Note One of PHP 6's key features is native support for Unicode (http://www.unicode.org/), a standard that greatly reduces the overhead involved in creating applications and Web sites intended to be used on multiple platforms and to support multiple languages. While neither Unicode nor PHP's implementation is discussed in this book, be sure to learn more about the topic if globally accessible applications are a crucial part of your project.


Supporting native languages and standards is a two-step process, requiring the developer to internationalize and localize the Web site. Internationalizing the Web site involves making the changes necessary to make it possible to localize the Web site, which involves updating the site to offer the actual languages and features. Because programmers are lazy, you'll often see internationalization written as i18n, and localization as l10n.

In this section you'll learn about an approach you might consider for internationalizing and localizing your Web site.

Translating Web Sites with Gettext

Gettext (http://www.gnu.org/software/gettext/) is one of the many great projects created and maintained by the Free Software Foundation, consisting of a number of utilities useful for internationalizing and localizing software. Over the years it's become a de facto standard solution for maintaining translations for countless applications and Web sites. PHP interacts with gettext through a namesake extension, meaning you need to download the gettext utility and install it on your system. If you're running Windows, download it from http://gnuwin32.sourceforge.net/ and make sure you update the PATH environment variable to point to the installation directory.

Because PHP's gettext extension isn't enabled by default, you probably need to reconfigure PHP. If you're on Linux you can enable it by rebuilding PHP with the −-enable-gettext option. On Windows all you need to do is uncomment the php_gettext.dll line found in the php.ini file. See Chapter 2 for more information about configuring PHP.

The remainder of this section guides you through the steps necessary to create a multilingual Web site using PHP and gettext.

Step 1: Update the Web Site Scripts

Gettext must be able to recognize which strings you'd like to translate. This is done by passing all translatable output through the gettext() function. Each time gettext() is encountered, PHP will look to the language-specific localization repository (more about this in Step 2), and match the string encompassed within the function to the corresponding translation. The script knows which translation to retrieve due to earlier calls to setlocale(), which tells PHP and gettext which language and country you want to conform to, and then to bindtextdomain() and textdomain(), which tell PHP where to look for the translation files.

Pay special attention to the mention of both language and country, because you shouldn't simply pass a language name (e.g., Italian) to setlocale(). Rather, you need to choose from a predefined combination of language and country codes as defined by the International Standards Organization. For example, you might want to localize to English but use the United States number and time/date format. In this case you would pass en_US to setlocale() as opposed to passing en_GB. Because the differences between British and United States English are minimal, largely confined to a few spelling variants, you'd only be required to maintain the few differing strings and allow gettext() to default to the strings passed to the function for those it cannot find in the repository.


Note You can find both the language and country codes as defined by ISO on many Web sites; just search for the keywords ISO, country codes, and language codes. Table 23-1 offers a list of common code combinations.


Table 23-1. Common Country and Language Code Combinations

Combination Locale
pt_BR Brazil
fr_FR France
de_DE Germany
en_GB Great Britain
he_IL Israel
it_IT Italy
es_MX Mexico
es_ES Spain
en_US United States

Listing 23-1 presents a simple example that seeks to translate the string Choose a password: to its Italian equivalent.

Listing 23-1. Using gettext() to Support Multiple Languages

<?php

    // Specify the target language
    $language = 'it_IT';

    // Assign the appropriate locale
    setlocale(LC_ALL, $language);

    // Identify the location of the translation files
    bindtextdomain("messages", "/usr/local/apache/htdocs/locale");

    // Tell the script which domain to search within when translating text
    textdomain("messages");
?>

<form action="subscribe.php" method="post">
   <?php echo gettext("Enter your e-mail address:"); ?><br />
   <input type="text" id="email" name="email" size="20" maxlength="40" value="" />
   <input type="submit" id="submit" value="Submit" />
</form>

Of course, in order for Listing 23-1 to behave as expected, you need to create the aforementioned translation repository and translate the strings according to the desired language. You'll learn how to do this in Steps 2, 3, and 4.

Step 2: Create the Localization Repository

Next you need to create the repository where the translated files will be stored. One directory should be created for each language/country code combination, and within that directory you need to create another directory named LC_MESSAGES. So, for example, if you plan on localizing the Web site to support English (the default), German, Italian, and Spanish, the directory structure would look like this:

locale/
    de_DE/
        LC_MESSAGES/
    it_IT/
        LC_MESSAGES/
    es_ES/
        LC_MESSAGES/

You can place this directory anywhere you please because the bindtextdomain() function (shown in action in Listing 23-1) is responsible for mapping the path to a predefined domain name.

Step 3: Create the Translation Files

Next you need to extract the translatable strings from the PHP scripts. You do so with the xgettext command, which is a utility bundled with gettext. xgettext offers an impressive number of options, each of which you can learn more about by executing xgettext with the --help option. Executing the following command will cause xgettext to examine all of the files found in the current directory ending in .php, producing a file consisting of the desired strings to translate:

%>xgettext -n *.php

The −n option results in the file name and line number being included before each string entry in the output file. By default the output file is named messages.po, although you can change this using the −-default-domain=FILENAME option. A sample output file follows:


# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION "
"Report-Msgid-Bugs-To: "
"POT-Creation-Date: 2007-05-16 13:13-0400 "
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE "
"Last-Translator: FULL NAME <EMAIL@ADDRESS> "
"Language-Team: LANGUAGE <[email protected]> "
"MIME-Version: 1.0 "
"Content-Type: text/plain; charset=CHARSET "
"Content-Transfer-Encoding: 8bit "

#: homepage.php:12
msgid "Subscribe to the newsletter:"
msgstr ""

#: homepage.php:15
msgid "Enter your e-mail address:"
msgstr ""

#: contact.php:12
msgid "Contact us at [email protected]!"
msgstr ""

Copy this file to the appropriate localization directory and proceed to the next step.

Step 4: Translate the Text

Open the messages.po file residing in the language directory you'd like to translate, and translate the strings by completing the empty msgstr entries that correspond to an extracted string. Then replace the placeholders represented in all capital letters with information pertinent to your application. Pay particular attention to the CHARSET placeholder, because the value you use has a direct effect on gettext's ability to ultimately translate the application. You need to replace CHARSET with the name of the appropriate character set used to represent the translated strings. For example, character set ISO-8859-1 is used to represent languages using the Latin alphabet, including English, German, Italian, and Spanish. Windows-1251 is used to represent languages using the Cyrillic alphabet, including Russian. Rather than exhaustively introduce the countless character sets here, I suggest you check out the great Wikipedia summary at http://en.wikipedia.org/wiki/Character_encoding.


Tip Writing quality text in one's own native tongue is difficult enough, so if you'd like to translate your Web site into another language, seek out the services of a skilled speaker. Professional translation services can be quite expensive, so consider contacting your local university—there's typically an abundance of foreign-language students who would welcome the opportunity to gain some experience in exchange for an attractive rate.


Step 5: Generate Binary Files

The final required preparatory step involves generating binary versions of the messages.po files, which will be used by gettext. This is done with the msgfmt command. Navigate to the appropriate language directory and execute the following command:

%>msgfmt messages.po

Executing this command produces a file named messages.mo, which is what gettext will ultimately use for the translations.

Like xgettext, msgfmt also offers a number of features through options. Execute msgfmt --help to learn more about what's available.

Step 6: Set the Desired Language Within Your Scripts

To begin taking advantage of your localized strings, all you need to do is set the locale using setlocale() and call the bindtextdomain() and textdomain() functions as demonstrated in Listing 23-1. The end result is the ability to use the same code source to present your Web site in multiple languages. For instance, Figures 23-1 and 23-2 depict the same form, the first with the locale set to en_US and the second with the locale set to it_IT.

image

Figure 23-1. A newsletter subscription form with English prompts

image

Figure 23-2. The same subscription form, this time in Italian

Of course, there's more to maintaining translations than what is demonstrated here. For instance, you'll need to know how to merge and update .po files as the Web site's content changes over time. Gettext offers a variety of utilities for doing exactly this; consult the gettext documentation for more details.

While gettext is great for maintaining applications in multiple languages, it still doesn't satisfy the need to localize other data such as numbers and dates. This is the subject of the next section.


Tip If your Web site offers material in a number of languages, perhaps the most efficient way to allow a user to set a language is to store the locale string in a session variable, and then pass that variable into setlocale() when each page is loaded. See Chapter 18 for more information about PHP's session-handling capabilities.


Localizing Dates, Numbers, and Times

The setlocale() function introduced in the previous section can go far beyond facilitating the localization of language; it can also affect how PHP renders dates, numbers, and times. This is important because of the variety of ways in which this often crucial data is represented among different countries. For example, suppose you are a United States-based organization providing an essential subscription-based service to a variety of international corporations. When it is time to renew subscriptions, a special message is displayed at the top of the browser that looks like this:

Your subscription ends on 3-4-2008. Renew soon to avoid service cancellation.

For the United States-based users, this date means March 4, 2008. However, for European users, this date is interpreted as April 3, 2008. The result could be that the European users won't feel compelled to renew the service until the end of March, and therefore will be quite surprised when they attempt to log in on March 5. This is just one of the many issues that might arise due to confusion over data representation.

You can eliminate such inconsistencies by localizing the information so that it appears exactly as the user expects. PHP makes this a fairly easy task, done by setting the locale using setlocale(), and then using functions such as money_format(), number_format(), and strftime() per usual to output the data.

For example, suppose you want to render the renewal deadline date according to the user's locale. Just set the locale using setlocale(), and run the date through strftime() (also taking advantage of strtotime() to create the appropriate timestamp) like this:

<?php
    setlocale(LC_ALL, 'it_IT'),
    printf("Your subscription ends on %s", strftime('%x', strtotime('2008-03-04'));
?>

This produces the following:


Your subscription ends on 04/03/2008

The same process applies to formatting number and monetary values. For instance, while the United States uses a comma as the thousands separator, Europe uses a period, a space, or nothing at all for the same purpose. Making matters more confusing, while the United States uses a period for the decimal separator, Europe uses a comma for this purpose. Therefore the following numbers are ultimately considered identical:

  • 523,332.98
  • 523 332.98
  • 523332.98
  • 523.332,98

Of course, it makes sense to render such information in a manner most familiar to the user, in order to reduce any possibility of confusion. To do so, you can use setlocale() in conjunction with number_format() and another function named localeconv(), which returns numerical formatting information about a defined locale. Used together, these functions can produce properly formatted numbers, like so:

<?php
    setlocale(LC_ALL, 'it_IT'),
    $locale = localeconv();
    printf("(it_IT) Total hours spent commuting %s <br />",
        number_format(4532.23, 2, $locale['decimal_point'],
        $locale['thousands_sep']));



    setlocale(LC_ALL, 'en_US'),
    $locale = localeconv();
    printf("(en_US) Total hours spent commuting %s",
        number_format(4532.23, 2, $locale['decimal_point'],
        $locale['thousands_sep']));

?>

This produces the following result:


(it_IT) Total hours spent commuting 4532,23
(en_US) Total hours spent commuting 4,532.23

Summary

Maintaining a global perspective when creating your Web sites can only serve to open up your products and services to a much larger audience. Hopefully this chapter showed you that the process is much less of a challenge than you previously thought.

The next chapter introduces you to one of today's hottest approaches in Web development paradigms: frameworks. You'll put what you learn about this topic into practice by creating a Web site using the Zend Framework.



1. http://www.internetworldstats.com/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.195.236