Chapter 24. Internationalization and Localization

 

Nobody can be exactly like me. Sometimes even I have trouble doing it.

 
 --Tallulah Bankhead

The credo of “Write once, run anywhere” means that your code will run in many places where languages and customs are different from yours. With a little care you can write programs that can adapt to these variations gracefully. Keeping your programs supple in this fashion is called internationalization. You have several tools for internationalizing your code. Using internationalization tools to adapt your program to a specific locale—such as by translating messages into the local language—is called localization.

The first tool is inherent in the language: Strings are in Unicode, which can express almost any written language on our planet. Someone must still translate the strings, and displaying the translated text to users requires fonts for those characters. Still, having Unicode is a big boost for localizing your code.

The nexus of internationalization and localization is the locale, which defines a “place.” A place can be a language, culture, or country—anything with an associated set of customs that requires changes in program behavior. Each running program has a default locale that is the user's preferred place. It is up to each program to adapt to a locale's customs as best it can. The locale concept is represented by the Locale class, which is part of the java.util package.

Given a locale, several tools can help your program behave in a locally comprehensible fashion. A common pattern is for a class to define the methods for performing locale-sensitive operations. A generic “get instance” static method of this class returns an object (possibly of a subclass) suitable for the default locale. The class will also provide an overload of each “get instance” method that takes a locale argument and returns a suitable object for a particular locale. For example, by invoking the class's getInstance methods, you can get an appropriate java.util.Calendar object that works with the user's preferred dates and times. The returned Calendar object will understand how to translate system time into dates using the customs of the default locale. If the user were Mexican, an object that was a Calendar adapted to Mexican customs could be returned. A Chinese user might get an object of a different subclass that worked under the Chinese calendar customs.

If your program displays information to the user, you will likely want to localize the output: Saying “That's not right” to someone who doesn't understand English is probably pointless, so you would like to translate (localize) the message for speakers in other locales. The resource bundle mechanisms help you make this possible by mapping string keys to arbitrary resources. You use the values returned by a resource bundle to make your program speak in other tongues—instead of writing the literal message strings in your code, you look up the strings from a resource bundle by string keys. When the program is moved to another locale, someone can translate the messages in the resource bundle and your program will work for that new locale without your changing a line of code.

The classes described in this chapter come mostly from the package java.util. There are occasional brief discussions of classes in the text internationalization and localization package java.text, with an overview of some of its capabilities in Section 24.6 on page 708, but a full discussion on this subject is outside the scope of this book.

Locale

A java.util.Locale object describes a specific place—cultural, political, or geographical. Using a locale, objects can localize their behavior to a user's expectations. An object that does so is called locale-sensitive. For example, date formatting can be localized by using the locale-sensitive DateFormat class (described later in this chapter), so the date written in the United Kingdom as 26/11/72 would be written 26.11.72 in Iceland, 11/26/72 in the United States, or 72.26.11 in Latvia.

A single locale represents issues of language, country, and other traditions. There can be separate locales for U.S. English, U.K. English, Australian English, Pakistani English, and so forth. Although the language is arguably in common for these locales, the customs of date, currency, and numeric representation vary.

Your code will rarely get or create Locale objects directly but instead will use the default locale that reflects the user's preference. You typically use this locale implicitly by getting resources or resource bundles as shown with other locale-sensitive classes. For example, you get the default calendar object like this:

Calendar now = Calendar.getInstance();

The Calendar class's getInstance method looks up the default locale to configure the calendar object it returns. When you write your own locale-sensitive classes, you get the default locale from the static getDefault method of the Locale class.

If you write code that lets a user select a locale, you may need to create Locale objects. There are three constructors:

  • public Locale(String language, String country, String variant)

    • Creates a Locale object that represents the given language and country, where language is the two-letter ISO 639 code for the language (such as "et" for Estonian) and country is the two-letter ISO 3166 code for the country (such as "KY" for Cayman Islands). “Further Reading” on page 755 lists references for these codes. The variant can specify anything, such as an operating environment (such as "POSIX" or "MAC") or company or era. If you specify more than one variant, separate the two with an underscore. To leave any part of the locale unspecified, use "", an empty string—not null.

  • public Locale(String language, String country)

    • Equivalent to Locale(language,country, "").

  • public Locale(String language)

    • Equivalent to Locale(language,"", "").

The language and country can be in any case, but they will always be translated to lowercase for the language and uppercase for the country to conform to the governing standards. The variant is translated into uppercase.

The Locale class defines static Locale objects for several well-known locales, such as CANADA_FRENCH and KOREA for countries, and KOREAN and TRADITIONAL_CHINESE for languages. These objects are simply conveniences and have no special privileges compared to any Locale object you may create.

The static method setDefault changes the default locale. The default locale is shared state and should always reflect the user's preference. If you have code that must operate in a different locale, you can specify that locale to locale-sensitive classes either as an argument when you get resources or on specific operations. You should rarely need to change the default locale.

Locale provides methods for getting the parts of the locale description. The methods getCountry, getLanguage, and getVariant return the values defined during construction. These are terse codes that most users will not know. These methods have “display” variants—getDisplayCountry, getDisplayLanguage, and getDisplayVariant—that return human-readable versions of the values. The method getDisplayName returns a human-readable summary of the entire locale description, and toString returns the terse equivalent, using underscores to separate the parts. These “display” methods return values that are localized according to the default locale.

You can optionally provide a Locale argument to any of the “display” methods to get a description of the given locale under the provided locale. For example, if we print the value of

Locale.ITALY.getDisplayCountry(Locale.FRANCE)

we get

Italie

the French name for Italy.

The methods getISO3Country and getISO3Language return three-character ISO codes for the country and language of the locale, respectively.

Resource Bundles

When you internationalize code, you commonly have units of meaning—such as text or sounds—that must be translated or otherwise made appropriate for each locale. If you put English text directly into your program, localizing that code is difficult—it requires finding all the strings in your program, identifying which ones are shown to users, and translating them in the code, thereby creating a second version of your program for, say, Swahili users. When you repeat this process for a large number of locales the task becomes a nightmare.

The resource bundle classes in java.util help you address this problem in a cleaner and more flexible fashion. The abstract class ResourceBundle defines methods to look up resources in a bundle by string key and to provide a parent bundle that will be searched if a bundle doesn't have a key. This inheritance feature allows one bundle to be just like another bundle except that a few resource values are modified or added. For example, a U.S. English bundle might use a U.K. English bundle for a parent, providing replacements for resources that have different spelling. ResourceBundle provides the following public methods:

  • public final String getString(String key) throws MissingResourceException

    • Returns the string stored in the bundle under the given key.

  • public final String[] getStringArray(String key) throws MissingResourceException

    • Returns the string array stored in the bundle under the given key.

  • public final Object getObject(String key) throws MissingResourceException

    • Returns the object stored in the bundle under the given key.

  • public abstract Enumeration getKeys()

    • Returns an Enumeration of the keys understood by this bundle, including all those of the parent.

Each resource bundle defines a set of string keys that map to locale-sensitive resources. These strings can be anything you like, although it is best to make them mnemonic. When you want to use the resource you look it up by name. If the resource is not found a MissingResourceException is thrown. The resources themselves can be of any type but are commonly strings, so the getString methods are provided for convenience.

The following example shows an internationalized way to rewrite the “Hello, world” example. This internationalized version requires a program called GlobalHello and a resource bundle for the program's strings called GlobalRes, which will define a set of constants for the localizable strings. First, the program:

import java.util.*;

public class GlobalHello {
    public static void main(String[] args) {
        ResourceBundle res =
            ResourceBundle.getBundle("GlobalRes");
        String msg;
        if (args.length > 0)
            msg = res.getString(GlobalRes.GOODBYE);
        else
            msg = res.getString(GlobalRes.HELLO);
        System.out.println(msg);
    }
}

The program first gets its resource bundle. Then it checks whether any arguments are provided on the command line. If some are, it says good-bye; otherwise it says hello. The program logic determines which message to display, but the actual string to print is looked up by key (GlobalRes.HELLO or GlobalRes.GOODBYE).

Each resource bundle is a set of associated classes and property files. In our example, GlobalRes is the name of a class that extends ResourceBundle, implementing the methods necessary to map a message key to a localized translation of that message. You define classes for the various locales for which you want to localize the messages, naming the classes to reflect the locale. For example, the bundle class that manages GlobalRes messages for the Lingala language would be GlobalRes_ln because "ln" is the two-letter code for Lingala. French would be mapped in GlobalRes_fr, and Canadian French would be GlobalRes_fr_CA, which might have a parent bundle of GlobalRes_fr.

We have chosen to make the key strings constants in the GlobalRes class. Using constants prevents errors of misspelling. If you pass literal strings such as "hello" to getString, a misspelling will show up only when the erroneous getString is executed, and that might not happen during testing. If you use constants, a misspelling will be caught by the compiler (unless you are unlucky enough to accidentally spell the name of another constant).

You find resources by calling one of two static getBundle methods in ResourceBundle: the one we used, which searches the current locale for the best available version of the bundle you name; and the other method, which lets you specify both bundle name and desired locale. A fully qualified bundle name has the form package.Bundle_la_CO_va, where package.Bundle is the general fully qualified name for the bundle class (such as GlobalRes), la is the two-letter language code (lowercase), CO is the two-letter country code (uppercase), and va is the list of variants separated by underscores. If a bundle with the full name cannot be found, the last component is dropped and the search is repeated with this shorter name. This process is repeated until only the last locale modifier is left. If even this search fails and if you invoked getBundle with a specified locale, the search is restarted with the full name of the bundle for the default locale. If this second search ends with no bundle found or if you were searching in the default locale, getBundle checks using just the bundle name. If even that bundle does not exist, getBundle throws a MissingBundleException.

For example, suppose you ask for the bundle GlobalRes, specifying a locale for an Esperanto speaker living in Kiribati who is left-handed, and the default locale of the user is for a Nepali speaker in Bhutan who works for Acme, Inc. The longest possible search would be:

GlobalRes_eo_KI_left
GlobalRes_eo_KI
GlobalRes_eo
GlobalRes_ne_BT_Acme
GlobalRes_ne_BT
GlobalRes_ne
GlobalRes

The first resource bundle that is found ends the search, being considered the best available match.

The examples you have seen use resource bundles to fetch strings, but remember that you can use getObject to get any type of object. Bundles are used to store images, URLs, audio sources, graphics components, and any other kind of locale-sensitive resource that can be represented by an object.

Mapping string keys to localized resource objects is usually straightforward—simply use one of the provided subclasses of ResourceBundle that implement the lookup for you: ListResourceBundle and PropertyResourceBundle.

ListResourceBundle

ListResourceBundle maps a simple list of keys to their localized objects. It is an abstract subclass of ResourceBundle for which you provide a getContents method that returns an array of key/resource pairs as an array of arrays of Object. The keys must be strings, but the resources can be any kind of object. The ListResourceBundle takes this array and builds the maps for the various “get” methods. The following classes use ListResourceBundle to define a few locales for GlobalRes. First, the base bundle:

public class GlobalRes extends ListResourceBundle {
    public static final String HELLO = "hello";
    public static final String GOODBYE = "goodbye";

    public Object[][] getContents() {
        return contents;
    }

    private static final Object[][] contents = {
        { GlobalRes.HELLO,      "Ciao" },
        { GlobalRes.GOODBYE,    "Ciao" },
    };
}

This is the top-level bundle—when no other bundle is found, this will be used. We have chosen Italian for the default. Before any “get” method is executed, GlobalRes.getContents will be invoked and the contents array's values will seed the data structures used by the “get” methods. ListResourceBundle uses an internal lookup table for efficient access; it does not search through your array of keys. The GlobalRes class also defines the constants that name known resources in the bundle. Here is another bundle for a more specific locale:

public class GlobalRes_en extends ListResourceBundle {
    public Object[][] getContents() {
        return contents;
    }

    private static final Object[][] contents = {
        { GlobalRes.HELLO,      "Hello" },
        { GlobalRes.GOODBYE,    "Goodbye" },
    };
}

This bundle covers the English-language locale en. It provides specific values for each localizable string. The next bundle uses the inheritance feature:

public class GlobalRes_en_AU extends ListResourceBundle {
    // mostly like basic English  - our parent bundle

    public Object[][] getContents() { return contents; }

    private static final Object[][] contents = {
        { GlobalRes.HELLO,      "G'day" },
    };
}

This bundle is for English speakers from Australia (AU). It provides a more colloquial version of the HELLO string and inherits all other strings from the general English locale GlobalRes_en. Whenever a resource bundle is instantiated, its parent chain is established. This proceeds by successively dropping the variant, country, and language components from the base bundle name and instantiating those bundles if they exist. If they do exist then setParent is called on the preceding bundle passing the new bundle as the parent. So in our example, when GlobalRes_en_AU is created, the system will create GlobalRes_en and set it as the parent of GlobalRes_en_AU. In turn, the parent of GlobalRes_en will be the base bundle GlobalRes.

Given these classes, someone with an English-language locale (en) would get the values returned by GlobalRes_en unless the locale also specified the country Australia (AU), in which case values from GlobalRes_en_AU would be used. Everyone else would see those in GlobalRes.

PropertyResourceBundle

PropertyResourceBundle is a subclass of ResourceBundle that reads its list of resources from a text property description. Instead of using an array of key/resource pairs, the text contains key/resource pairs as lines of the form

key=value

Both keys and values must be strings. A PropertyResourceBundle object reads the text from an InputStream passed to the PropertyResourceBundle constructor and uses the information it reads to build a lookup table for efficient access.

The bundle search process that we described earlier actually has an additional step that looks for a file ResName.properties after it looks for a class ResName. For example, if the search process doesn't find the class GlobalRes_eo_KI_left it will then look for the file GlobalRes_eo_KI_left.properties before looking for the next resources class. If that file exists, an input stream is created for it and used to construct a PropertyResourceBundle that will read the properties from the file.

It is easier to use property files than to create subclasses of ListResourceBundle but the files have two limitations. First, they can only define string resources whereas ListResourceBundle can define arbitrary objects. Second, the only legal character encoding for property files is the byte format of ISO 8859-1. This means that other Unicode characters must be encoded with uxxxx escape sequences.

Subclassing ResourceBundle

ListResourceBundle, PropertyResourceBundle, and .properties files will be sufficient for most of your bundles, but you can create your own subclass of ResourceBundle if they are not. You must implement two methods:

  • protected abstract Object handleGetObject(String key) throws MissingResourceException

    • Returns the object associated with the given key. If the key is not defined in this bundle, it returns null, and that causes the ResourceBundle to check in the parent (if any). Do not throw MissingResourceException unless you check the parent instead of letting the bundle do it. All the “get” methods are written in terms of this one method.

  • public abstract Enumeration getKeys()

    • Returns an Enumeration of the keys understood by this bundle, including all those of the parent.

Exercise 24.1Get GlobalHello to work with the example locales. Add some more locales, using ListResourceBundle, .properties files, and your own specific subclass of ResourceBundle.

Currency

Currency encoding is highly sensitive to locale, and the java.util.Currency class helps you properly format currency values. You obtain a Currency object from one of its static getInstance methods, one of which takes a Locale object while the other takes a currency code as a String (codes are from the ISO 4217 standard).

The Currency class does not directly map currency values into localized strings but gives you information you need to do so. The information at your disposal is

  • public String getSymbol()

    • Returns the symbol of this currency for the default locale.

  • public String getSymbol(Locale locale)

    • Returns the symbol of this currency for the specified locale. If there is no locale specific symbol then the ISO 4217 currency code is returned. Many currencies share the same symbol in their own locale. For example, the $ symbol represents U.S. dollars in the United States, Canadian dollars in Canada, and Australian dollars in Australia—to name but a few. The local currency symbol is usually reserved for the local currency, so each locale can change the representation used for other currencies. For example, if this currency object represents the U.S. dollar, then invoking getSymbol with a U.S locale will return "$" because it is the local currency. However, invoking getSymbol with a Canadian locale will return "USD" (the currency code for the U.S. dollar) because the $ symbol is reserved for the Canadian dollar in the Canadian locale.

  • public int getDefaultFractionDigits()

    • Returns the default number of fraction digits used with this currency. For example, the British pound would have a value of 2 because two digits usually follow the decimal point for pence (such as in £18.29), whereas the Japanese yen would have zero because yen values typically have no fractional part (such as ¥1200). Some “currencies” are not really currencies at all (IMF Special Drawing Rights, for example), and they return –1.

  • public String getCurrencyCode()

    • Returns the ISO 4217 currency code of this currency.

Exercise 24.2Select six different locales and six different currencies, and print a table showing the currency symbol for each currency in each locale.

Time, Dates, and Calendars

Time is represented as a long integer measured in milliseconds since midnight Greenwich Mean Time (GMT) January 1, 1970. This starting point for time measurement is known as the epoch. This value is signed, so negative values signify time before the beginning of the epoch. The System.currentTimeMillis method returns the current time. This value will express dates into the year A.D. 292,280,995, which should suffice for most purposes.

You can use java.util.Date to hold a time and perform some simple time-related operations. When a new Date object is created, you can specify a long value for its time. If you use the no-arg constructor, the Date object will mark the time of its creation. A Date object can be used for simple operations. For example, the simplest program to print the current time (repeated from page 37) is

import java.util.Date;

class Date2 {
    public static void main(String[] args) {
        Date now = new Date();
        System.out.println(now);
    }
}

This program will produce output such as the following:

Sun Mar 20 08:48:38 GMT+10:00 2005

Note that this is not localized output. No matter what the default locale, the date will be in this format, adjusted for the current time zone.

You can compare two dates with the before and after methods, which return true if the object on which they are invoked is before or after the other date. Or you can compare the long values you get from invoking getTime on the two objects. The method setTime lets you change the time to a different long.

The Date class provides no support for localization and has effectively been replaced by the more sophisticated and locale-sensitive Calendar and DateFormat classes.

Calendars

Calendars mark the passage of time. Most of the world uses the same calendar, commonly called the Gregorian calendar after Pope Gregory XIII, under whose auspices it was first instituted. Many other calendars exist in the world, and the calendar abstractions are designed to express such variations. A given moment in time is expressed as a date according to a particular calendar, and the same moment can be expressed as different dates by different calendars. The calendar abstraction is couched in the following form:

  • An abstract Calendar class that represents various ways of marking time

  • An abstract TimeZone class that represents time zone offsets and other adjustments, such as daylight saving time

  • An abstract java.text.DateFormat class that defines how one can format and parse date and time strings

Because the Gregorian calendar is commonly used, you also have the following concrete implementations of the abstractions:

  • A GregorianCalendar class

  • A SimpleTimeZone class for use with GregorianCalendar

  • A java.text.SimpleDateFormat class that formats and parses Gregorian dates and times

For example, the following code creates a GregorianCalendar object representing midnight (00:00:00), October 26, 1972, in the local time zone, then prints its value:

Calendar cal =
    new GregorianCalendar(1972, Calendar.OCTOBER, 26);
System.out.println(cal.getTime());

The method getTime returns a Date object for the calendar object's time, which was set by converting a year, month, and date into a millisecond-measured long. The output would be something like this (depending on your local time zone of course):

Thu Oct 26 00:00:00 GMT+10:00 1972

You can also work directly with the millisecond time value by using getTimeInMillis and setTimeInMillis. These are equivalent to working with a Date object; for example, getTimeInMillis is equivalent to invoking getTime().getTime().

The abstract Calendar class provides a large set of constants that are useful in many calendars, such as Calendar.AM and Calendar.PM for calendars that use 12-hour clocks. Some constants are useful only for certain calendars, but no calendar class is required to use such constants. In particular, the month names in Calendar (such as Calendar.JUNE) are names for the various month numbers (such as 5—month numbers start at 0), with a special month UNDECIMBER for the thirteenth month that many calendars have. But no calendar is required to use these constants.

Each Calendar object represents a particular moment in time on that calendar. The Calendar class provides only constructors that create an object for the current time, either in the default locale and time zone or in specified ones.

Calendar objects represent a moment in time, but they are not responsible for displaying the date. That locale-sensitive procedure is the job of the DateFormat class, which will soon be described.

You can obtain a calendar object for a locale by invoking one of the static Calendar.getInstance methods. With no arguments, getInstance returns an object of the best available calendar type (currently only GregorianCalendar) for the default locale and time zone, set to the current time. The other overloads allow you to specify the locale, the time zone, or both. The static getAvailableLocales method returns an array of Locale objects for which calendars are installed on the system.

With a calendar object in hand, you can manipulate the date. The following example prints the next week of days for a given calendar object:

public static void oneWeek(PrintStream out, Calendar cal) {
    Calendar cur = (Calendar) cal.clone(); //modifiable copy
    int dow = cal.get(Calendar.DAY_OF_WEEK);
    do {
        out.println(cur.getTime());
        cur.add(Calendar.DAY_OF_WEEK, 1);
    } while (cur.get(Calendar.DAY_OF_WEEK) != dow);
}

First we make a copy of the calendar argument so that we can make changes without affecting the calendar we were passed.[1] Instead of assuming that there are seven days in a week (who knows what kind of calendar we were given?), we loop, printing the time and adding one day to that time, until we have printed a week's worth of days. We detect whether a week has passed by looking for the next day whose “day of the week” is the same as that of the original object.

The Calendar class defines many kinds of calendar fields for calendar objects, such as DAY_OF_WEEK in the preceding code. These calendar fields are constants used in the methods that manipulate parts of the time:

  • MILLISECOND

  • SECOND

  • MINUTE

  • HOUR

  • HOUR_OF_DAY

  • AM_PM

  • DAY_OF_WEEK

  • DAY_OF_WEEK_IN_MONTH

  • DAY_OF_MONTH

  • DATE

  • DAY_OF_YEAR

  • WEEK_OF_MONTH

  • WEEK_OF_YEAR

  • MONTH

  • YEAR

  • ERA

  • ZONE_OFFSET

  • DST_OFFSET

  • FIELD_COUNT

An int is used to store values for all these calendar field types. You use these constants—or any others defined by a particular calendar class—to specify a calendar field to the following methods (always as the first argument):

get

Returns the value of the field

set

Sets the value of the field to the provided int

clear

Clears the value of the field to “unspecified”

isSet

Returns true if the field has been set

add

Adds an int amount to the specified field

roll

Rolls the field up to the next value if the second boolean argument is true, or down if it is false

getMinimum

Gets the minimum valid value for the field

getMaximum

Gets the maximum valid value for the field

getGreatestMinimum

Gets the highest minimum value for the field; if it varies, this can be different from getMinimum

getLeastMaximum

Gets the smallest maximum value for the field; if it varies, this can be different from getMaximum

The greatest minimum and least maximum describe cases in which a value can vary within the overall boundaries. For example, the least maximum value for DAY_OF_MONTH on the Gregorian calendar is 28 because February, the shortest month, can have as few as 28 days. The maximum value is 31 because no month has more than 31 days.

The set method allows you to specify a date by certain calendar fields and then calculate the time associated with that date. For example, you can calculate on which day of the week a particular date falls:

public static int dotw(int year, int month, int date) {
    Calendar cal = new GregorianCalendar();
    cal.set(Calendar.YEAR, year);
    cal.set(Calendar.MONTH, month);
    cal.set(Calendar.DATE, date);
    return cal.get(Calendar.DAY_OF_WEEK);
}

The method dotw calculates the day of the week on the Gregorian calendar for the given date. It creates a Gregorian calendar object, sets the date fields for year, month, and day, and returns the resulting day of the week.

The clear method can be used to reset a field's value to be unspecified. You can use clear with no parameters to clear all calendar fields. The isSet method returns true if a field currently has a value set.

Three variants of set change particular fields you commonly need to manipulate, leaving unspecified fields alone:

public void set(int year, int month, int date)
public void set(int year, int month, int date, int hrs, int min)
public void set(int year, int month, int date, int hrs, int min, int sec)

You can also use setTime to set the calendar's time from a Date object.

A calendar field that is out of range can be interpreted correctly. For example, January 32 can be equivalent to February 1. Whether it is treated as such or as an error depends on whether the calendar is considered to be lenient. A lenient calendar will do its best to interpret values as valid. A strict (non-lenient) calendar will not accept any values out of range, throwing IllegalArgumentException. The setLenient method takes a boolean that specifies whether parsing should be lenient; isLenient returns the current setting.

A week can start on any day, depending on the calendar. You can discover the first day of the week with the method getFirstDayOfWeek. In a Gregorian calendar for the United States this method would return SUNDAY, whereas Ireland uses MONDAY. You can change this by invoking setFirstDayOfWeek with a valid weekday index.

Some calendars require a minimum number of days in the first week of the year. The method getMinimalDaysInFirstWeek returns that number; the method setMinimalDaysInFirstWeek lets you change it. The minimum number of days in a week is important when you are trying to determine in which week a particular date falls—for example, in some calendars, if January 1 is a Friday it may be considered part of the last week of the preceding year.

You can compare two Calendar objects by using compareTo since Calendar implements Comparable. If you prefer, you can use the before and after methods to compare the objects.

Time Zones

TimeZone is an abstract class that encapsulates not only offset from GMT but also other offset issues, such as daylight saving time. As with other locale-sensitive classes, you can get the default TimeZone by invoking the static method getDefault. You can change the default time zone by passing setDefault a new TimeZone object to use—or null to reset to the original default time zone. Time zones are understood by particular calendar types, so you should ensure that the default calendar and time zone are compatible.

Each time zone has a string identifier that is interpreted by the time zone object and can be displayed to the user. These identifiers use a long form consisting of a major and minor regional name, separated by '/'. For example, the following are all valid time zone identifiers: America/New_York, Australia/Brisbane, Africa/Timbuktu. Many time zones have a short form identifier— often just a three letter acronym—some of which are recognized by TimeZone for backward compatibility. You should endeavor to always use the long form—after all, while many people know that EST stands for “Eastern Standard Time,” that doesn't tell you for which country. TimeZone also recognizes generic identifiers expressed as the difference in time from GMT. For example, GMT+10:00 and GMT-4:00 are both valid generic time zone identifiers. You can get an array of all the identifiers available on your system from the static method getAvailableIDs. If you want only those for a given offset from GMT, you can invoke getAvailableIDs with that offset. An offset might, for example, have identifiers for both daylight saving and standard time zones.

You can find the identifier of a given TimeZone object from getID, and you can set it with setID. Setting the identifier changes only the identifier on the time zone—it does not change the offset or other values. You can get the time zone for a given identifier by passing it to the static method getTimeZone.

A time zone can be converted into a displayable form by using one of the getDisplayName methods, similar to those of Locale. These methods allow you to specify whether to use the default locale or a specified one, and whether to use a short or long format. The string returned by the display methods is controlled by a DateFormat object (which you'll see a little later). These objects maintain their own tables of information on how to format different time zones. On a given system they may not maintain information for all the supported time zones, in which case the generic identifier form is used, such as in the example on page 696.

Each time zone has a raw offset from GMT, which can be either positive or negative. You can get or set the raw offset by using getRawOffset or set RawOffset, but you should rarely need to do this.

Daylight saving time supplements the raw offset with a seasonal time shift. The value of this shift can be obtained from getDSTSavings—the default implementation returns 3,600,000 (the number of milliseconds in an hour). You can ask whether a time zone ever uses daylight saving time during the year by invoking the method useDaylightTime, which returns a boolean. The method inDaylightTime returns true if the Date argument you pass would fall inside daylight saving time in the zone.

You can obtain the exact offset for a time zone on a given date by specifying that date in milliseconds or by using calendar fields to specify the year and month and so on.

  • public int getOffset(long date)

    • Returns the offset from GMT for the given time in this time zone, taking any daylight saving time offset into account

  • public abstract int getOffset(int era, int year, int month, int day, int dayOfWeek, int milliseconds)

    • Returns the offset from GMT for the given time in this time zone, taking any daylight saving time offset into account. All parameters are interpreted relative to the calendar for which the particular time zone implementation is designed. The era parameter represents calendar-specific eras, such as B.C. and A.D. in the Gregorian calendar.

GregorianCalendar and SimpleTimeZone

The GregorianCalendar class is a concrete subclass of Calendar that reflects UTC (Coordinated Universal Time), although it cannot always do so exactly. Imprecise behavior is inherited from the time mechanisms of the underlying system.[2] Parts of a date are specified in UTC standard units and ranges. Here are the ranges for GregorianCalendar:

YEAR

1–292278994

MONTH

0–11

DATE

Day of the month, 1–31

HOUR_OF_DAY

0–23

MINUTE

0–59

SECOND

0–59

MILLISECOND

0–999

The GregorianCalendar class supports several constructors:

  • public GregorianCalendar()

    • Creates a GregorianCalendar object that represents the current time in the default time zone with the default locale.

  • public GregorianCalendar(int year, int month, int date, int hrs, int min, int sec)

    • Creates a GregorianCalendar object that represents the given date in the default time zone with the default locale.

  • public GregorianCalendar(int year, int month, int date, int hrs, int min)

    • Equivalent to GregorianCalendar(year,month, date,hrs, min,0) —that is, the beginning of the specified minute.

  • public GregorianCalendar(int year, int month, int date)

    • Equivalent to GregorianCalendar(year,month, date,0, 0,0) —that is, midnight on the given date (which is considered to be the start of the day).

  • public GregorianCalendar(Locale locale)

    • Creates a GregorianCalendar object that represents the current time in the default time zone with the given locale.

  • public GregorianCalendar(TimeZone timeZone)

    • Creates a GregorianCalendar object that represents the current time in the given timeZone with the default locale.

  • public GregorianCalendar(TimeZone zone, Locale locale)

    • Creates a GregorianCalendar object that represents the current time in the given timeZone with the given locale.

In addition to the methods it inherits from Calendar, GregorianCalendar provides an isLeapYear method that returns true if the passed in year is a leap year in that calendar.

The Gregorian calendar was preceded by the Julian calendar in many places. In a GregorianCalendar object, the default date at which this change happened is midnight local time on October 15, 1582. This is when the first countries switched, but others changed later. The getGregorianChange method returns the time the calendar is currently using for the change as a Date. You can set a calendar's change-over time by using setGregorianChange with a Date object.

The SimpleTimeZone class is a concrete subclass of TimeZone that expresses values for Gregorian calendars. It does not handle historical complexities, but instead projects current practices onto all times. For historical dates that precede the use of daylight saving time, for example, you will want to use a calendar with a time zone you have selected that ignores daylight saving time. For future dates, SimpleTimeZone is probably as good a guess as any.

Formatting and Parsing Dates and Times

Date and time formatting is a separate issue from calendars, although they are closely related. Formatting is localized in a different way. Not only are the names of days and months different in different locales that share the same calendar, but also the order in which a dates' components are expressed changes. In the United States it is customary in short dates to put the month before the date, so that July 5 is written as 7/5. In many European countries the date comes first, so 5 July becomes 5/7 or 5.7 or …

In the previous sections the word “date” meant a number of milliseconds since the epoch, which could be interpreted as year, month, day-of-month, hours, minutes, and seconds information. When dealing with the formatting classes you must distinguish between dates, which deal with year, month, and day-of-month information, and times, which deal with hours, minutes, and seconds.

Date and time formatting issues are text issues, so the classes for formatting are in the java.text package—though the java.util.Formatter class (see page 624) also supports some localized date formatting as you shall see. The Date2 program on page 695 is simple because it does not localize its output. If you want localization, you need a DateFormat object.

DateFormat provides several ways to format and parse dates and times. It is a subclass of the general Format class, discussed in Section 24.6.2 on page 710. There are three kinds of formatters, each returned by different static methods: date formatters from getDateInstance, time formatters from getTimeInstance, and date/time formatters from getDateTimeInstance. Each of these formatters understands four formatting styles: SHORT, MEDIUM, LONG, and FULL, which are constants defined in DateFormat. And for each of them you can either use the default locale or specify one. For example, to get a medium date formatter in the default locale, you would use

Format fmt = DateFormat.getDateInstance(DateFormat.MEDIUM);

To get a date and time formatter that uses dates in short form and times in full form in a Japanese locale, you would use

Locale japan = new Locale("jp", "JP");
Format fmt = DateFormat.getDateTimeInstance(
                DateFormat.SHORT, DateFormat.FULL, japan
             );

For all the various “get instance” methods, if both formatting style and locale are specified the locale is the last parameter. The date/time methods require two formatting styles: the first for the date part, the second for the time. The simplest getInstance method takes no arguments and returns a date/time formatter for short formats in the default locale. The getAvailableLocales method returns an array of Locale objects for which date and time formatting is configured.

The following list shows how each formatting style is expressed for the same date. The output is from a date/time formatter for U.S. locales, with the same formatting mode used for both dates and times:

FULL:    Friday, August 29, 1986 5:00:00 PM EDT
LONG:    August 29, 1986 5:00:00 PM EDT
MEDIUM:  Aug 29, 1986 5:00:00 PM
SHORT:   8/29/86 5:00 PM

Each DateFormat object has an associated calendar and time zone set by the “get instance” method that created it. They are returned by getCalendar and getTimeZone, respectively. You can set these values by using setCalendar and setTimeZone. Each DateFormat object has a reference to a NumberFormat object for formatting numbers. You can use the methods getNumberFormat and setNumberFormat. (Number formatting is covered briefly in Section 24.6.2 on page 710.)

You format dates with one of several format methods based on the formatting parameters described earlier:

  • public final String format(Date date)

    • Returns a formatted string for date.

  • public abstract StringBuffer format(Date date, StringBuffer appendTo, FieldPosition pos)

    • Adds the formatted string for date to the end of appendTo.

  • public abstract StringBuffer format(Object obj, StringBuffer appendTo, FieldPosition pos)

    • Adds the formatted string for obj to the end of appendTo. The object can be either a Date or a Number whose longValue is a time in milliseconds.

The pos argument is a FieldPosition object that tracks the starting and ending index for a specific field within the formatted output. You create a FieldPosition object by passing an integer code that represents the field that the object should track. These codes are static fields in DateFormat, such as MINUTE_FIELD or MONTH_FIELD. Suppose you construct a FieldPosition object pos with MINUTE_FIELD and then pass it as an argument to a format method. When format returns, the getBeginIndex and getEndIndex methods of pos will return the start and end indices of the characters representing minutes within the formatted string. A specific formatter could also use the FieldPosition object to align the represented field within the formatted string. To make that happen, you would first invoke the setBeginIndex and setEndIndex methods of pos, passing the indices where you would like that field to start and end in the formatted string. Exactly how the formatter aligns the formatted text depends on the formatter implementation.

A DateFormat object can also be used to parse dates. Date parsing can be lenient or not, depending on your preference. Lenient date parsing is as forgiving as it can be, whereas strict parsing requires the format and information to be proper and complete. The default is to be lenient. You can use setLenient to set leniency to be true or false. You can test leniency via isLenient.

The parsing methods are

  • public Date parse(String text) throws ParseException

    • Tries to parse text into a date and/or time. If successful, a Date object is returned; otherwise, a ParseException is thrown.

  • public abstract Date parse(String text, ParsePosition pos)

    • Tries to parse text into a date and/or time. If successful, a Date object is returned; otherwise, returns a null reference. When the method is called, pos is the position at which to start parsing; at the end it will either be positioned after the parsed text or will remain unchanged if an error occurred.

  • public Object parseObject(String text, ParsePosition pos)

    • Returns the result of parse(text,pos) . This method is provided to fulfill the generic contract of Format.

The class java.text.SimpleDateFormat is a concrete implementation of DateFormat that is used in many locales. If you are writing a DateFormat class, you may find it useful to extend SimpleDateFormat. SimpleDateFormat uses methods in the DateFormatSymbols class to get localized strings and symbols for date representation. When formatting or parsing dates, you should usually not create SimpleDateFormat objects; instead, you should use one of the “get instance” methods to return an appropriate formatter.

DateFormat has protected fields calendar and numberFormat that give direct access to the values publicly manipulated with the set and get methods.

Exercise 24.3Write a program that takes a string argument that is parsed into the date to print, and print that date in all possible styles. How lenient will the date parsing be?

Using Formatter with Dates and Times

The java.util.Formatter class, described in Chapter 22, also supports the formatting of date and time information using a supplied Date or Calendar object, or a date represented as a long (or Long). Using the available format conversions you can extract information about that date/time, including things like the day of the month, the day of the week, the year, the hour of the day, and so forth.

The output of the formatter is localized according to the locale associated with that formatter, so things like the name of the day and month will be in the correct language—however, digits themselves are not localized. Unlike DateFormat, a formatter cannot help you with localization issues such as knowing whether the month or the day should come first in a date—it simply provides access to each individual component and your program must combine them in the right way.

A date/time conversion is indicated by a format conversion of t (or T for uppercase output), followed by various suffixes that indicate what is to be output and in what form. The following table lists the conversion suffixes related to times:

H

Hour of the day for 24-hour clock format. Two digits: 00–23

I

Hour of the day for 12-hour clock format. Two digits: 01–12

k

Hour of the day for 24-hour clock format: 0–23

l

Hour of the day for 12-hour clock format: 1–12

M

Minute within the hour. Two digits: 00–59

S

Seconds within the minute. Two digits: 00–60 (60 is a leap second)

L

Milliseconds within the second. Three digits: 000–999

N

Nanoseconds within the second. Nine digits: 000000000–999999999

p

Locale specific AM or PM marker.

z

Numeric offset from GMT (as per RFC 822). E.g. +1000

Z

String representing the abbreviation for the time zone

s

Seconds since the epoch.

Q

Milliseconds since the epoch.

So, for example, the following code will print out the current time in the familiar hh:mm:ss format:

System.out.printf("%1$tH:%1$tM:%1$tS %n", new Date());

The conversion suffixes that deal with dates are

B

Full month name

b

Abbreviated month name

h

Same as 'b'

A

Full name of the day of the week

a

Short name of the day of the week

C

The four digit year divided by 100. Two digits: 00–99

Y

Year. Four digits: 0000–9999

y

Year: Two digits: 00–99

j

Day of the year. Three digits: 001–999

m

Month in year. Two digits: 01–99

d

Day of month. Two digits: 01–99

e

Day of month: 1–99

Naturally, the valid range for day of month, month of year, and so forth, depends on the calendar that is being used. To continue the example, the following code will print the current date in the common mm/dd/yy format:

System.out.printf("%1$tm/%1$td/%1$ty %n", new Date());

As you can see, all the information about a date or time can be extracted and you can combine the pieces in whatever way you need. Doing so, however, is rather tedious both for the writer and any subsequent readers of the code. To ease the tedium a third set of conversion suffixes provides convenient shorthands for common combinations of the other conversions:

R

Time in 24-hour clock hh:mm format ("%tH:%tM")

T

Time in 24-hour clock hh:mm:ss format ("%tH:%tM:%tS")

r

Time in 12-hour clock h:mm:ss am/pm format ("%tI:%tM:%tS %Tp")

D

Date in mm/dd/yy format ("%tm/%td/%ty")

F

Complete date in ISO 8601 format ("%tY-%tm-%td")

c

Long date and time format ("%ta %tb %td %tT %tZ %tY")

So the previous examples could be combined in the more compact and somewhat more readable

System.out.printf("%1$tT %1$tD %n", new Date());

As with all format conversions a width can be specified before the conversion indicator, to specify the minimum number of characters to output. If the converted value is smaller than the width then the output is padded with spaces. The only format flag that can be specified with the date/time conversions is the '–' flag for left-justification—if this flag is given then a width must be supplied as well.

Internationalization and Localization for Text

The package java.text provides several types for localizing text behavior, such as collation (comparing strings), and formatting and parsing text, numbers, and dates. You have already learned about dates in detail so in this section we look at general formatting and parsing, and collation.

Collation

Comparing strings in a locale-sensitive fashion is called collation. The central class for collation is Collator, which provides a compare method that takes two strings and returns an int less than, equal to, or greater than zero as the first string is less than, equal to, or greater than the second.

As with most locale-sensitive classes, you get the best available Collator object for a locale from a getInstance method, either passing a specific Locale object or specifying no locale and so using the default locale. For example, you get the best available collator to sort a set of Russian-language strings like this:

Locale russian = new Locale("ru", "");
Collator coll = Collator.getInstance(russian);

You then can use coll.compare to determine the order of strings. A Collator object takes locality—not Unicode equivalence—into account when comparing. For example, in a French-speaking locale, the characters ç and c are considered equivalent for sorting purposes. A naïve sort that used String.compare would put all strings starting with ç after all those starting with c (indeed, it would put them after z), but in a French locale this would be wrong. They should be sorted according to the characters that follow the initial c or ç characters in the strings.

Determining collation factors for a string can be expensive. A CollationKey object examines a string once, so you can compare precomputed keys instead of comparing strings with a Collator. The method Collator.getCollationKey returns a key for a string. For example, because Collator implements the interface Comparator, you could use a Collator to maintain a sorted set of strings:

class CollatorSorting {
    private TreeSet<String> sortedStrings;

    CollatorSorting(Collator collator) {

        sortedStrings = new TreeSet<String>(collator);
    }

    void add(String str) {
        sortedStrings.add(str);
    }

    Iterator<String> strings() {
        return sortedStrings.iterator();
    }
}

Each time a new string is inserted in sortedStrings, the Collator is used as a Comparator, with its compare method invoked on various elements of the set until the TreeSet finds the proper place to insert the string. This results in several comparisons. You can make this quicker at the cost of space by creating a TreeMap that uses a CollationKey to map to the original string. CollationKey implements the interface Comparable with a compareTo method that can be much more efficient than using Collator.compare.

class CollationKeySorting {
    private TreeMap<CollationKey, String> sortedStrings;
    private Collator collator;

    CollationKeySorting(Collator collator) {
        this.collator = collator;
        sortedStrings = new TreeMap<CollationKey, String>();
    }

    void add(String str) {
        sortedStrings.put(
            collator.getCollationKey(str), str);
    }

    Iterator<String> strings() {
        return sortedStrings.values().iterator();
    }
}

Formatting and Parsing

The abstract Format class provides methods to format and parse objects according to a locale. Format declares a format method that takes an object and returns a formatted String, throwing IllegalArgumentException if the object is not of a type known to the formatting object. Format also declares a parseObject method that takes a String and returns an object initialized from the parsed data, throwing ParseException if the string is not understood. Each of these methods is implemented as appropriate for the particular kind of formatting. The package java.text provides three Format subclasses:

  • DateFormat was discussed in the previous section.

  • MessageFormat helps you localize output when printing messages that contain values from your program. Because word order varies among languages, you cannot simply use a localized string concatenated with your program's values. For example, the English phrase “a fantastic menu” would in French have the word order “un menu fantastique.” A message that took adjectives and nouns from lists and displayed them in such a phrase could use a MessageFormat object to localize the order.

  • NumberFormat is an abstract class that defines a general way to format and parse various kinds of numbers for different locales. It has two subclasses: ChoiceFormat to choose among alternatives based on number (such as picking between a singular or plural variant of a word); and DecimalFormat to format and parse decimal numbers. (The formatting capabilities of NumberFormat are more powerful than those provided by java.util.Formatter.)

NumberFormat in turn has four different kinds of “get instance” methods. Each method uses either a provided Locale object or the default locale.

  • getNumberInstance returns a general number formatter/parser. This is the kind of object returned by the generic getInstance method.

  • getIntegerInstance returns a number formatter/parser that rounds floating-point values to the nearest integer.

  • getCurrencyInstance returns a formatter/parser for currency values. The Currency object used by a NumberFormatter can also be retrieved with the getCurrency method.

  • getPercentInstance returns a formatter/parser for percentages.

Here is a method you can use to print a number using the format for several different locales:

public void reformat(double num, String[] locales) {
    for (String loc : locales) {
        Locale pl = parseLocale(loc);
        NumberFormat fmt = NumberFormat.getInstance(pl);
        System.out.print(fmt.format(num));
        System.out.println("	" + pl.getDisplayName());
    }
}

public static Locale parseLocale(String desc) {
    StringTokenizer st = new StringTokenizer(desc, "_");
    String lang = "", ctry = "", var = "";
    try {
        lang = st.nextToken();
        ctry = st.nextToken();
        var = st.nextToken();
    } catch (java.util.NoSuchElementException e) {
        ; // fine, let the others default
    }
    return new Locale(lang, ctry, var);
}

The first argument to reformat is the number to format; the other arguments specify locales. We use a StringTokenizer to break locale argument strings into constituent components. For example, cy_GB will be broken into the language cy (Welsh), the country GB (United Kingdom), and the empty variant "". We create a Locale object from each result, get a number formatter for that locale, and then print the formatted number and the locale. When run with the number 5372.97 and the locale arguments en_US, lv, it_CH, and lt, reformat prints:

5,372.97        English (United States)
5 372,97        Latvian
5'372.97        Italian (Switzerland)
5.372,97        Lithuanian

A similar method can be written that takes a locale and a number formatted in that locale, uses the parse method to get a Number object, and prints the resulting value formatted according to a list of other locales:

public void parseAndReformat(String locale, String number,
                             String[] locales)
    throws ParseException
{
    Locale loc = LocalNumber.parseLocale(locale);
    NumberFormat parser = NumberFormat.getInstance(loc);
    Number num = parser.parse(number);
    for (String str : locales) {
        Locale pl = LocalNumber.parseLocale(str);
        NumberFormat fmt = NumberFormat.getInstance(pl);
        System.out.println(fmt.format(num));
    }
}

When run with the original locale it_CH, the number string "5'372.97" and the locale arguments en_US, lv, and lt, parseAndReformat prints:

5,372.97
5 372,97
5.372,97

Text Boundaries

Parsing requires finding boundaries in text. The class BreakIterator provides a locale-sensitive tool for locating such break points. It has four kinds of “get instance” methods that return specific types of BreakIterator objects:

  • getCharacterInstance returns an iterator that shows valid breaks in a string for individual characters (not necessarily a char).

  • getWordInstance returns an iterator that shows word breaks in a string.

  • getLineInstance returns an iterator that shows where it is proper to break a line in a string, for purposes such as wrapping text.

  • getSentenceInstance returns an iterator that shows where sentence breaks occur in a string.

The following code prints each break shown by a given BreakIterator:

static void showBreaks(BreakIterator breaks, String str) {
    breaks.setText(str);
    int start = breaks.first();
    int end = breaks.next();
    while (end != BreakIterator.DONE) {
        System.out.println(str.substring(start, end));
        start = end;
        end = breaks.next();
    }
    System.out.println(str.substring(start)); // the last
}

A BreakIterator is a different style of iterator from the usual java.util.Iterator objects you have seen. It provides several methods for iterating forward and backward within a string, looking for different break positions.

You should always use these boundary classes when breaking up text because the issues involved are subtle and widely varying. For example, the logical characters used in these classes are not necessarily equivalent to a single char. Unicode characters can be combined, so it can take more than one 16-bit Unicode value to constitute a logical character. And word breaks are not necessarily spaces—some languages do not even use spaces.

 

Never speak more clearly than you think

 
 --Jeremey Bernstein


[1] For historical reasons Calendar.clone returns Object not Calendar, so a cast is required.

[2] Almost all modern systems assume that one day is 24*60*60 seconds. In UTC, about once a year an extra second, called a leap second, is added to a day to account for the wobble of the Earth. Most computer clocks are not accurate enough to reflect this distinction, so neither is the Date class. Some computer standards are defined in GMT, which is the “civil” name for the standard; UT is the scientific name for the same standard. The distinction between UTC and UT is that UT is based on an atomic clock and UTC is based on astronomical observations. For almost all practical purposes, this is an invisibly fine hair to split. See “Further Reading” on page 755 for references.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.111.179