6. Globalization

IT IS RARE THAT AN ENTIRE CHAPTER CAN BE summarized in a single sentence, but this chapter can almost be summed up in this way:

Always use the appropriate .NET Framework Globalization class, and a large part of the globalization of your application will be taken care of for you.

To recap from Chapter 3, “An Introduction to Internationalization,” globalization is the process of engineering an application so that it does not have cultural preconceptions. The .NET Framework includes many classes in its System.Globalization namespace designed for this purpose. This chapter explores these classes and also covers a few globalization issues that are not covered by these classes.

The CultureInfo Class

The CultureInfo class, introduced in Chapter 3, is the single most important class in .NET’s internationalization story. It encapsulates a language; optionally, a country or region; and, in some cases, a script. In this section, we look beyond the introduction given in Chapter 3.

The CultureInfo class has four constructors:


public CultureInfo(int culture);

public CultureInfo(int culture, bool useUserOverride);

public CultureInfo(string name);

public CultureInfo(string name, bool useUserOverride);

Two of the constructors accept an LCID (locale identifier) and two accept a culture string. The LCID constructors are useful for constructing new CultureInfo objects when the culture string is not immediately available (for example, when using Win32 APIs). In addition, they are useful for constructing CultureInfo objects that cannot be distinguished by their culture strings (e.g., “es-ES” (Traditional Sort) and “en-ES” (Modern Sort)). See the section entitled “Alternate Sort Orders” for more details. An alternative to using the CultureInfo constructors in the .NET Framework 2.0 is to use the CultureInfo.GetCultureInfo method, which has the following overrides:


public static CultureInfo GetCultureInfo(int culture);

public static CultureInfo GetCultureInfo(string name);

public static CultureInfo GetCultureInfo(
    string name, string altName);

Two differences exist between the CultureInfo constructor and the Culture-Info.GetCultureInfo method:

CultureInfo objects returned by the GetCultureInfo method are cached, and, therefore, subsequent hits on the same culture are returned faster.

CultureInfo objects are read-only.

CultureInfo has a similar method called GetCultureInfoByIetfLanguageTag that performs the same operation but accepts an IETF language tag instead of a culture name.

The culture string used in the CultureInfo constructor and GetCultureInfo method accepts a string in the RFC 1766 format:


languagecode2[-country/regioncode2[-script]]

languagecode2 is an ISO 639-1 or 639-2 code, and country/regioncode2 is an ISO 3166 code. The ISO 639-1 standard specifies the two-letter code used to identify a language. Sometimes there is no two-letter code for a language, so the three-letter ISO 639-2 code is used instead. In the .NET Framework 1.1. and 2.0, this applies to just three cultures:


div (Divehi)
kok (Konkani)
syr (Syriac)

Correspondingly, the CultureInfo.TwoLetterISOLanguage property returns three letters instead of two letters for these cultures.

Some cultures support more than one writing system (script). Table 6.1 shows the cultures that have a script tag suffix of “Cyrl” (for Cyrillic scripts) or “Latn” (for Latin scripts).

Table 6.1. Cultures with Scripts

image

Two cultures use nonstandard culture string formats: zh-CHS (Chinese (Simplified)) and zh-CHT (Chinese (Traditional)). Looking at these strings, you would expect them to represent specific cultures (“zh” being a language and “CHS”/”CHT” being a country), but they do not. Instead, they are both neutral cultures and are parents to zh-CN (Chinese (People’s Republic of China)) and zh-TW (Chinese (Taiwan)), respectively. For this reason, you should take care to use the CultureInfo.IsNeutralCulture property to determine whether a culture is neutral instead of making an inference by parsing the culture’s name. This property is similarly important when considering custom cultures because there is no requirement that the name of a custom culture follow a strict <language>-<region> format. Note that there is no CultureInfo.IsSpecificCulture property, so the implication is that if IsNeutralCulture is false, the culture is specific. This is true for all cultures except CultureInfo.InvariantCulture, which is neither neutral nor specific (although it can behave like a specific culture). You can test a culture to see if it is the invariant culture by making a comparison with CultureInfo.InvariantCulture.

Two CultureInfo constructors accept a second Boolean parameter, useUserOverride. This parameter specifies whether the user should be able to override the culture’s number, currency, time, and date default settings (used in the DateTimeFormatInfo and NumberFormatInfo classes) in the Customize Regional Options dialog (in the Regional and Language Options dialog, click on the Customize... button) (see Figure 6.1). (Prior to Windows XP Professional, these same tabs are included in the Regional and Language Options dialog, so there is no need for a Customize button.) When the useUserOverride parameter is true, the user’s settings override the culture’s default settings.

Figure 6.1. The Customize Regional Options Dialog Currency tab

image

If this parameter isn’t specified, the default is true, so the user’s own settings override the default settings. In an ASP.NET 2.0 application where Culture and/or UICulture is “auto”, the useUserOverride parameter is not specified; if you don’t want to accept the user’s settings, you need to override the InitializeCulture method to change this behavior (see Chapter 5, “ASP.NET Specifics”). In addition, the CultureInfo.GetCultureInfo method does not have a useUserOverride parameter, so this method always returns CultureInfo objects where useUserOverride is false.

The recommended practice is to accept the user’s overrides. There are many reasons why the user’s settings would be considered essential. For example, when a country changes its currency, the currency symbol needs to be updated. This happens frequently: in the past with France changing from francs to euros, more recently with Turkey changing from TL (Türk Lirasi) to YTL (Yeni Türk Lirasi), and possibly in the future with the English pound changing to euros. Whereas the most recent versions of the operating system are updated with such changes, they cannot predict future events, and older operating systems (e.g., Windows 98 and Windows NT, in the case of the French franc) remain out-of-date. (Windows XP SP2 was released before the introduction of the new Turkish Lira on January 1, 2005, so it became out-of-date when the old Turkish Lira was removed from circulation at the end of 2005.) A simple solution to this problem is to ensure that your users run Windows Update on a regular basis, as Windows Update keeps culture information up-to-date. The problem itself, however, is lessened by using the .NET Framework 2.0 as opposed to 1.1. The culture information provided by the .NET Framework 2.0 has been updated with known changes, so whereas the .NET Framework 1.1’s tr-TR (Turkish (Turkey)) culture reports that the currency symbol is “TL”, the .NET Framework 2.0’s same culture reports that the currency symbol is “YTL”.

In addition, typically you should trust the user’s good intentions and accept their overrides. The alternative is to reject the user’s overrides and either create new custom cultures with the updated information (see Chapter 11, “Custom Cultures”) or create a CultureInfoProvider:


public class CultureInfoProvider
{
    public static CultureInfo GetCultureInfo(string name)
    {
        CultureInfo cultureInfo = new CultureInfo(name);
        ApplyKnownUpdates(cultureInfo);
        return cultureInfo;
    }

    public static CultureInfo GetCultureInfo(int LCID)
    {
        CultureInfo cultureInfo = new CultureInfo(LCID);
        ApplyKnownUpdates(cultureInfo);
        return cultureInfo;
    }
    public static void ApplyKnownUpdates(CultureInfo cultureInfo)
    {
        if (cultureInfo.Name == "fr-FR")
            cultureInfo.NumberFormat.CurrencySymbol = "€";
        else if (cultureInfo.Name == "tr-TR")
            cultureInfo.NumberFormat.CurrencySymbol = "YTL";
        else if (cultureInfo.Name == "en-GB")
            cultureInfo.NumberFormat.CurrencySymbol = "€";
    }
}

The problem with this solution is that you have to catch every situation that creates a CultureInfo and furthermore be able to change the code to create the CultureInfo using the new CultureInfoProvider class. FxCop can help with this. See the “CultureInfo not provided by Provider” rule in Chapter 13, “Testing Internationalization Using FxCop.” A more sophisticated version of this CultureInfoProvider (which creates the CultureInfoEx objects shown in the section entitled “Extending the CultureInfo Class,” later in this chapter) is included in the source code for this book.

CultureInfo.GetCultures and CultureTypes Enumeration

The static CultureInfo.GetCultures method gets a list of cultures that match a CultureTypes enumeration (see Table 6.2). So CultureInfo.GetCultures(Culture Types.NeutralCultures) returns an array of culture-neutral CultureInfo objects:


foreach (CultureInfo cultureInfo in
    CultureInfo.GetCultures(CultureTypes.NeutralCultures))
{
    listBox1.Items.Add(cultureInfo.DisplayName);
}

Table 6.2. CultureTypes Enumeration

image

The invariant culture is included in the list of neutral cultures even though CultureInfo.InvariantCulture.IsNeutralCulture is false. This inclusion in the list of neutral cultures represents a bug that was present in the .NET Framework 1.0. It persists in later versions of the .NET Framework for backward compatibility.

You can see from the enumeration value in Table 6.2 that CultureTypes can be added together, so CultureInfo.GetCultures(CultureTypes.NeutralCultures | CultureTypes.InstalledWin32Cultures) returns an array of cultures that are either neutral or known to the operating system.

The InstalledWin32Cultures value deserves a special mention. This represents all of the cultures that are known to the current version of the operating system. For each version of the operating system, this is a different number, with Windows XP Service Pack 2 including the highest number of cultures at the time of writing. What is especially useful about this is that when new cultures are subsequently added in later versions of the operating system, they will be included in the list returned by GetCultures and they will be recognized without requiring a new version of the framework. In fact, some cultures either exist or not, depending upon how the operating system has been configured. The Bengali (India), Hindi and Malayam (India) cultures are added to the list of cultures when complex script support is installed (see the “Supplemental Language Support” section in Chapter 7, “Middle East and East Asian Cultures”). This adaptable behavior is new in the .NET Framework 2.0, and, as the .NET Framework 1.1 doesn’t support this option, it is unaware of new cultures and uses its own hard-coded culture list.

The order in which the cultures are returned is unsorted. As the culture list in .NET Framework 1.1 is fixed, the order is a constant and most similar cultures are loosely grouped together, but they are not alphabetically sorted by any criteria. The culture list in the .NET Framework 2.0 is variable, and cultures are returned in a different order than that of the .NET Framework 1.1; however, the order is still, nonetheless, unsorted.

The Relationship Between CultureInfo and Other Globalization Classes

The CultureInfo class frequently represents the focal point of the System.Globalization namespace. It is supported by, and references, many other System. Globalization classes. The relationship between its properties and those classes can be seen in Figure 6.2.

Figure 6.2. The Relationship between CultureInfo and Other System.Globalization Classes

image

(The properties listed in the diagram are only those properties that relate to other classes; this is not a complete list.) We explore these classes throughout this chapter.

The RegionInfo Class

As you know, the CultureInfo class can relate to a language by itself or a language in a country/region. However, it cannot relate to just a country/region alone. This is the purpose of the RegionInfo class. The RegionInfo class describes a country/region regardless of its language. This can be especially useful in Country combo boxes that allow you to select your country from a list of all countries, although it is worth pointing out that a region does not always have a one-to-one mapping with a country. Hong Kong, for example, is a region but not a country. RegionInfo supports two constructors: One accepts an LCID (locale ID), and the other accepts a region code or culture code. It should be noted, however, that an LCID refers to a specific language in a specific country; thus, there is often more than one LCID that refers to the same country/region. The CultureInfo class does not have a Region property that identifies the culture’s region, but one can easily be created by either of the following lines:


RegionInfo regionInfo =
    new RegionInfo(Thread.CurrentThread.CurrentCulture.Name);

RegionInfo regionInfo =
    new RegionInfo(Thread.CurrentThread.CurrentCulture.LCID);

Of the two choices, the constructor that accepts a name is the safer of the two. RegionInfo objects cannot be constructed from LCIDs of custom cultures (see Chapter 11) because supplementary custom cultures all have the same LCID.

The RegionInfo class does not have a GetRegions method corresponding to the CultureInfo’s GetCultures method, but the following GetRegions method provides the same results. (If you are using the .NET Framework 1.1, replace culture Info.Name with cultureInfo.LCID.)


public static RegionInfo[] GetRegions()
{
    Hashtable regionInfos = new Hashtable();
    foreach (CultureInfo cultureInfo in
        CultureInfo.GetCultures(CultureTypes.SpecificCultures))
    {
        RegionInfo regionInfo = new RegionInfo(cultureInfo.Name);
        if (regionInfos[regionInfo.ThreeLetterISORegionName] == null)
            regionInfos.Add(
                regionInfo.ThreeLetterISORegionName, regionInfo);
    }
    RegionInfo[] regionInfoArray = new RegionInfo[regionInfos.Count];
    regionInfos.Values.CopyTo(regionInfoArray, 0);
    return regionInfoArray;
}

The GetRegions method uses the CultureInfo.GetCultures method to get a list of specific cultures (because neutral cultures do not have a country/region) and creates a new RegionInfo from the culture’s name (or the culture’s LCID in the .NET Framework 1.1). As there can be many cultures that refer to the same country/region, we search the Hashtable to see if the country/region is already in the list by looking for a country/region with the same ThreeLetterISORegionName.

The RegionInfo properties are listed in Table 6.3. Notice that the static Current Region property refers to the value retrieved by the Win32 GetUserDefaultLCID API, which is the value set by the user in the Regional and Language Options dialog. This property is not affected by changes to the CurrentCulture or CurrentUICulture. Also note that the DisplayName property gets its resources from the .NET Framework Language Pack; whereas for Windows-only cultures, the NativeName property gets it resources from the operating system. If you do not have an appropriate .NET Framework Language Pack installed, you will get a mismatch if the language is not English. Finally, notice the inadequacies of the Boolean type in the IsMetric property. This property indicates whether the country/region uses the metric system. There are no shades of gray in the Boolean type, so the United States and the United Kingdom report False and True, respectively, where neither is entirely correct nor entirely incorrect.

Table 6.3. RegionInfo Properties

image

Geographical Information

One of the new RegionInfo properties in the .NET Framework 2.0 is GeoId. The geographical ID is an integer that identifies a geographical region. As it uniquely identifies a geographical region, it can be used as a primary key in a database (Microsoft uses GeoIds to support products such as MapPoint). The numbers themselves are defined by Microsoft and can be referenced in the Table Of Geographical Locations (http://msdn.microsoft.com/library/default.asp?url=/library/enus/intl/nls_locations.asp). Apart from the GeoId’s value as a unique identifier, it can be used to retrieve information about a geographical region (as opposed to a locale). Unfortunately, the .NET Framework does not have a GeoInfo class to store or retrieve this information, but we can write one. Geographical information is retrieved using the GetGeoInfo Win32 function:


[DllImport("kernel32")]
protected static extern int GetGeoInfo(
    int GeoId,
    SYSGEOTYPE GeoType,
    StringBuilder lpGeoData,
    int cchData,
    int language
);

The GeoId comes straight from the RegionInfo.GeoId (if you are using the .NET Framework 1.1, you could use the information in the “Table Of Geographical Locations” to build a static lookup of GeoIds from their names). The GeoType identifies the type of information you are getting and is a SYSGEOTYPE. lpGeoData is a buffer into which the returned information is placed. cchData is the size of the buffer. language is the language identifier that you want the information to be returned in (1033 is English (United States)). SYSGEOTYPE is an enumeration:


public enum SYSGEOTYPE
{

    GEO_NATION = 0x0001,
    GEO_LATITUDE = 0x0002,
    GEO_LONGITUDE = 0x0003,
    GEO_ISO2 = 0x0004,
    GEO_ISO3 = 0x0005,
    GEO_RFC1766 = 0x0006,
    GEO_LCID = 0x0007,
    GEO_FRIENDLYNAME = 0x0008,
    GEO_OFFICIALNAME = 0x0009,
    GEO_TIMEZONES = 0x000A,
    GEO_OFFICIALLANGUAGES = 0x000B,
};

To retrieve information using GetGeoInfo, you should call it once to get the buffer size and then a second time to retrieve the information itself. The following method is a convenient wrapper around the GetGeoInfo method:


protected virtual string GetGeoInfoString(SYSGEOTYPE sysGeoType)
{

    // find out the length of the geo information
    int length = GetGeoInfo(geoId, sysGeoType, null, 0, language);
    if (length == 0)
        return null;
    else
    {
        StringBuilder lpGeoData = new StringBuilder(length);
        // get the geo information
        int result = GetGeoInfo(
            geoId, sysGeoType, lpGeoData, length, language);
        if (result == 0)
            return null;
        else
            return lpGeoData.ToString();
    }
}

The full GetInfo class is part of the source code for this book, but here is a cut-down version showing just the OfficialName property:


class GeoInfo
{
    private int geoId;
    private int language = 1033;

    public GeoInfo(int geoId)
    {
        this.geoId = geoId;
    }
    public GeoInfo(int geoId, int language)
    {
        this.geoId = geoId;
        this.language = language;
    }
    public string OfficialName
    {
        get
        {
            string geoInfo =
                GetGeoInfoString(SYSGEOTYPE.GEO_OFFICIALNAME);
            if (geoInfo != null)
                return geoInfo;
            else
                throw new GeoInfoException(
                    "Failed to retrieve Geo Information",
                    SYSGEOTYPE.GEO_OFFICIALNAME, geoInfo);
        }
    }
}

You use the GeoInfo class like this:


RegionInfo regionInfo = new RegionInfo("en-US");
GeoInfo geoInfo = new GeoInfo(regionInfo.GeoId);
MessageBox.Show("Official Name: " + geoInfo.OfficialName);

String Comparisons

Comparing the equality of two strings is typically considered a simple matter. Programmers vary in their preference among five choices:

• The equality operator (==)

String.Equals

String.CompareTo

String.Compare

String.CompareOrdinal (and String.Compare used with StringComparison.Ordinal)

The first two choices are essentially the same choice. If the compiler detects that both sides of the equality operator are strings, the operation equates to String.Equals. String.Equals performs a case-sensitive, culture-insensitive comparison:


string s1 = "Bob";
string s2 = "BOB";
if (String.Equals(s1, s2))
    textBox1.Text += "Equal" + System.Environment.NewLine;
else
    textBox1.Text += "Not Equal" + System.Environment.NewLine;

string s3 = "Bob";
string s4 = "Bob";
if (String.Equals(s3, s4))
    textBox1.Text += "Equal" + System.Environment.NewLine;
else
    textBox1.Text += "Not Equal" + System.Environment.NewLine;

In this example, “Bob” does not equal “BOB” because they are different cases. “Bob” does equal “Bob” because their values are the same. This shows that String.Equals tests the values of strings, not the references of string, so although s3 and s4 have different references, they have the same value.

The third, fourth, and fifth choices are also the same if a test for equality is your only desire. These three methods return an integer, where 0 means that the two strings are equal. Although the return results will vary according to the given culture if the strings are not equal, the results will not vary, regardless of culture, if the two strings are exactly equal. Consequently, a comparison with 0 will always be accurate, regardless of cultural considerations. The difference between these methods and the meaning of nonzero return results are covered in the section entitled “Sort Orders.”

Casing

Latin script languages have a concept of upper and lower case, which all developers are familiar with. However, not all languages have this concept or implement it in the same way. Some languages (e.g., Japanese) are case-less. Others exist only as upper case (e.g., Khutsuri) or lower case (e.g., Nushkuri). For these languages, no case conversions should occur. If you intend to support Azeri or Turkish, you should be aware of the special case of the letter I. Most Latin script languages have two I characters: a Capital Letter I (without a dot on top) and a Small Letter i (with a dot on top). Azeri and Turkish have two more I characters: a Capital Letter i (with a dot on top) and a Small Letter image (without a dot on top). You need to be aware of this because the rules for conversion between upper and lower case among these four letters are different for Azeri and Turkish cultures than they are for other cultures. (These special rules are known as Turkic Casing Rules.) Tables 6.4 and 6.5 show the effects of converting each of the four letters between upper and lower case using the “en” and “tr”, cultures respectively.

Table 6.4. Upper- and Lowercasing I Using English Culture

image

Table 6.5. Upper- and Lowercasing I Using Turkish Culture

image

From these two tables, it can be seen that the Turkish lowercase equivalent of Latin Capital I is not the same as the English lowercase equivalent, and the Turkish uppercase equivalent of Latin Small I is not the same as the English uppercase equivalent. The problem is illustrated in the following code fragment:


CultureInfo cultureInfo = new CultureInfo("en");
string test = "Delphi is in italics";
string testUpper = "DELPHI IS IN ITALICS";
if (test.ToUpper(cultureInfo).CompareTo(testUpper) == 0)
    Text = "Equal";
else
    Text = "Not equal";

The two strings are equal if the culture is “en”, but they are not equal if the culture is “tr”. How you handle this difference in code is dependent upon the nature of the strings being compared. If you were comparing a company name typed by a user, a case-less conversion using String.Compare and passing the CurrentCulture would be the safest comparison. If, however, you were comparing a string that could be considered a programmatic element against a known string—say, an XML tag name—you should use the invariant culture to perform the comparison, to ensure that culture-specific casing rules do not change the success of the comparison.

Sort Orders

Sorting (also called collation) has numerous differences from culture to culture. It is one of those many areas that the .NET Framework handles correctly with very little intervention on behalf of the developer, other than to specify what culture should be used for sorting. Table 6.6 provides a number of examples of characters and character combinations that sort differently in one language to another.

Table 6.6. Examples of Characters with Different Sort Behaviors in Different Languages

image

Fortunately, developers do not need to remember or even know these differences—only that there are differences. Take, for example, the Array.Sort method. This method accepts an IComparer interface to sort elements of an array. If the IComparer is null, Array.Sort uses each element’s IComparable interface to determine the order of a sort. IComparable has a single method, CompareTo. The String class supports the IComparable interface and includes the CompareTo method. The String.CompareTo method uses the CultureInfo.CurrentCulture.Compare-Info.Compare method to perform a culture-sensitive comparison between two strings. In the following code snippet, the en-US culture is used to sort two strings:


string[] strings = new string[] {"eé", "ée"};
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
Array.Sort(strings);
foreach(string s in strings)
{
    listBox1.Items.Add(s);
}

The output is “” and then “ée”. If we change the culture to French in France (fr-FR), in which diacritics (e.g., the acute accent above the e) are evaluated from right to left instead of left to right, the order is reversed. The point is that the sort order will respect the local culture’s sort behavior without you having to know what that behavior is.

This returns us to the String.CompareTo, String.Compare and String.CompareOrdinal methods that we saw earlier in the “String Comparisons” section. The single difference between String.CompareTo and String.Compare is that the former is an instance method and the latter is a static method. As we have seen, the String.CompareTo method exists so that the String class can support the IComparable interface. In nearly all cases, String.CompareTo and String.Compare call CompareInfo.Compare to perform string comparisons. The various overloads either accept an explicit culture or, like most globalization methods, default the culture to CultureInfo.CurrentCulture. The return result is an integer indicating the relative order of the two strings:

Negative if the first string is sorted before the second

0 if the first string is equal to the second

Positive if the first string is sorted after the second

In most cases, the negative value will be -1 and the positive value will be +1. The exception is the String.Compare overload in the .NET Framework 2.0, which accepts a StringComparison enumeration where the value is Ordinal or Ordinal IgnoreCase. In this scenario, the “magnitude” of the difference is expressed in the same way as for String.CompareOrdinal.

String.CompareOrdinal is similar to String.Compare, but it performs the comparison based upon the Unicode code points of each character in the string and returns the “magnitude” of the difference. For example, String.CompareOrdinal("a", "á") returns -128. The Unicode code point of the letter “a” is U+0061 (97), and the Unicode code point of the letter “á” is U+00E1 (225). The result is 97 – 225 (i.e., –128). There are several benefits to using String.CompareOrdinal: It is culture-insensitive (because it uses Unicode code points) and it is faster than other comparison methods. It should also be noted that String.CompareOrdinal compares all characters in a string, whereas other comparison methods are dependent upon characters being defined in the .NET Framework’s sorting tables. This means that if a comparison is performed using String.Compare and is passed the invariant culture, characters that are not in the .NET Framework’s sorting tables will simply be ignored.

For these reasons, Microsoft recommends using an ordinal comparison for culture-insensitive comparisons. Additionally, it should be noted that although the actual sort itself is unlikely to yields results that are culturally significant for any particular culture, it can still be useful for maintaining ordered lists that require fast searching.

Alternate Sort Orders

The discussions on sort orders so far have assumed that each culture has a single method of sorting. However, a few cultures have more than one way of sorting the same data. All existing .NET Framework cultures have a default sort order, and a few have a single alternate sort order in addition to their default. Spanish, for example, has two sort orders: Modern/International (the default sort order that is typically used in Spain and the U.S.) and the Traditional alternate sort order (used less frequently in some situations in Spain). Each CultureInfo object has a single CompareInfo class that it uses for sorting. Using a different sort order requires creating a different culture. To use the Traditional sort order for Spanish, you must create a new CultureInfo object or create a CompareInfo object using CompareInfo.GetCompareInfo. The CultureInfo object for the alternate sort order is identical in every way to the CultureInfo object for the default sort order, with the exception of its CompareInfo object. This means that the culture’s name is also the same. This presents a problem, then, in creating the CultureInfo object:


CultureInfo cultureInfo = new CultureInfo("es-ES");

This code is ambiguous to the reader because there are two cultures that have the name “es-ES”. When you use a string in the format <language>-<region> to identify a culture, you get the culture with the default sort order. Both the .NET Framework 1.1 and 2.0 enable you to create a culture for an alternate sort order using a locale ID (LCID):


CultureInfo cultureInfo = new CultureInfo(0x0000040A);

In addition, the .NET Framework 2.0 supports the creation of cultures for an alternate sort order using a language and region suffixed with the alternate sort order:


CultureInfo cultureInfo = new CultureInfo("es-ES_tradnl");

The same name can be used with the new CultureInfo.GetCultureInfo method to get a cached read-only CultureInfo. The following two examples of the CultureInfo.GetCultureInfo result in the same CultureInfo object:


CultureInfo cultureInfo1 =
    CultureInfo.GetCultureInfo("es-ES", "es-ES_tradnl");

CultureInfo cultureInfo2 =
    CultureInfo.GetCultureInfo("es-ES_tradnl");

This capability to specify a culture including an alternate sort order by name is an important enhancement to the .NET Framework because it means that all cultures can now be represented as strings. In contrast, to be able to represent all cultures (including cultures with alternate sort orders), code written for the .NET Framework 1.1 must support representing cultures using both strings (e.g., “es-ES”) and also integers (e.g. 0x0000040A).

Table 6.7 is a list of all of the alternate sort orders recognized by the .NET Framework 1.1 and 2.0.

Table 6.7. Cultures with Alternate Sort Orders

image

The CompareInfo class has a Name property in the .NET Framework 2.0 (but not 1.1), which is the same as the CultureInfo.Name (e.g. “es-ES”) for all default sort orders. This name can be used to specify cultures for alternate sort orders in the .NET Framework 2.0.

It is worth noting that, regardless of the data type (i.e., string or integer) used to create a CultureInfo object, the resulting CultureInfo.Name is the same as the name of the default sort order. The following example outputs “es-ES”, “es-ES”, and “es-ES”:


CultureInfo cultureInfo1 = new CultureInfo("es-ES");
CultureInfo cultureInfo2 = new CultureInfo(0x0000040A);
CultureInfo cultureInfo3 = new CultureInfo("es-ES_tradnl");
listBox1.Items.Add(cultureInfo1.Name);
listBox1.Items.Add(cultureInfo2.Name);
listBox1.Items.Add(cultureInfo3.Name);

To distinguish between the different cultures, you should use either the LCID (in the .NET Framework 1.1 and 2.0) or, preferably, the CompareInfo.Name (in the .NET Framework 2.0).

Unfortunately, the .NET Framework does not support any facility for program-matically discovering alternate sort orders. However, the Win32 EnumSystem Locales function accepts a parameter of LCID_ALTERNATE_SORTS, which does provide this functionality and enables you to offer a choice of sort orders to a user. The following class is a wrapper around this function, and the GetAlternative SortOrders method returns an array of LCIDs of alternate sort orders:


public class AlternateSortOrders
{
    public static int[] GetAlternateSortOrders()
    {
        const uint LCID_ALTERNATE_SORTS = 4;

        alternateSortOrders = new List<int>();

        EnumSystemLocales(new LocaleEnumProc(AlternateSortsCallback),
            LCID_ALTERNATE_SORTS);

        int[] alternateSortOrdersArray =
            new int[alternateSortOrders.Count];

        alternateSortOrders.CopyTo(alternateSortOrdersArray);

        return alternateSortOrdersArray;
    }

    protected delegate bool LocaleEnumProc(string lcidString);

    [DllImport("kernel32.dll")]
    protected static extern bool EnumSystemLocales(
        LocaleEnumProc lpLocaleEnumProc, uint dwFlags);

    protected static List<int> alternateSortOrders;

    protected static bool AlternateSortsCallback(string lcidString)
    {
        int LCID;
        if (Int32.TryParse(lcidString,
            NumberStyles.AllowHexSpecifier, null, out LCID))
            alternateSortOrders.Add(LCID);

        return true;
    }
}

The GetAlternateSortOrders method calls EnumSystemLocales and passes a method (AlternateSortsCallback) to call back for each alternate sort order and a flag (LCID_ALTERNATE_SORTS) specifying that only the alternate sorts should be enumerated. The AlternateSortsCallback method simply converts the LCID string to an integer and adds it to an internal list. When the EnumSystemLocales function has completed enumerating locales, the GetAlternateSortOrders method converts the list of integers to an array of integers and returns the array.

As an alternative, the user can specify the preferred sort order in the Regional and Language Options dialog by clicking on the Customize button and selecting the Sort tab (see Figure 6.3).

Figure 6.3. Using Customize Regional Options to Specify a Sort Order

image

Although this applies to all .NET Framework applications, it is unlikely to be useful in ASP.NET applications because the culture setting is more likely to arrive from the user’s language preferences on their own machine. Figure 6.4 shows that the Spanish (International Sort) and Spanish (Traditional Sort) language preferences appear to be distinct.

Figure 6.4. Internet Explorer Language Preferences Dialog Showing Different Sort Orders

image

Unfortunately, this is just smoke and mirrors, as you can see from the “[es]” language code next to the description. If you close this dialog and reopen it (see Figure 6.5), you will see that even Internet Explorer cannot tell the difference between “es” and “es”. There are at least two workarounds, and both involve defining a User-Defined language in Internet Explorer. The simplest is to specify a culture name that includes the sort order (e.g., “es-ES_tradnl”). The second, more complex, workaround is to specify a culture using the LCID as the name (see Figure 6.6).

Figure 6.5. Internet Explorer Language Preferences Dialog Showing Same Sort Orders

image

Figure 6.6. Internet Explorer Language Preferences Dialog with User-Defined LCID

image

Unfortunately, the Culture="auto" and UICulture="auto" tags used in ASP.NET 2.0 localized forms do not recognize LCIDs as valid culture identifiers, so you have to read the Request.UserLanguage[0] value and set the CurrentCulture and CurrentUICulture in code. In ASP.NET 2.0, you can override the Page.InitializeCulture method to initialize the culture from the LCID:


protected override void InitializeCulture()
{
    if (Request.UserLanguages.GetLength(0) > 0)
    {
        string userLanguage = Request.UserLanguages[0];
        if ((userLanguage.StartsWith("0x") ||
            userLanguage.StartsWith("0X"))&&
            userLanguage.Length > 2)
        {
            // Int32.Parse requires that hex numbers do not
            // start with "0x" or "oX"
            string LCIDString = userLanguage.Substring(2);

            int LCID;
            if (Int32.TryParse(LCIDString,
                NumberStyles.AllowHexSpecifier, null, out LCID))
            {
                try
                {

                    Thread.CurrentThread.CurrentCulture =
                        new CultureInfo(LCID);

                    Thread.CurrentThread.CurrentUICulture =
                        Thread.CurrentThread.CurrentCulture;
                }
                catch (ArgumentException)
                {
                    // the LCID was not a valid LCID
                }
            }
        }
        else
        {
            try
            {
                int LCID = Convert.ToInt32(userLanguage);

                Thread.CurrentThread.CurrentCulture =
                    new CultureInfo(LCID);

                Thread.CurrentThread.CurrentUICulture =
                    Thread.CurrentThread.CurrentCulture;
            }
            catch (ArgumentException)

            {
                // the LCID was not a valid LCID
            }
            catch (FormatException)
            {
                // the LCID was not an integer
            }
        }
    }
}

This method accepts LCIDs either as hex values (prefixed with “0x”) or as integers. Notice that the method deliberately ignores exceptions that result from an invalid user language, in keeping with ASP.NET’s default behavior.

Persisting Culture Identifiers

There will be occasions when it will be necessary to persist a culture identifier. It may be to store a culture in a config file for a user preference, or in a database to maintain a list of selected cultures, or in an XML document for consumption by another process. The method of persistence of the culture identifier requires a moment’s thought. We saw in the previous section that simply using a culture’s name is insufficient to distinguish between a culture with a default sort order and a culture with an alternate sort order (because both cultures have the same name). The following method is suitable for persisting culture identifiers in the .NET Framework 2.0:


public static string GetPersistentCultureName(
    CultureInfo cultureInfo)
{
    if ((CultureTypes.UserCustomCulture & cultureInfo.CultureTypes)
        != (CultureTypes)0)
        return cultureInfo.Name;
    else
        return cultureInfo.CompareInfo.Name;
}

The if statement determines whether the culture is a custom culture (see Chapter 11). If the culture is a custom culture, the culture’s name uniquely identifies the culture in all cases. If the culture is not a custom culture, the culture’s CompareInfo name uniquely identifies the culture. The CompareInfo name is used instead of the culture’s name because this value respects the culture’s sort order. We cannot use the CompareInfo name for custom cultures because custom cultures use CompareInfo objects from existing cultures, and such names do not uniquely identify the custom culture.

The following method is suitable for persisting culture identifiers in the .NET Framework 1.1:


public static string GetPersistentCultureName(
    CultureInfo cultureInfo)
{
    if (cultureInfo.LCID == 0x0000040A ||
        cultureInfo.LCID == 0x00030404 ||
        cultureInfo.LCID == 0x00020804 ||
        cultureInfo.LCID == 0x00020c04 ||
        cultureInfo.LCID == 0x00021004 ||
        cultureInfo.LCID == 0x00021404 ||
        cultureInfo.LCID == 0x00010411 ||
        cultureInfo.LCID == 0x00010412 ||
        cultureInfo.LCID == 0x00010407 ||
        cultureInfo.LCID == 0x0001040e ||
        cultureInfo.LCID == 0x00010437)
        return cultureInfo.LCID.ToString();
    else
        return cultureInfo.Name;
}

The .NET Framework 1.1 does not support custom cultures, so there is no need to write code for them. However, the .NET Framework 1.1’s CompareInfo class doesn’t have a name property, and its CultureInfo’s constructors do not accept Compare Info names to create cultures with alternate sort orders. The result is that cultures with alternate sort orders must be persisted using their locale IDs instead of their names. Any code that subsequently constructs a CultureInfo object from the resulting string must first check whether the string contains a name or number. If it contains a number, the string must first be converted to an integer.

Calendars

The calendar system known and used in most English-speaking countries is known as the Gregorian calendar. Started in 1582, it replaced the previous Julian calendar system, which had become increasingly inaccurate. Like other globalization issues, you can support any number of alternative calendar systems in use throughout the world without being aware of their differences by using the calendar classes provided. You can create new calendar objects directly from their class constructors (e.g., new GregorianCalendar()), but you will also encounter calendars through the use of the CultureInfo.Calendar, CultureInfo.OptionalCalendars and Date-TimeFormatInfo.Calendar properties. For the “en-US” culture, CultureInfo. Calendar is a GregorianCalendar object. For the “ar-SA” (Arabic (Saudi Arabia)) culture, CultureInfo.Calendar is a HijriCalendar object. In fact, it is only the Arabic, Divehi and Thai cultures for which CultureInfo.Calendar is not a GregorianCalendar. The CultureInfo.Calendar property represents merely the “default” calendar used by the culture. Cultures can support any number of calendars (although the current maximum is seven) through the OptionalCalendars property. This array’s first element contains the default calendar object, so CultureInfo.Calendar is equal to CultureInfo.OptionalCalendars[0]. The list of OptionalCalendars for the Arabic (Saudi Arabia) culture is shown in Table 6.8.

Table 6.8. Arabic Culture Optional Calendars

image

As you can see, the last five calendars are all GregorianCalendars. GregorianCalendars have a CalendarType property (see Table 6.9) that determines the language used in date/time strings, but it does not affect the values of Gregorian Calendar properties.

Table 6.9. GregorianCalendarTypes Enumeration

image

For cultures that support more than one calendar, you can change the culture’s calendar to one of the OptionalCalendars by assigning a new calendar to the culture’s DateTimeFormat.Calendar:


CultureInfo cultureInfo = new CultureInfo("ar-SA");
// change the calendar to the second optional calendar
cultureInfo.DateTimeFormat.Calendar =
    cultureInfo.OptionalCalendars[1];

// change the calendar to the Gregorian(MiddleEastFrench) calendar
cultureInfo.DateTimeFormat.Calendar =
    new GregorianCalendar(GregorianCalendarTypes.MiddleEastFrench);

// Throws an ArgumentOutOfRangeException
cultureInfo.DateTimeFormat.Calendar = new JapaneseCalendar();

The complete set of calendar classes is shown in Figure 6.7. The classes typically differ in the following ways:

• The value of their properties (see Table 6.10)

Table 6.10. Calendar Property Values

image

• The name of their calendar-specific static era field (e.g., ADEra for GregorianCalendar, HebrewEra for HebrewCalendar)

• The logic used in their methods that are dependent upon calendar-specific calculations (e.g., AddMonths, AddYears, GetDayOfYear, GetDaysInMonth, GetLeapMonth, GetWeekOfYear, IsLeapYear)

Figure 6.7. .NET Framework Calendar Class Hierarchy

image

Methods that perform day, week, and time arithmetic (such as AddDays, AddHours, AddSeconds, AddWeeks) are implemented in the base Calendar class.

The logic behind the calendar calculations is determined mostly by the AlgorithmType property, which is a CalendarAlgorithmType enumeration (see Table 6.11). Although this property and its enumeration are new in the .NET Framework 2.0, it is still useful for categorizing the calendar classes in a discussion involving any version of the framework.

Table 6.11. CalendarAlgorithmType Enumeration

image

Table 6.12 shows some of the differences that you can expect from the calendars based on the different algorithms. The table shows the different possible values for the Calendar GetDaysInMonth, GetMonthsInYear, and GetDaysInYear methods. If there was any doubt about whether you should hard-code these values based on a North American/European background or use the .NET Framework’s globalization classes, this table should remove that doubt.

Table 6.12. Examples of CalendarAlgorithmType Differences

image

Calendars classes are often used in DateTime constructors to work with dates based on a given calendar, so in the following example, the year/month/day has a different meaning to each of the three calendars:


DateTime dateTime1 =
    new DateTime(2000, 1, 1, new GregorianCalendar());

DateTime dateTime2 =
    new DateTime(2000, 1, 1, new HijriCalendar());

DateTime dateTime3 =
    new DateTime(2000, 1, 1, new JapaneseCalendar());

listBox1.Items.Add(dateTime1.ToString("dd MMM yyyy"));
listBox1.Items.Add(dateTime2.ToString("dd MMM yyyy"));
listBox1.Items.Add(dateTime3.ToString("dd MMM yyyy"));

The result is:


01 Jan 2000
07 Jan 2562
01 Jan 3988

In this example, the year (2000), month (1), and day (1) mean a different day in time to different calendars.

Calendar Eras

The Calendar class has a read-only Eras integer array that lists the era numbers that can be used with the calendar. For all calendars except the JapaneseCalendar and JapaneseLunisolarCalendar, Eras has a single element containing the value 1. For the GregorianCalendar, this means that the calendar covers a single era— namely, AD (Anno Domini), also called CE (Current Era). The previous era, BC (Before Christ), also called BCE (Before Common Era), is not covered by the GregorianCalendar. The JapaneseCalendar has four eras, which are numbered 4, 3, 2, and 1 in elements 0, 1, 2, and 3 of the Eras array. The JapaneseLunisolarCalendar has two eras, which are numbered 2 and 1 in elements 0 and 1. The only information that is available about these eras is the era name (in Kanji) and the era’s abbreviated name (in Kanji). No further information is available about these eras programmatically, so you need to know that they refer to the eras of the Japanese Modern Period. Each era corresponds to the reign of a different emperor. Information about these eras, such as the Romaji names of the eras (Meiji, Taisho, Showa, and Heisei), the names of the emperors, and the periods of the eras are all unavailable, and you would have to manually hard-code such information into your application if you needed to make reference to it. The two references to the Eras property in the Calendar class are by the GetEras method and the CurrentEra field. The GetEras method accepts a DateTime and reports the era number (not the Era array element number), so the era for January 1, 2005, is 4. The Calendar.CurrentEra field is a read-only static constant with the value 0 and refers to the element number of the Eras array. Unfortunately, whereas GregorianCalendar has a static constant field (ADEra) that identifies its single era number, the JapaneseCalendar and JapaneseLunisolarCalendar classes do not have equivalent constants (e.g., MeijiEra, TaishoEra, ShowaEra, and HeiseiEra) to make programmatic comparisons with era numbers meaningful.

Calendar.TwoDigitYearMax

The Calendar.TwoDigitYearMax property is used to identify the century of two-digit years when date strings are parsed. For example, is “1/1/30” January 1, 0030, or January 1, 1930, or January 1, 2030 ? The TwoDigitYearMax represents the maximum year that is used for interpreting the century. Two-digit years that are higher than the last two digits of the year are assumed to be 100 years earlier than the TwoDigitYearMax. The following example uses the English(US) culture, which uses the GregorianCalendar, which has a TwoDigitYearMax of 2029:


Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
listBox1.Items.Add(DateTime.Parse("1/1/29").ToLongDateString());
listBox1.Items.Add(DateTime.Parse("1/1/30").ToLongDateString());
cultureInfo.Calendar.TwoDigitYearMax = 2050;
listBox1.Items.Add(DateTime.Parse("1/1/29").ToLongDateString());
listBox1.Items.Add(DateTime.Parse("1/1/30").ToLongDateString());

The result is:


Monday, January 01, 2029
Wednesday, January 01, 1930
Monday, January 01, 2029
Tuesday, January 01, 2030

The user can change the TwoDigitYearMax (see Figure 6.8) by clicking the Customize button in the Regional Settings tab of Regional and Language Options.

Figure 6.8. Customizing the Calendar.TwoDigitYearMax

image

For the value to be used, the CultureInfo’s UseUserOverride parameter must be true, which, by default, it is.

DateTimes, DateTimeFormatInfos, and Calendars

The relationship between the DateTime structure and DateTimeFormatInfo and Calendar classes warrants some explanation. The DateTime structure holds date/time information as a culture-agnostic point in time in the form of ticks (a long type). Regardless of how a DateTime structure is created, the point in time is not dependent upon any given culture, so January 1, 2000, refers to a fixed point in time, regardless of the calendar system used to create it or represent it. The DateTime structure accepts calendar objects in its constructor for the sole purpose of interpreting the year, month, and day passed to the constructor. (Recall from the previous section that the year 2000, month 1, and day 1 is open to interpretation depending upon the calendar system in use.) After the meaning of the year, month, and day has been established, the DateTime is not directly related to any given calendar.

The DateTimeFormatInfo class contains date-/time-formatting information. It has a read/write Calendar property from which it can draw information about how any given point in time is represented in a particular calendar. In the following code example, CultureInfo objects are created for three cultures: “en-US(English (United States)), “ar-SA(Arabic (Saudi Arabia)), and “ja-JP(Japanese (Japan)). The default calendar for all three cultures is the GregorianCalendar, but the calendar is changed for the “ar-SA” culture to the HijriCalendar and for the “ja-JP” culture to the JapaneseCalendar. (Remember, the CultureInfo.Calendar property is the default Calendar and is read-only, so the only way to change a culture’s calendar is through its DateTimeFormat property.)


CultureInfo englishCultureInfo = new CultureInfo("en-US");

CultureInfo arabicCultureInfo = new CultureInfo("ar-SA");
arabicCultureInfo.DateTimeFormat.Calendar = new HijriCalendar();

CultureInfo japaneseCultureInfo = new CultureInfo("ja-JP");
japaneseCultureInfo.DateTimeFormat.Calendar = new JapaneseCalendar();

DateTime firstJan2000 =
    new DateTime(2000, 1, 1, new GregorianCalendar());

listBox1.Items.Add(firstJan2000.ToString(englishCultureInfo));
listBox1.Items.Add(firstJan2000.ToString(arabicCultureInfo));
listBox1.Items.Add(firstJan2000.ToString(japaneseCultureInfo));

A single variable, firstJan2000, is shown three times, once for each calendar, giving the following results:


1/1/2000 12:00:00 AM
25/09/20 12:00:00 image
image 12/1/1 0:00:00

Although the representations of January 1, 2000, show different days, months, and years (and different AM/PM notation), it is the same point in time, as should be obvious by the fact that the variable does not change its value. The calendars are simply used to provide a human point of reference to the underlying ticks. You could draw an analogy with temperature in which the freezing point of water represents a single absolute temperature value (like a single point in time), and Fahrenheit, Celsius, and Kelvin are simply different ways of representing the same value—so Fahrenheit, Celsius, and Kelvin act like the calendars of the temperature world.

The DateTimeFormatInfo class has several methods that provide access to the localized names of days, months, and eras (see Table 6.13).

Table 6.13. DateTimeFormatInfo Localized Name Methods

image

It should go without saying that you should always use these methods to iterate through day and month names instead of hard-coding them, but the following code should give you even more reason to use the GetMonthName method:


CultureInfo cultureInfo = new CultureInfo("he-IL");
DateTimeFormatInfo dtfi = cultureInfo.DateTimeFormat;
dtfi.Calendar = new HebrewCalendar();
for(int monthNumber = 1; monthNumber <=
    dtfi.Calendar.GetMonthsInYear(5345); monthNumber++)
{
    listBox1.Items.Add(dtfi.GetMonthName(monthNumber));
}

In this example, we use the “he-IL(Hebrew (Israel)) culture and assign to its DateTimeFormatInfo.Calendar a new HebrewCalendar object. The Hebrew calendar has 13 months in leap years, and the Hebrew year 5345 is one such year.

We use the Calendar.GetMonthsInYear method to get the number of months in the given year (13, in this example), and then use the DateTimeFormatInfo.GetMonthName to get the name of each month.

DateTime.ToString, DateTime Parsing, and DateTimeFormatInfo

The DateTime.ToString method provides myriad formatting options for formatting date and times. You are at liberty to devise your own date/time formatting by assembling the basic building blocks of format patterns (d, M, y, g, h, H, m, s, f, t, z for the day, month, year, era, 12 hour, 24 hour, minute, second, fractions of a second, am/pm designator, and time zone offset, respectively), such as this:


new DateTime(2005, 1, 2).ToString("MM/dd/yy");

The problem with this approach, however, is that it is almost certainly culturally biased. Whether the resulting string (“01/02/05”) means January 2, 2005, or February 1, 2005, or February 5, 2001, depends on whether you come from the U.S., the U.K., or the People’s Republic of China. The locale-aware solution is to let the .NET Framework worry about the format and use format characters instead of constructing your own format patterns. Format characters are single letters that indicate a completed pattern without forcing a specific implementation of that pattern. So, for example, the following conversion to the short date time character “d”:


new DateTime(2005, 1, 2).ToString("d");

results in these strings in English (United States), English (United Kingdom), and Chinese (People’s Republic Of China), respectively:


1/2/2005
02/01/2005
2005-1-2

The full list of format characters, their equivalent methods, and associated Date-TimeFormatInfo pattern properties is shown in Table 6.14. There is no functional difference between the format character and its associated method, so the following two lines are functionally identical:


new DateTime(2000, 1, 1).ToString("d");
new DateTime(2000, 1, 1).ToShortDateString();

Table 6.14. DateTime Format Character and Methods, and DateTimeFormatInfo Patterns

image

However, the former accepts an optional IFormatProvider parameter (see the next section), whereas the latter does not, so it could be considered more versatile but equally, by FxCop standards (also see next section), more ambiguous.

So you should conclude from this that when displaying a date/time to the user, you should always use the format character or its associated method, and should avoid building your own formats. If you want to enforce this approach in your code, take a look at the “DateTime.ToString() should not use a culture-specific format” rule in Chapter 13.

The DateTime structure in the.NET Framework 1.1 supports two methods for parsing date/time strings into dates/times: Parse and ParseExact. The .NET Framework 2.0 adds two new methods: TryParse and TryParseExact. The Parse method is very forgiving in its parsing of date/time strings and works hard to attempt to recognize numerous variations on string formats. So “12/31/01”, “December, 31 01”, and “31 December 01” are all parsed (using English (United States)) to mean December 31, 2001. This flexibility can be very handy but has a necessary performance hit and can be fooled. The ParseExact method, however, demands the date/time format string that should be used to parse the string. There are no gray areas; if the string doesn’t match the format, an exception is thrown. The TryParse and TryParseExact .NET Framework 2.0 methods simply try the same operations but do not throw an exception if the parse attempt fails.

Genitive Date Support

If ever you needed any more reasons to let the .NET Framework do your globalization for you, then there is the issue of genitive dates. The good news is that the .NET Framework understands this problem and deals with it accurately. This is especially good news because it is unlikely that most English-only developers will know what a genitive date is or what the problem is. The issue is that, in some languages, when a month can be seen to “own” or “possess” a day, the month name changes. Table 6.15 shows English month names, Polish month names, and their genitive forms.

Table 6.15. Polish Month Names and Their Genitive Forms

image

You can see that the month name is different when it is used to “possess” a day. In the list of cultures that the .NET Framework supports, Czech, Greek, Latvian, Lithuanian, Mongolian, Polish, and Slovak all use genitive dates. The moral of the story, as usual, is to let the framework build date strings instead of taking the problem into your own hands. In both the .NET Framework 1.1 and 2.0, the DateTimeFormatInfo class understands how to use genitive dates, so the DateTime.ToString method always returns correct strings. In the .NET Framework 2.0, you can gain programmatic access to the genitive month names using the AbbreviatedMonthGenitiveNames and MonthGenitiveNames array properties.

DateTime.ToString and IFormatProvider

The DateTime.ToString method has four overloads, which ultimately all boil down to a single signature:


public string ToString(string format, IFormatProvider provider);

This method simply returns the following value:


DateTimeFormat.Format(
    this, format, DateTimeFormatInfo.GetInstance(provider));

If the provider parameter is null, DateTimeFormatInfo.GetInstance uses CultureInfo.CurrentCulture.

The IFormatProvider interface has a single method:


public interface IFormatProvider
    {
          object GetFormat(Type formatType);
}

There are just three classes in the .NET Framework that support the IFormatProvider interface:

CultureInfo

DateTimeFormatInfo

NumberFormatInfo

The GetFormat method accepts a Type and returns an object of that Type. So if the CultureInfo.GetFormat method is called with the DateTimeFormatInfo Type, then it returns the value of its CultureInfo.DateTimeFormat property. As the name implies, the IFormatProvider implementation provides formatting information. In this example, a German CultureInfo object provides the IFormatProvider interface:


DateTime firstJan2000 = new DateTime(2000, 1, 1);
CultureInfo cultureInfo = new CultureInfo("de-DE");
listBox1.Items.Add(firstJan2000.ToString("D", cultureInfo));

The following string is added to the list box:


Samstag, 1. Januar 2000

The CultureInfo object provides the DateTimeFormatInfo, which includes the date-/time-formatting patterns and also the Calendar required to represent the date. The FxCop “Specify IFormatProvider” rule (see Chapter 13) enforces that the IFormatProvider parameter is always passed to DateTime.ToString. The idea behind this rule is to ensure that there is no ambiguity in the way the date is represented and that the developer has been forced to consider the globalization issues of the code.

Numbers, Currencies, and NumberFormatInfo

Numbers and currencies follow a similar pattern to date/times and DateTime FormatInfo, but without the additional complexity of calendars:

• Numbers and currencies are formatted using the NumberFormatInfo class.

• The CultureInfo class has a NumberFormatInfo property called NumberFormat.

• It is possible to build your own culture-unaware format strings from primitives such as #, the comma, the period, and zero (e.g., "###,###.00").

• It is possible to specify culture-aware formats (see Table 6.16) using a format specifier, which draws on information in a NumberFormatInfo object.

Table 6.16. Standard Number Format Specifiers

image

• Number types that overload ToString methods accept an IFormatProvider that can be either a NumberFormatInfo or a CultureInfo (from which the NumberFormatInfo is extracted) .

• The FxCop “Specify IFormatProvider” rule catches number types that overload ToString methods, which are not called with an IFormatProvider.

As we saw with date/times, it should be obvious by now that predicting the myriad cultural differences across the world is very difficult, but to remove any doubt, have a look at the examples in Table 6.17, which show the number –20000.15 formatted for currency in a selection of different cultures. Notice the use of commas and periods to indicate sometimes a thousands separator and a decimal separator, and sometimes vice versa; the positioning of the currency symbol; the positioning of the negative sign; the expression of negatives using parentheses; and the exclusion of decimals altogether.

Table 6.17. Examples of Formatted Currencies

image

International Domain Name Mapping

When the World Wide Web was first developed, the Domain Name Service upon which it is based was rooted firmly in the ASCII character set. This meant that all domain names had to conform to 7-bit ASCII. This tiny range (U+0000 to U+007f) covers the English language and very few others. The problem faced by the rest of the world was how to create domain names that used characters outside of this range yet still worked with the antiquated ASCII DNS.

Before we look at the solution, let’s be a little clearer about what the problem is from the developer’s point of view. Open Internet Explorer 6 or earlier and navigate to www.i18ncafé.com. Internet Explorer will be unable to navigate to this page because the domain name cannot be resolved; it contains an “e” with an acute accent (é), which is outside of the 7-bit ASCII range. If you use Internet Explorer 7, FireFox, Mozilla, Opera, or Safari, you will be able to successfully navigate to this page because all of these browsers support international domain names (IDN). Developers need to care about this problem because if your application needs to navigate to such a page using Internet Explorer 6 or earlier, or to send an e-mail to a person on such a domain name, or interact with such a domain in any way that uses DNS, you will need to know how to convert the name to its ASCII equivalent.

In 2003, the IETF published the “Internationalizing Domain Names in Applications (IDNA)” standard (RFC 3490) to provide an interim solution until DNS fully supports Unicode. Remember that the Internet is the world’s largest legacy system, so upgrading it is not a fast process. IDNA is an encoding mechanism that converts Unicode domain names into ASCII domain names that can be recognized by Domain Name Servers everywhere. The .NET Framework 2.0 includes the IdnMapping class, which is an encapsulation of the IDNA encoding mechanism. Figure 6.9 shows a Windows Forms application that illustrates the IDN problem and solution.

Figure 6.9. IDN Mapping Problem

image

The “Go” button next to the International Domain Name TextBox contains the following code:


webBrowser1.Navigate(textBoxIDN.Text);

When the button is pressed, the WebBrowser at the bottom of the form fails to navigate to the domain. The IDN To ASCII button contains the following code:


IdnMapping idnMapping = new IdnMapping();
textBoxASCII.Text = idnMapping.GetAscii(textBoxIDN.Text);

When the button is pressed, the ASCII TextBox is filled with the encoded domain name “http://www.xn—i18ncaf-hya.com”. The second “Go” button contains the following code:


webBrowser1.Navigate(textBoxASCII.Text);

The browser can successfully navigate to the ASCII domain name. To convert in the other direction, from ASCII to Unicode, you use the IdnMapping.GetUnicode method:


IdnMapping idnMapping = new IdnMapping();
textBoxIDN.Text = idnMapping.GetUnicode(textBoxASCII.Text);

The strategy is that, in your application, you use the IdnMapping class to show the Unicode domain names in the user interface, but the ASCII domain names in any programmatic operation (such as navigating to a page or sending an email).

Table 6.18 shows a number of examples of Unicode domain names and their ASCII-encoded equivalents. You should be able to appreciate from the list that Anglicizing all domain names is an unacceptable solution to people who do not use English as a primary language. It is also worth noting that names that do not need any conversion do not get any conversion, so it is safe to use the IdnMapping class everywhere without fear of it breaking existing code.

Table 6.18. International Domain Name Examples

image

International Domain Names and Visual Spoofing

One of the concerns that you will often see raised with regard to international domain names is visual spoofing. Table 6.19 illustrates the problem.

Table 6.19. Visual Spoofing Using International Domain Names

image

Can you see a difference between the original domain name and the spoofed domain name? No, neither can I—but they are different, as you can see by the ASCII domain name of the spoofed domain. The difference is that certain letters have been replaced with different letters that are visually identical in certain fonts. The Latin Small Letter O (U+006F), for example, has been replaced with the Greek Small Letter Omicron (U+03BF). In general, spoof characters are drawn from Cherokee, Cyrillic, or Greek characters. The problem is that people see links to Web sites and e-mail addresses (often in spam emails), and trust them to be genuine because they look genuine. The problem itself isn’t new, but international domain names make the scope of the problem much wider. The problem itself doesn’t have any impact on the steps you need to take when internationalizing your applications, but you should be aware of this security issue (see http://www.unicode.org/reports/tr36/ for more details).

Environment Considerations

It is good practice to avoid hard-coding references to specific file locations in any application, but international applications have an additional consideration: The names of special folders are localized in some language versions of Windows. So, if you make a direct reference to the program files folder using “Program Files”, the folder you actually find (if it exists) may or may not be the program files folder you are expecting to find. In the German version of Windows, for example, the program files folder is “Programme”. You can avoid hard-coding references to specific file locations using the Environment.GetFolderPath method. Replace code like this:


string programFilesFolder = @"Program Files";

with code like this:


string programFilesFolder =
    Environment.GetFolderPath(
    Environment.SpecialFolder.ProgramFiles);

Extending the CultureInfo Class

As sophisticated as the CultureInfo class and its supporting classes are, it does not cover every globalization issue that your application will encounter. There are more globalization issues that you might need to address that are outside of the scope of the existing CultureInfo class. Such issues include:

• Postal code formats differ from country to country (not all countries even use a postal code).

• Address formats differ from country to country.

• Preferred paper sizes differ from country to country. (Imagine how irritating it would be to users in the United States if their .NET application defaulted to A4 every time it printed.)

• Units of measure differ from country to country (temperature, distance, volumes, etc.).

Furthermore, the CultureInfo class includes only basic information about the language/country. Information about the following is absent:

• The country’s continent

• The IANA Top Level Domains used by the country

• The time zones that span the country

• The country’s International Olympics Committee (IOC) code

• The International Distance Direct Dialing code used by the country

The country’s demographics (such as population, literacy, religions)

• The bumper sticker code used on vehicles of that country

• The country’s capital city (in English and native language)

These examples would all extend a RegionInfo class instead of a CultureInfo class, but you can see that the need exists; it is probably only a matter of time before you find your own reasons why you want to extend the CultureInfo class, so we tackle this issue here. We extend the CultureInfo class in two stages: First we create a new CultureInfoEx class that can be extended. Then we extend it using an example of attaching postal code formats to a culture.

Initially, extending the CultureInfo class looks simple. Simply inherit from CultureInfo and implement the same constructors as the CultureInfo class:


public class CultureInfoEx: CultureInfo
{
    public CultureInfoEx(int culture): base(culture)
    {
    }
    public CultureInfoEx(string name): base(name)
    {
    }
    public CultureInfoEx(int culture, bool useUserOverride):
        base(culture, useUserOverride)
    {
    }
    public CultureInfoEx(string name, bool useUserOverride):
        base(name, useUserOverride)
    {
    }
}

The problems lie with culture’s parents and the invariant culture. Whenever you use the CultureInfo.Parent property, it returns a new instance of a CultureInfo object. So in this example, we start with a new CultureInfoEx object, but when we get its parent, we get a CultureInfo object, not a CultureInfoEx object:


CultureInfoEx cultureInfo = new CultureInfoEx("en-GB");
CultureInfo parentCultureInfo = cultureInfo.Parent;

To solve this problem, we need to implement a new Parent property in our CultureInfoEx class:


public new CultureInfoEx Parent
{
    get
    {
        CultureInfo parent = base.Parent;
        if (CultureInfo.InvariantCulture.Equals(parent))
            return CultureInfoEx.InvariantCulture;
        else
            // change the type of the parent to CultureInfoEx
            return new CultureInfoEx(parent.Name, UseUserOverride);
    }
}

The get method starts by getting the base class’s Parent. This will be a regular CultureInfo object. We check to see whether this is the invariant culture; if it is, we replace it with our own CultureInfoEx invariant culture (more on this in a moment). If it isn’t the invariant, we need to build a new CultureInfoEx object from the name of the original parent CultureInfo object. We also pass the UseUserOverride property to the new CultureInfoEx’s constructor, to ensure that it adopts the user’s settings if it should do so.

The second problem is the invariant culture. CultureInfo.InvariantCulture returns a CultureInfo object, not a CultureInfoEx object. We want our CultureInfoEx objects to be polymorphic, so the invariant culture must be changed to be a CultureInfoEx object as well. For this, we implement a new static Invariant Culture property:


private static CultureInfoEx invariantCulture;
public new static CultureInfoEx InvariantCulture
{
    get
    {
        if (invariantCulture == null)
        {
            invariantCulture = new CultureInfoEx(0x7f, false);
        }
        return invariantCulture;
    }
}

0x7F is the locale ID of the invariant culture. The InvariantCulture property is simply a wrapper and initializer for the private static invariantCulture field. Unfortunately, in this case, Microsoft is very fond of encapsulation, and encapsulation is opposed to inheritance. Our new invariant culture is not quite the same as CultureInfo.InvariantCulture (apart from the obvious difference in classes). The difference is that the CultureInfo.InvariantCulture is read-only, whereas CultureInfoEx.InvariantCulture is read/write. The field that holds the read-only state is internal and, therefore, prevents inheritance from working effectively. One solution to this problem would be to use Type.GetField to get the FieldInfo for the internal m_IsReadOnly field, and call its SetValue method to set it to true. It is aesthetically unpleasing, but encapsulation often presents inheritors with no other choice.

Now that our Parent and InvariantCulture properties have been implemented, there is one more issue that we should look at. Consider the following code, which starts with a specific culture (“en-US”) and walks through its parents (“en” and then the invariant culture), adding the culture names to a list box:


CultureInfoEx cultureInfo = new CultureInfoEx("en-US");
listBox1.Items.Add(cultureInfo.Name);
while (! CultureInfo.InvariantCulture.Equals(cultureInfo))
{
    cultureInfo = cultureInfo.Parent;
    listBox1.Items.Add(cultureInfo.Name);
}

Notice that the while loop checks to see if the current culture is the invariant culture. More specifically, it checks to see if it is the CultureInfo invariant culture and not the CultureInfoEx invariant culture. You might expect this to either enter an infinite loop or else crash when you get the parent of the invariant culture (although that wouldn’t happen because the parent of the invariant culture is the invariant culture). In fact, this code works just the way that you want it to because the test for equality is based on object references and the culture name in the .NET Framework 2.0 and the locale ID (only) in the .NET Framework 1.1. So when you compare a CultureInfo.InvariantCulture with a CultureInfoEx.InvariantCulture, the result is true because the object references/culture names (in the .NET Framework 2.0) or locale IDs (in the .NET Framework 1.1) are the same. This simple fact is essential in successfully extending the CultureInfo class because this test is exactly what the ResourceManager class does when it goes through its resource fallback process: The fallback process stops when it reaches the invariant culture. If the CultureInfoEx.InvariantCulture wasn’t equal to the CultureInfo.InvariantCulture, the ResourceManager would enter an infinite loop.

The replacement of the CultureInfo class with the extended CultureInfoEx class lends more weight to the suggestion earlier in this chapter to use a CultureInfoProvider class to provide culture objects. In this case, the overloaded CultureInfoProvider.GetCultureInfo methods would create CultureInfoEx objects instead of CultureInfo objects.

The second stage of extending the CultureInfo class is to provide some new functionality. The example I use is to attach postal code format information to the culture. The PostalCodeInfo class is a simple example that enables us to focus on the model of extending the CultureInfo class instead of the details of postal codes. Postal code formats vary from country to country. The .NET Framework 2.0 MaskedTextBox control has a property called Mask that can be set to a mask to restrict input. This kind of control is ideal for helping with data such as postal codes, which obey a fixed format. The MaskedTextBox even has an Input Mask dialog that offers a set of input masks based upon the current culture of the development machine. Unfortunately, the correct input mask can be determined only at runtime, not at development time. Consequently, we need to have some facility that we can interrogate to get the right format for a culture. Enter the PostalCodeInfo class. This class and its supporting infrastructure in the CultureInfoEx class are loosely modeled on the DateTimeFormatInfo class and its supporting structure in the CultureInfo class. In its simplest form, the PostalCodeInfo class can be used like this:


PostalCodeInfo postalCodeInfo = new PostalCodeInfo("en-US");
maskedTextBox1.Mask = postalCodeInfo.Mask;

An overloaded PostalCodeInfo constructor accepts a culture name, and the Mask property contains the correct postal code mask for the locale (a US ZIP code, in this example). This isn’t how it is expected to be used, but for now we will just look at how it is implemented and come back to a more common usage in a moment. The PostalCodeInfo class looks like this:


public class PostalCodeInfo
{
    protected static Hashtable masks;

    static PostalCodeInfo()
    {
        masks = new Hashtable();
        masks.Add("en-US", "00000-9999");
        masks.Add("en-GB", "L?90? 9??");
        masks.Add("en-AU", "LLL 0000");
    }
    public static string GetMask(string cultureName)
    {
        if (masks.ContainsKey(cultureName))
            return (string) masks[cultureName];
        else
            return null;
    }
    public static void SetMask(string cultureName, string mask)
    {

        if (masks.ContainsKey(cultureName))
            masks[cultureName] = mask;
        else
            masks.Add(cultureName, mask);
    }

    public PostalCodeInfo()
    {
    }
    public PostalCodeInfo(string cultureName)
    {
        this.mask = GetMask(cultureName);
    }
    private string mask;
    public string Mask
    {
        get {return mask;}
        set {mask = value;}
    }
}

It has a protected static field called masks, which is a Hashtable of all the masks for every culture. The field is initialized by the static constructor to the “known” values of postal code formats. The list can be modified using the static SetMask method to change incorrect values or to add new cultures that are not part of the original list. This last issue is important for supporting custom cultures that this class cannot know about at design time. The PostalCodeInfo constructor accepts a culture name and performs a lookup in its mask’s Hashtable to find the corresponding postal code for the culture name. It assigns this mask to its private mask field, which has a public Mask property wrapper. The class itself is not overly complex. The next step is to make the CultureInfoEx class aware of it in the following addition to CultureInfoEx:


private PostalCodeInfo postalCodeInfo;
public PostalCodeInfo PostalCode
{
    get
    {

        CheckNeutral(this);
        if (postalCodeInfo == null)
            postalCodeInfo = new PostalCodeInfo(Name);
        return postalCodeInfo;
    }


    set {postalCodeInfo = value;}
}
protected static void CheckNeutral(CultureInfo culture)
{
    if (culture.IsNeutralCulture)
    {
        throw new NotSupportedException(
            EnvironmentEx.GetResourceString(
            "Argument_CultureInvalidFormat",
            new object[1] { culture.Name }));
    }
}

The private postalCodeInfo field holds a reference to the PostalCodeInfo object associated with the culture. The public PostalCodeInfo property’s get method initializes this field. First it calls CheckNeutral to assert that the culture is not neutral. I have taken the approach that a postal code cannot belong to just a language—it can be associated only with a country/region. Then the field is initialized from a PostalCodeInfo object matching the culture name (the CultureInfo.Name property) of the culture.

The normal use of the PostalCodeInfo class, therefore, is more akin to this:


CultureInfoEx cultureInfo = new CultureInfoEx("en-US");
maskedTextBox1.Mask = cultureInfo.PostalCode.Mask;

From this simple postal code model, you should be able to extend the CultureInfo class to meet your own globalization requirements.

Where Are We?

We started this chapter with the basic premise that a large part of your globalization issues are handled for you if you use the .NET Framework globalization classes. The conclusion is no different. Of course, the basic premise does require that you know what the classes are, what their properties and methods are, and how and when you should use them, but you should be reassured from the examples in this chapter that the .NET Framework is going to considerable lengths on your behalf to relieve you of the burden of having to know every detail about every culture.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.133.96