Internationalizing Applications

Overview

This chapter reviews the concepts of internationalization (I18N). For more detailed information, we recommend Chapter 11 of the OSF Motif Programmer's Guide, Release 1.2 . For additional documentation on the underlying X and Xt support for localized applications, refer to Chapters 10 and 11 of the XLib Programming Manual from O'Reilly & Associates, Inc.

Note: This chapter assumes that you are using Motif 1.2 or later. Earlier versions do not support locales.

 

Internationalizing Your Application

Internationalization (I18N) is the development of applications that can be run in different language environments (Japanese, Spanish, English, and so forth) without code revision or recompilation. The code is free of dependencies on language, character set, or special data representations (for example, currency and time formats).

Localization

Any language-specific information required by the application must be kept separate from the application source code. Typically, this information includes:

· Data Presentation Format

· Collation Order

· Numeric Format

· Currency Format

· Time Format

· Date Format

· Bitmaps

· Icons

· Geometry

· All strings to be displayed to the user

Localization is the process by which this information is prepared for access by the application in the context of a given language.


Note: Not everything can be localized. Source code elements such as identifiers, resource names, and instance names must remain C-readable.

In the course of writing an application that will work in different languages, you must make decisions concerning coding, input techniques, and output methods and formats. Builder Xcessory has several features that assist you in producing an internationalized application.

Character Sets and Code Sets

The practical problem with I18N arises from the different representation methods used by languages to build their respective linguistic elements. Most especially, ideographic languages, which can contain literally thousands of individual glyphs, cannot be fully represented using standard 8-bit code sets.

Character set

A character set is the set of all the characters required to represent the words in the language.

Code set

A code set is the set of binary values required to represent the character set of a language.

ISO8859-1/ Latin-1

ISO8859-1 (also called Latin-1), which contains the ASCII codes, is the standard code set for the representation of English, as well as several other alphabetic languages.

Code sets correspond to various languages or areas. For example:

Language or Area

Code Set

English, Western Europe

ISO8859-1

Eastern Europe

ISO8859-2

Northern Europe

ISO8859-3

Cyrillic

ISO8859-5

Hebrew

ISO8859-6

Greek

ISO8859-7,8,9

Japan

JIS X 0201

JIS X 0208

JIS X 0212

Korea

KSC5601.1987-0

 

X and Motif I18N Support

When X11R5 and Motif 1.2 were released, they contained new data types, function calls, fonts, and modifications to widgets to handle the localization of graphical user interface (GUI) applications. These additions provide a method by which you can create applications, with one set of source code, that can run in any one of several alternative languages, depending on the locale. Although these additions enable I18N and localization, they do not facilitate them.

Types of problems

When internationalizing an application, there are three broad types of problem:

· File processing problems

How are characters encoded; how does your application deal with storage formats for dates, addresses, currency, and so on?

· Input problems

How are the characters of a complex character set (for example, ideographic sets) actually entered when it is impractical to construct a physical keyboard capable of representing every one of those characters?

· Output problems

How are characters of different languages displayed within the application, especially within the same text area?

Note: Builder Xcessory uses the same features supported in generated code, so you can choose to run Builder Xcessory in another locale. This allows strings to be input with that locale's input method.

Note: A Japanese version is available. Contact your ICS Sales Representative.

Application Coding

An I18N program must operate regardless of the encoding of the characters in the user's language. A program that ignores or truncates the eighth bit of every character (as some English-based applications do) will not work in Europe, which requires eight bits to represent accented characters. Similarly, an application that assumes that every character is eight bits long will not work in Japan, where there are many thousands of ideographic characters. In addition, you cannot assume a single character size, because Japanese commonly intermixes 16-bit Japanese characters with 8-bit Latin characters.

Adding Localization Calls to the Code

Xt support of internationalization is trivial in most applications: the only code required is a call to XtSetLanguageProc() just before the call to XtAppInitialize(). This one function call does all the set-up necessary for an Xt-based application. Some additional work is required if your application is to support internationalized text output or input (as explained in Input methods ).

Builder Xcessory makes the XtSetLanguageProc() function call in the code generated if you set the Initialize Localization Support toggle in the Code Generation Preferences dialog.

Localization support is always enabled in ViewKit applications.

 

Locales

A locale is the language environment determined by the application at run time. The X/Open Portability Guide, Issue 3 (XPG3)defines locale as a means of specifying three characteristics of a language environment that might be required for localization:

· Language

· Territory

· Code set

These elements are represented by many systems in the following format:

<language>_<TERRITORY> . <code set> .

For example, ja_JP.ujis is the representation for Japanese, as used in Japan with the UJIS code set.

Using locales

At start-up, the application must find the correct UID file in either the DECW$USER_DEFAULTS or DECW$SYSTEM_DEFAULTS directories. See the DECwindows Motif Release Notes for more information.

As an example of using locales, assume you have an application with two different language versions. You set yourXUSERFILESEARCHPATH (see Static output ) to the following (all of which should appear on one line):

/usr/lib/X11/app-defaults/%l/%t/%c/%N:
/usr/lib/X11/app-defaults/%l/%t/%N:
/usr/lib/X11/app-defaults/%l/%N:
/usr/lib/X11/app-defaults/%L/%N:
/usr/lib/X11/app-defaults/%N

This search path allows your application to search from the most specific designation of the locale to the least specific, allowing you a high degree of flexibility in configuring your application. See the explanation of "XtResolvePathname" in Vol. 5 of the O'Reilly & Assoc. X Toolkit Intrinsics Reference Manual for an explanation of the various substitutions and how they relate to the locale designation.

Most vendors' implementations of X automatically include locale-specific searches in the default search path environment variables, so setting these variables is usually unnecessary.

Example

If you are using UIL for your application, you can use a similar mechanism to search for the compiled UIL (UID) files.

Set UIDPATH to the following (all of which should appear on one line):

<pathname>/%l/%t/%c/%U:<pathname>/%l/%t/%U:
<pathname>/%l/%U:<pathname>/%L/%U:
<pathname>/%U:

At start-up, the application finds the correct UID file based on how your UIDPATH is set. See the description of MrmOpenHierarchyin the 
OSF Motif 1.2 Programmer's Reference for further details.

Locales in Builder Xcessory

The application must have a way of recognizing the language environment in which it is running. Based on this information, the application can then make adjustments such as allowing the display of strings in the appropriate language. Builder Xcessory provides a code generation option to enable this support in your application.


Note: ViewKit applications always have localization support enabled by the VkApp object.

Enabling localization

To enable localization, follow these steps:

1. Select Code Generation Preferences from the Browser Options menu.

2. Click the Application tab on the Code Generation Preferences dialog ( Code Generation Preferences Dialog ).

3. Set the Initialize Localization Support toggle, and dismiss the dialog.

When the main C or C++ file is written, the following line is included:

(void) XtSetLanguageProc (NULL,
(XtLanguageProc)NULL,
NULL);

Builder Xcessory inserts code into the main routine to initialize the toolkit I18N features. When the code is compiled and run, the toolkit examines the LANG environment variable to determine the current locale, and then initialize its internal routines to deal with locale-specific issues.

For example, if the user has set the LANG environment variable to ja_JP . ujis , the application automatically initializes support for Japanese language character input and display, as well as monetary and date output, and so forth.

Code Generation Preferences Dialog

Setting locales

If the Initialize Localization Support toggle is not set, you can manually set the locale in your application using the ANSI C functionsetlocale . The function setlocale (LC_ALL,"") sets all locale-specific information to the default (the LANG environment variable). The function 
setlocale(LC_COLLATE,"ja_JP.ujis") sets only the collation order to that of the Japanese UJIS code set.

The function setlocale() is called from the default language procedure installed by XtSetLanguageProc() , described in detail in Asente, Converse, Swick's X Window System Toolkit . XtSetLanguageProc() also initializes the toolkit internationalization techniques, connects to the input method (if necessary), and sets various defaults for the current locale, as specified by the LANG environment variable.


Note: The default language procedure can be replaced using XtSetLanguageProc() with arguments other than NULL. Refer to O'Reilly & Associates' X Window System , Volumes 4 and 5 for a more detailed discussion of Xt language procedures.

Locale-friendly coding

When coding an application to be run in multiple locales, consider the following issues:

· Use ANSI C or C++; Kernigan & Ritchie 1 implementations may not support locale-aware string manipulation.

· Use strcoll() rather than strcmp() for string comparisons.
strcmp() routine assumes an ASCII character set when doing string comparisons; strcoll() has no such limitation and can deal with locale-specific character encoding and sorting order.

· Do not make assumptions about word order in strings, and do not make comparisons to specific, hardcoded, characters.

· Use strftime() rather than ctime() or asctime().
strftime()
 formats time and dates according to locale; both ctime() and asctime() are of limited flexibility.

· Use ctype(3) library routines to identify character ranges.

· Use wchar_t , rather than char as the type for string processing.
The char type allocates a fixed size (8 or 16 bit, depending on the architecture) for a character. This is not enough to hold the characters of some locales, particularly the idiographic languages. The wchar_t type allocates size sufficient to hold the largest character in the current locale. This can be grossly inefficient, so you should only use wchar_t for operations that index arrays of characters.

· Do not hardcode decimal separators in parsing or arithmetic operations. Some languages use a comma rather than a period for a decimal separator.

· Do not embed pixmap graphics in code; specific colors and graphics can have different meanings from locale to locale.

Placing Resource Values in Resource or UIL Files

Your application should not explicitly code any language-dependent information. 2 This includes strings, fonts, and language-dependent pixmaps. In order to do this, the Open Group (formerly OSF) suggests that these resources be placed in message catalogs, resource files or UIL files.

Builder Xcessory allows resources to be placed in resource files. Once a resource is set, you can choose (on an individual resource or resource class basis) whether that resource is set in the code or in a resource file.

Individually, this choice is made with the Resource Placement menu, to the right of the text field used to enter the resource value (seeResource Editor Placement Settings ).

Code

· Code indicates that the resource is hard-coded with calls to XtSetValues, and is the default.

App

· App indicates that the resource is placed in an app-defaults file, which can be edited to produce localized versions of the application.

Resource Editor Placement Settings

Resource values can also be placed in a resource file on a type basis by setting the resource's default resource placement. In the placement window, types can be specified to be put, by default, into Code or App (resource file).

Example

For example, if you want to place all compound strings and fonts into resource files, follow these steps:

1. Select Default Resource Placement from the Resource Editor Options menu to display the Default Resource Type dialog:

Default Resource Type Dialog

2. Scroll the dialog to find the Compound String and Font types, and set the App toggle for each (see Default Resource Type Settings).

Default Resource Type Settings

Generating Multiple UIL Files

Builder Xcessory also allows you to generate single or multiple UIL files, providing another language-independent way of specifying resource values that can be used to internationalize an application. To save resources in a UIL file, set the resource to be saved into code and when generating code, generate UIL instead of C or C++.


Note: You can only generate UIL files if generating an application in C (not in C++).

Once you generate a resource (or UIL) file, you must make copies of the file for each language supported and modify the contents accordingly. Then, using the locale of the machine and the environment variables LANG, UIDPATH, and XUSERFILESEARCHPATH, the different resource or UIL files are read and used by the application at run time.

For more information on this topic, refer to the OSF/Motif Programmer's Guide, Release 1.2, pages 11-26 to 11-31 and in the X Window System Toolkit, pages 433-436.


1. Kernigan, Brian & Ritchie, Dennis. The C Programming Language, 2nd ed. (Englewood Cliffs, NJ: Prentice Hall PTR, 1988).

2. OSF/Motif Programmer's Guide, Release 1.2, page 11-21; PTR Prentice Hall, Englewood Cliffs, New Jersey 07632

Text Input

An internationalized program must be able to display all the characters used in the user's language, and must allow the user to specify all those characters as input. When there are more characters in a language than there are keys on a keyboard, some sort of "input method" is required to convert multiple keystrokes to single characters.

Input methods

An input method is a mapping between keyboard input and the text data passed to the application. Such a mapping exists even within the familiar context of ISO8859-1 where, for example, the combination of the <Ctrl> or <Alt> key and a letter translates into a letter with a special accent mark: ü, é, and so forth.


Note: Within the 7- bit ASCII characters, there are no accented characters. However, ISO8859-1 is a superset of ASCII extending the code set to 8 bits, and includes accented characters and symbols.

The concept of an input method is especially important for ideographic languages. Review Chapter 11 of the OSF/Motif Programmer's Guide , Release 1.2 for a detailed discussion of the different aspects of input methods and how they are supported by Motif.

Motif support of input methods

Builder Xcessory assumes that you have access to an input method. Input methods are available in Motif 1.2. Prior to Motif 1.2 and X11R5, input methods were proprietary additions, and no standard existed. Builder Xcessory supports the use of X11R5-style input methods exclusively.

Using input methods

Input methods allow your users to enter text in their native language. There are several input methods available from hardware vendors and third party software vendors. The X source code distribution also includes a few sample implementations. They run as separate processes alongside the internationalized applications.


Note: Multiple input methods can run simultaneously for any number of internationalized applications.

Integrating with an input method

To allow your application to use a currently running input method, follow these steps:

1. Make sure that your locale is set correctly. Many platforms have dialogs to simplify this process, but usually setting the LANG environment variable is sufficient.

2. Make certain your application calls XtSetLanguageProc() before it calls XtAppInitialize(). This ensures that X/Motif localization support is properly initialized.

3. Create an application defaults file for your application that sets the font list resources to the font set appropriate for your locale.

4. Start your application.

The input method, with the locale set and the correct X11R5 calls in your application, communicates with your application automatically.

Text Output

An internationalized application displays all text in the user's chosen language. This includes prompts, error messages, and text on buttons, menus, and other widgets. The simplest approach to this requirement is to remove all strings that are to be displayed from the source code of the application and store them in a file that will be read when the application starts up. That file can then be translated into various languages, with the appropriate version being read at start-up.

In addition, an internationalized application must display times, dates, numbers, and so on, in the format that the user expects. For example, Americans expect dates in the form month/day/year, English expect day/month/year, and Germans expect day.month.year.

Setting Up Localized Output

Most languages use words from different languages and some require the word's native character set to be used. For example, a Japanese application might require error messages with a mix of Hirigana, Katakana, and Kanji characters, as well as some technical terms that require a Latin character set. This means that in one string there could be 5 words using Kanji characters and one word using Latin characters, requiring 2 different fonts.


Note: It is possible to display characters of multiple character sets within the same output string using compound strings and font lists.

Compound Strings

A compound string is a byte stream in ASN.1 encoding, consisting of tag-length-value segments.

Motif uses compound strings in many widgets. A compound string is used to set labels on label and button widgets as well as the contents of lists in Motif. These compound strings hold all information related to a string, including the text, direction and font used to display the string.

Compound string components

A compound string can contain the following components:

· Text of the string

· Separators (or line breaks)

· Font list tags

· Direction identifiers

Compound String Editor

The Compound String Editor allows strings to have multiple fonts and direction, and allows for a connection to an X input method. For more information on the Compound String Editor, refer to Compound String Editor .

Motif does not supply a String-to- XmString converter that understands font list tags or direction information. Builder Xcessory provides a String-to-XmString converter in the bxutils file, which supports Builder Xcessory style ASCII representations of compound strings.

Template code

The template code for the converter is in the file:

{BX}/xcessory/gen/common/bxutils.c

ASCII compound string format

The ASCII compound string format is:

:: [#tag][:t][:r]["xxxxx"]

where:

tag is font tag (or charset).

:t is set when a separator is requested.

:r is set when the string is displayed from right to left.

"xxxxx" is the text portion of the string.

Creating Multifont and Multidirectional Strings

A font list is a resource type that can be a single font or a font set. Font sets were introduced in X11R5 and Motif 1.2 (as the XFontSet). A font set is treated as a single entry in a font list, but contains all the fonts required to display all the characters of a locale. Internal X, Motif, and C routines are used to encode the font information for displaying a given string.

Font List Editor

The Font List Editor in Builder Xcessory supports the specification of font sets as well as regular Fonts (see Font List Editor ). For more information on the Font List Editor, refer to Font List Editor .

Font List Editor

You create the different font sets and tag them, so that they can be used in defining multifont strings.


Note: A font set is defined by the locale in which the program is running. For example, a font set that works in a Korean locale will not work in a Japanese locale.

Each font set member can use a different encoding, family, size, and so forth, and there are no limits on how many fonts can be defined for a given font set.

Once a font set has been added to a font list, it can be used like any other entry in the list. Motif and X control how to use the various fonts in the font set.

bxutils file

Although Motif 1.2 supports multiple entry font lists containing both fonts and font sets, Motif does not supply a String-to- XmString converter that understands font list tags or direction information. Builder Xcessory provides such a converter in the bxutils file. This converter is installed in your application by a call to RegisterBxConverters(). The converter supports a special textual representation of all the information encoded in a compound string: font tag, direction, etc. This allows you to quickly create multifont and multidirectional strings and use them with the Motif widget set, even though they are not supported in Motif 1.2. This code is completely portable and OS independent.

Generating Localized Files

The following sections discuss the generation of files for an internationalized application. Static and dynamic output are considered for both UIL and C/C++ generation. In static output, strings are created when the source code is generated. In dynamic output, strings are incorporated at run time by reference to a separate source.

Generating UIL

Static and dynamic output

Static and dynamic output are handled in the same manner when generating UIL. Typically, App-defaults files are not used. Instead, the application maintains a list of messages in a separate UIL file for each locale. When the application is built, the appropriate UIL is compiled into a UID and then used.

Use the Constant Manager and File Manager to create each of the locale-specific UIL files:

· Code all output (for example, warning, error, and informational messages) as constants.

· Set the file placement for this output to the appropriate UIL file. The application then fetches the value of the constant from the UID at run time.

Generating C and C++

Your application handles static and dynamic output differently in C/C++.

Static output

When generating C/C++, save Strings and XmStrings, as well as other locale-specific information such as fonts and colors, in app-defaults. To do this, you can take advantage of the X environment variables XFILESEARCHPATH andXUSERFILESEARCHPATH.

The following table lists the substitutions X and Motif allow in these paths:

Platform

Substitution

Description

X

%N

Value of the filename parameter, or the application's class name if filename is NULL

%T

Value of the type parameter

%S

Value of the suffix parameter

%L

Language string associated with the specified display

%l

Language part of the display's string

%t

Territory part of the display's string

%c

Code set part of the display's string

Motif

%U

Value of the UID filename parameter

See the section "Finding File Names" in X Window System Toolkit by Paul J. Asente and Ralph R. Swick (p. 860 in ISBN 1-55558-051-3) for further details.

Dynamic output

If you plan to use C or C++ instead of UIL to handle string output, write your application so that dynamic output uses a message catalog. A message catalog is a method for storing and fetching strings to/from external sources.

Message catalog options

You have three options for a message catalog:

· Use the MNLS standard

· Use the XPG standard

· Design your own


Note: To use one of the first two methods, your operating system must support it. Different operating systems require different storage locations for the catalog.

Designing Your Own Message Catalog

Designing your own message catalog is your only choice if you plan to run on a set of operating systems that do not share a common standard.

Example

The following procedure is one of many ways of designing your message catalog:

1. Use the Berkeley DBM library, which, given a key or a tag string, fetches the corresponding message string.

2. Represent the tag string in Builder Xcessory as constants.

3. Use the DBM string fetching routines, passing the tag string and a default message string as parameters. If the lookup of the message catalog yields no match for the tag string, display the default message string.

Suggested Reading

There are many considerations when designing an internationalized application. The following documents are useful references for I18N design issues:

· OSF/Motif Programmer's Guide, pp. 11-21. Prentice-Hall, 1993. 
(ISBN 0-13-643107-0)

· O'Donnell, Sandra Martin. Programming for the World: A Guide to Internationalization. Prentice Hall, 1994. (ISBN 0-13-722190-8)

· Scheifler, Robert W. and James Gettys. X Window System-Extension Libraries. Digital Press, 1997. (ISBN 1-55558-146-3)

· X/Open Portability Guide . 3rd ed. Prentice Hall, 1989. (ISBN 0-13-685868-6)

· Tuthill, Bill and David Smallberg. Creating Worldwide Software: Solaris International Developer's Guide. Prentice Hall, 1997. (ISBN 0-13-031063-8)

· Aldersey-Williams, Hugh. Nationalism and Globalism in Design . Rizzoli International Publications, 1992. (ISBN 0-84-781461-0)

· Nye, Adrian. Xlib Programming Manual . 3rd ed. O'Reilly and Associates, 1992. Volume 1 in the O'Reilly X Series. (ISBN 1-56-592002-3)

· Nye, Adrian and Tim O'Reilly. X Toolkit Intrinsics Programming Manual, Motif Edition. O'Reilly and Associates, 1992. Volume 4 in the O'Reilly X Series. (ISBN 1-56-592013-9)

· Heller, Dan and Paula M. Ferguson. Motif Programming Manual. 2nd ed. O'Reilly and Associates, 1994. Volume 6A in the O'Reilly X Series. (ISBN 1-56-592016-3)

· Asente, Paul, Donna Converse, and Ralph Swick. X Window System Toolkit. Digital Press, 1997. (ISBN 1-55558-178-1)

 

Documentation: