Internationalisation

Internationalization :
Internationalizing a program means taking the necessary steps to make it aware of different languages and national standards. By internationalization, one refers to the operation by which a program, or a set of programs turned into a package, is made aware and able to support multiple languages. This is a generalization process, by which the programs are untied from using only English strings or other English specific habits, and connected to generic ways of doing the same, instead. Program developers may use various techniques to internationalize their programs, some of them have been standardized. GNU gettext offers one of these standards.

Localization :
The process of Localization takes place when an internationalized program is given the information needed to behave correctly with a certain language and set of cultural habits. By localization, one means the operation by which, in a set of programs already internationalized, one gives the program all needed information so that it can bend itself to handle its input and output in a fashion which is correct for some native language and cultural habits.
This is a particularisation process, by which generic methods already implemented in an internationalized program are used in specific ways. The programming environment puts several functions to the programmers disposal which allow this runtime configuration. The formal description of specific set of cultural habits for some country, together with all associated translations targeted to the same native language, is called the programs by setting proper values to special environment variables, prior to executing those programs, identifying which locale should be used.

1) What does 'xgettext' command do?
For translating any application into the local language, we just require to translate all the message-strings which occur in do? For translating any application into the local language, we just require to translate all the message-strings which occur in the source code. For the purpose of translation the message-strings have to be extracted first. The command xgettext does precisely this. It scans all the source files given as arguments, for message strings. It distinguishes message-strings from the rest of the code as the text which is marked by " ". By default it creates an editable file messages.po which contains all these message-strings. The command, along with the options, is

: $ xgettext -a --C --force Where the varoius options used by us for xgettext are
-a : extracts all strings.
-d : outputs the results in hello-world.po (the default is messages.po).
-k : instructs xgettext to look for _ when searching translatable strings (the defaults gettext and gettext_noop are still looked for).
-s : generates a sorted output and removes duplicates.
-v : tells xgettext to be verbose when it generates messages.
' --C ' : recognize C style comments.
' --force ' : always write output file even if no message is defined.
' -j ' : join messages with existing file.

The function gettext When writing multilingual programs with this package, strings are "wrapped'' in a function call instead of being coded directly in the source. The function is called gettext and accepts exactly one string argument and returns a string. Despite its simplicity, gettext is very effective: the string passed as an argument is looked up in a table to find a corresponding translation.
If a translation is found, then gettext returns it; otherwise, the passed string is returned and the program will continue to use a default language. Before printing each string at runtime, we must pass it through gettext.

2) Construction of .po files :
po stands for Portable Object. The command 'xgettext' is applied on all the c-source files of the gnome-application to be translated to obtain the .po files.
The next step is to fill the .po file with the messages. A .po file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given .po file usually pertain to a single project, and all translations are expressed in a single target language. One .po file entry has the following schematic structure:
white-space
# translator-comments
#.automatic-comments
#: reference...to the source code
#, flag... msgid untranslated-string msgstr translated-string The comments start with the # character which are of two kinds:
those which have some white space immediately following the #, which are created and maintained exclusively by the translator, and those which have some non-white character just after the #, which are created and maintained automatically by the command xgettext. All comments, of either kind, are optional. After white space and comments, entries show two strings, giving first the untranslated string as it appears in the original program sources, and then, the translation of this string. The original string is introduced by the keyword msgid, and the translation, by msgstr. The string which is to be translated is written with msgid . The translated string in the local language is written with the msgstr. Maintaining the Message File If the source code changes, the corresponding .po file should be updated without losing any previous translation. Unfortunately, simply calling xgettext again does not work because it overwrites the old .po file. In this case, the program tupdate comes in handy. It merges two .po files, keeping translations already made, as long as the new strings match with the old. Its syntax is simple: tupdate new.po old.po > latest.po
New strings will obviously still be empty in latest.po, but already translated ones will be there without the need for reprocessing. The .po files finally obtained have to be compiled into .mo files using the command msgfmt.

3) What does msgfmt do?
The .po files thus made must be compiled to an executable form. This is done by msgfmt command. It makes the .mo files which are to be placed in the locale directory. If the environment variable $LANG is set to some local language then whenever any gnome application is invoked the application is executed in the correspondance to the .mo file in the locale directory.
Usually the locale directory was the /usr/share/locale/ If we exported the $LANG as hi_IN then it searched for .mo file in the directory structure /usr/share/locale/hi/LC_MESSAGES.
Click here for the man page of msgfmt. The command, along with the options, used by us during the course of the project is :
$ msgfmt --output-file= where, '--output-file' specifies the output file.

4) Construction of .mo files :
mo stands for Machine Object. The command msgfmt is applied on all the .po files of the gnome-application to be translated to obtain the .mo files. The .mo files are machine compatible translations in binary format of the .po files. Main Changes required in any C file to make it compatible with internationalization(i18n) and localization(l10n) We need to include header file.
For every printf command in the c file we need to add gettext. for eg. We need to modify printf("This prints the translated text"); to printf(gettext("This prints the translated text"); to simplify this procedure we just define gettext(x) as _(x) and hence we require just to add '_' instead of gettext.
second most important thing that is required is to do is define textdomain. It must be called in the beginning of the program, so that the system can select the proper .mo file according to the current locale variables. The final step is copying helloworld.mo to a suitable location, where it can be found by the gettext system. The default location is
/usr/share/locale/LL/ LC_MESSAGES/ or
/usr/share/locale/LL_CC/LC_MESSAGES/, where LL is the language and CC is the country. For example, the Indian translation should be placed in /usr/share/locale/hi/ LC_MESSAGES/helloworld.mo.

Fonts:
For a clear presentation of indian languages on the computer screen it is essential to have appropriate fonts. The normal ASCII is a singe byte code and does not support indian scripts which have more than 256 characters including matras.
Unicode is a multibyte code which is better suited to indian languages.
Input Methods:
For the input methods we can use the following editors as standard (or any other which gich gives UTF-8 format)
1. gedit:- the default editor of gnome in linux has a very good support for writing indian languages.Inscript keyboard is available.
2. yudit:- it is one of the editor which works very well in both MS windows and GNU/linux platform. Inscript , ITrans and Phonetic keyboards are available.(www.yudit.org about 3 MB. Add Gargi(for Devanagari) , padmaa (for Gujarati) to fonts directory.) 1. OpenOffice-1.1:- Inscript keyboard is available.

Translation Aid: kbabel:
it is application of kde which is used to input the translated string and provides a method to build up a database for the translated part. The problem is that display is not good. This is a management tool for managing authority, fuzzy translations, to know how much work is done, to provide dictionaries etc. Inscript keyboard is available for typing but display is not proper. gtranslator: Similar to kbabel.Inscript keyboard is available.
Dictioneries : Openoffice, IIIT(Hyd), IndLinux Anuvadak: A program on the net (www.parixa.com ) that help translate on line , keep uptodate translations, provide dictionerries etc.( www.parixa.com/anu )

Modes of Communication:
We can do the communication through emails or on paper (for all those not connected to internet we can distribute the strings to be translated through the paper medium)