xenocara/lib/libX11/specs/i18n/localedb/localedb.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
                   "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd">

<book id="localedb">

<bookinfo>
   <title>X Locale Database Specification</title>
   <authorgroup>
      <author>
         <firstname>Yoshio</firstname><surname>Horiuchi</surname>
         <affiliation><orgname>IBM Japan</orgname></affiliation>
      </author>
   </authorgroup>
   <copyright><year>1994</year><holder>IBM Corporation</holder></copyright>
   <copyright><year>1994</year><holder>X Consortium</holder></copyright>


<legalnotice>

<para>
License to use, copy, modify, and distribute this software and its documentation for
any purpose and without fee is hereby granted, provided that the above copyright notice
appear in all copies and that both that copyright notice and this permission notice
appear in supporting documentation, and that the name of IBM not be used in advertising
or publicity pertaining to distribution of the software without specific, written
prior permission.
</para>
<para>
IBM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS, AND NONINFRINGEMENT OF THIRD PARTY RIGHTS,
IN NO EVENT SHALL IBM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
</para>

<para>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files
(the &ldquo;Software&rdquo;), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the following
conditions:
</para>

<para>
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
</para>

<para>
THE SOFTWARE IS PROVIDED &ldquo;AS IS&rdquo;, WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN
NO EVENT SHALL THE X CONSORTIUM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
</para>

<para>
Except as contained in this notice, the name of The Open Group shall not
be used in advertising or otherwise to promote the sale, use or other dealings
in this Software without prior written authorization from X Consortium.
</para>

<para>X Window System is a trademark of The Open Group.</para>

</legalnotice>
</bookinfo>

<chapter id="localeDatabase">
<title>LocaleDB</title>

<sect1 id="General">
<title>General</title>
<para>
An X Locale Database contains the subset of a user's environment that
depends on language, in X Window System.  It is made up from one or more
categories.  Each category consists of some classes and sub-classes.
</para>

<para>
It is provided as a plain ASCII text file, so a user can change its
contents easily.  It allows a user to customize the behavior of
internationalized portion of Xlib without changing Xlib itself.
</para>

<para>
This document describes;
</para>

<itemizedlist>
  <listitem>
    <para>
Database Format Definition
    </para>
  </listitem>
  <listitem>
    <para>
Contents of Database in sample implementation
<!-- .RE -->
    </para>
  </listitem>
</itemizedlist>

<para>
Since it is hard to define the set of required information for all
platforms, only the flexible database format is defined.
The available entries in database are implementation dependent.
</para>

</sect1>
<sect1 id="Database_Format_Definition">
<title>Database Format Definition</title>
<para>
The X Locale Database contains one or more category definitions.
This section describes the format of each category definition.
</para>

<para>
The category definition consists of one or more class definitions.
Each class definition has a pair of class name and class value, or
has several subclasses which are enclosed by the left brace ({) and
the right brace (}).
</para>

<para>
Comments can be placed by using the number sign character (#).
Putting the number sign character on the top of the line indicates
that the entire line is comment.  Also, putting any whitespace character
followed by the number sign character indicates that a part of the line
(from the number sign to the end of the line) is comment.
A line can be continued by placing backslash (\) character as the
last character on the line;  this continuation character will be
discarded from the input.  Comment lines cannot be continued on
a subsequent line using an escaped new line character.
</para>

<para>
X Locale Database only accepts XPCS, the X Portable Character Set.
The reserved symbols are;  the quotation mark("), the number sign (#),
the semicolon(;), the backslash(\), the left brace({) and
the right brace(}).
</para>

<para>
The format of category definition is;
</para>

<informaltable frame="none">
  <tgroup cols='3' align='left'>
  <colspec colname='c1' colwidth="3*" colsep="0"/>
  <colspec colname='c2' colwidth="1*" colsep="0"/>
  <colspec colname='c3' colwidth="6*" colsep="0"/>
  <tbody>
    <row rowsep="0">
      <entry>CategoryDefinition</entry>
      <entry>::=</entry>
      <entry>CategoryHeader CategorySpec CategoryTrailer</entry>
    </row>
    <row rowsep="0">
      <entry>CategoryHeader</entry>
      <entry>::=</entry>
      <entry>CategoryName NL</entry>
    </row>
    <row rowsep="0">
      <entry>CategorySpec</entry>
      <entry>::=</entry>
      <entry>{ ClassSpec }</entry>
    </row>
    <row rowsep="0">
      <entry>CategoryTrailer</entry>
      <entry>::=</entry>
      <entry>"END" Delimiter CategoryName NL</entry>
    </row>
    <row rowsep="0">
      <entry>CategoryName</entry>
      <entry>::=</entry>
      <entry>String</entry>
    </row>
    <row rowsep="0">
      <entry>ClassSpec</entry>
      <entry>::=</entry>
      <entry>ClassName Delimiter ClassValue NL</entry>
    </row>
    <row rowsep="0">
      <entry>ClassName</entry>
      <entry>::=</entry>
      <entry>String</entry>
    </row>
    <row rowsep="0">
      <entry>ClassValue</entry>
      <entry>::=</entry>
      <entry>ValueList | "{" NL { ClassSpec } "}"</entry>
    </row>
    <row rowsep="0">
      <entry>ValueList</entry>
      <entry>::=</entry>
      <entry>Value | Value ";" ValueList</entry>
    </row>
    <row rowsep="0">
      <entry>Value</entry>
      <entry>::=</entry>
      <entry>ValuePiece | ValuePiece Value</entry>
    </row>
    <row rowsep="0">
      <entry>ValuePiece</entry>
      <entry>::=</entry>
      <entry>String | QuotedString | NumericString</entry>
    </row>
    <row rowsep="0">
      <entry>String</entry>
      <entry>::=</entry>
      <entry>Char { Char }</entry>
    </row>
    <row rowsep="0">
      <entry>QuotedString</entry>
      <entry>::=</entry>
      <entry>""" QuotedChar { QuotedChar } """</entry>
    </row>
    <row rowsep="0">
      <entry>NumericString</entry>
      <entry>::=</entry>
      <entry>"\\o" OctDigit { OctDigit }</entry>
    </row>
    <row rowsep="0">
      <entry></entry>
      <entry>|</entry>
      <entry>"\\d" DecDigit { DecDigit }</entry>
    </row>
    <row rowsep="0">
      <entry></entry>
      <entry>|</entry>
      <entry>"\\x" HexDigit { HexDigit }</entry>
    </row>
    <row rowsep="0">
      <entry>Char</entry>
      <entry>::=</entry>
      <entry>&lt;XPCS except NL, Space or unescaped reserved symbols&gt;</entry>
    </row>
    <row rowsep="0">
      <entry>QuotedChar</entry>
      <entry>::=</entry>
      <entry>&lt;XPCS except unescaped """&gt;</entry>
    </row>
    <row rowsep="0">
      <entry>OctDigit</entry>
      <entry>::=</entry>
      <entry>&lt;character in the range of "0" - "7"&gt;</entry>
    </row>
    <row rowsep="0">
      <entry>DecDigit</entry>
      <entry>::=</entry>
      <entry>&lt;character in the range of "0" - "9"&gt;</entry>
    </row>
    <row rowsep="0">
      <entry>HexDigit</entry>
      <entry>::=</entry>
      <entry>&lt;character in the range of "0" - "9", "a" - "f", "A" - "F"&gt;</entry>
    </row>
    <row rowsep="0">
      <entry>Delimiter</entry>
      <entry>::=</entry>
      <entry>Space { Space }</entry>
    </row>
    <row rowsep="0">
      <entry>Space</entry>
      <entry>::=</entry>
      <entry>&lt;space&gt; | &lt;horizontal tab&gt;</entry>
    </row>
    <row rowsep="0">
      <entry>NL</entry>
      <entry>::=</entry>
      <entry>&lt;newline&gt;</entry>
    </row>
  </tbody>
  </tgroup>
</informaltable>

<para>
Elements separated by vertical bar (|) are alternatives.  Curly
braces ({...}) indicate zero or more repetitions of the enclosed
elements.  Square brackets ([...]) indicate that the enclosed element
is optional. Quotes ("...") are used around literal characters.
</para>

<para>
The backslash, which is not the top character of the NumericString, is
recognized as an escape character, so that the next one character is
treated as a literal character.  For example, the two-character
sequence, ""\"""(the backslash followed by the quotation mark) is
recognized and replaced with a quotation mark character.
Any whitespace character, that is not the Delimiter, unquoted and
unescaped, is ignored.
</para>

</sect1>
<sect1 id="Contents_of_Database_">
<title>Contents of Database </title>
<para>
The available categories and classes depend on implementation, because
different platform will require different information set.
For example, some platform have system locale but some platform don't.
Furthermore, there might be a difference in functionality even if the
platform has system locale.
</para>

<para>
In current sample implementation, categories listed below are available.
</para>

<informaltable frame="none">
  <tgroup cols='3' align='left'>
  <colspec colname='c1' colwidth="2*" colsep="0"/>
  <colspec colname='c2' colwidth="1*" colsep="0"/>
  <tbody>
    <row rowsep="0">
      <entry>XLC_FONTSET:XFontSet relative information</entry>
    </row>
    <row rowsep="0">
      <entry>XLC_XLOCALE:Character classification and conversion information</entry>
    </row>
  </tbody>
  </tgroup>
</informaltable>

</sect1>
<sect1 id="XLC_FONTSET_Category">
<title>XLC_FONTSET Category</title>
<para>
The XLC_FONTSET category defines the XFontSet relative information.
It contains the CHARSET_REGISTRY-CHARSET_ENCODING name and character
mapping side (GL, GR, etc), and is used in Output Method (OM).
</para>

<informaltable frame="none">
  <tgroup cols='3' align='left'>
  <thead>
  <colspec colname='c1' colwidth="3*" colsep="0"/>
  <colspec colname='c2' colwidth="1*" colsep="0"/>
  <colspec colname='c3' colwidth="3*" colsep="0"/>
    <row>
      <entry>class</entry>
      <entry>super class</entry>
      <entry>description</entry>
    </row>
  </thead>
  <tbody>
    <row rowsep="0">
      <entry>fsN</entry>
      <entry></entry>
      <entry>Nth fontset (N=0,1,2, ...)</entry>
    </row>
    <row rowsep="0">
      <entry>charset</entry>
      <entry>fsN</entry>
      <entry>list of encoding name</entry>
    </row>
    <row rowsep="0">
      <entry>font</entry>
      <entry>fsN</entry>
      <entry>list of font encoding name</entry>
    </row>
  </tbody>
  </tgroup>
</informaltable>

<variablelist>
  <varlistentry>
    <term>fsN</term>
    <listitem>
      <para>
Includes an encoding information for Nth charset, where N is
the index number (0,1,2,...).  If there are 4 charsets available
in current locale, 4 fontsets, fs0, fs1, fs2 and fs3, should be
defined.
This class has two subclasses, 'charset' and 'font'.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>charset</term>
    <listitem>
      <para>
Specifies an encoding information to be used internally in Xlib
for this fontset.  The format of value is;
      </para>
<informaltable frame="none">
  <tgroup cols='3' align='left'>
  <colspec colname='c1' colwidth="3*" colsep="0"/>
  <colspec colname='c2' colwidth="1*" colsep="0"/>
  <colspec colname='c3' colwidth="4*" colsep="0"/>
  <tbody>
    <row rowsep="0">
      <entry>EncodingInfo</entry>
      <entry>::=</entry>
      <entry>EncodingName [ ":" EncodingSide ]</entry>
    </row>
    <row rowsep="0">
      <entry>EncodingName</entry>
      <entry>::=</entry>
      <entry>CHARSET_REGISTRY-CHARSET_ENCODING</entry>
    </row>
    <row rowsep="0">
      <entry>EncodingSide</entry>
      <entry>::=</entry>
      <entry>"GL" | "GR"</entry>
    </row>
  </tbody>
  </tgroup>
</informaltable>

<para>
For detail definition of CHARSET_REGISTRY-CHARSET_ENCODING, refer
"X Logical Font Descriptions" document.
</para>
<literallayout>
example:
     ISO8859-1:GL
</literallayout>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>font</term>
    <listitem>
      <para>
Specifies a list of encoding information which is used for searching
appropriate font for this fontset.  The left most entry has highest
priority.
      </para>
    </listitem>
  </varlistentry>
</variablelist>

</sect1>
<sect1 id="XLC_XLOCALE_Category">
<title>XLC_XLOCALE Category</title>
<para>
The XLC_XLOCALE category defines character classification, conversion
and other character attributes.
</para>

<informaltable frame="none">
  <tgroup cols='3' align='left'>
  <colspec colname='c1' colwidth="3*" colsep="0"/>
  <colspec colname='c2' colwidth="1*" colsep="0"/>
  <colspec colname='c3' colwidth="3*" colsep="0"/>
  <thead>
    <row>
      <entry>class</entry>
      <entry>super class</entry>
      <entry>description</entry>
    </row>
  </thead>
  <tbody>
    <row rowsep="0">
      <entry>encoding_name</entry>
      <entry></entry>
      <entry>codeset name</entry>
    </row>
    <row rowsep="0">
      <entry>mb_cur_max</entry>
      <entry></entry>
      <entry>MB_CUR_MAX</entry>
    </row>
    <row rowsep="0">
      <entry>state_depend_encoding</entry>
      <entry></entry>
      <entry>state dependent or not</entry>
    </row>
    <row rowsep="0">
      <entry>wc_encoding_mask</entry>
      <entry></entry>
      <entry>for parsing wc string</entry>
    </row>
    <row rowsep="0">
      <entry>wc_shift_bits</entry>
      <entry></entry>
      <entry>for conversion between wc and mb</entry>
    </row>
    <row rowsep="0">
      <entry>csN</entry>
      <entry></entry>
      <entry>Nth charset (N=0,1,2,...)</entry>
    </row>
    <row rowsep="0">
      <entry>side</entry>
      <entry>csN</entry>
      <entry>mapping side (GL, etc)</entry>
    </row>
    <row rowsep="0">
      <entry>length</entry>
      <entry>csN</entry>
      <entry>length of a character</entry>
    </row>
    <row rowsep="0">
      <entry>mb_encoding</entry>
      <entry>csN</entry>
      <entry>for parsing mb string</entry>
    </row>
    <row rowsep="0">
      <entry>wc_encoding</entry>
      <entry>csN</entry>
      <entry>for parsing wc string</entry>
    </row>
    <row rowsep="0">
      <entry>ct_encoding</entry>
      <entry>csN</entry>
      <entry>list of encoding name for ct</entry>
    </row>
  </tbody>
  </tgroup>
</informaltable>

<variablelist>
  <varlistentry>
    <term>encoding_name</term>
    <listitem>
      <para>
Specifies a codeset name of current locale.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>mb_cur_max</term>
    <listitem>
      <para>
Specifies a maximum allowable number of bytes in a multi-byte character.
It is corresponding to MB_CUR_MAX of "ISO/IEC 9899:1990 C Language Standard".
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>state_depend_encoding</term>
    <listitem>
      <para>
Indicates a current locale is state dependent. The value should be
specified "True" or "False".
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>wc_encoding_mask</term>
    <listitem>
      <para>
Specifies a bit-mask for parsing wide-char string.  Each wide character is
applied bit-and operation with this bit-mask, then is classified into
the unique charset, by using 'wc_encoding'.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>wc_shift_bits</term>
    <listitem>
      <para>
Specifies a number of bit to be shifted for converting from a multi-byte
character to a wide character, and vice-versa.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>csN</term>
    <listitem>
      <para>
<!-- .br -->
Includes a character set information for Nth charset, where N is the
index number (0,1,2,...).  If there are 4 charsets available in current
locale, cs0, cs1, cs2 and cs3 should be defined. This class has five
subclasses, 'side', 'length', 'mb_encoding' 'wc_encoding' and 'ct_encoding'.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>side</term>
    <listitem>
      <para>
Specifies a mapping side of this charset. The format of this value is;
      </para>
      <literallayout>
   Side    ::=  EncodingSide[":Default"]
      </literallayout>
      <para>
The suffix ":Default" can be specified.  It indicates that a character
belongs to the specified side is mapped to this charset in initial state.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>length</term>
    <listitem>
      <para>
<!-- .br -->
Specifies a number of bytes of a multi-byte character of this charset.
It should not contain the length of any single-shift sequence.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>mb_encoding</term>
    <listitem>
      <para>
Specifies a list of shift sequence for parsing multi-byte string.
The format of this value is;
      </para>
<informaltable frame="none">
  <tgroup cols='3' align='left'>
  <colspec colname='c1' colwidth="3*" colsep="0"/>
  <colspec colname='c2' colwidth="1*" colsep="0"/>
  <colspec colname='c3' colwidth="5*" colsep="0"/>
  <tbody>
    <row rowsep="0">
      <entry>MBEncoding</entry>
      <entry>::=</entry>
      <entry>ShiftType ShiftSequence</entry>
    </row>
    <row rowsep="0">
      <entry></entry>
      <entry>|</entry>
      <entry>ShiftType ShiftSequence ";" MBEncoding</entry>
    </row>
    <row rowsep="0">
      <entry>ShiftType</entry>
      <entry>::=</entry>
      <entry>"&lt;SS&gt;"|"&lt;LSL&gt;"|"&lt;LSR&gt;"</entry>
    </row>
    <row rowsep="0">
      <entry>ShiftSequence</entry>
      <entry>::=</entry>
      <entry>SequenceValue|SequenceValue ShiftSequence</entry>
    </row>
    <row rowsep="0">
      <entry>SequenceValue</entry>
      <entry>::=</entry>
      <entry>NumericString</entry>
    </row>
  </tbody>
  </tgroup>
</informaltable>

      <literallayout>
example:
     &lt;LSL&gt; \x1b \x28 \x4a; &lt;LSL&gt; \x1b \x28 \x42
      </literallayout>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>wc_encoding</term>
    <listitem>
      <para>
Specifies an integer value for parsing wide-char string.
It is used to determine the charset for each wide character, after
applying bit-and operation using 'wc_encoding_mask'.
This value should be unique in all csN classes.
      </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>ct_encoding</term>
    <listitem>
      <para>
Specifies a list of encoding information that can be used for Compound
Text.
      </para>
    </listitem>
  </varlistentry>
</variablelist>
</sect1>

<sect1 id="Sample_of_X_Locale_Database">
<title>Sample of X Locale Database</title>
<para>
The following is sample X Locale Database file.
</para>

<literallayout class="monospaced">
#  XLocale Database Sample for ja_JP.euc
#

#
#      XLC_FONTSET category
#
XLC_FONTSET
#      fs0 class (7 bit ASCII)
fs0     {
        charset              ISO8859-1:GL
        font                 ISO8859-1:GL; JISX0201.1976-0:GL
}
#      fs1 class (Kanji)
fs1     {
        charset              JISX0208.1983-0:GL
        font                 JISX0208.1983-0:GL
}
#      fs2 class (Half Kana)
fs2     {
        charset              JISX0201.1976-0:GR
        font                 JISX0201.1976-0:GR
}
#      fs3 class (User Defined Character)
# fs3     {
#        charset             JISX0212.1990-0:GL
#        font                JISX0212.1990-0:GL
# }
END XLC_FONTSET

#
#      XLC_XLOCALE category
#
XLC_XLOCALE

encoding_name             ja.euc
mb_cur_max                3
state_depend_encoding     False

wc_encoding_mask          \x00008080
wc_shift_bits             8

#      cs0 class
cs0     {
        side                 GL:Default
        length               1
        wc_encoding          \x00000000
        ct_encoding          ISO8859-1:GL; JISX0201.1976-0:GL
}
#      cs1 class
cs1     {
        side                 GR:Default
        length               2

        wc_encoding          \x00008080

        ct_encoding          JISX0208.1983-0:GL; JISX0208.1983-0:GR;\
                             JISX0208.1983-1:GL; JISX0208.1983-1:GR
}

#      cs2 class
cs2     {
        side                 GR
        length               1
        mb_encoding          &lt;SS&gt; \x8e

        wc_encoding          \x00000080

        ct_encoding          JISX0201.1976-0:GR
}

#      cs3 class
# cs3     {
#         side               GL
#         length             2
#         mb_encoding        &lt;SS&gt; \x8f
# #if HasWChar32
#         wc_encoding        \x20000000
# #else
#         wc_encoding        \x00008000
# #endif
#         ct_encoding        JISX0212.1990-0:GL; JISX0212.1990-0:GR
# }

END XLC_XLOCALE
</literallayout>
</sect1>

<sect1 id="Reference">
<title>Reference</title>
<para>
[1] <emphasis remap='I'>ISO/IEC 9899:1990 C Language Standard</emphasis>
</para>
<para>
[2] <emphasis remap='I'>X Logical Font Descriptions</emphasis>
</para>

</sect1>
</chapter>
</book>