MozExpTool 0.91 - A Localisation tool for the Mozilla 5.0 project

1. Introduction

MozExpTool is the tool that we are using at SoftCatalà to produce the Catalan version of Mozilla. This tool
has been written for our internal use and we are releasing the code because we think that it may be
useful for other people creating localised versions of Mozilla.

MozExpTool allows you to create text files (we call them glossaries)  from dtd/xul files which you can them give
to translators then they rather than have them working with dtd/xul files. When the translators have finished their
translations you can reimport them into the dtd/xul files.

In addition to simplifying the translator's work, MozExpTool allows to leverage (recover previous
translations). You can easily create a glossary file from the old version, and then reimport it to the set of
dtd/xul files. When leveraging, we take in account the context of the translation, that is, the file that it is coming
from and the name of the entity.

If you want to download MozExpTool, here you have the files:

mozexptool091src.zip - The complete Visual C++6 project
mozexptool091.zip  -  The binary file for win32 platform

2. Functionality

2.1. Exporting the localisable text

This function generates a plain text file with all the localisable text contained in the Mozilla resource files.
This is useful if you need to spell check the text once the translations have finished or if you want to generate a
list of all the English terms to enable the creation of a glossary. Because this options indicates the word counting
for every file, it can also be used to identify the files that contain localisable text during the localisation
analysis process.

For example, to export the localisable text of all the *.dtd files located in d:\testing\mozilla into
a file called export.txt you should type:

mozexptool -wd:\testing\mozilla\*.dtd -oexport.txt

If you want to export the localisable text in the file downloadprogress.xul into the file down.txt,
you should type:

mozexptool -xdownloadprogress.dtd -odown.txt

2.2.Simulate translation

This option changes all the given files and simulating that the program has been localised. The
function SimLocStr implements the string localisation routine, right now it just adds some text
to allow the identificaction of  the hardcoded text. You can change this routine to implement a more
accurate simulation of your language.

mozexptool -s:d:\testing\mozilla\*.dtd
mozexptool -s:d:\testing\mozilla\*.xul

That's what my current build looks like:

Note: since we do not yet support property files yet some of the strings that appear as not
localised may come from those files.

2.3 Leveraging from an Ascii delimited glossary

If you have localised Netscape 4.x you can use this option to recover your previous translation. Use your
favourite localisation tool to generate an ascii delimited glossary following this format:

"source term", "target term", "other stuff"

This file should contain a term per line, the first term is the source term and the second the
localised one. The rest of the fields after the second one are ignored. Quoted comma (")
is assumed to be the text delimited when needed it.

You can modify the source code to add support to your variation of the standard delimited
glossary format. See the function LoadDelimitedForLeverage.

For example, if you have your Netscape 4.x translations in a delimited ascii glossary and
you want to leverage then over your copy of mozilla located at d:\testing\mozilla you
should type:

mozexptool.exe -kglos46.txt -ld:\testing\mozilla\*.dtd

We recommend to remove the hotkeys when generating the glossary.

2.4 Creating a glossary

In order to deliver files to translators we need a format that is more suitable that editing DTD or XUL
files. We created a text file from a single or a group of xul/dtd files. This file can be easily translated and
then they can be imported back to the XUL files.

The glossary file has the following format:

1,"browserCmd.label","bin\chrome\navigator\locale\en-US\viewSource.dtd"
"New Browser Window"
"New Browser Window"

The first line contains the context information that MozExpTool uses to reimport the translations.
The number at the beginning of the line is simply a sequential number to give the translators an easy way
to identify each translation unit contained in the glossary. This number is there only for reference proposes
only, it's not use as part of the context.

The second line contains the source term, usually English. It is always included to give the translator and
proof-readers the source term. The source term is also use as reference during the leverage.

The third line (the line in bold) contains the translated term, by default is equal to the source one. The translator
should just replace it by the proper translation.

This is what this translation unit will look like when it's translated:

1,"browserCmd.label","bin\chrome\navigator\locale\en-US\viewSource.dtd"
"New Browser Window"
"Nova finestra del navegador"

The translator can use all the characters that are required to represent his/her language and they can
be later converted to the proper entities when we leverage from the translated glossary. See
entities.h for a complete list of supported entities.

It is very important that the translators does not delete the context information, the source term
or any other part of the translation unit, the translator should just translated the last line and
preserve the original format of the file.

Let's look at a real word example. If we want to create a glossary for a translator taking a source group of
files the dtd files located at d:\mozilla\bin and to store the glossary in a file called translator.txt, and our
original set of English mozilla files is in d:\testing\mozilla.org\ we should type:

mozexptool.exe -ud:\testing\mozilla.org\*.dtd -xd:\mozilla\bin\*.dtd -otranslator.txt

It is very important the you always keep a copy of the original Mozilla resource files
to be able to use the as the source reference during the glossary generation. As soon you install
the build that you want to localise make a copy of all the xul/dtd files in another directory
in the same directory level and with the same directory structure.

2.5 Leveraging from exported glossaries

Once the translator has finished the translation you can reimported easily. For example,
to import a file called translator.txt into the Mozilla files again, just type:

mozexptool -ld:\testing\mozilla\*.dtd -ltrans.txt

Assuming that d:\testing\mozilla is the place where you have the files.

When doing the leverage MozExpTool uses the entity name and the filename to give
a context to every translation. If you wish you can use the parameter -z and the filename
will not be use as part of the context, then just the entity name will be use.

2.6 Leveraging from one build to other

Imagine that you have a complete or partially translated build and you want to recover
the translations into a new build. You should follow these steps:

1. Make sure that both builds are in directories at the same levels with identical
directory structures. Use the parameter -z if for any reason the name of the files
have changed or your directory structure does not match.

2. Generate a glossary from the old build following the steps described in section
2.4

3. Now, leverage the glossary that you just have created from the old build into
the new build following the steps described in section 2.5.

Do not forget to repeat this process for xul and dtd files if your build contains
localisable data in both. You can use the -a if you want to leverage your localised
map keyboard settings.

3. Notes


4. Enhancements, bug fixes, new versions

Known issues:

- MozExpTool has only been tested in ISO-8859-1
 - It only compiles in Win32, but with minor changes you should be able to compiled under
other platforms, because I kept away from specific win32 functionality. MozExpTools
- It has only been tested with Mozilla build 1999071417 and M9.
- There is no support for property files yet
- To specify more than once file the paths should end with a *.extension, that is,
d:\test\*.xul, d:\test\*.ppr, etc. You should use the form *.xxx.
 

We will continue to develop MozExpTool to cover our needs regarding the localisation into
Catalan of Mozilla 5.0. We will always endeavour to fix any reported bug.

Jordi Mas
jmas@softcatala.org