Set up Translation with Memory
The application Translation with Memory in censhare Web requires configuration. We take you step-by-step through the required configuration.
Context
Translation with memory is available in censhare Web.
The configuration of Translation with memory is done in the censhare Admin Client.
Prerequisites
A license for "Translation with memory" for censhare Web is required.
Introduction
Translation with memory is a tool for professional translation. Translation with memory searches for saved translations segments that match text segments in the content that you want to translate. Text segments in the memory that are similar or identical to segments in the content you want to translate appear as possible translations. The larger the database of saved segments, the better the system works. The primary task of the translator is to select the best matches and edit suggestions that have a low match percentage.
"Translation with memory" is an integrated translation-memory system. Preconfigured widgets provide the functionality that you need to complete several tasks:
Edit texts, segments, and terminology.
Display matches and statistics.
The following sections explain how the asset structure, workspace and censhare Admin Client are configured for the translation memory. This configuration is already created in censhare and usually does not need to be changed.
Key steps
Activate the Translation with Memory license in censhare Web. Without a license, only a simple translation tool is provided in censhare Web.
Activate the Inverted Index. Without the inverted index, censhare shows no suggestions from the translation memory.
Activate the Babelfish service. The Babelfish service sets up the internal translation pipeline for the Translation with Memory application. Without the Babelfish service, Translation with memory does not run.
Activate the Translation service. The service handles variables for date/time and numbers in segments to facilitate the reuse of segments. You can define your own rules. Without the translation service, Translation with memory does not run.
Add any source and target language that you to want to use in translation. You can also define custom languages.
Add the respective permissions to the roles that work with Translation with Memory. Without the permissions, users do not see the translation tab in their workspace.
Create article templates for the Content Editor. Articles for Translation with Memory require a special asset structure and configuration. For example, the source asset must have a different language than the target asset, Also, find a first introduction below.
Note: These steps are the minimum requirements for Translation with memory. Additional configuration can be necessary. For example, the import/export of segments or terminology.
Asset structure
The translation memory works with text that is saved in XML format in assets. Text assets can be part of an asset structure. For example, an article or a website. Every language is created in its own text asset. The source language functions as the master for other languages. You can link multiple languages to the source language. One language can be set up as both the source and target language.
If the source language and the target language are the same, the translation functions as a cascade. The initial translation is from the source language into the first target language. The first target language is used as the source language for translation into the second target language. Both variants (a source language or a cascade of target and source languages) can be set up in the asset structure in Translation with memory.
The translation memory in censhare web is available for text assets with the MIME type "XML" (ID = "text/xml") and "Adobe InCopy" (ID = "application/vnd.adobe.incopy-icml").
Translation with the censhare translation memory is directional. The translation is done from a master language into another language. This configuration is dictated by the asset structure. A "Variant with update flag" asset must be created for every language to be translated.
The following asset structure is required to work with "Translation with memory":
Asset structure with a source and two target languages. The text asset (1) contains the source language (4) for all target languages. From this asset, a "Variant with update flag" (2) is created for each target language. The translation is created in the linked Text assets (3). The translation memory contains already translated segments and suggests translation for the text in the target language (5).
Asset structure with a source/target language cascade. The text asset (1) contains the source language (5) for the asset (2) with the target language (6). This works as the source language for the target language (7) in the asset (3). The assets are each linked as "Variants with update flag" (4).
For more information, see Article templates for Translation with memory.
Workspace
In censhare Web, translators work with the translation memory in a specially-designed editor (not in the standard editor). The editor for Translation with memory has additional widgets that support translation tasks. For example, widgets that show terminology lists or translation suggestions.
The censhare translation memory is preconfigured. In censhare Web, the translation memory widgets are grouped on the "Translation" tab. To configure this tab, go to the "Translation memory" tab in the workspace container asset (resource key: censhare:workspace.container.translation_with_memory). The "Translation" tab contains the following widgets:
Segments: This widget contains the translation editor.
Memory: This widget lists all of the translation suggestions for the selected segments.
Terminology: This widget contains a list of terms that are found in the text.
Statistics: This widget contains reports that show the progress of the translation.
Note: The "Translation" tab and widgets are only visible to users who have the appropriate permissions.
Configuration in censhare Web
Activate Translation with memory for censhare Web
To use Translation with memory in censhare Web, you must activate the license in the System asset. If the license is not activated, users only see the "Translation without memory" application in censhare Web. For more information, see Activate the license for Translation with Memory.
Configuration in the censhare Admin Client
Activate the Translation service
Open the censhare Admin Client and go to "Configuration/Services/Translation".
To edit the service configuration, double click the Configuration entry.
In the General setup section, select Service enabled.
If needed, define additional rules to detect certain formats of date/time and numbers. For more information, see Configure variables for Translation with memory.
You can also edit the segmentation rules for the segments. For most use cases, this is not necessary. For more information, see Translation with memory – segmentation and tag rules.
To save the configuration, click OK.
Synchronize the changed configuration with the Master server and Remote servers if existing.
Activate the Babelfish service
Open the censhare Admin Client and go to "Configuration/Services/Babelfish".
To edit the service configuration, double click the Configuration entry.
In the General setup section, select Service enabled.
To save the configuration, click OK.
Synchronize the changed configuration with the Master server and Remote servers if existing.
Activate the Inverted Index
To search and read segments and terminology entries, censhare uses the censhare database (cdb). censhare only accesses the relational database (Oracel or PostgrSQL) to update or add segments or terminology entries. To find entries in the cdb, censhare uses an inverted index. This index must be activated in the censhare Admin Client:
In the censhare Admin Client, go to the directory "Configuration/Services/Inverted Index"
To edit the service configuration, double click Configuration.
In the General setup section select Service enabled.
To save the configuration, click OK.
Synchronize the changed configuration with the Master server and Remote servers if existing.
Repair translation memory
If the translation memory in censhare is corrupted, there are malformed segments. When you export segments, the corrupted memory creates errors. You also get an error when you want to update the Translation memory index. Besides that, there can be duplicated segments. censhare provides a server action to detect these two situations. For more information, see Repair translation memory (TMX).
Mismatch between cdb and translation memory
censhare stores the translation memory in the database. For a fast access to the stored segment pairs, censhare uses the cdb. If the translation memory changes, the cdb is updated automatically.
Under some circumstances, the automatic update of the cdb fails or is not executed. As a result, there is a mismatch between translation memory stored in the database and segments stored in the cdb.
For example, there can be:
Segment pairs that exist in the translation memory but not in the cdb.
Segment pairs that were changed in translation memory but were not updated in the cdb.
Segment pairs that were deleted from the translation memory but still exist in the cdb.
To get translation memory and cdb synchronized again, you have to update the cdb.
This is done in two ways:
Restart the censhare Server: Restarting the server automatically rebuilds the cdb and updates the translation memory index.
Rebuild the Translation Memory Index manually: The server action rebuilds the translation memory index. You do not have to shut down and restart the server for this action.
Set up additional or custom languages
Introduction
When you translate texts in censhare Web, each text asset must have a defined language. If you want to translate properties, each property must have a defined language. ISO-639 defines standard language codes for language names. For example, "en" stands for English and "de" stands for German. censhare Web uses these codes to store segments and terminology entries with the corresponding languages. These codes are also used to import and export segments in the TMX format and terminology entries in the TBX format.
Some languages have country-specific codes added. For example, "en-US" stands for American English and "en-GB" stands for British English.
In some cases, there is no ISO-639 language code available for the desired language variant. For example, one team translates into English for Hong Kong and the other team translates into English for Taiwan. Both teams work in the same censhare domain. The segments and terminology entries from each team need to be kept separate. If both teams use the language code "en", all the segments and terminology entries are visible to both teams. The solution is to define a custom language code that is unique for each team.
Here is another use case: A company is active on the Scandinavian market. Before the documents are translated into English for Sweden and English for Norway, an intermediate step translates them into Scandinavian English. Sweden and Norway have no specific language country codes, like en-US for American English. But, there is a country code: SE for Sweden and NO for Norway. The custom language-country code is en-SE and en-NO.
Scandinavian English does not have a specific country code. The solution is to create a custom language code for Scandinavian English. For example, "en-SCA". The custom language codes ensure that the segments for intermediate and the target languages do not mix with other English entries in the database.
Define an additional or custom language
censhare comes with a basic set of languages. The default languages are defined in the Languages table in the censhare Admin Client. You can only assign a language to an asset or property if the language is defined the Language table. You must create the language entry before you can assign it to an asset.
A language definition in censhare consists of two parts:
A common part with the ID, names, descriptions and domains.
An Assignment part with a specific language mapping for some applications.
The ID is the most important element in the language definition. This ID is also used as the language code. censhare enters this ID in the database, TMX files, or TBX files when it stores the language information.
If you add a commonly known language, use the specific ISO language code. ISO 639-1 lists all standard language codes with a two-letter country code where relevant. If you want to define a custom language, you are free to choose a combination of characters that suits your needs. The maximum length is 256 characters.
Note: To conform with the XLIFF format, your custom language ID must follow this pattern: xx-YY! "xx" stands for two letters with small capitals and "YY" for two letters with big capitals.
In the Assignments section, select your specific language entries. The language entries are mapped to the proprietary language keys of the applications. For more information, see the Related topics section for the "Languages (Master Data)" article.
Note: For technical reasons, you must define a mapping for the Translation memory application entry. The mapping is not used as a language code for the defined language in Translation with Memory. However, if the mapping is not defined, Translation with Memory in censhare Web does not work.
To create your own language definition:
Open the censhare Admin Client and then the Languages table in the Master data section.
To create a new entry, click the Plus icon in the upper toolbar.
Enter your ID for the language code, names, and descriptions.
Select the specific language entries for the different applications.
To save the language configuration, click OK.
Permissions
censhare has individual permission keys for Translation with memory. You can assign these permissions keys to a Permission set. For example, create a "Translation memory" permission set and assign the set to the "Translator" role.
The following permissions are available in censhare Web:
Permission key | Definition |
Translation memory all | This permission key has all of the individual permissions for the translation memory. Users with full access to the translation management tool need these permissions. |
Translation memory read-only | This permission key limits usage to read-only access to the translation memory. Assign this permission to users who do not have a translation function. Users with this permission level cannot changes segments or confirm translated segments. |
Add translation memory segments | This permission key controls who can add text segments to the translation memory. Users need this permission to confirm translated segments and save translations. When a translation is confirmed, the corresponding segment is added to the translation memory. Users with these permissions can create segments in the translation management tool manually. |
Delete translation memory segments | This permission key controls who can delete text segments from the translation memory. Users with access to the translation management tool need this if they wish to remove segments from the memory. |
Edit translation memory segments | This permission key controls who can edit text segments in the translation memory. There is always a pair of segments with two different languages that belong together. When you translate text you can edit the translation suggestions in the target language. Users with access to the translation management tool need this permission to edit segment pairs in the memory. |
Terminology all | This permission key has all of the individual permissions for terminology lists. Users with this permission key can edit, add, and delete entries in the terminology list. Users with access to the translation management tool need these permissions. |
Read-only terminology | This permission key gives users read-only access to terminology lists. Users need this permission to work with an existing terminology list. |
Note: The permissions Translation of content with memory all and Translation of content with memory read-only are reserved for work with the translation memory in the censhare Client. These permissions do not apply to censhare Web.
Result
The translation memory is ready to use.
Next steps
You have the following options:
Work with an empty memory:
With a growing number of translated texts, the probability that suitable translation suggestions are found in the translation memory increases. If you begin with an empty memory, the translation memory does not have any suggestions for the first segments and cannot automatically translate any segments. As soon as you translate a segment into the target language and confirm the translation, censhare Web saves that segment. If the translation memory finds an identical segment or a similar segment in another document, it suggests the saved segment as a translation.Import text segments from another system:
If you have already worked with a translation memory system, you can import the segments from that system into censhare.Import dictionaries from another system:
Just like segments, you can import dictionaries and glossaries from other systems into censhare.Manually add segments and terminology:
Users with the right privileges can create segments and terms in censhare manually. For more information, see the translation overview article in the "Related topics" section.Change segmentation rules: If the automatic segmentation does not deliver the results you want, you can adjust the segmentation rules to fit to your needs. For example, make the segments larger or smaller.
Change tag rules: XML elements and properties in a document have to be correctly recognized by the translation memory. Therefore, the translation memory stores information on the document structure. In the tag rules, set up how the structure and inline elements are processed for the best segmentation results.