Configure translation management - SysAdmin
Basic concepts of translation in marketing and product environments. Learn about translation tools in general and how to prepare your content for translation and localization.
Translation vs.(market) adaptation
censhare supports multi-language asset content and asset properties. Internal translation tools and connectors help users manage translations and multi-language content.
What is the goal of the translation?
When talking about the translation of content the first question we have to answer is: “what should be achieved?”
The best possible/selling communication of a product, for example?
100% transport of the source information to the target (as in technical documentation)?
... or something in between, depending on the resources, the control over the target from central marketing …?
Should we translate or adapt the content?
Translation
When do we translate?
The target text should be a 1:1 reflection of the original text (content, style, rhythm)
The target text also reflects the demands/habits of the (foreign) language but is still near to the source [a sentence in the source language is a sentence in the target language]
The target text, is to a certain extent, independent from the original source, [the sentence structure may vary, to get a better understandable target text]
Adaptation
When is a (market) adaptation necessary? Here are some examples:
Legal issues: other demands for the display of technical values, direct comparison of competitors is not allowed
Marketing and sales issues: The market positioning of a product may vary from country to country (more luxury in one market and sportier in another)
Customer-oriented approaches: In some markets products are often sold as bundles or packages and in others the customer want more possibilities to configure their “own” products
Product specifications: 110V plugs in the US, 220V plugs in Europe
Different product labels: An identical t-shirt might be labeled "M" in the US, "L" in Germany, and "XL" in France
Regional variations: Ivory painted cars are associated with taxis in Germany but not outside of Germany; “4” is an unlucky number in China while “13” is seen as unlucky in Europe/US
Cultural differences: Especially a topic for the graphic connotations (showing models in swimsuits lying on a car bonnet might not be the best way of marketing cars in the mid-east)
Figurative expressions: Often cannot be translated, a different explanation is needed to transport the information
Text length: French text is about 40% longer than English text
Writing and reading direction: left to right or right to left
Word length: Some languages have longer words than others
Selling products in more than one market might lead to a vast amount of market adaptations for sales materials
Some definitions
To get a better understanding of the expressions used in the field of translation here are some definitions:
Source = the text to be translated
Target = the translated text in another language or in the same language but in e.g. another dialect
Segment = the smallest unit to translate that is used in a translation memory (Standard: one sentence is one segment)
Master = an asset, from which language variants are derivated
TM = translation memory
CAT = Computer-Aided Translation (makes use of a TMS)
TMS = Translation Memory System
TMX/TBX/SRX = Exchange formats for translation memorys/teminology databases/segmentation rules
Match (quote) = accordance rate between a segment to translate and existing segments in the TM
Alignment = feeding a TM with existing translations (normally supported by a tool)
Main parts of a TMS/CAT tool
A TMS is not google translate
It is important not to mix it up with a translation memory system (TMS) with automated translation tools like google translate. A TMS “learns” via usage. In the beginning, it is empty and has to be filled manually by translating texts or by the import of an existing translation memory.
TMX/TBX/SRX
All these abbreviations stand for an interchange format between different TMS. Most of the Systems support these formats
TMX = Translation Memory Exchange
TBX = Terminology Database Exchange
SRX = Segmentation Rules Exchange
censhare supports these formats as they are open standards
Segmentation
Let’s start with segmentation and why it is necessary: The goal of the segmentation is to break the source text into bite-size chunks large enough for translation without compromising the context and maintaining the ability to create matches with existing translations in your TMS.
In an ideal world, every individual word would become segmented but at the risk of compromising syntax. To maintain context, paragraphs as segments would be desirable but this, in turn, could lead to fewer similar segments.
In short:
A segment almost represents a sentence. These sentences or segments are stored as bilingual pairs (source and target) in the TM.
To know how to split the source into segments, segmentation rules are used. One main topic is to differentiate between “.” closing a sentence and an abbreviation “.”
censhare uses the okapi framework for text segmentation. The framework is open-source, so every customer can adapt the rules and also contribute to the framework. censhare itself does not provide any services for this.
Translation memory
The translation memory stores pairs of source and target segments. If a new source segment needs translating, the TMS checks whether there is already a source segment or a similar source segment in the TM. The occurrence ratio between a new segment and a segment in the memory is calculated and shown as a match rate in percent. A 100% match means that there is an already existing identical segment in the TM.
The TM does not know if the translation is correct or not. It's just a simple database. Therefore, it's the job of the translator to provide correct translations.
So if the same text is translated twice in different ways, it will show up two 100% matches if a new text to translate contains existing segments. Now the translator has to decide, which of the 100% matches he will use.
Every time an asset in a target language is opened, the source is segmented and checked against the TM. In 2017.1 censhare also “remembers” the used 100% matches for the translation, to avoid reconfirmation of already used translations.
Terminology
A terminology database is an additional database within a TMS. In the terminology the translator stores words or expressions, that must always be translated in the same way. In censhare terminology, it is checked whether a certain expression is used as often in the target as it appears in the source.
And XLIFF?
In basic XLIFF is also an exchange format. It is XML-based and is used to exchange translation information between TMS. XLIFF always contains the source segments and the target segments (as pairs).
This is important if a new/different TMS is to be used to utilize an existing TM, and the existing TM is constructed with different segmentation rules.
This often happens if censhare is newly implemented alongside an existing TMS (e.g. SDL Trados).
Why and when to use a TM
Should you always use a TMS?
In most cases "yes" but it depends on;
How often identical or similar texts are used in the source
How strong the impact of (market) adaptation is on the target
How strong the translator should stick on the source text or how free he is to find his own way to express the message within the text
In standard cases, a TMS is very helpful, but it also restricts the translator and the freedom to change translated text outside the TMS. (As) Every change outside the TMS, that could improve the translation quality, should also be stored in the TM. If this work is not done, the TMS will always show up the “old” translation and not the reworked one.
Using a TMS has a deep influence on the whole process of text internationalization.
Are there any options?
censhare allows the reuse of text also on an asset base. Maybe it is worth thinking about the “direct” reuse of a product/feature description – instead of translating it again with the help of a TMS. (But nevertheless, a TMS could be used to create the “first” translation.)
Also, not as an option but as an additional thought, if texts could be pre-translated before they are placed in a layout, it might assist in achieving a solid translation foundation before the overworking and adaptation to a specific page begins.
Adaptation starts with the master
Preparation of a master layout
A few ideas on how to prepare a master layout properly;
Make text boxes as large as necessary for the longest-running language (or use the respective function in Adobe InDesign to make your layout flexible)
Be aware that the length of the headlines may change the whole look of a page
Decide whether the master could simultaneously also be a published issue, or whether it’s better to have a “neutral” master, that can be easily adopted (this applies to the text and the layout)
A layout perfect for print is not necessarily a layout ready for translation. Creating print layout masters for translation might be a lot more effort than only creating the final piece.
The master text
Not only the layout has to be prepared, but also the text. Especially if a TMS is used, keep some of these points in mind:
Expression variation is the friend of a good text but the enemy of a TM because every variation leads to a new entry in the segment list, and therefore to a new translation
Content has to be written in a consistent way: For the TM there is a difference between e.g. “2$” and “2 $” also between “2.50$” and “2,50$”
Reusing existing source segments leads to more matches in the translation memory
A conflict of interest: Is the quality of a source text measured by its language quality or by the cost for its translation?
The master text and formatting
Last but not least: be mindful of the format the text is stored in. The best selection would be plain text with no formatting and no control characters. Every unnecessary additional information (e.g. spacing, soft returns for linebreaks within a sentence, bold, superscript) that is embedded into the text has to be filtered out to keep the segments of the text as similar as possible and not to force the translator to reuse this information that might not be necessary for his translation. The challenge is the filter. In one case it might be good to keep inline information, in other cases it may not. censhare uses a story normalizer to filter out unwanted control characters for the XLIFF export, but this is never a perfect solution.
From the TM perspective, the best way would be to have no control characters in the text.