A translator's perspective.
AI and Machine Learning are gaining relevance in the daily life of translators, and SAP linguists have always been at the forefront of computer-aided translation. The idea of machine translation, which is a form of AI and, in its neuronal version, leverages machine learning, is not a new endeavour within SAP, a company that embraced machine translation from the early days (from rule-based machine translation and statistical machine translation to neural machine translation).
Machine Translation (MT) has been saluted as the go-to solution where it is necessary to support high volumes of non-communication-critical content, where the source language can be standardised to a high level, leaving little space for interpretation, synonymity, and ambiguity, and where quality is not an issue and out-of-the-box translation quality is sufficient for gisting.
It satisfies two very distinct needs:
First, it allows the reader to understand foreign-language content in the own language and is usually used as self-service for the gist-translation of SAP Notes and SAP Knowledge Base Articles within the SAP ONE Support Launchpad. Should you want to access such service on the Launchpad, please refer to the blog post by Fabio Almeida in the SAP blog, which explains how to access and use the SAP Leonardo Machine Learning Foundation MT, a neural machine translation engine which leverages 25 years of SAP terminology and translation memory content developed by the linguists from the SAP Globalization Services - among which, ILT.
Second, it streamlines the SAP product translation process by supporting the manual translation process with a pre-translation step. The output is then revised and improved by SAP-linguists to assure that the language conforms to the technical and business requirements of the product. Its use is gaining momentum. With the introduction of the Sap Translation Hub the use of the SAP MT engine has been widened to clients developing apps in the SAP Hana Platform or otherwise interested in using it for the translation of own SAP customisations and developments. Although available on the SAP Hana Platform, for the successful completion of a MT project it is essential to refer to the expertise of a SAP Langauge Services Partner, because "in every translation project, post-editing the raw MT output is perhaps the most important stage in the process, since the quality expected from the final translations (after post-editing) must in principle meet the same high requirements as that of any human translation" as put by Dr Falko Schaefer.
The introduction of the MT step within the translation workflow has dramatically changed the skills and expertise required from translators.
Mr Granuzzo, our Senior SAP-Linguist, will tell you what to look out for when working with machine translation technology within the SAP ecosystem. It can get very technical!
"My experience with MT delivered translation is based on projects where light post-editing was requested, and translation was executed in segments in SDL Trados Studio environment with English as the source language and Italian as the target language.
As a post-editing translator, the main problems I encountered in a text rendered by MT were linked to the structural diversity of source and target languages. As a typical synthetic language English make use of lots of noun-noun combinations and does not use as many prepositions as analytical languages like Italian or other Latin languages.
We cannot deny that MT has nowadays dramatically improved and new forms of MT have been developed that can learn and predict the likelihood of sequence of words based on a neural network (NMT), a considerable step ahead compared to phrase-based, rules-based, or statistics-based MT engines.
Nevertheless, structural differences between source and target languages seem not to have been overcome yet, and they still represent a minefield for MT endeavours. In my experience, post-editing is still the lifeline to deliver a good translation. The main issues emerged can be classified as structural issues and context issues.
Regarding structure-based issues, we can define three main issue categories that can undermine the MT rendering:
MT has often failed to recognise the need for gender agreement, thus delivering adjectives and past participles not in concord with their corresponding nouns.
As a typical mainly synthetic language that makes rare use of prepositions, English uses noun agglomerations, sometimes even a sequence of 5 nouns, where the last one is the core element in the sequence. These combinations cannot be translated in the same way in a target analytical language like Italian, where the base noun must be in the first place and be followed by all specifiers, i.e. adjectives, specification complements, and so on.
MT has often failed to get the core element of a noun-noun combination and so the simple name of a button, a tab, or a field can create problems in the translation into a language like Italian, where the core element needs to be completed by its specifications, be it a name or a complement with preposition.
The app name was not recognized and the rest of text has been wrongly translated.
In this second example, we can see how MT failed to recognize the presence of a noun in adjectival position, i.e. ‘Canada’ and the core element in the following noun-noun combination, that is ‘levels’, thus completely missing the intended complement ‘at company code levels and at tax jurisdiction levels’.
The following example shows how even a reliable neural MT can fail to get the core element. It can learn how the Italian structure looks like – I guess that is why the verb has been put in the second place – but it has not recognized the core element of the sentence, that is ‘dialogue box’. MT has simply followed and translated the standard SVO word order.
The standard position of the English adjective - always preceding the noun - is not always recognized by MT, even more so if we bear in mind the fact that they are gender-neutral and that some adjectives can easily be mistaken for nouns - and not only by MT!
Compare the following examples:
In this case, the term financial instruments was translated correctly and financial has been recognized as the adjective preceding the noun because the term ‘financial instruments’ is quite a common one and is offered by the Term Recognition function.
In this case, the adjective ‘potential’ was not identified and therefore, was wrongly translated as a noun.
UI function of the term
Context is paramount to deliver a good translation. Especially in SAP software translation, you need to know what UI function the term refers to. If it is a button, then it is likely to be a verb, if it is a field it is likely to be a noun or an adjective. Critical terms are for instance Open, Process, Run, Check, Download, Set, etc.
The term ‘run’ is used in the source twice as a verb but has been misinterpreted in the first case and considered as a noun in the second case.
The same term must be used consistently in the whole object.
The translation Reporting Activity = attività di conteggio, is not aligned with the term required and used in the whole object, that is reporting activities = attività di reporting.
Application Component Recognition
For the MT tool is not easy to grasp some particular nuances of meaning since a translation object belongs to a specific application component. MT can get the right logistics term from SAPTerm via the Term Recognition function for component LO, for instance, but cannot provide the different meaning existing between a term belonging to SAPTerm subcomponent LO-AGR-CC (SAP Agricultural Contract Management) or the same term belonging to SAPTerm subcomponent LO-CMM (Commodity Management in Logistics). For instance, the term ‘commodity’ is translated ‘materia prima agricola’ for component LO-AGR-CC, whereas simply as ‘commodity’ for the subcomponent LO-CMM where the focus is on the Commodity Management in a logistics field and terms are more similar to component FIN-FSCM-TRM, that is a financial field.
If you do not have a 100% knowledge of the domain or a field of a translation object, polysemous words as Cancel, Clear, Post can be ‘lethal’ to general comprehension. Only a translator can choose the right term from those offered to him/her in the Multiterm Term Recognition box in an SDL Trados Studio environment.
What is required from a post-editor
The pre-knowledge of all these possible issues can help a translator be quicker in the correction of MT-delivered target texts. A translator can focus his/her attention on critical parts of speech, for instance adjectives and focus on spotting the core element of a noun-noun combination.
The presence of tags in the target text delivered by MT can alert a post-editor about a possible noun or adjective misalignments. Tags indicate text formatting that often identify the name or the title of tables, tabs, fields, checkboxes or apps.
Attention and patience
It is important to remember that an automatically-delivered translation that appears to be correct, could still be misleading and contain inconsistent or wrongly-concorded terms. It can be just a wrong vowel, but it makes all the difference. So, a post-editor needs to patiently check all terms and make sure they are correct and consistent. If you build a house from scratch you know the materials and techniques used, whereas if you have to refurbish it…
A post-editing translator must consider an MT tool as a support to translation and not as a ‘competitor’. Especially in a ‘light post-editing scenario’ a post-editor should try to retain as much of the raw machine translation output as possible and restrict his/her post-editing interventions to those corrections needed to deliver a semantically, grammatically and syntactically correct translation.
Some Facts about MT
What it can do
MT can definitely help the translation process. A fair percentage of segments were translated correctly by MT:
In this example, there are no adjectives or noun-noun combinations that could ‘mislead’ MT and the only wrong term was ‘positions’ translated as ‘saldi’ - instead of ‘posizioni’. In general MT can offer a fully and correctly translated segment, when a 1:1 equivalence may apply, provided it is a documentation text and ‘light post-editing’ is requested. That does not apply to a marketing text, where even a correct text offered by MT still needs to be made ‘attractive’ by the translator and ‘full post-editing’ is necessary.
MT makes drafting the target text much easier for a translator as it already provides the translator with a textual framework on which to base his/her final translation. Besides, all the errors mentioned in the examples above will certainly be avoided next time, since MT can ‘learn its lesson’. As a matter of fact, in the second project where MT was employed the amount of post-editing decreased compared to the very first project.
What it cannot do
Nevertheless, I wouldn’t say the combination MT + post-editing can speed up the translation process. In my experience, the number of segments that did not require post-editing was balanced by those which required lots of corrections.
Sometimes, some errors forced the translator to change the text within tags, wrongly translated by the MT engine. And text between tags is usually very critical since it may refer to an UI element, that needs to be translated consistently, both in the UI translation (short texts) and the documentation translation (long texts).
In other cases, the structure was so wrong that starting from scratch would have been a better option.
Post-editing is vital
It is imperative that texts undergo post-editing since a learning algorithm-based MT can propose some context-based terms, but it cannot certainly get all the syntactic and grammar differences existing between two languages, and especially in SAP translation cannot fetch the right application component-based term among those offered for selection. Of course, next time MT will propose the sentences as corrected, that is the corrected equivalences, but is still not likely to cope with possible changes in context.
MT can propose localised culture-oriented terms, but what happens if the context and the application field change? Here is a bitterly funny outcome:
The source text ‘in two colors - brown for uncovered demand and green for covered demands’ was rendered by the MT as follows ‘in due colori - Müller per i fabbisogni non coperti e verde per fabbisogni coperti’
The colour ‘brown’ had been interpreted by the MT engine as a typical English surname and translated with the equally typical German surname Mueller.
Context is absolutely crucial for a good translation.
At the current stage – and I hope I won’t be contradicted in a too near-future – even the most state-of-the-art MT, that is NMT, although it can foresee a context, it cannot see the context!"