Zeige Artikel getaggt mit case study

Veröffentlicht von am in CS Science
Case study- scientific translations

Challenge

Translation is a hard problem.



There is nothing that a person could know, or feel, or dream, that could not be crucial for getting a good translation of some text or other. To be a translator, therefore, one cannot just have some parts of humanity; one must be a complete human being.

-- Martin Kay

In social scientific research the translation problem is even more pronounced. Traditional questionnaire research is in itself affected by the problem that language is not exact and the degree to which different people have a similar understanding of a specific concept varies. When gathering data in several language contexts with the aim of treating the dataset as one, non-regarding the language, exact translation is needed. Exact translation however, is an oxymoron. Researchers keep pointing out how different the understanding of emotions, personality concepts, and other social constructs are between cultures. 

This is a problem all cross-cultural questionnaire research has to tackle as best as possible. In March we posted about such a project, the Sustainable Workforce study for which we built the study software. In addition we were also tasked with coordinating the scientific translation of the questionnaires into 7 languages. 

To make this whole matter even more complicated, translating an online questionnaire also means translating parts of a software. Software translation in itself is yet another complex problem. 

calvin-translation.gif

Approaches

In most cases the obvious choice for translations are professional translators. They have studied the complexity of their trade and have experience in how to maintain meaning despite contextual differences. There are of course also specialists in various professions and sciences such as medical translators, legal translators or technical translators.  

But what about translating psychological constructs? Can we trust people without social scientific background to translate these? Doesn’t it take a certain degree of knowledge about how measurement scales are built and validated? With scientific translations it is often necessary to have everything translated back into the original language and then evaluate the differences and adjust the translation accordingly. 

Regardless of the answers to the previous questions, what if the budget for the study doesn’t allow for paying translations and maybe back translations made by professionals? Master or Doctoral students in psychology or sociology with the respective mother tongue can be less expensive than professional translators and they have the right background. What they may lack is actual expertise in translation.

Solution 

The solution we offered in the past as well as for this project was based on our extensive network of social scientists across Europe. For cost reasons and because the project was already under time pressure when we came on board, we suggested to replace back translations with proof reading by a second translator. This seemed justified on the background that the surveys largely consisted of questions of demographic nature rather than actual psychodiagnostics. 

For each language we recruited two people from our network, all of them with either a Master’s Degree or a PhD in Psychology. The only exception was Portuguese, where one of the two was a professional translator. 

images.jpgWe then prepared the materials, briefed the translators and coordinated work between the translators and the proof readers as well as the continuous updates coming from the project side. We also gave translators access to the system where they could verify their own translations in the final context they would be shown in. The result of this whole process was then to be verified by people of the respective languages within the research project.

Learnings

Managing translations split into many small parts due to being used in software and at the same time dealing with constant increments and small changes is a very challenging task. Additionally, being at the interface between all parties without actually being able to judge the quality of neither the translators’ work nor the validity or accuracy of the customer’s feedback is uncomfortable to say the least.

At a point in the project the state of some translations didn’t meet the expectations of the research team at all. A small project crisis ensued which was then collaboratively solved. The interesting thing about it all is the analysis of the factors that may have contributed to the situation:

  • Translations are hard, scientific software translations are very hard.
  • There is more than one possible solution for any sufficiently complex translation problem, the perfect translation often doesn’t exist.
  • Due to the tight schedule of the whole project, the texts already in translation were still in change.
  • It is easy to underestimated the verification and testing effort such translations work takes from the side of the customer. In this case, it was heavily underestimated. In other words, the implicit customer expectation was that the delivered translations would be as good as final. Our implicit assumption was that only the customer would be able to finalize the product in cycles of acceptance testing. Such different assumptions unavoidably lead to misunderstandings.
  • Involvement of the project team in the translation process is crucial for two reasons. For one, only the creators of a questionnaire can verify its translation. Secondly, involvement creates commitment to and satisfaction with the created solution. This isn’t a trivial point, given the fact that many correct translations of a given text exist.
  • A degree in psychology doesn’t guarantee good translation outcomes due to a potential lack of experience and expertise.
  • A degree in translation doesn’t guarantee an outcome that is acceptable for researchers but it should guarantee linguistic correctness.

How to do it better

Despite the fact that after all the additional rounds of reviews and corrections, the study went online as planned and no serious problems with the questionnaires were reported since, we definitely think that there are a few things that could be improved for future projects of a similar nature.

  • Clarify expectations very carefully. True for every project.
  • Don’t start the translation process before the source is absolutely final. This means that the online questionnaire has to be adequately tested and language changes implemented, before the translation process starts.
  • Use professional translators for the translation, use scientists for proof-reading. 
  • If possible, the actual research team should do the back translation or proof-reading.
  • At the very least, the final editing and quality review can not be outsourced and takes a significant amount of time. This has to be taken into account when planning the project.
  • Use a software environment were translations can be made in context from the start.
  • Use some kind of change tracking if possible.
Fallstudie – 60 Jahre kantonale Schweizer Wahldaten aus Papierarchiven befreien

Herausforderung

In einem Land wie der Schweiz, wo seit Jahrhunderten auf allen Verwaltungsebenen mit Herzblut Bürokratie betrieben wird, liegen systematisch gesammelte Daten aller Art brach. Daten, welche für die Wissenschaft von einzigartiger Bedeutung sein können, da vergleichbare Informationen in den wenigstens Regionen der Welt so akribisch gesammelt wurden. 

staatsarchiv.jpg

Wir Psychologen sind an solchen Daten bisher wenig interessiert. Für Ökonomen und Politwissenschaftler sind diese aber Gold wert. Der Haken an der Sache ist nur, dass der ganz grosse Teil davon vor dem Computer-Zeitalter gesammelt wurde und daher nur auf Papier gespeichert ist. 

Wie schafft man sich eine qualitativ einwandfreie digitale Datenbasis auf Grund von Daten, die in grossen Büchern in Archiven liegen und darüber hinaus in jedem Kanton der Schweiz anders aussehen? 
Genau dieser Herausforderung stellten sich Prof. Dr. Mark Schelker und Dr. Lukas Schmid, deren Ziel es war, die Resultate der kantonalen Parlamentswahlen der letzten 60 Jahre zu digitalisieren. Cloud solutions konnte die Forscher bei der Entwicklung und Umsetzung einer optimalen technischen Lösung unterstützen. 

Ansätze

Selbstverständlich ist die automatische Texterkennung (OCR) relativ weit fortgeschritten. Für die Herausforderung der kantonalen Wahldaten kam OCR aber aus verschiedenen Gründen nicht in Frage: 

  • Beim Scannen von dicken Büchern werden die Inhalte in Bundnähe oft verzerrt, bleicher oder sogar leicht abgeschnitten. OCR Softwares können damit nicht umgehen. 
  • Tabellen mit vielen Trennlinien sind ebenfalls ein Problem für OCR. 
  • Ältere Schriftstile haben eine schlechtere Erkennungsrate. 

Das Nachkorrigieren von schlechten OCR Daten wäre eine Möglichkeit gewesen. Dies wird aber schnell aufwändiger, als die direkte Eingabe der Daten aus einem einfachen Scan und führt mit hoher Wahrscheinlichkeit dazu, dass falsch erkannter Text als Fehler in die Datenmatrix gelangt. 
Es blieb also nur die manuelle Erfassung. Dazu würde man traditionellerweise wohl Excel benutzen, was aber verschiedene Probleme mit sich bringt:  

  • Arbeit dieser Art ist durch ihre repetitive Natur eher fehleranfällig, Excel bietet keine Unterstützung dabei, verschiedene Fehlerquellen wie Zeilenverschiebungen, falsche Eingaben, falsche Zuordnung, etc. zu vermeiden. 
  • Das manuelle Zusammenführen vieler einzelner Excel Dateien stellt eine weitere Fehlerquelle dar. 
  • Bei vielen auf mehrere Erfasser verteilten Excel-Dateien ist keine laufende Kontrolle über den Stand der Erfassung und die Qualität der Daten möglich.

Umgesetzte Lösung

wahlbersicht.png

Die mit dem Kunden in gemeinsamer Denkarbeit entwickelte und durch CS programmierte Lösung hatte zum Ziel, die jeweiligen Stärken von Technik und Mensch zu vereinen, um so die Qualität der Daten zu maximieren. Das entwickelte System hatte folgende Merkmale: 

  • Klar strukturierte, software-geführte Erfassung der Daten. 
  • Vermeidung von redundanter Erfassung durch Auftrennen in mehrere Erfassungsebenen (Kanton, Bezirkswahljahr, Kandidaten). 
  • Gewisse vorerfasste Daten, die bereits korrekt zur Auswahl gestellt werden konnten. 
  • Datenvalidierung bei Eingabe. 
  • Eingebaute Qualitätschecks (Vergleich von perfekten, vorerfassten Datensätzen mit den eingegebenen Daten). 
  • Sorgfältige Instruktion und Support der ErfasserInnen. 
  • Zusätzliche manuelle Stichprobenkontrollen durch das Forscherteam. 

kandidaten.png

Auf diese Weise wurden Ende 2014 / Anfang 2015 durch 30 ErfasserInnen in höchst zufrieden stellender Qualität an die 190'000 Kandidierende erfasst, verteilt auf 60 Jahre, 4000 Wahlbezirke und 15'000 Listen. 

Show page in

Was Kunden
über uns sagen

Die unabhängige technische Expertise unserer Web-Plattform hat dank der gründlichen und kompetenten Analysen von cloud solutions sowohl die Stärken als auch die relevanten Verbesserungspotentiale klar auf den Tisch gebracht.

Günter Ackermann
Projektleiter Qualität Evaluation, Gesundheitsförderung Schweiz

 
 
 
The future of the PHP PaaS is here: Our journey to Platform.sh
CS Tech
In our team we’re very confident in our ability to produce high quality software. For the past decad...