Comparable Corpora And Computer Assisted Translation

Comparable Corpora and Computer assisted Translation PDF
Get This Book
Author: Estelle Maryline Delpech
Publisher: John Wiley & Sons
ISBN: 1119002702
Size: 34.39 MB
Format: PDF, Kindle
Category : Computers
Languages : en
Pages : 304
View: 5264

Computer-assisted translation (CAT) has always used translationmemories, which require the translator to have a corpus of previoustranslations that the CAT software can use to generate bilinguallexicons. This can be problematic when the translator does not havesuch a corpus, for instance, when the text belongs to an emergingfield. To solve this issue, CAT research has looked into theleveraging of comparable corpora, i.e. a set of texts, in two ormore languages, which deal with the same topic but are nottranslations of one another. This work had two primary objectives. The first is to assess theinput of lexicons extracted from comparable corpora in the contextof a specialized human translation task. The second objective is toidentify bilingual-lexicon-extraction methods which best match thetranslators’ needs, determining the current limits of thesetechniques and suggesting improvements. The author focuses, inparticular, on the identification of fertile translations, themanagement of multiple morphological structures, and the ranking ofcandidate translations. The experiments are carried out on two language pairs(English–French and English–German) and on specializedtexts dealing with breast cancer. This research puts significantemphasis on applicability – methodological choices are guidedby the needs of the final users. This book is organized in twoparts: the first part presents the applicative and scientificcontext of the research, and the second part is given over toefforts to improve compositional translation. The research work presented in this book received the PhD Thesisaward 2014 from the French association for natural languageprocessing (ATALA).


Using Comparable Corpora For Under Resourced Areas Of Machine Translation

Using Comparable Corpora for Under Resourced Areas of Machine Translation PDF
Get This Book
Author: Inguna Skadiņa
Publisher: Springer
ISBN: 3319990047
Size: 64.97 MB
Format: PDF, Mobi
Category : Computers
Languages : en
Pages : 323
View: 3733

This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.


Building And Using Comparable Corpora

Building and Using Comparable Corpora PDF
Get This Book
Author: Serge Sharoff
Publisher: Springer Science & Business Media
ISBN: 3642201288
Size: 67.28 MB
Format: PDF, Docs
Category : Computers
Languages : en
Pages : 335
View: 1840

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.


Hybrid Approaches To Machine Translation

Hybrid Approaches to Machine Translation PDF
Get This Book
Author: Marta R. Costa-jussà
Publisher: Springer
ISBN: 3319213113
Size: 65.86 MB
Format: PDF, Mobi
Category : Computers
Languages : en
Pages : 205
View: 348

This volume provides an overview of the field of Hybrid Machine Translation (MT) and presents some of the latest research conducted by linguists and practitioners from different multidisciplinary areas. Nowadays, most important developments in MT are achieved by combining data-driven and rule-based techniques. These combinations typically involve hybridization of different traditional paradigms, such as the introduction of linguistic knowledge into statistical approaches to MT, the incorporation of data-driven components into rule-based approaches, or statistical and rule-based pre- and post-processing for both types of MT architectures. The book is of interest primarily to MT specialists, but also – in the wider fields of Computational Linguistics, Machine Learning and Data Mining – to translators and managers of translation companies and departments who are interested in recent developments concerning automated translation tools.


Corpus Based Perspectives In Linguistics

Corpus based Perspectives in Linguistics PDF
Get This Book
Author: Yuji Kawaguchi
Publisher: John Benjamins Publishing
ISBN: 9789027233189
Size: 40.58 MB
Format: PDF, ePub, Mobi
Category : Language Arts & Disciplines
Languages : en
Pages : 439
View: 3913

UBLI has conducted field surveys since 2002 and built spoken language corpora for French, Spanish, Italian (Salentino dialect), Russian, Malaysian, Turkish, Japanese, and Canadian multilinguals. This volume features new research presented at the UBLI second workshop on Corpus Linguistics – Research Domain, which was held on September 14, 2006. The first part consisting of eleven presentations to this workshop shows a wide range of subjects within the area of corpus-based research, such as dictionary, linguistic atlas, dialect, translation, ancient texts, non-standard texts, sociolinguistics, second language acquisition, and natural language processing. The second part of this volume comprises ten additional contributions to both written and spoken corpora by the members and research assistants of UBLI.


Corpus Based Language Studies

Corpus based Language Studies PDF
Get This Book
Author: Tony McEnery
Publisher: Taylor & Francis
ISBN: 9780415286220
Size: 51.63 MB
Format: PDF, ePub, Mobi
Category : Language Arts & Disciplines
Languages : en
Pages : 386
View: 1277

Covering the major approaches to the use of corpus data, this work gathers together influential readings from leading names in the discipline, including Biber, Widdowson, Sinclair, Carter and McCarthy.


Topics In Language Resources For Translation And Localisation

Topics in Language Resources for Translation and Localisation PDF
Get This Book
Author: Elia Yuste Rodrigo
Publisher: John Benjamins Publishing
ISBN: 9027216886
Size: 69.32 MB
Format: PDF
Category : Language Arts & Disciplines
Languages : en
Pages : 220
View: 1231

Language Resources (LRs) are sets of language data and descriptions in machine readable form, such as written and spoken language corpora, terminological databases, computational lexica and dictionaries, and linguistic software tools. Over the past few decades, mainly within research environments, LRs have been specifically used to create, optimise or evaluate natural language processing (NLP) and human language technologies (HLT) applications, including translation-related technologies. Gradually the infrastructures and exploitation tools of LRs are being perceived as core resources in the language services industries and in localisation production settings. However, some efforts ought yet to be made to raise further awareness about LRs in general, and LRs for translation and localisation in particular to a wider audience in all corners of the world. Topics in Language Resources for Translation and Localisation sets out to establish the state of the art of this ever expanding field and underscores the usefulness that LRs can potentially have in the process of creating, adapting, managing, standardising and leveraging content for more than one language and culture from various perspectives.


Translation Driven Corpora

Translation Driven Corpora PDF
Get This Book
Author: Federico Zanettin
Publisher: Routledge
ISBN: 1317639847
Size: 23.99 MB
Format: PDF, Mobi
Category : Language Arts & Disciplines
Languages : en
Pages : 244
View: 4558

Electronic texts and text analysis tools have opened up a wealth of opportunities to higher education and language service providers, but learning to use these resources continues to pose challenges to scholars and professionals alike. Translation-Driven Corpora aims to introduce readers to corpus tools and methods which may be used in translation research and practice. Each chapter focuses on specific aspects of corpus creation and use. An introduction to corpora and overview of applications of corpus linguistics methodologies to translation studies is followed by a discussion of corpus design and acquisition. Different stages and tools involved in corpus compilation and use are outlined, from corpus encoding and annotation to indexing and data retrieval, and the various methods and techniques that allow end users to make sense of corpus data are described. The volume also offers detailed guidelines for the construction and analysis of multilingual corpora. Corpus creation and use are illustrated through practical examples and case studies, with each chapter outlining a set of tasks aimed at guiding researchers, students and translators to practice some of the methods and use some of the resources discussed. These tasks are meant as hands-on activities to be carried out using the materials and links available in an accompanying DVD. Suggested further readings at the end of each chapter are complemented by an extensive bibliography at the end of the volume. Translation-Driven Corpora is designed for use by teachers and students in the classroom or by researchers and professionals for self-learning. It is an invaluable resource for anyone interested in this fast growing area of scholarly and professional activity.


Machine Learning In Translation Corpora Processing

Machine Learning in Translation Corpora Processing PDF
Get This Book
Author: Krzysztof Wolk
Publisher: CRC Press
ISBN: 0429588836
Size: 13.10 MB
Format: PDF, Mobi
Category : Computers
Languages : en
Pages : 264
View: 7693

This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora.


Parallel Corpora For Contrastive And Translation Studies

Parallel Corpora for Contrastive and Translation Studies PDF
Get This Book
Author: Irene Doval
Publisher: John Benjamins Publishing Company
ISBN: 9027262845
Size: 21.64 MB
Format: PDF, ePub, Docs
Category : Language Arts & Disciplines
Languages : en
Pages : 301
View: 4404

This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.