Δημοσίευση - Discovering Parallel Language Resources for Training MT Engines
ΕΡΕΥΝΑ

Discovering Parallel Language Resources for Training MT Engines

Ερευνητική περιοχή:  
    
Είδος:  
Άρθρο σε πρακτικά

 

Έτος: 2018
Συγγραφείς: Βασίλης Παπαβασιλείου; Προκόπης Προκοπίδης; Στέλιος Πιπερίδης
Τίτλος βιβλίου: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Ημερομηνία: Μάϊος
Περίληψη:
Web crawling is an efficient way for compiling the monolingual, parallel and/or domain-specific corpora needed for machine translation and other HLT applications. These corpora can be automatically processed to generate second order or synthesized derivative resources, including bilingual (general or domain-specific) lexica and terminology lists. In this submission, we discuss the architecture and use of the ILSP Focused Crawler (ILSP-FC), a system developed by researchers of the ILSP/Athena RIC for the acquisition of such resources, and currently being used through the European Language Resource Coordination effort. ELRC aims to identify and gather language and translation data relevant to public services and governmental institutions across 30 European countries participating in the Connecting Europe Facility (CEF).
[Bibtex]