Recent initiatives in language technology have lead to the development of at least minimal language processing toolkits for all EU-official languages, as well as for languages with a large number of speakers worldwide such as Chinese and Arabic. This is a big step towards the automatic processing and/or extraction of information, especially from official documents and newspapers, where the standard, literary language is used.
Apart from those official languages, a large number of dialects or closely-related language variants are in daily use, not only as spoken colloquial languages but also in written media and social networks.
Building language resources and tools from scratch is expensive, but the efforts can often be reduced by making use of pre-existing resources and tools for related, resource-richer languages. Examples of language variants include the different variants of Spanish in Latin America, the Arabic dialects in North Africa and the Middle East, German in Germany, Austria and Switzerland, French in France and in Belgium, Dutch in the Netherlands and Flemish in Belgium, etc. Examples of pairs of related languages include Swedish-Norwegian, Bulgarian-Macedonian, Serbian-Bosnian, Spanish-Catalan, Russian-Ukrainian, Irish-Gaelic Scottish, Malay-Indonesian, Turkish–Azerbaijani, Mandarin-Cantonese, Hindi–Urdu, and many other.
This workshop intends to bring together specialists working on LT-Applications dealing with various related language pairs, discuss novel approaches in exploring language closeness, and raise attention on this particular topic. A previous version of this workshop was organised at RANLP 2013 and showed a great interest from communities worldwide as well as the necessity for further activities.
Topics of interest include but are not limited to the following:
- Adaptation of monolingual tools for closely-related languages and language variants
- Case studies of using language resources and tools for standard languages on documents in language variants
- Machine translation among closely related languages
- Evaluation of language resources and tools for language variants and close languages. Linguistic issues in adaptation of language resources and tools (e.g., semantic discrepancies, lexical gaps, false friends)
11 Nov 2013