Exploitation of multilingual resources and tools for Central and (South) Eastern European Languages

Workshop associated with the LREC 2010 conference (17-23 May 2010)

The reconciliation of differences in the availability of language resources and tools for more intensively and less intensively spoken languages has been the main concern of several European initiatives. Central and (South-) Eastern European languages can be the subject of a case study in that respect: integration of diverse languages into a broad language community.

The main result of these initiatives was the increased production of language resources and especially language technology tools for Central, Eastern and Southern European languages in recent years. While monolingual systems have achieved performances comparable to those for intensively studied languages, still a lot of work has to be invested in multilingual tools for applications such as machine translation or cross-lingual information retrieval. At least three major issues have critical influence on the performance of such systems:

  • the availability of the appropriate quantities of data for training and evaluation
  • the analysis of structural linguistic differences among languages so as to be able to improve statistical methods with targeted linguistic knowledge;
  • the availability of knowledge bases for incorporation into language processing systems.

The identification of key aspects of linguistic modelling and resource supply for multilingual technologies involving Central, Eastern and Southern European languages can have impact not only on the local improvement of such systems but also on the overall development of multilingual technologies. The same holds for well established or emerging linguistic knowledge representation frameworks, which can only benefit from embedding components for Central, Eastern and Southern European languages.

