The economical and political changes during the last years in Europe have led to a major shift of activities in the language technology area. On one side the access for all citizens to global information has become a real demand, on the other side the specific European context with a big number of national and minority languages and the exponential number of documents appearing every day, makes this desiderata a real challenge.
Language Processing is now seen as the main technology being able to give people access to information (no matter where it was produced) in their own languages. Despite important developments, the infrastructure in terms of languages resources (data and tools) for less spoken languages , (especially Balkan and Slavic languages) are still far behind the achieved standard for major west European ones.
As most part of the current LT-Applications rely on data-driven methods, one major drawback in the development of language resources for these languages is the lack of training and evaluation data , as well as reference systems for comparing results. Well-known corpora like JRC-ACQUIS and OPUS, although a significant step forward,:
- still do not cover all languages in Balkan area
- are collection of documents in specialised language and therefore decrease the performance of systems trained on this data when testing on other domains and registers.
In order to shorten this bottleneck, it is necessary to develop, promote and make available data which can be used for training and evaluation. In addition, it is important to know which systems have been developed for which applications, on which data were tested and with which evaluation results.
The aim of the current workshop is to make a first step in this direction.
02 May 2009