During the course, the participants will be introduced to the Slavic corpora that might support language and culture learning, research and work as a translator. The course will give a detailed insight into the types of corpora (e.g. national, reference, synchronic, diachronic, web corpora) as well as methods of working with them (mainly through on-line corpus managers). No former knowledge of language technology is required. The course is most suitable for students who have at least basic knowledge of one Slavic language or who have completed some other course in Slavic linguistics. The course participants should be able to define their language, research and study interests at the beginning of the course so that the choice of the corpora discussed could be adjusted more precisely.
Preliminary course calendar:
07.09
Introduction to corpus linguistics, strengths and limitations of working with corpora
08.09
Regular expressions & query structure. Fundamentals for working with corpus manager. Part 1.
14.09
Discussions: criticism on corpus linguistics. Is negative evidence possible in corpus linguistics?
Regular expressions & query structure. Fundamentals for working with corpus manager. Part 2.
15.09
NoSketch Engine, KonText and their functions. Fundamentals for working with corpus manager. Part 3.
21.09
National corpora and other important monolingual corpora
22.09
National corpora and other important monolingual corpora
28.09
Web as Corpus – useful approach for studies in underresourced Slavic languages
29.09
Slavic Treebanks
05.10
Multilingual corpora: Parallel corpora: Intercorp, Parasol, EU data bases
06.10
Finnish-Slavic corpora
12.10
Other interesting multilingual sources, comparable corpora
13.10
Digital sources for diachronic studies
19.10
Miscellany – dialects, heritage speakers, records, videos etc.
20.10
Final conclusions |