£1.8m funding for large-scale online resource of contemporary Welsh language

As a leading authority on Welsh language technologies, Bangor University will be participating in a multi-institution project to develop the first mass corpus to capture and inform the past, present and future use of the Welsh language.

Led by Cardiff University’s School of English, Communication and Philosophy, the £1.8 million interdisciplinary, collaborative project, funded by the Economic and Social Research Council (ESRC) and the Arts and Humanities Research Council (AHRC), entitled The National Corpus of Contemporary Welsh, or Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC) will compile an initial data set of 10 million Welsh words.

Commencing in March 2016, the project will run for three and a half years. It will draw on expertise from Bangor, Cardiff, Swansea and Lancaster Universities and break new ground as both a language resource and a model of corpus construction.

Professor Enlli Thomas from Bangor University, already the home of the Welsh National Corpus Portal, will co-lead on the development and evaluation of a dedicated resource for teachers and learners of Welsh that will result from Corpus as part of the research.

Professor Thomas, said: "This work will be a major contribution to the field. It will be a substantial collection of different kinds of language patterns of Welsh speakers - both verbally and in writing - and a living record of the language."

The corpus - a large collection of texts, or a body of written or spoken material for linguistic analysis - will represent Welsh language use across all communication types. This will include spoken, written and digital language, encompassing different genres, language varieties (regional and social) and contexts.

Contributors will be drawn from the 562,000 Welsh speakers in the UK, who will contribute via crowdsourcing digital technologies and community collaboration.

Further detail on the project’s construction and the ways in which users will be able to participate will be shared once it is live in 2016. 

Dr Dawn Knight, from Cardiff University’s School of English, Communication and Philosophy, who is leading the project, said: “What we hope to achieve is the development of the first large-scale living and evolving corpus, representing the Welsh language across communication types and informed by real, current, users of the language.

“We will be engaging with the public in a number of ways, and using new technologies to do so. This is a project about the past, present and future use of the Welsh language and will inform us about variation and change in real language use, such as regional differences or use of mutations over time.

“The project will have a positive impact on the work of translators, publishers, policy-makers, language technology developers and academics, and a bespoke toolkit will be constructed for teachers and learners, integrating basic corpus functionalities for the exploration of language use.”

The range of stakeholders for the project - including the Welsh Government, Welsh Joint Education Committee, Welsh for Adults, Gwasg y Lolfa and University of Wales Dictionary – are representative of the linguistic, cultural and social relevance of the project.

Publication date: 13 October 2015