Reaching Across the Isles: UK-LLM Brings AI to UK Languages With NVIDIA Nemotron

Trained on the Isambard-AI supercomputer, a new model developed by University College London, NVIDIA and Bangor University taps NVIDIA Nemotron open-source techniques and datasets to enable AI reasoning for Welsh and other UK languages for public services including healthcare, education and legal resources.

Celtic languages — including Cornish, Irish, Scottish Gaelic and Welsh — are the U.K.’s oldest living languages. To empower their speakers, the UK-LLM sovereign AI initiative is building an AI model based on NVIDIA Nemotron that can reason in both English and Welsh, a language spoken by about 850,000 people in Wales today.

Enabling high-quality AI reasoning in Welsh will support the delivery of public services including healthcare, education and legal resources in the language.

“I want every corner of the U.K. to be able to harness the benefits of artificial intelligence. By enabling AI to reason in Welsh, we’re making sure that public services — from healthcare to education — are accessible to everyone, in the language they live by,” said U.K. Prime Minister Keir Starmer. “This is a powerful example of how the latest AI technology, trained on the U.K.’s most advanced AI supercomputer in Bristol, can serve the public good, protect cultural heritage and unlock opportunity across the country.”

The UK-LLM project, established in 2023 as BritLLM and led by University College London, has previously released two models for U.K. languages. Its new model for Welsh, developed in collaboration with Wales’ Bangor University and NVIDIA, aligns with Welsh government efforts to boost the active use of the language, with the goal of achieving a million speakers by 2050 — an initiative known as Cymraeg 2050.

U.K.-based AI cloud provider Nscale will make the new model available to developers through its application programming interface.

“The aim is to ensure that Welsh remains a living, breathing language that continues to develop with the times,” said Gruffudd Prys, senior terminologist and head of the Language Technologies Unit at Canolfan Bedwyr, the university’s center for Welsh language services, research and technology. “AI shows enormous potential to help with second-language acquisition of Welsh as well as for enabling native speakers to improve their language skills.”

This new model could also boost the accessibility of Welsh resources by enabling public institutions and businesses operating in Wales to translate content or provide bilingual chatbot services. This can help groups including healthcare providers, educators, broadcasters, retailers and restaurant owners ensure their written content is as readily available in Welsh as they are in English.

Beyond Welsh, the UK-LLM team aims to apply the same methodology used for its new model to develop AI models for other languages spoken across the U.K. such as Cornish, Irish, Scots and Scottish Gaelic — as well as work with international collaborators to build models for languages from Africa and Southeast Asia.

“This collaboration with NVIDIA and Bangor University enabled us to create new training data and train a new model in record time, accelerating our goal to build the best-ever language model for Welsh,” said Pontus Stenetorp, professor of natural language processing and deputy director for the Centre of Artificial Intelligence at University College London. “Our aim is to take the insights gained from the Welsh model and apply them to other minority languages, in the U.K. and across the globe.”

Tapping Sovereign AI Infrastructure for Model Development

The new model for Welsh is based on NVIDIA Nemotron, a family of open-source models that features open weights, datasets and recipes. The UK-LLM development team has tapped the 49-billion-parameter Llama Nemotron Super model and 9-billion-parameter Nemotron Nano model, post-training them on Welsh-language data.

Compared with languages like English or Spanish, there’s less available source data in Welsh for AI training. So to create a sufficiently large Welsh training dataset, the team used NVIDIA NIM microservices for gpt-oss-120b and DeepSeek-R1 to translate NVIDIA Nemotron open datasets with over 30 million entries from English to Welsh.

They used a GPU cluster through the NVIDIA DGX Cloud Lepton platform and are harnessing hundreds of NVIDIA GH200 Grace Hopper Superchips on Isambard-AI — the U.K.’s most powerful supercomputer, backed by £225 million in government investment and based at University of Bristol — to accelerate their translation and training workloads.

This new dataset supplements existing Welsh data from the team’s previous efforts.

Capturing Linguistic Nuances With Careful Evaluation

Bangor University, located in Gwynedd — the county with the highest percentage of Welsh speakers — is supporting the new model’s development with linguistic and cultural expertise.

Prys, from the university’s Welsh-language center, brings to the collaboration about two decades of experience with language technology for Welsh. He and his team are helping to verify the accuracy of machine-translated training data and manually translated evaluation data, as well as assess how the model handles nuances of Welsh that AI typically struggles with — such as the way consonants at the beginning of Welsh words change based on neighboring words.

The model, as well as the Welsh training and evaluation datasets, are expected to be made available for enterprise and public sector use, supporting additional research, model training and application development.

“It’s one thing to have this AI capability exist in Welsh, but it’s another to make it open and accessible for everyone,” Prys said. “That subtle distinction can be the difference between this technology being used or not being used.”

Deploy Sovereign AI Models With NVIDIA Nemotron, NIM Microservices

The framework used to develop UK-LLM’s model for Welsh can serve as a foundation for multilingual AI development around the world.

Benchmark-topping Nemotron models, data and recipes are publicly available for developers to build reasoning models tailored to virtually any language, domain and workflow. Packaged as NVIDIA NIM microservices, Nemotron models are optimized for cost-effective compute and run anywhere, from laptop to cloud.

Europe’s enterprises will be able to run open, sovereign models on the Perplexity AI-powered search engine.

Get started with NVIDIA Nemotron.