AI, Copyright and Student Work: Why the UK’s U‑Turn Matters (1/4)

By Mark House

19th mar 2026

For a while, it looked like AI developers might get a broad “free pass” to train on almost anything they could scrape. That idea has now been quietly parked. The UK government’s latest report on copyright and artificial intelligence signals a very different direction – one that matters a lot if you’re responsible for student work and assessment.

The re-think is at least in part in response to the changing legal landscape. For example Encyclopedia Britannica and its Merriam-Webster subsidiary have sued OpenAI in Manhattan federal court for allegedly misusing their reference materials to train its ‌artificial intelligence models.

In this post, I’ll try to translate that shift into plain language for schools, trusts and exam boards, explain why it matters for student work in particular, and be honest about how we’re thinking about it in RM Compare.

From “free text-and-data mining” to “not so fast”

The story so far, in brief:

A few years ago, the UK floated a broad copyright exception that would have allowed text and data mining for any purpose, including commercial AI training, as long as the user had lawful access to the content.
Creators, publishers and rights‑holder groups pushed back hard. They argued that this would effectively strip value from their work, with no realistic way to license or be paid.
The government has now stepped away from that broad exception. Instead of a clear “yes, you can” or “no, you can’t”, they are gathering more evidence, consulting further, and exploring narrower options that put more emphasis on licensing and transparency.

So we don’t have a neat new rule. But we do have a very clear signal: the idea that AI companies can simply help themselves to copyrighted works for training, as long as they can see them, is not where the UK wants to land.

For anyone handling student work, that’s an important backdrop. The direction of travel is away from “free for all” and toward “who owns this, who can license it, and who should benefit?”

Three pillars we can’t ignore

You could summarise the report’s direction in three words that are very relevant to education: control, access, and transparency.

Control

The report leans toward a future where:

The owners of copyrighted works retain meaningful control over whether their works are used for training.
That control is often exercised through licensing – explicit agreements about who can do what, for how long, and on what terms.
Special protection for purely computer‑generated works may be removed, while protection for AI‑assisted human works is preserved, to keep incentives focused on human creativity.

For pupil essays, portfolios and recordings, that implies a simple thing: you can’t treat student work as if it automatically becomes free training material for any AI system that touches it.

Access

At the same time, the government knows that AI systems need large, rich datasets to be useful. If licensing becomes impossibly heavy, innovation will slow or concentrate in a few very large players.

So they are trying to find a balance: create space for licensing deals and sector datasets, without blocking all training. For education, that opens up questions like:

Should there be sector‑level datasets for training education models?
Who would control them – vendors, trusts, government, or some combination?

Transparency

Finally, the report puts a lot of weight on transparency. Users and rights holders should be able to understand, at least in broad terms, what works were used to train a model. There is also talk of labelling requirements for AI‑generated content, and of technical tools to track and manage rights in training data.

In practice, for schools, that points toward being able to answer basic questions from parents and students: “Was my work used to train any models? Which ones? Under what conditions?”

Expect vendors to be far more specific about where student work goes and what it powers.

Our best read, not the final word

We should say up front that none of this is settled. The law around AI and copyright is still evolving, and even the UK government has stepped back from its first ideas to gather more evidence and think again. What follows here is our best read of the direction of travel, not a legal opinion or a final answer. We expect to revise our thinking as policymakers, courts, schools and unions all weigh in. In the meantime, we’re trying to act on a few simple principles that already feel clear:

Treat student work as creative property
Avoid hidden training on that work
Keep institutions in control of the intelligence their professional judgment helps to build.

What this means for student work in assessment

If you bring those three pillars back to a very concrete context – a pile of student scripts being judged online – a few things follow quite quickly.

Student work is not “just data”
Essays, portfolios and recordings are copyrighted works. Students are usually the first owners. Moving them into a digital assessment system doesn’t change that basic fact, and doesn’t automatically grant anyone a training licence.
Assessment use and training use are different
Using student work to mark, moderate and standard‑set – including running the statistics and analytics you need to do that fairly – is one category of use.
Using the same work to train a model that will live on after the assessment – whether that is a feedback model, a grading model or a commercial foundation model – is a different act, and needs a different justification.
Implied, blanket rights will be harder to defend
In a world where government is rowing back from broad exceptions and talking more about licences and transparency, it will get harder for anyone to say: “We can reuse anonymised student work for research, benchmarking and AI training, because it’s in our privacy policy.”

If you’re a school, MAT or exam board, this is the lens we think you’ll increasingly be expected to apply.

How RM Compare is approaching this (for now)

Against that backdrop, we’ve tried to keep our own approach to student work simple and conservative:

Student work is processed in RM Compare to support assessment and moderation – that’s the primary purpose.
We do not treat that work as default training fuel for general‑purpose models. Any move beyond assessment operations would need a separate, explicit agreement with the institution.
Customers retain ownership of their data. We provide tenancy, isolation and export so that institutions can keep, move and reuse their own assessment assets, rather than losing them into a black box.

This is not because we think training on student work is always wrong. Quite the opposite: we can see huge potential benefits in institutional and sector models trained on rich, well‑judged student work. But we don’t think those benefits justify cutting corners on ownership, consent or sovereignty.

It’s also worth saying that RM Compare doesn’t sit in isolation. Our parent organisation, RM Assessment, is actively exploring AI‑supported marking in other parts of the assessment landscape. Those projects face the same copyright and sovereignty questions we’ve outlined here, and we’re working together to apply the same basic commitments: respect student work as creative output, avoid hidden training on that work, and keep institutions in control of how their learners’ data powers any models we build.

Why some designs may find the next few years harder

It is worth noting that a different design pattern has already emerged in AI‑supported assessment. Student scripts are turned into images or text and sent to large, external models for automated assessment. Some vendors reserve broad rights to reuse anonymised extracts for research or product improvement. Also AI does most of the judging work once a school flips an “AI on” switch, with relatively little visible governance around training and reuse.

Those systems may be compliant today. But as copyright policy hardens around control, licensing and transparency, and as education‑specific guidance matures, they will face some tough questions:

Who, exactly, gave you the right to train on student work in that way?
Can students and institutions realistically find out if their work was part of a training set?
If we stop using your service, what happens to models trained in part on our students’ work and judgements?

We don’t think the sector has good answers to those questions yet. But they are coming.

Where we’ll go next in this series

This post has focused on the “why now?” – the UK’s U‑turn on a broad exception and the three pillars we think matter for education: control, access and transparency.

In the next posts, we’ll look at:

Who actually owns student work when AI is in the loop, and what that means for everyday assessment practice.
When training on student work might be legitimate and beneficial, and what safeguards would need to be in place.
A practical checklist you can use when you’re choosing or reviewing AI‑enabled assessment tools.

We won’t pretend to have all the answers. But if we can help schools, trusts and exam boards ask better questions – and design systems that respect student work as creative property – that feels like a good place to start.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP