Training on Student Work: Saying “Yes” (and “No”) Safely (3/4)

By Mark House

19th mar 2026

So far in this series, we’ve argued that student work is creative property, not just “data”, and that assessment use and training use are not the same thing. That naturally leads to a harder question: is it ever acceptable to train models on student work – and if so, on what terms?

You can read the Government Report (March 2026) here.

This post is an attempt to answer that honestly. There are real benefits on the table, but also real risks. The goal is not to ban training outright, nor to treat student work as free fuel, but to sketch a shape where saying “yes” or “no” is a conscious, defensible choice.

Why everyone wants to train on student work

It’s worth admitting the attraction plainly. High‑quality student work and judgement data are some of the most valuable ingredients you could imagine for education AI.

If you want models that give robust feedback on writing, you need authentic scripts across abilities, genres and contexts, together with reliable judgements about quality. If you want to understand equity and bias, you need to see how different groups are judged and how they perform over time. If you want early‑warning systems that spot students who are struggling, you need patterns across tasks, cohorts and years.

You do not get any of that from synthetic examples or generic web text. You get it from real student work, judged by real professionals, in real contexts.

So the instinct to ask “can we train models on this?” is not wrong. But it needs to be balanced against another instinct: “does the fact that we can collect this work for assessment also mean we can repurpose it for training, indefinitely, on our own terms?” That is where caution – and design – come in.

A simple three‑tier model for training use

One way to make sense of this is to separate possible uses into three tiers and be very clear about which tier you are in.

The first tier is where most assessment systems should sit by default. In this tier, student work is used to run the assessment and moderation process: presenting work to markers or judges, calculating scores, quality‑controlling the outcomes, and giving the institution the reports and exports it needs. Work might be stored for a time for audit, appeals or research by the institution itself, but it is not treated as a standing training set for models that live beyond that context. You can think of this as “service only”: the work comes in to be judged; the outputs go back to the institution; and that is the end of the story.

The second tier is where an institution – a trust, a university, an exam board – decides that it wants to train its own models on its own data. Here, student work and judgement data are still not a vendor’s asset. They are an institutional resource. The institution decides, within its own governance and ethics frameworks, to use that resource to train models that will serve its own learners and staff. It might, for example, build a writing feedback assistant tailored to its own curriculum, or a moderation‑support tool for its own markers. The key point is that the decision and ownership sit with the institution, not with a supplier.

The third tier is where data from many institutions is pooled to train models with sector‑wide reach: national feedback models, cross‑trust comparators, tools aimed at improving equity or informing policy. This is where the potential benefits are greatest – and the stakes are highest. It is also where questions about licensing, consent and governance become inescapable. Who decides which data is included? Under what conditions can institutions or individual students opt out? Who owns the resulting models? How is benefit shared back to the sector, rather than simply captured by a vendor?

Nothing in current law gives a simple, “one size fits all” answer to those questions. But the three‑tier framing at least lets us say, explicitly, what we are doing, and to whom the training benefit is meant to accrue.

Conditions for a legitimate “yes”

Within that framework, what would need to be true to make “yes, we will train on student work” feel legitimate?

The first condition is clarity of purpose. It should be obvious why training is being proposed and what problem it is meant to solve. “Because we might want a better model one day” is not good enough. “Because we want a sector‑wide, freely‑available feedback model that every school can use” is at least a meaningful proposition, even if you still decide to say no.

The second is meaningful permission. That doesn’t necessarily mean individual signatures on every script, but it does mean that students, parents and staff are not finding out after the fact that their work has been repurposed. At institutional level, it means boards and senior leaders taking an explicit decision, with a clear understanding of what they are licensing and on what terms. At learner level, it means at least being told clearly what is happening, and in some contexts being offered a genuine way to opt out.

The third is scope and reversibility. Any licence to train should be bounded: in time, in purpose, in who can use the resulting models. It should be possible, at least in principle, to stop future training on new work, or to withdraw from a pooled dataset for future projects. Perfection is not realistic here; some training use is irreversible. But a blanket, perpetual, all‑purposes licence buried in small print is the opposite of what the policy mood is moving toward.

The fourth is benefit back to the people whose work is being used. In education, that is unlikely to be a royalty cheque. More often it will look like: better tools that are made available on fair terms to the institutions that contributed data; stronger insight into curriculum and equity; priority access to improvements. The point is to avoid a situation where student work and professional judgement are used to build models that are then sold back to the sector on whatever terms the vendor chooses.

If those conditions cannot be met, saying “no” – or “not yet” – is not anti‑innovation. It is an honest recognition that the groundwork is not in place.

How we see RM Compare’s role in this

Seen through this lens, RM Compare currently sits firmly in the first tier. Student work comes into the system to be judged. Professional judgement and psychometrics are applied. Outcomes and analytics flow back to institutions. We provide tenancy and export so that those institutions retain control over their assessment assets. We do not, by default, treat that flow of work as a training pipeline for our own models.

Looking ahead, we can imagine playing a supporting role in the second and third tiers – but only under different governance.

At the institutional level, that might mean providing the infrastructure for a trust or board to use its own RM Compare data to train models it owns, hosted in environments it controls. Our role there would be technical and advisory: helping to curate, structure and connect the right data, not claiming rights over it.

At the sector level, it might mean contributing to frameworks where groups of institutions – or a public body – decide to license training on student work for clearly defined purposes: for example, a national, non‑proprietary writing feedback model. If that happens, we think the governance should sit with the sector, not with any single vendor. Again, our role would be as an infrastructure provider and a good citizen, not as the owner of the resulting models.

In all of those scenarios, the default would remain conservative. Training on student work would be something that happens because institutions, and ideally students, have knowingly said “yes” under clear conditions, not because a supplier has quietly decided it is the obvious thing to do.

In RM Compare, our first instinct is not to feed student work into a generic model at all, but to turn professional judgements into reusable rulers – standards that belong to the institutions who created them. Through Hub and our tenancy model, licence holders decide where those rulers live and who can use them. That way, the intelligence you generate is captured and redeployed under your control, long before anyone talks about training wider models.

There are also ways to enjoy the speed of AI without turning student work into a permanent, vendor‑owned training set. One we use in RM Compare is what we call the Validation Layer: AI can do a fast first pass, but human consensus, captured through comparative judgement, acts as the anchor and the “hallucination detector”. That keeps human‑created standards at the centre of the system, which is exactly what current policy thinking is trying to protect.

This three‑tier way of thinking about training isn’t just something we apply inside RM Compare. RM Assessment’s wider AI marking work has to make the same distinctions: between running today’s exams and repurposing past scripts to train tomorrow’s models; between institution‑specific tools and sector‑scale systems. Internally, we are trying to hold a consistent line: any move into training on student work, in any product, should be deliberate, clearly governed, and grounded in explicit agreements with the institutions and learners whose work makes it possible.

Why some existing patterns look risky

Against this backdrop, certain design choices already visible in the market start to look exposed.

If a system routes large volumes of student work into external models for automated assessment, blurs the line between assessment and training use, reserves broad rights to reuse anonymised work for “improvement”, and offers institutions little visibility over where their data goes or what it ultimately powers, it may find the next few years uncomfortable. That is not a moral judgement. It is a simple observation about the direction of travel: toward stronger rights for creators, more emphasis on licensing and transparency, and more scrutiny over how children’s work is used in high‑stakes systems.

We do not think that direction will reverse. If anything, education, with its particular sensitivities around minors, exams and fairness, is likely to see stricter expectations than some other sectors.

Holding the line while the picture settles

As in the rest of this series, it is important to end with some humility. There is no settled template yet for “the right” way to train on student work. The government is still weighing options. Case law has not fully caught up. Different jurisdictions will take different paths.

In that uncertainty, our position is deliberately cautious. We want to explore the benefits of models built on rich, well‑judged student work. But we do not want to get ahead of the legal, ethical and sector consensus. Until that consensus is clearer, we are choosing to treat student work as creative property first and training resource second, and to separate assessment use from training use as sharply as we can.

In the final post in this series, we’ll turn this into something very practical: a short copyright‑focused checklist for anyone buying or reviewing AI‑enabled assessment tools. The aim is not to crown winners and losers, but to help you ask the questions that will matter more and more as AI and copyright continue their uneasy dance.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP