Opinion

Post‑16 pathways reform: three assessment questions AOs and providers need to answer now

By Mark House

10th mar 2026

The government’s post‑16 Level 3 and below reforms are no longer abstract policy; they are now a concrete redesign of the 16–19 landscape. A Levels, T Levels, new V Levels and two reformed Level 2 pathways will replace a crowded field of overlapping qualifications. In that world, assessment quality, standards and progression evidence stop being technical details and become existential questions for awarding organisations and providers.

From my vantage point working on comparative judgement and assessment technology, I keep coming back to three questions that everyone designing or delivering these new pathways should be asking.

1. Standards: how will we set and maintain them in larger, synoptic programmes?

The reforms deliberately move towards a more tightly controlled set of qualification routes, with fewer overlapping options, clearer purposes, and much closer alignment to employer-led occupational standards (especially for technical and vocational routes). That is especially true for V Levels and the new Level 2 Occupational and Further Study pathways, where applied, synoptic tasks, projects and portfolios are likely to play a central role.

That creates an opportunity and a risk.

On the one hand, rich tasks and portfolios can capture the kinds of applied performance employers care about.
On the other, they are harder to mark consistently, harder to standard‑set on, and harder to compare over time than traditional paper‑based tests.

A practical question follows: how will you establish defensible standards on these components and show that they are being maintained as curricula, occupational standards and cohorts evolve?

Comparative judgement offers one useful approach. Instead of trying to define complex mark schemes upfront, you use the collective professional judgement of experts to compare pairs of student work and build a robust rank order. That rank order can then be translated into cut scores, grade boundaries or proficiency levels, and revisited over time as new cohorts and tasks emerge.

For awarding organisations designing V Levels or reworking Level 2 pathways, that means:

using structured comparative judgement studies to set initial standards on new tasks
revisiting those standards periodically to check for drift
and gathering rich evidence about reliability and marker agreement along the way.

For providers, particularly those taking on a “pioneer” role, it means using similar approaches to align internal assessors, moderate across sites and bring new staff quickly into a shared understanding of what good looks like.

2. Progression: how will we know these pathways genuinely support the next step?

The reforms place progression at the centre of the policy story. Level 2 programmes are expected to offer a “clear line of sight” to Level 3 or skilled work; Level 3 programmes are expected to lead to higher education, apprenticeships or employment with real prospects. That is a higher bar than simply delivering a qualification and reporting a pass rate.

If progression is the promise, it needs some kind of evidence base.

There are at least three layers to think about:

Within‑programme progression: does assessment discriminate effectively across the ability range, and can you identify performance thresholds that indicate readiness for the next stage?
Between‑programme progression: for learners on different pathways (for example V Levels and more academic routes), can you say anything meaningful about equivalence of performance or readiness?
Longitudinal progression: do students who perform strongly on key assessment components actually make the transitions the qualification claims to support?

Comparative judgement can contribute to each. When you build rank orders of work from a range of learners and programmes, you begin to see common performance standards emerge. Over time, you can link those judgements to real destinations: HE offers, apprenticeship starts, sustained employment.

For AOs, that makes it possible to design key components whose standards are explicitly tied to progression claims (“work at this level reflects readiness for X”). For providers, it opens up a way to talk about progression that is richer than “X% of students passed” – especially valuable in conversations with governors, parents, employers and inspectors.

3. Accountability: what evidence will we put in front of regulators and funders?

The shift to a smaller, tightly governed qualifications landscape raises the stakes for everyone. When there are fewer funded options in a route, the spotlight on each qualification’s purpose, quality and outcomes becomes harsher. At the same time, providers are being asked to produce formal transition plans and to show that they are ready to deliver reformed programmes well.

That naturally leads to an accountability question: what will you show when someone asks, “How do you know your standards are fair and your learners are being well‑served by this pathway?”

In practice, that might include:

documentation of how standards were originally set on key components
evidence of reliable marking and moderation across centres and over time
studies comparing performance on legacy and reformed qualifications to reassure stakeholders about continuity
and analyses that link assessment outcomes to real progression destinations.

This is where tools like RM Compare sit. Comparative judgement environments can generate:

audit trails of expert decisions used to set standards
reliability statistics that quantify the consistency of those decisions
and flexible studies that compare different tasks, cohorts or qualification versions without needing to force everything into a single, rigid mark scheme.

For regulators and funders, this kind of evidence is more interpretable than black‑box scores. For AOs and providers, it becomes part of a risk‑management strategy: you are not just asserting that your assessments are fair and effective; you are prepared to show how you know.

Bringing it together

If you are an AO, a provider, or somewhere in between (for example, a large college group working closely with awarding partners), you do not need to have all the answers now. But you probably do need to start asking, explicitly:

How will we establish and maintain standards on richer, synoptic assessments?
How will we demonstrate that our pathways really support progression?
What assessment evidence will we use to meet the new accountability expectations?

For me, comparative judgement and platforms like RM Compare are not the whole solution, but they are one of the more powerful and practical tools available to answer those questions with more than assertions.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP

Post‑16 pathways reform: three assessment questions AOs and providers need to answer now

1. Standards: how will we set and maintain them in larger, synoptic programmes?

2. Progression: how will we know these pathways genuinely support the next step?

3. Accountability: what evidence will we put in front of regulators and funders?

Bringing it together

Cookies