Product

The 90% Problem: Why "Dip Sampling" Can No Longer Protect Your Provision

By Mark House

27th nov 2025

GenAI Prompt Generator: 2025 Quality Assurance Strategy

👉 How to use: Select your role above. The generated prompt will instruct the AI to design a QA strategy that solves the "90% Problem" by replacing dip sampling with total visibility.

The 2025 apprenticeship assessment reforms have shifted responsibility for quality assurance decisively toward training providers. With the launch of Skills England and new flexibility in assessment delivery, providers are no longer just preparing learners - they are increasingly validating them.

This is a milestone opportunity for the FE sector to own the entire learner journey. But it brings a critical, under-discussed risk: How do you prove consistency when your assessors are spread across the country?

For decades, the sector has relied on a method that was never designed for this level of scrutiny: dip sampling.

The Illusion of Safety

Internal Quality Assurance (IQA) in vocational education has traditionally worked like this: you check 10% of portfolios, spot a few errors, provide feedback to the assessor, and assume the other 90% is fine.

This model made sense when training providers were preparing learners and independent End Point Assessment Organisations (EPAOs) were validating them. The handover created a natural check. If a provider's standards drifted, the EPAO would catch it.

But as the line between training and assessment blurs, that safety net is disappearing. Providers are now marking components of high-stakes assessments themselves. And when the External Quality Assurer (EQA) arrives, they are not just checking your teaching - they are auditing your judgement.

In this new environment, dip sampling is no longer a safety net. It's a blindfold.

The "Postcode Lottery" Problem

Here is the uncomfortable truth: if you are a national provider or large college group, your assessor in Newcastle almost certainly marks differently than your assessor in Bristol.

This is not a training problem. It is a human judgement problem.

Vocational evidence - portfolios, video diaries, practical observations, professional discussions - does not fit neatly into tick-box rubrics. Even when assessors are trained on the same standard, interpretation varies. One assessor sees a "Distinction." Another sees a "Merit." Both believe they are right.

Traditional standardisation meetings happen once or twice a year. By the time you spot the drift, hundreds of learners have already been graded inconsistently. And when the EQA compares portfolios across your sites, the discrepancies are obvious.

The consequences are severe:

Funding clawback for inaccurate achievement claims
Direct Claims Status suspended or revoked
Reputational damage when learners appeal inconsistent grades
Regulatory sanctions from Awarding Organisations or Ofqual

You cannot afford to leave 90% of your quality assurance to chance.

Why Traditional Moderation Does Not Scale

Some providers respond by increasing moderation. More sampling. More meetings. More layers of approval.

But this approach has three fatal flaws:

It is still statistical guesswork
Even if you increase sampling to 20% or 30%, you are still extrapolating from a subset. You cannot prove that the unsampled work meets the same standard. In a high-stakes audit, "we checked 30%" is not a defence.
It is logistically impossible
Bringing assessors together for standardisation is expensive and slow. Travel costs, venue hire, lost delivery time - it adds up fast. And even when you do it, you can only standardise on a handful of sample portfolios. The moment assessors return to their sites, drift begins again.
It does not generate defendable data
When an EQA challenges a grade decision, what evidence do you show them? Meeting notes? A sample script with tick marks? These are not data - they are anecdotes. You need evidence that stands up to regulatory scrutiny.

The Solution: Total Visibility, Not More Sampling

At RM Compare, we have spent years helping global Awarding Organisations solve this exact problem for high-stakes exams. Now, we are bringing that same rigour to the training provider market.

Our approach is built on a simple insight: humans are terrible at absolute judgement, but excellent at relative judgement.

Ask an assessor to score a portfolio against a rubric, and you will get inconsistency. But ask them to compare two portfolios and decide "which is better," and they are remarkably accurate.

This is the foundation of Adaptive Comparative Judgement (ACJ) - the most reliable method for assessing complex, holistic work.

How It Works for Training Providers

Instead of sampling 10% of portfolios and hoping they represent the whole, RM Compare allows you to:

Run National Standardisation Windows Online
Upload sample portfolios from every site. Assessors judge pairs of work anonymously (A vs B). Our algorithm builds a consensus "Quality Ruler" in minutes, instantly identifying which assessors are out of sync with the national standard.

No travel. No logistics. No arguments over borderline scripts.

Upskill New Staff Before They Mark Live Work
Use RM Compare as a "training flight simulator." New assessors judge pre-seeded "Gold Standard" examples and receive instant feedback on their accuracy. Only certify them to mark once they hit a reliability score of 0.9 - the same threshold used by exam boards for high-stakes qualifications.
Generate Audit-Proof Evidence
When an EQA asks why a learner received a Distinction, you do not just show the portfolio. You show the data: "This work was ranked in the top 5% by a consensus of 12 independent professionals with a reliability score of 0.92."

That is an evidence base that protects your provision.

The 2025 Reforms Are a Test

The apprenticeship assessment reforms are not just a policy change - they are a stress test for your quality assurance systems.

Providers with robust, data-driven IQA will thrive. They will secure Direct Claims Status, defend their grades with confidence, and scale quality across multiple sites without increasing cost.

Providers still relying on dip sampling and annual standardisation meetings will struggle. They will face clawbacks, sanctions, and the constant anxiety of not knowing whether their grades will hold up under scrutiny.

The question is not whether the reforms are coming. The question is whether your quality assurance is ready.

What We Have Built for You

Today, RM Compare has launched a dedicated Training & FE Sector, designed specifically to help providers:

Standardise judgement across distributed teams
Upskill new assessors with measurable reliability
Generate defendable evidence for Awarding Organisations and Ofqual

You can explore the new sector here: RM Compare for Training & FE

The 90% You Cannot See

The reforms have shifted the responsibility. The stakes have increased. And the old methods - designed for a different era - are no longer fit for purpose.

If you are still relying on dip sampling to protect your provision, you are not managing risk. You are ignoring it.

The 90% you do not check is where your vulnerabilities live. It is time to see them.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP