The 90% Problem: Why "Dip Sampling" Can No Longer Protect Your Provision

The 2025 apprenticeship assessment reforms have shifted responsibility for quality assurance decisively toward training providers. With the launch of Skills England and new flexibility in assessment delivery, providers are no longer just preparing learners - they are increasingly validating them.

This is a milestone opportunity for the FE sector to own the entire learner journey. But it brings a critical, under-discussed risk: How do you prove consistency when your assessors are spread across the country?

For decades, the sector has relied on a method that was never designed for this level of scrutiny: dip sampling.

The Illusion of Safety

Internal Quality Assurance (IQA) in vocational education has traditionally worked like this: you check 10% of portfolios, spot a few errors, provide feedback to the assessor, and assume the other 90% is fine.

This model made sense when training providers were preparing learners and independent End Point Assessment Organisations (EPAOs) were validating them. The handover created a natural check. If a provider's standards drifted, the EPAO would catch it.

But as the line between training and assessment blurs, that safety net is disappearing. Providers are now marking components of high-stakes assessments themselves. And when the External Quality Assurer (EQA) arrives, they are not just checking your teaching - they are auditing your judgement.

In this new environment, dip sampling is no longer a safety net. It's a blindfold.

The "Postcode Lottery" Problem

Here is the uncomfortable truth: if you are a national provider or large college group, your assessor in Newcastle almost certainly marks differently than your assessor in Bristol.

This is not a training problem. It is a human judgement problem.

Vocational evidence - portfolios, video diaries, practical observations, professional discussions - does not fit neatly into tick-box rubrics. Even when assessors are trained on the same standard, interpretation varies. One assessor sees a "Distinction." Another sees a "Merit." Both believe they are right.

Traditional standardisation meetings happen once or twice a year. By the time you spot the drift, hundreds of learners have already been graded inconsistently. And when the EQA compares portfolios across your sites, the discrepancies are obvious.

The consequences are severe:

  • Funding clawback for inaccurate achievement claims
  • Direct Claims Status suspended or revoked
  • Reputational damage when learners appeal inconsistent grades
  • Regulatory sanctions from Awarding Organisations or Ofqual

You cannot afford to leave 90% of your quality assurance to chance.

Why Traditional Moderation Does Not Scale

Some providers respond by increasing moderation. More sampling. More meetings. More layers of approval.

But this approach has three fatal flaws:

  1. It is still statistical guesswork
    Even if you increase sampling to 20% or 30%, you are still extrapolating from a subset. You cannot prove that the unsampled work meets the same standard. In a high-stakes audit, "we checked 30%" is not a defence.
  2. It is logistically impossible
    Bringing assessors together for standardisation is expensive and slow. Travel costs, venue hire, lost delivery time - it adds up fast. And even when you do it, you can only standardise on a handful of sample portfolios. The moment assessors return to their sites, drift begins again.
  3. It does not generate defendable data
    When an EQA challenges a grade decision, what evidence do you show them? Meeting notes? A sample script with tick marks? These are not data - they are anecdotes. You need evidence that stands up to regulatory scrutiny.

The Solution: Total Visibility, Not More Sampling

At RM Compare, we have spent years helping global Awarding Organisations solve this exact problem for high-stakes exams. Now, we are bringing that same rigour to the training provider market.

Our approach is built on a simple insight: humans are terrible at absolute judgement, but excellent at relative judgement.

Ask an assessor to score a portfolio against a rubric, and you will get inconsistency. But ask them to compare two portfolios and decide "which is better," and they are remarkably accurate.

This is the foundation of Adaptive Comparative Judgement (ACJ) - the most reliable method for assessing complex, holistic work.

How It Works for Training Providers

Instead of sampling 10% of portfolios and hoping they represent the whole, RM Compare allows you to:

  1. Run National Standardisation Windows Online
    Upload sample portfolios from every site. Assessors judge pairs of work anonymously (A vs B). Our algorithm builds a consensus "Quality Ruler" in minutes, instantly identifying which assessors are out of sync with the national standard.

No travel. No logistics. No arguments over borderline scripts.

  1. Upskill New Staff Before They Mark Live Work
    Use RM Compare as a "training flight simulator." New assessors judge pre-seeded "Gold Standard" examples and receive instant feedback on their accuracy. Only certify them to mark once they hit a reliability score of 0.9 - the same threshold used by exam boards for high-stakes qualifications.
  2. Generate Audit-Proof Evidence
    When an EQA asks why a learner received a Distinction, you do not just show the portfolio. You show the data: "This work was ranked in the top 5% by a consensus of 12 independent professionals with a reliability score of 0.92."

That is an evidence base that protects your provision.

The 2025 Reforms Are a Test

The apprenticeship assessment reforms are not just a policy change - they are a stress test for your quality assurance systems.

Providers with robust, data-driven IQA will thrive. They will secure Direct Claims Status, defend their grades with confidence, and scale quality across multiple sites without increasing cost.

Providers still relying on dip sampling and annual standardisation meetings will struggle. They will face clawbacks, sanctions, and the constant anxiety of not knowing whether their grades will hold up under scrutiny.

The question is not whether the reforms are coming. The question is whether your quality assurance is ready.

What We Have Built for You

Today, RM Compare has launched a dedicated Training & FE Sector, designed specifically to help providers:

  • Standardise judgement across distributed teams
  • Upskill new assessors with measurable reliability
  • Generate defendable evidence for Awarding Organisations and Ofqual

You can explore the new sector here: RM Compare for Training & FE

The 90% You Cannot See

The reforms have shifted the responsibility. The stakes have increased. And the old methods - designed for a different era - are no longer fit for purpose.

If you are still relying on dip sampling to protect your provision, you are not managing risk. You are ignoring it.

The 90% you do not check is where your vulnerabilities live. It is time to see them.