- AI & ML
Blog Series Introduction: Can We Trust AI to Understand Value and Quality?
 
        Introducing the Series: Can We Trust AI to Understand Value and Quality?
- Introduction: Blog Series Introduction: Can We Trust AI to Understand Value and Quality?
- Blog 1: Why the Discrepancy Between Human and AI Assessment Matters—and Must Be Addressed
- Blog 2: Variation in LLM perception on value and quality.
- Blog 3: Who is Assessing the AI that is Assessing Students?
- Blog 4: Building Trust: From “Ranks to Rulers” to On-Demand Marking
- Blog 5: Fairness in Focus: The AI Validation Layer Proof of Concept Powered by RM Compare
- Blog 6: RM Compare as the Gold Standard Validation Layer: The Research Behind Trust in AI Marking
Artificial intelligence is becoming deeply embedded in how we teach, learn, and assess. But as LLMs and automated marking tools step into spaces once reserved for humans, a fundamental question emerges: can AI truly understand what we mean by quality and value?
This question sits at the heart of a new six-part RM Compare blog series exploring what happens when AI joins the judging panel — and what safeguards we need to build trust in its decisions. Across the series, we move from exploring the limits of current models to proposing new frameworks for transparent, validated AI assessment.
Firstly, we look at the Why the Discrepancy Between Human and AI Assessment Matters—and Must Be Addressed. As AI-powered tools increasingly participate in educational assessment, a critical challenge comes into sharp focus: AI judges very differently from humans. This difference is more than technical; it threatens the fairness, trust, and validity of assessment systems if left unaddressed.
The conversation moves on to the Variation in LLM Perception on Value and Quality, where a short study compares AI- and human-created items to test whether LLMs can grasp the same sense of value as human judges. The results reveal variation not only between human and AI perceptions, but also between different LLMs — highlighting how subjective quality remains a uniquely human challenge.
In Who is Assessing the AI that is Assessing Students?, the focus shifts to accountability. If AI systems are contributing to marks and grades, who is making sure those judgements are fair, accurate, and representative? The piece considers the growing importance of “assessing the assessor” in education.
Next, Building Trust: From ‘Ranks to Rulers’ to On-Demand Marking shows how trust can be rebuilt through transparency and reliability. It explores the evolution of RM Compare technology and how on-demand, repeatable judgment can lay the foundation for trusted AI decision-making.
Fairness in Focus: The AI Validation Layer Proof of Concept Powered by RM Compare introduces a practical response — an AI Validation Layer designed to monitor, test, and certify AI decisions before they reach real-world learners. This concept creates a transparent and auditable bridge between AI performance and human trust.
In 2025, the educational assessment sector experienced a step change in the evidence base supporting Comparative Judgement (CJ) as a validation layer—bolstering the RM Compare approach described throughout this paper. Major independent studies, regulatory pilots, and industry-led deployments have converged on the effectiveness, reliability, and transparency that CJ-powered systems provide for AI calibration and human moderation alike. RM Compare as the Gold Standard Validation Layer: Recent Sector Research and Regulatory Evidence
Taken together, these blogs outline both the challenges and opportunities in building a world where AI supports — and is held to — the same standards of fairness and quality that underpin human assessment.
At RM Compare, we believe that trust in AI starts not with replacing human judgement, but with validating it. This series is an invitation to explore what that future could look like.