Introducing a new 4 part Blog Series: Can We Trust AI to Understand Value and Quality?

Introducing the Series: Can We Trust AI to Understand Value and Quality?

Artificial intelligence is becoming deeply embedded in how we teach, learn, and assess. But as LLMs and automated marking tools step into spaces once reserved for humans, a fundamental question emerges: can AI truly understand what we mean by quality and value?

This question sits at the heart of a new four-part RM Compare blog series exploring what happens when AI joins the judging panel — and what safeguards we need to build trust in its decisions. Across the series, we move from exploring the limits of current models to proposing new frameworks for transparent, validated AI assessment.

The conversation begins with Variation in LLM Perception on Value and Quality, where a short study compares AI- and human-created items to test whether LLMs can grasp the same sense of value as human judges. The results reveal variation not only between human and AI perceptions, but also between different LLMs — highlighting how subjective quality remains a uniquely human challenge.

In Who is Assessing the AI that is Assessing Students?, the focus shifts to accountability. If AI systems are contributing to marks and grades, who is making sure those judgements are fair, accurate, and representative? The piece considers the growing importance of “assessing the assessor” in education.

Next, Building Trust: From ‘Ranks to Rulers’ to On-Demand Marking shows how trust can be rebuilt through transparency and reliability. It explores the evolution of RM Compare technology and how on-demand, repeatable judgment can lay the foundation for trusted AI decision-making.

Finally, Fairness in Focus: The AI Validation Layer Proof of Concept Powered by RM Compare introduces a practical response — an AI Validation Layer designed to monitor, test, and certify AI decisions before they reach real-world learners. This concept creates a transparent and auditable bridge between AI performance and human trust.

Taken together, these blogs outline both the challenges and opportunities in building a world where AI supports — and is held to — the same standards of fairness and quality that underpin human assessment.

At RM Compare, we believe that trust in AI starts not with replacing human judgement, but with validating it. This series is an invitation to explore what that future could look like.

Read the full White Paper: Beyond Human Moderation: The Case for Automated AI Validation in Educational Assessment

View PDF