AI & ML

Blog Series Introduction: Can We Trust AI to Understand Value and Quality?

By Mark House

29th oct 2025

Introducing the Series: Can We Trust AI to Understand Value and Quality?

Introduction: Blog Series Introduction: Can We Trust AI to Understand Value and Quality?
Blog 1: Why the Discrepancy Between Human and AI Assessment Matters—and Must Be Addressed
Blog 2: Variation in LLM perception on value and quality.
Blog 3: Who is Assessing the AI that is Assessing Students?
Blog 4: Building Trust: From “Ranks to Rulers” to On-Demand Marking
Blog 5: Fairness in Focus: The AI Validation Layer Proof of Concept Powered by RM Compare
Blog 6: RM Compare as the Gold Standard Validation Layer: The Research Behind Trust in AI Marking

Artificial intelligence is becoming deeply embedded in how we teach, learn, and assess. But as LLMs and automated marking tools step into spaces once reserved for humans, a fundamental question emerges: can AI truly understand what we mean by quality and value?

This question sits at the heart of a new six-part RM Compare blog series exploring what happens when AI joins the judging panel — and what safeguards we need to build trust in its decisions. Across the series, we move from exploring the limits of current models to proposing new frameworks for transparent, validated AI assessment.

Firstly, we look at the Why the Discrepancy Between Human and AI Assessment Matters—and Must Be Addressed. As AI-powered tools increasingly participate in educational assessment, a critical challenge comes into sharp focus: AI judges very differently from humans. This difference is more than technical; it threatens the fairness, trust, and validity of assessment systems if left unaddressed.

The conversation moves on to the Variation in LLM Perception on Value and Quality, where a short study compares AI- and human-created items to test whether LLMs can grasp the same sense of value as human judges. The results reveal variation not only between human and AI perceptions, but also between different LLMs — highlighting how subjective quality remains a uniquely human challenge.

In Who is Assessing the AI that is Assessing Students?, the focus shifts to accountability. If AI systems are contributing to marks and grades, who is making sure those judgements are fair, accurate, and representative? The piece considers the growing importance of “assessing the assessor” in education.

Next, Building Trust: From ‘Ranks to Rulers’ to On-Demand Marking shows how trust can be rebuilt through transparency and reliability. It explores the evolution of RM Compare technology and how on-demand, repeatable judgment can lay the foundation for trusted AI decision-making.

Fairness in Focus: The AI Validation Layer Proof of Concept Powered by RM Compare introduces a practical response — an AI Validation Layer designed to monitor, test, and certify AI decisions before they reach real-world learners. This concept creates a transparent and auditable bridge between AI performance and human trust.

In 2025, the educational assessment sector experienced a step change in the evidence base supporting Comparative Judgement (CJ) as a validation layer—bolstering the RM Compare approach described throughout this paper. Major independent studies, regulatory pilots, and industry-led deployments have converged on the effectiveness, reliability, and transparency that CJ-powered systems provide for AI calibration and human moderation alike. RM Compare as the Gold Standard Validation Layer: Recent Sector Research and Regulatory Evidence

Taken together, these blogs outline both the challenges and opportunities in building a world where AI supports — and is held to — the same standards of fairness and quality that underpin human assessment.

At RM Compare, we believe that trust in AI starts not with replacing human judgement, but with validating it. This series is an invitation to explore what that future could look like.

Read the full White Paper: Beyond Human Moderation: The Case for Automated AI Validation in Educational Assessment

View PDF

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP

Blog Series Introduction: Can We Trust AI to Understand Value and Quality?

Introducing the Series: Can We Trust AI to Understand Value and Quality?

Read the full White Paper: Beyond Human Moderation: The Case for Automated AI Validation in Educational Assessment

Cookies