- Opinion
Why Evaluative Judgement is essential in the age of AI: Looking forward with RM Compare

As artificial intelligence (AI) and large language models (LLMs) revolutionise learning and work, Evaluative Judgement—the capability to decide what “good” really means—has never been more important. RM Compare is at the heart of this movement, guiding educators, students, assessors, and designers to thrive in an AI-rich future.
Grasping Evaluative Judgement
Evaluative judgement empowers learners and professionals to distinguish between superficial quality and genuine excellence. In the age of generative AI, this capacity is essential—not just for academic assessment, but for responsible participation in society and the workplace. As Professor Phill Dawson observes, “knowing if AI is giving you garbage requires an understanding of what quality is. This ‘evaluative judgement’ is perhaps the most crucial assessment capability in an AI world”.
Learning by Evaluating: Building Real Expertise
Research led by Scott Bartholomew, Prof Phill Dawson, and others has shown that “learning by evaluating”—where learners repeatedly compare and justify the quality of work—fosters deep reflection, sharper analytical skills, and a habit of justifying decisions with evidence. Through judicious use of comparative judgement platforms like RM Compare, learners move beyond checklists to develop a nuanced, contextual understanding of quality—preparing them for real-world decision-making.
You can learn more about the work of Dr. Scott Bartolomew in this case study, and in the video below.
Stakeholder Impact: Why It Matters
For Educators
Educators’ role is elevated, not diminished, by AI. Their guidance steers learners in constructing professional knowledge, recognising authentic quality, and resisting uncritical adoption of machine outputs.
For Students
It empowers students as well-rounded citizens and future professionals—capable of making, defending, and taking responsibility for quality decisions in a world flooded with rapidly produced, AI-generated work. “AI has widened the gap between our capability to produce work, and our capability to evaluate the quality of that work… but knowing if it is good enough… requires expertise” Xie, Zhang, Wilson 2025
For Assessors and Examiners
Expert human judgement remains key to fair, transparent assessment. As sophisticated as AI becomes, human arbiters ensure that “good” is contextual, authentic, and defensible—not just statistically likely or algorithmically inferred.
For Assessment Designers
Designers must embed opportunities for evaluative judgement across curricula—not just as discrete steps, but throughout entire assessment journeys. Tasks must foster authentic, iterative reflection and comparison, with opportunities for peer and self-critique.
Responsibility in an AI World
With LLMs generating content at scale, responsibility for the outputs lies firmly with their human users. Legal and ethical standards are evolving, but the central principle is clear: “It will not be a defence to argue ‘the machine told me to do it’ when things go wrong; human judgement will become paramount”. Those engaging with AI are expected to:
- Scrutinise, validate, and edit machine-generated content before submission.
- Detect biases, errors, and issues of integrity in both process and product.
- Evidence and take ownership of their inputs and decisions.
As Margaret Bearman, Phill Dawson, and colleagues state: “If evaluative judgement is to remain a uniquely human capability... the development of evaluative judgement is to remain a responsibility held by humans—both educators and learners”.
RM Compare: Your Partner for the Future
RM Compare empowers every user to hone the habits of thoughtful judgement, explicit justification, and shared understanding. Every comparison and reflection equips participants to take responsibility, make credible decisions, and thrive as discerning contributors in a world where AI is a powerful—yet not infallible—partner.