- Opinion
Beyond Formula: The New Frontier for Assessment in an AI World

This blog follows up on our earlier reflections on curriculum change and assessment backwash, bringing new insights and urgency to the conversation in the era of AI.
The Washback Risk: What Happens When AI Rewards Formulaic Thinking?
As AI-powered marking becomes more common in education, we must confront a critical risk: the “washback” effect, where the method of assessment shapes how and what students learn. Automated essay scoring powered by AI often rewards essays that adhere closely to expected templates—clear structure, formulaic vocabulary, and standard organization—regardless of whether content is genuinely insightful or original.
This risk isn’t hypothetical. Recent research confirms that AI systems, while remarkably efficient at processing large volumes of essays, fall short when it comes to appreciating creativity, novel reasoning, or unconventional approaches. They’re trained to spot patterns from vast troves of typical responses. As a result, they reward conformity—and can penalise precisely the kind of original thought our future economy and society need most
What the Research Reveals
A wave of recent studies underscores the limitations of AI assessment:
- Surface-Level Marking: AI grades consistently reward structure, grammar, and conformity over depth, originality, or creative argumentation. Essays that mimic high-scoring responses—even if content is shallow—are often rated more highly than risk-taking, thoughtful work.
- Penalizing Originality: Empirical studies show that AI tends to penalise or undervalue truly original or creative responses, especially when those responses deviate from "normed" patterns found in training data.
- Human vs AI Judgement: Direct comparisons in the last two years demonstrate that human markers recognise and reward nuance, complexity, and originality to a much greater degree. Newer "reasoning" AI models narrow the gap slightly, but the risk remains pronounced—particularly for tasks requiring genuine reasoning and analysis.
Lessons from Apple's Landmark Study
This challenge was powerfully highlighted by Apple’s recent study, The Illusion of Thinking. The research exposed that even the most advanced AI models can easily be misled by complexity or unfamiliarity. While standard language models excelled at simple, template-based tasks, they broke down entirely on problems requiring deeper reasoning or adaptation. Even explicit instructions and examples failed to elevate their performance on genuinely complex challenges. In short, AI’s apparent “reasoning” often proved to be little more than pattern-matching, not true understanding
This raises a vital question for educators: If AI can't recognize or reward real thinking, what happens to student motivation, curriculum, and pedagogy?
Washback: Implications for Curriculum and Pedagogy
When assessment tools reward what is easiest for an algorithm to detect—rather than what matters most for learning or life—the effects ripple outward:
- Narrowed Curriculum: Teachers and learners, knowingly or not, adapt to the test. Lessons pivot towards producing what the assessment will reward: formulaic writing and safe, conventional answers.
- Stifled Creativity: The message to students becomes clear: “Play the game, don’t take risks.” Over time, this erodes both engagement and the broader competencies—creativity, critical thinking, problem-solving—needed in an AI-driven world.
- Missed Skills for an AI World: Paradoxically, by leaning into AI’s current strengths, we neglect developing the very skills that make us resilient and relevant. To flourish amid automation and rapid change, learners must cultivate the abilities that AI cannot (yet) replicate: genuine reasoning, creativity, ethical judgement, and adaptability.
If these washback effects are allowed to persist unchallenged, we risk preparing young people for the needs of 2010, not 2035.

The RM Compare Solution: Assessment for Complexity and Creativity
So how can we safeguard against the homogenising effect of AI assessment, while still harnessing its efficiency? The answer lies in approaches that blend human insight and technology—like RM Compare:
- Adaptive Comparative Judgement: Rather than marking against a rigid rubric, RM Compare enables judges (human or hybrid) to compare pairs of student work and decide which best meets the intended educational outcomes. This comparative process values the whole performance, capturing qualities like creativity, originality, and nuance that algorithms alone often miss.
- Authentic Assessment: By reflecting real comparative judgement, the platform helps ensure that what is assessed is what truly matters—supporting positive washback on curriculum and teaching.
- Fairness and Consistency: RM Compare leverages technology to reduce traditional marking biases, while still preserving the human capacity to recognise and reward the exceptional, the original, and the insightful.
- Building Skills for the AI Age: Crucially, RM Compare supports the cultivation—and recognition—of the complex skills young people need to thrive, not just survive, in a world reshaped by artificial intelligence
RM Assessment
RM Assessment, as the parent organisation, is actively embracing AI to enhance assessmentby delivering reliable, consistent, and timely marking for large-scale qualifications. At the same time, RM Assessment recognizes the vital importance of human judgment and the need to nurture skills beyond what AI alone can identify. That’s why they are investing in a broader ecosystem of tools, such as RM Compare, which harnesses adaptive comparative judgement to value the depth, originality, and creativity in learner responses. This balanced approach ensures that while AI supports efficiency and fairness at scale, other solutions are being developed to preserve and amplify the rich, nuanced qualities that are essential to both learning and teaching in the AI era
A Call to Action
As educators, school leaders, and policymakers, we face a pivotal choice: allow our assessment systems to narrow the horizon of what’s possible, or reimagine them to unlock the full breadth of human potential—in partnership with, but not dictated by, AI.
The future demands more than formula. Let’s ensure our learners are prepared.