Restoring Trust in Meritocracy: How organisations Are Fighting Back Against the Assessment Crisis (Blog 3 / 3)

In our first two blog posts, we documented the crisis. Groundbreaking research from Princeton and Dartmouth showing how generative AI has destroyed the signalling value of written applications , and explored why this threatens the very essence of meritocratic systems and demands new assessment approaches . The evidence is overwhelming. The consequences are severe. Traditional written signals no longer work.

But evidence of crisis means little without evidence of solutions. This final post in our series answers the critical question: what does it actually look like when institutions successfully fight back to restore fair, merit-based selection in the age of AI?

The answer, at least in part, might come from real institutions implementing Adaptive Comparative Judgement through RM Compare, achieving measurable improvements in fairness, reliability, stakeholder trust, and most importantly their ability to identify genuine talent rather than AI-assisted performance. These aren't theoretical possibilities. They're documented outcomes from schools, universities, and organisations that refused to accept the erosion of meritocracy.

The Stakes: Why Getting This Right Matters Now

Before examining solutions, we must understand what's at stake when assessment fails to identify genuine merit. The research is unequivocal about the consequences

  • Markets become systematically less meritocratic. The Galdin-Silbert study found that when written signals break down, high ability workers are hired 19% less often while low-ability workers are hired 14% more often - a fundamental reversal of merit-based selection . This isn't a marginal effect. It's a complete inversion of what fair systems should achieve.
  • Trust in credentials evaporates. When students, parents, teachers, and employers perceive that grades and qualifications no longer reflect genuine learning or capability, the entire legitimacy of educational sorting crumbles. Research confirms that belief in school meritocracy serves a system-justifying function. People accept educational hierarchies as fair only when they believe success is earned. Destroy that belief, and you damage not just education but social cohesion itself.
  • Social mobility suffers. Educational meritocracy is the primary mechanism through which talented individuals from disadvantaged backgrounds can advance . When assessment can no longer reliably identify genuine ability - when AI access matters more than actual learning - opportunities increasingly flow to those with privilege rather than potential. The ladder of opportunity doesn't just get harder to climb; it disappears entirely for those who need it most.
  • The democratic bargain breaks. Citizens accept differential outcomes in education, employment, and life chances because they believe those differences reflect genuine merit, and that success is earned through effort and ability . When that perception fails, when people believe the system rewards gaming and privilege over authentic achievement, the foundational legitimacy of democratic societies weakens.

These aren't distant theoretical concerns. They're happening now, in classrooms and hiring decisions worldwide. Every day institutions continue using compromised assessment methods is another day of unfair outcomes, eroded trust, and damaged opportunity.

The question isn't whether to act. It's whether we'll learn from institutions that already have.

Evidence from the Front Lines: Institutions Restoring Meritocracy

Case Study 1: Purdue University—Turning Assessment Into Learning

  • The Challenge: Purdue's Design Thinking in Technology course serves 550 first-year students annually. Traditional assessment created a disconnect. Students received grades but didn't truly understand what distinguished excellent work from mediocre work. This gap undermined learning and made assessment feel arbitrary rather than educational.
  • The Solution: Purdue implemented the largest-ever study of Adaptive Comparative Judgement, using RM Compare in a randomised controlled trial . Half of the students (the treatment group) used RM Compare to evaluate work samples from previous cohorts before completing their own assignments. They compared pairs of anonymised work, making judgments about which better met criteria. Their own completed work was then assessed by faculty using RM Compare's comparative judgement approach.

Read the full case study here.

The Results were striking

  • Seven of the top ten performers came from the RM Compare group, despite random assignment ensuring equal baseline ability across groups.
  • Performance improvements appeared across all ability levels, not just high or low achievers, a rare equity of impact in educational interventions.
  • Students developed tacit knowledge of what quality looks like through the act of comparing and evaluating peer work, internalising standards in ways that rubrics alone could never achieve
  • Teachers freed from routine marking could focus on facilitation, coaching, and supporting struggling students—using their pedagogical expertise more strategically
  • Assessment time was cut in half compared to traditional marking while achieving higher reliability (regularly above 0.90)

Professor Scott Bartholomew, the study's lead researcher, explained: "We found that students who had used RM Compare from the outset performed significantly better than their peers, despite it being a small and easy to implement intervention. We see it as a great opportunity to help our students learn through evaluation - it turns the assessment process into a learning experience".

The Meritocracy Impact: By exposing students to diverse exemplars and helping them internalise quality standards, RM Compare restored the connection between genuine understanding and performance outcomes. Students couldn't game the system through AI generated submissions because they were being assessed on their ability to recognise and produce authentic quality, a fundamentally different and more robust signal of capability

Case Study 2: Caine College of the Arts - Fair Creative Admissions

The Challenge: Utah State University's Department of Art and Design runs a highly competitive, nationally accredited Interior Architecture and Design program. Traditional portfolio assessment faced multiple problems: individual marker bias, difficulty maintaining consistent standards, vulnerability to formulaic submissions optimised for rubric criteria, and the challenge of fairly evaluating unconventional or innovative approaches that didn't fit predetermined boxes.

In the AI era, these problems intensified. Applicants could use AI tools to enhance portfolio presentation, generate design concepts, or craft compelling narratives. The program needed an admissions process that could identify genuine creative potential rather than rewarding those best at AI-assisted portfolio optimisation.

The Solution: Caine College implemented Adaptive Comparative Judgement using RM Compare, enlisting 18 expert judges (faculty and distinguished alumni) from across the nation. Instead of scoring portfolios against rubrics, judges made holistic side-by-side comparisons: "Which portfolio shows the best overall qualifications for the program?"

The Results Were Transformative:

  • Reliability of 0.87 was achieved - substantially higher than traditional portfolio marking and remarkable given the inherent subjectivity of creative work.
  • 67 portfolios received 646 total judgments (each portfolio reviewed 19 times) in just 18 hours of collective work, averaging 16 minutes per portfolio and far exceeding traditional review efficiency.
  • Full reliability was achieved by round 12 (approximately 12 comparisons per portfolio), suggesting even greater efficiency in future iterations.
  • Anonymised assessment eliminated demographic bias, allowing work quality alone to drive decisions.
  • Greater diversity in admitted cohort emerged naturally when unconventional approaches weren't penalized by rigid criteria.
  • Judges reported deeper professional insight from seeing the full range of submissions and repeatedly making comparative decisions, tacit knowledge made explicit and shared.

The Meritocracy Impact: By shifting from rubric-based scoring to holistic comparative judgement of actual work, Caine College created an admissions process that AI cannot game. You cannot fake a portfolio's demonstration of creative thinking, technical skill, and artistic vision when expert judges are comparing your work directly against peers'. The process restored confidence that admitted students possess genuine creative merit rather than AI-enhanced presentation skills

Read the full case study here.

Case Study 5: Amplify Trading - Merit-Based Hiring

The Challenge: Amplify Trading, a financial services firm, needed to identify candidates with genuine ability to analyse complex information and present actionable conclusions, skills that traditional CV and interview processes struggle to assess reliably. In the AI era, written applications and prepared interview responses are increasingly unreliable signals of actual workplace capability.

The Solution: Amplify partnered with RM Compare to use Adaptive Comparative Judgement in a peer assessment format. Candidates completed authentic work tasks demonstrating their analytical and communication capabilities. These work samples were then assessed comparatively by expert judges, creating a reliable rank order based on demonstrated ability rather than self-reported credentials

The Results:

  • Authentic, innovative talent identification that surfaced genuine capability rather than rewarding those best at crafting AI enhanced applications.
  • Reduced bias in hiring decisions through multi-assessor comparative evaluation.
  • Greater confidence in selection decisions based on actual demonstrated performance.
  • Improved candidate experience as applicants could showcase real capability rather than trying to optimise applications for algorithmic screening.

The Meritocracy Impact: By shifting from AI-vulnerable written applications to assessment of authentic work samples, Amplify Trading restored the connection between genuine capability and hiring outcomes, ensuring that talented candidates are identified based on what they can actually do

Read the full case study here.

Common Themes: What Works in Practice

Across these diverse contexts certain principles consistently emerge:

1. Authentic Performance Matters More Than Self-Reported Narratives

Every successful implementation shifted focus from what candidates say about themselves (easily AI-enhanced) to what they can actually demonstrate through authentic work, evaluation of peer work, or practical tasks . This shift is fundamental. You cannot fake the ability to recognize quality in peer work. You cannot AI-generate genuine creative insight that holds up under expert comparative evaluation. You cannot shortcut the development of tacit knowledge that emerges from evaluative practice

2. Collective Expert Judgment Beats Individual Marking

Multi-assessor approaches consistently achieved higher reliability, reduced bias, and greater stakeholder confidence than single marker methods. The collective wisdom of diverse expert judges - their individual biases averaged out, their combined perspective more robust than any one person's view - creates defensible consensus that stakeholders trust.

3. Efficiency and Quality Aren't Opposing Goals

Remarkably, every case study reported time savings alongside quality improvements . Comparative judgement is faster and easier than rubric application while achieving higher reliability . This isn't a trade-off between efficiency and fairness - it's a genuine improvement on both dimensions

4. Transparency Builds Trust

When students, candidates, teachers, and stakeholders can see how consensus emerges from multiple diverse judgments - when the process is auditable and explainable - trust in outcomes increases . Comparative judgement isn't a black box. It's a transparent aggregation of professional expertise that stakeholders can examine, understand, and therefore trust

5. The Process Itself Has Educational Value

Perhaps most importantly, the act of comparing and evaluating work builds professional capability, deepens understanding of quality, and develops the tacit knowledge essential for genuine learning . Assessment becomes learning, not just measurement

Addressing Common Concerns: What About...?

"What About AI-Generated Work Samples?"

This is precisely why comparative judgement is resilient. While AI can generate plausible individual pieces of work, expert judges comparing submissions can identify patterns that reveal AI generation - generic language, lack of personal voice, superficial treatment of complex ideas, disconnection from taught content. Moreover, well-designed authentic tasks requiring integration of classroom experiences, application to specific contexts, or demonstration of process are fundamentally difficult for AI to complete convincingly.

"Is This Just Subjective Opinion?"

No. Comparative judgement achieves reliability coefficients regularly exceeding 0.90, higher than many "objective" marking schemes. The key is that while individual comparisons are judgments, the aggregation of many diverse judgments creates robust consensus that is statistically validated and demonstrably reliable

"What About Workload?"

Every case study reported workload reduction, not increase. The comparative process is faster and less cognitively demanding than rubric application. Judges report finishing work on time, reducing take-home marking, and redirecting saved time to strategic activities rather than mechanical scoring

"Can This Scale?"

Yes! RM Compare operates a tenancy model bringing together Licence Centres and Connectors to facilitate large scale sessions and deployments.

The Broader Picture: Sector-Wide Momentum

These case studies aren't isolated examples. They represent growing momentum across education and employment sectors toward more robust, fair, and AI-resistant assessment:

  • Universities from Purdue to Utah State are adopting comparative judgement for admissions, program assessment, and learning interventions.
  • School partnerships and trusts across the UK and the world are implementing comparative judgement for moderation and collaborative assessment.
  • Professional bodies and qualifications are shifting toward comparative judgement for complex, open-ended assessment.
  • Employers in competitive sectors are using work-sample assessment and comparative evaluation to identify genuine talent.
  • Policy makers and regulators are validating comparative judgement approaches for high-stakes use.

This isn't a fringe movement. It's mainstream adaptation to a changed assessment landscape

What This Means for Your Institution

If you're a school leader, university administrator, or HR professional reading this, the message is clear: you have proven tools available right now to restore fair, merit-based assessment in your context.

You don't need to wait for perfect solutions or sector-wide mandates. You can start with a pilot - a single subject, a specific qualification, or one hiring process - and gather your own evidence of improved outcomes.

The institutions profiled here weren't special. They simply recognised the assessment crisis earlier and acted decisively to address it. They're now reaping benefits: better student outcomes, fairer selection, improved stakeholder trust, reduced workload, and restored confidence that their assessment identifies genuine merit.

Practical Next Steps

For Schools:

  1. Start with a single subject or year group using comparative judgement for moderation or formative assessment.
  2. Engage teachers in comparing student work collaboratively to build shared standards
  3. Measure reliability, gather teacher feedback, and document time savings
  4. Scale successful implementations across subjects and year groups

For Universities:

  1. Pilot comparative judgement for competitive program admissions or complex portfolio assessment
  2. Implement "learning by evaluating" approaches where students develop quality standards through peer comparison
  3. Use comparative judgement for interdisciplinary work or capstone projects that resist conventional rubrics
  4. Build evidence of improved fairness, reliability, and student outcomes

For Employers:

  1. Shift from CV and cover letter screening to work sample assessment for key hiring decisions
  2. Use comparative judgement to evaluate practical tasks, case studies, or portfolio submissions
  3. Reduce reliance on AI-vulnerable written applications and prepared interview responses
  4. Demonstrate commitment to merit-based hiring through transparent, evidence-based selection

The Choice Ahead

The research is definitive: traditional written assessment signals have lost their reliability in the AI era . The consequences of inaction are severe: eroded trust, systemic unfairness, diminished social mobility, and damaged meritocracy.

But this series has shown that the crisis is solvable. Institutions worldwide are successfully implementing Adaptive Comparative Judgement through RM Compare and achieving measurable improvements across all the dimensions that matter:

  • Fairness: Reduced bias, greater equity, transparent consensus
  • Reliability: Coefficients regularly exceeding 0.90
  • Efficiency: Time savings while improving quality
  • Validity: Assessment of authentic capability rather than AI-enhanced presentation
  • Trust: Stakeholder confidence in fair, merit-based outcomes
  • Educational value: Assessment becomes learning, not just measurement

The institutions profiled here aren't waiting for the perfect moment or the complete solution. They're acting now, building evidence, and reaping benefits. They're defending meritocracy when it matters most, when the easy path would be to accept compromised assessment and hope for the best.

Conclusion: From Crisis to Opportunity

This three-part series began with sobering evidence: the collapse of written signalling, the reversal of meritocracy, the urgent threat to fair assessment. We explored why these changes strike at the foundations of educational legitimacy and social mobility. And we've concluded with compelling evidence that institutions can successfully fight back - that proven tools exist to restore fair, reliable, merit-based assessment even in an AI-saturated world.

The assessment crisis is real. But it's also solvable. The question isn't whether we can preserve meritocracy in the age of AI. It's whether we'll choose to do so.

The institutions profiled here have made that choice. They're not waiting for others to act. They're building the future of fair assessment right now, one implementation at a time, accumulating evidence that genuine merit can still be identified, rewarded, and celebrated.

What will your institution choose?

The tools are available. The evidence is documented. The pathway is clear. All that remains is the decision to act and to refuse acceptance of compromised assessment, to demand better for your students or candidates, to insist that genuine ability still matters more than AI-assisted presentation.

In the end, the fight for meritocracy isn't won by perfect solutions or complete transformations. It's won by practical decisions, incremental improvements, and the accumulation of evidence that fairer approaches work better than compromised alternatives.

The institutions profiled here didn't wait for permission. They saw the crisis, they adopted proven tools, and they generated evidence of success. Now they're sharing that evidence so others can follow.

The pathway to restoring trust in meritocracy runs through your next assessment decision. Will it be business as usual, accepting diminished reliability and fairness? Or will it be the first step toward something better, toward assessment that genuinely identifies and rewards merit, toward systems that stakeholders can trust, toward educational practices that fulfill rather than betray the promise of fair opportunity?

The choice, and the opportunity, are yours.

About This Series

This is the third and final post in our series examining how generative AI has disrupted traditional assessment signals and what institutions can do to restore fair, merit-based evaluation.

  1. When Written Applications No Longer Signal Ability: The research evidence showing how AI has destroyed the informational value of written work
  2. The AI Shift and the Future of Fair Assessment: Why traditional assessment methods are failing and why comparative judgement offers a robust alternative
  3. Restoring Trust in Meritocracy (this post): Real-world evidence from institutions successfully implementing fair assessment in the AI era