- Opinion
Meritocracy in selection - the efficiency paradox of selecting and hiring in the age of AI (1/3)
Recent analysis from the Financial Times, in their article "The AI Shift: Is hiring becoming less meritocratic?", raises a critical question that resonates across education, corporate recruitment, and competitive admissions: as generative AI becomes ubiquitous in application preparation, how can organisations maintain genuine meritocracy in selection? The answer lies not in fighting AI, but in fundamentally rethinking how we assess talent.
This is the first post in our series examining how generative AI has disrupted traditional assessment signals and what institutions can do to restore fair, merit-based evaluation.
The Problem: Traditional Assessment in an AI-Saturated World
For decades, assessment has relied on a set of trusted signals: CVs that reflect genuine experience, cover letters that showcase authentic voice and motivation, portfolios that demonstrate real creative work, and interviews that reveal true capability and fit. These signals worked because they were difficult to fake at scale and required genuine investment from applicants.
Today, that assumption has collapsed. Generative AI tools can now produce convincing CVs tailored to specific job descriptions, craft compelling cover letters in minutes, generate portfolio concepts, and even help candidates prepare scripted answers to predictable interview questions. The democratisation of AI writing and design tools means that applicants with access to the right tools, not necessarily the most talented or qualified, can now present themselves as strong candidates, regardless of their actual abilities.
This creates a cascade of problems:
The Erosion of Merit Signals: When AI can generate polished application materials indistinguishably from human work, traditional markers of quality become unreliable. Hiring managers and admissions officers can no longer confidently distinguish genuine talent from sophisticated AI mimicry. The signal-to-noise ratio in applications has degraded dramatically.
Increased Bias and Inequality: Paradoxically, while AI promised to democratize opportunity, it has instead created new forms of inequality. Those with access to premium AI tools, digital literacy, or knowledge of how to "prompt engineer" their applications gain unfair advantage. Conversely, highly talented individuals from non-technical backgrounds or regions with limited AI access may be disadvantaged, not because of lack of ability, but because of inability to play the new game.
Algorithmic Gaming and Manipulation: Beyond AI-written content, applicants are increasingly using AI to optimise their submissions for algorithmic screening. Keyword stuffing, strategic formatting, and AI-generated "SEO-friendly" content are designed to pass automated filters. This means the first layer of screening - often AI-driven itself—rewards gaming over genuine quality, filtering out unconventional talent that doesn't fit algorithmic expectations.
Loss of Organizational Trust: Decision-makers report growing uncertainty about the authenticity of applications and the reliability of their own hiring decisions. This erodes confidence in the assessment process itself and raises questions about whether the candidates being selected truly represent the organization's quality standards or are simply those best at AI manipulation.
The Specific Challenge for Creative and Complex Roles
These challenges are particularly acute in creative fields, competitive academic programs, and roles requiring nuanced judgment. Design, architecture, fine arts, and innovation-focused positions fundamentally cannot be assessed through AI-written text alone—they require evaluation of authentic work, creative thinking, and originality. Yet even portfolios, traditionally the stronghold of creative assessment, are now at risk as AI can generate plausible design concepts, assist in portfolio curation, and help present work in strategically compelling ways.
Traditional rubric-based assessment offers no defense here. Rigid criteria like "creativity (0-10 points)" or "technical skill (0-5 points)" cannot capture the nuanced, holistic qualities of exceptional work. Moreover, individual marker judgment - even by experts - is vulnerable to fatigue, bias, personal preference, and the difficulty of maintaining consistent standards across large cohorts of diverse submissions.
A Case Study in Meritocratic Assessment: Caine College
The Interior Architecture and Design program at Utah State University's Caine College faced exactly these challenges. As a highly competitive, nationally accredited program receiving dozens of high-quality applications each year, the program needed an admissions process that could:
- Fairly assess creative portfolios without bias or subjectivity
- Identify genuine creative potential, not just well-executed conventional work
- Scale efficiently without sacrificing quality
- Maintain transparency and defensibility in selection decisions
The program's solution: Adaptive Comparative Judgement using RM Compare.
Instead of applying a standardised rubric to each portfolio, the program enlisted a diverse panel of 18 expert judges, including faculty and alumni from across the nation, to make direct, side-by-side comparisons of portfolios. Each judge was asked a simple, holistic question: "Which portfolio shows the best overall qualifications for the USU Interior Architecture & Design program?" Rather than scoring each work independently, judges simply indicated which of two portfolios they believed was stronger.
The results were transformative:
Dramatically Improved Reliability: The session achieved a reliability coefficient of 0.87 - substantially higher than could be expected from traditional marking. This high reliability emerged despite the inherent subjectivity of creative assessment, because comparative judgment harnesses the collective wisdom of multiple diverse judges, averaging out individual bias and creating a robust consensus ranking.
Authentic Creative Expression: By removing rigid rubrics, candidates were freed to express themselves authentically across diverse media, styles, and approaches. The process didn't penalise unconventional thinking; instead, it allowed authentic creative talent to emerge through expert comparison. Candidates with innovative but non-formulaic approaches were no longer disadvantaged by narrow criteria.
Eliminated Bias and Increased Diversity: Anonymised assessment meant judges could not be influenced by candidate names, backgrounds, or other identity markers. The process focused purely on work quality. As a result, the admitted cohort reflected greater diversity of backgrounds and artistic approaches than had been typical under previous methods.
Efficiency at Scale: The entire process - 67 portfolios, 646 total judgments, each portfolio reviewed 19 times - was completed in just 18 hours of collective work (averaging 16 minutes per portfolio). Notably, the session achieved full reliability by round 12 (approximately 12 comparisons per portfolio), suggesting even greater efficiency is possible in future iterations. This far exceeded the time typically required for traditional portfolio reviews and standardization meetings.
Enhanced Professional Judgment: Judges reported that the comparative process itself deepened their understanding of quality and standards. By seeing the full range of submissions and repeatedly making comparative decisions, assessors developed clearer, more nuanced insights into what constitutes excellence in their discipline - tacit knowledge that was made explicit and shared across the group.
How Comparative Judgement Resists AI Disruption
The Caine College case study reveals why adaptive comparative judgement is uniquely resilient in an AI-disrupted assessment landscape:
Focuses on Authentic Work: Comparative judgment assesses actual portfolios, designs, or creative outputs, not self-reported narratives that can be AI-enhanced. You cannot fake a design portfolio; the work either demonstrates skill and creativity or it doesn't. This shifts assessment from easily manipulable signals back to genuine evidence of capability.
Leverages Expert Judgment Over Algorithms: Rather than relying on fixed criteria or automated scoring, comparative judgment depends on the nuanced, contextual judgment of human experts. Experts can recognize originality, understand intent, and appreciate unconventional approaches in ways that algorithms cannot. This human expertise is far more difficult to game than algorithmic filters.
Builds Consensus Through Diversity: Because multiple diverse judges evaluate each submission, no single assessor's biases or preferences can dominate. One judge's idiosyncratic preferences are balanced by others' perspectives. This collective approach is inherently more resistant to both conscious bias and algorithmic manipulation.
Remains Transparent and Defensible: The process is auditable and explainable. Decision-makers can see the pattern of judgments, understand the reliability achieved, and identify any outlier judges. This transparency makes it difficult to manipulate and easy to defend, which is increasingly important in high-stakes selection contexts.
Adapts to Complexity: Comparative judgment doesn't require assessors to reduce complex work to numerical scores on predefined dimensions. It allows holistic evaluation that captures nuance, creativity, and potential in ways that rigid rubrics cannot. This is essential for assessing work in fields where innovation and non-conformity are valuable.
Broader Implications for Organisations
The shift toward comparative judgement has implications that extend far beyond creative admissions:
Corporate Recruitment: HR leaders can use comparative judgment to assess candidates based on work samples, project portfolios, or problem-solving tasks rather than relying on AI-susceptible resume screening and interview preparation. This is particularly valuable for roles requiring creativity, strategic thinking, or complex problem-solving.
Competitive Academic Programs: Universities can confidently select students based on authentic demonstration of ability and potential rather than test scores, grades, or application materials that may have benefited from AI assistance.
Professional Certification and Licensing: Professions that require demonstrated competence - medicine, law, engineering, design - can shift toward comparative assessment of actual work or case-based judgment, reducing the role of standardized testing and gaming.
Talent Development and Internal Promotion: Organisations can use comparative judgment to fairly and transparently identify high-potential employees for development, leadership programs, or promotion, creating greater perceived fairness and reducing bias in internal talent management.
Regulatory Compliance and Fairness: As regulations around AI in hiring, algorithmic bias, and employment discrimination tighten, organisations that shift toward evidence-based, expert-led comparative judgment can demonstrate a commitment to meritocratic, auditable, and defensible decision-making.
The Broader Context: AI and Education for Collective Intelligence
It's worth noting that this evolution in assessment reflects a larger shift in how organisations think about capability and learning. Recent UNESCO research by Dr Imogen Casebourne and Professor Rupert Wegerif on "AI and Education for collective intelligence" suggests that as AI becomes ubiquitous, the most valuable human capability may not be individual knowledge or skill, but the ability to think, learn, and solve problems collectively. Comparative judgment - fundamentally a process of collective expert reasoning - aligns perfectly with this emerging reality.
The Path Forward
The Financial Times article raises urgent questions about meritocracy in an AI-saturated world. The Caine College case study provides a practical answer: by shifting from AI-vulnerable signals (self-reported narratives, algorithmic screening) to authentic work assessed through collective expert judgment, organizations can restore trust, fairness, and genuine merit-based selection.
RM Compare's adaptive comparative judgement approach is not a rejection of technology or innovation. Rather, it represents a thoughtful evolution of how we use assessment technology. Instead of replacing human judgment with algorithms, it amplifies human expertise, harnesses collective wisdom, and creates assessments that are fair, valid, transparent, and resilient to manipulation.
As AI continues to disrupt traditional assessment signals, forward-thinking organizations - from elite universities to innovative corporations - are embracing comparative judgment. The message is clear: in the age of AI, authentic work and collective expert judgment are becoming the gold standard for fair, meritocratic selection.
The question for your organisation is no longer whether to adapt your assessment practices, but how quickly you can embrace this more robust, fair, and future-proof approach.
About this series
This is the first post in our series examining how generative AI has disrupted traditional assessment signals and what institutions can do to restore fair, merit-based evaluation.
- When Written Applications No Longer Signal Ability: (This post) The research evidence showing how AI has destroyed the informational value of written work
- The AI Shift and the Future of Fair Assessment: Why traditional assessment methods are failing and why comparative judgement offers a robust alternative
- Restoring Trust in Meritocracy: Real-world evidence from institutions successfully implementing fair assessment in the AI era