AI Assessment has a dignity problem - here's how to fix it

AI assessment has a dignity problem. Not a technology problem, or even just a fairness problem, but a problem with how it treats people at precisely the moments they are most exposed and most human.

AI, dignity and “winning the argument”

Rory Sutherland (Vice Chair, Ogilvy) makes a sharp observation about how decisions get made in organisations: most of the time we are not optimising for solving the problem, we are optimising for winning the argument. To get anything through a governance process, you need a clean, logical, ideally numerical case. That subtly changes the question from “What is the right thing to do here?” to “What can I most easily prove is right?”

In assessment, that creates a built‑in bias towards things that are easy to measure. Reliability coefficients. Validity studies. Turnaround times. Cost per script. Number of candidates per assessor hour. These are all important, but they have an unfair advantage in the meeting because they are neat, legible and chart‑friendly.

The things that matter just as much to the humans on the receiving end – dignity, being taken seriously, feeling seen as a person rather than a datapoint – don’t show up so cleanly. You can’t put “this left me feeling humiliated” into a bar chart. So, when those two worlds collide, the hard‑to‑measure human qualities quietly lose the argument to the easy‑to‑measure operational ones.

“Computer says no” in high definition

That is how we end up with AI‑mediated processes that feel like a slick, data‑driven version of “computer says no”. The Little Britain catchphrase was funny because it nailed a familiar experience: a system that is technically correct but utterly uninterested in you as a person. You are reduced to a case that either fits the rule or it doesn’t.

When that mentality creeps into assessment, the stakes are much higher. You are not just being told you can’t book a flight or change a tariff; you are being judged on your worth, your potential, your future. If that judgement arrives via a process that feels opaque, robotic and indifferent, the injury is not just practical, it is to your dignity.

Recent reporting around AI interviews makes this real. Candidates describe the experience as awkward and humiliating, a one‑way interaction with a machine that can’t see nuance, answer clarifying questions or recognise context. Some walk away from processes entirely because the experience feels dehumanising. None of that shows up in the metrics that win the internal argument – but it has consequences in the real world. This is something we have written about before, highlighting the ongoing Mobley vs Workday case in the US.

Why dignity in assessment really matters

This is not just about recruitment. Any time we assess people – students, professionals, grant applicants, creators – we are doing something profoundly human: we are saying “this is what you’re worth in this context” and “this is what your work amounts to.” That always touches identity. It shapes whether people feel they belong, whether they believe they are “the kind of person who can do this.”

When assessment protects dignity:

  • People are more willing to step forward, to take risks, to show their best work.
  • They are more able to hear and act on difficult feedback, because it comes in a context where they feel respected.
  • They are more likely to trust the institution doing the judging – the university, the employer, the professional body.

When assessment erodes dignity:

  • People disengage or self‑select out. Those with other options simply walk away.
  • The experience leaves a residue of resentment or shame that colours their view of the institution.
  • Over time, systems start selecting not just for competence, but for willingness to tolerate being treated as a case.

At scale, that is corrosive. We end up with assessment systems that look efficient on paper but feel inhumane to the people who pass through them. We also risk losing precisely the kinds of people and performances we say we want: unconventional thinkers, those from non‑standard backgrounds, those whose strengths are more holistic and less easily captured in a rigid rubric.

Treasuring what we can measure vs measuring what we treasure

This is where the measurement trap bites. If you only ever get promoted, praised or funded for improving the things that are easiest to measure, you slowly start to treasure what you can measure. Speed, cost per candidate, test reliability, automation rates – these become the de facto definition of “good” assessment, simply because they are the numbers on the slide.

The irony is that most of us, if asked directly, would say we treasure something much richer:

  • Assessments that feel authentic and meaningful.
  • Judgements that reflect holistic quality, not just box‑ticking.
  • Processes that protect dignity, especially when people fall short.

The shift we need – and the one RM Compare is built around – is to start measuring what we treasure, not just treasuring what we can currently measure. That means deliberately designing systems and evidence around the things we actually value: nuanced human judgement, rich performances, and yes, the dignity of the people being assessed.

Where RM Compare fits

RM Compare and Adaptive Comparative Judgement were created for exactly the spaces where rubrics and fully automated scoring struggle: complex, tacit “what good looks like” domains such as portfolios, extended writing, creative work, oracy, professional performance and nuanced recruitment tasks.

A few things are deliberate here:

  • Human at the core: The engine is structured human comparison, not machine replacement of human judgement. Technology organises, scales and analyses human decisions; it doesn’t erase them.
  • Rich evidence, not thin proxies: Instead of reducing performance to a handful of machine‑detectable features, ACJ lets judges work with whole pieces of work in context, capturing quality that is hard to codify but easy for humans to recognise.
  • Experience as a first‑class concern: Well‑designed comparative judgement sessions can feel more like a serious, interactive conversation with a community of professionals than a one‑shot, one‑way interaction with a black box.

None of this is anti‑AI. It’s pro‑human. It says: use Machine Learning and automation to handle the logistics, to support consistency, to make large‑scale human judgement feasible and robust. But don’t outsource the fundamentally human act of saying “this is good work”, “this shows potential”, or “this meets the standard” to a machine that cannot care.

If we do nothing

If we don’t tackle the dignity problem now, the path of least resistance is pretty clear:

  • Automated systems will spread wherever they make the neatest business case, regardless of whether they make the best human sense.
  • Over time, the dominant definition of “good assessment” will narrow to what fits neatly into metrics dashboards.
  • More and more people will experience key life moments – exams, interviews, professional judgements – as “computer says no” experiences.

We will have built systems that are technically impressive, marginally cheaper and faster, and quietly at odds with what we say we believe about people and potential.

RM Compare exists because we think there’s a better way: one where we use powerful technology to scale and strengthen human judgement, while measuring what we actually treasure – including the dignity of the people being assessed.