- Opinion
What does a four‑year‑old Clumber Spaniel named Bruin have in common with Olympic breakdancing and modern examinations?
What does a four‑year‑old Clumber Spaniel named Bruin have in common with Olympic breakdancing and modern examinations?
By Lara - originally posted to Flickr as Clumber Spaniel, CC BY-SA 2.0, https://commons.wikimedia.org/w/index.php?curid=8708645
On the surface, nothing at all. Bruin is a gentle, long‑backed gundog who has just trotted his way to Best in Show at Crufts, padding calmly down the famous green carpet while the NEC holds its breath. Breakers will spin and freeze their way across an Olympic floor to pounding music. Examiners sit alone with stacks of scripts and detailed mark schemes. Three very different worlds, three very different kinds of performance.
Look at how they are judged, though, and you start to see the same pattern.
All three live on reassuring talk of “standards”. Dog shows have breed standards: pages of prose describing the ideal Clumber Spaniel – proportions, movement, coat, temperament. Olympic breaking has official criteria for difficulty, creativity, musicality and execution. Exam systems are built on mark schemes and grade descriptors that promise to turn messy human work into neat numerical decisions. On paper, it all looks comfortingly objective.
But when the pressure is on, nobody is really working their way line‑by‑line through a checklist.
In the Crufts ring, the Best in Show judge stands back and looks at seven champions together. Bruin is not being inspected in isolation; he is being seen against six other group winners who are all, in their own ways, close to perfect. The judge sends them round the ring, watches how they move, comes back to one or two to look again at balance, presence and temperament, and quietly builds a mental rank order. In the end, Bruin is not “9 out of 10 against the Clumber standard” so much as “the best of this exceptional seven tonight”.
Olympic breakdancing will be no different. There will be handbooks and score sheets, but when two breakers finish a battle, the judges will still be doing what humans always do with rich performances: asking themselves which dancer, in this moment, felt stronger. Which routine flowed better with the music, which risks paid off, whose style felt more convincing. The numbers they record are a translation of a fundamentally comparative act.
Modern examinations work hard to hide the same move.
We train examiners to apply mark schemes faithfully, to match snippets of work to levels and descriptors, to avoid “thinking comparatively” across a script batch. We moderate and standardise to keep everyone in line. All of that matters. But in the marking room, the cognitive reality is much closer to Bruin and the breakers than we like to admit. Examiners constantly compare: this essay feels stronger than the last one; that answer is right on the boundary; this script sits just below the exemplar in my mind.
Donald Laming’s old argument – that there is no such thing as absolute judgement, only comparisons – is alive in all three worlds: in Bruin’s steady trot around the Crufts ring, in a b‑girl’s freeze on the Paris floor, and in a tired examiner deciding whether to nudge a script into the higher grade.
Once you see that, two questions follow.
The first is uncomfortable: if our supposedly “objective” systems are built on comparative human judgements, what happens when those comparisons are narrow, inconsistent or opaque? In the show ring, Best in Show rests on a single person’s taste and experience. On the dance floor, a small panel’s decisions can make or break an athlete’s Olympic dream. In the exam system, one examiner’s internal ladder of scripts can set a student’s life chances. When people feel that their own internal ranking doesn’t match the official outcome, trust evaporates quickly.
The second question is more hopeful: if comparison is how humans naturally judge complex quality, how do we design systems that lean into that fact instead of fighting it?
Dog shows and breaking give one answer: embrace expertise and accept that not everything can be boiled down to a simple rubric. Comparative judgement in assessment offers another answer: keep the human comparisons, but spread them across many judges and many pairings, and use a model to turn lots of small, simple “which is better?” decisions into stable, transparent outcomes.
Adaptive Comparative Judgement
That basic idea is what sits underneath Adaptive Comparative Judgement and under the technology we’ve been building in RM Compare. Instead of pretending that examiners can make perfectly absolute decisions against a mark scheme, we ask them to do the thing they are already good at: look at two pieces of work and choose the better one for a given construct. We then repeat that process at scale, across many judges and many pairings, and let the system infer a robust rank order and scale. The human expertise doesn’t go away; it is amplified and organised.
This isn’t a new thought experiment. Twenty years ago, at an international assessment conference, Alastair Pollitt stood up and suggested that we should stop marking exams in the traditional way and move towards comparative approaches for complex performances. At the time it sounded radical: the theory was persuasive, but the infrastructure wasn’t ready. Two decades, a pandemic, an AI boom and a lot of quiet engineering later, we are much closer to being able to treat comparative judgement not as a curiosity, but as ordinary assessment practice.
Bruin, the breakers and our students are all, in their own domains, relying on us to get that design right. The spaniel in the green ring may only be chasing rosettes, but the same invisible machinery of human judgement is at work when we hand out medals and grades. The more honest we are about how that machinery really operates – comparative, contextual, human – the better chance we have of building systems that are not just technically defensible, but genuinely worthy of trust.