Research

How to enjoy fries on the beach, undisturbed by seagulls: Surprising Truths from a recent ACJ Study

By Mark House

31st oct 2025

A recent study (2025) from Jeffrey Buckley (Technological University of the Shannon) and Caiwei Zhu (Delft University of Technology) set out to answer a critical question for anyone seeking fairness and efficiency in educational assessment: How feasible is Adaptive Comparative Judgement (ACJ) when deployed in real classrooms?

To find out, they recruited 20 industrial designers to evaluate over 200 anonymized design portfolios created by primary and secondary school pupils tackling a real-world problem - how to enjoy fries on the beach, undisturbed by seagulls!!. Using an online ACJ platform, judges compared pairs of student work, making binary holistic judgements, while their decision times and patterns were carefully recorded. The study aimed to uncover whether difficult comparisons took longer, if judge fatigue set in across assessment rounds, and what these findings could mean for scaling ACJ - especially for platforms like RM Compare.

The results? Six key lessons - and each has a clear implication for anyone using RM Compare to deliver reliable, scalable, and transparent assessment.

Six key lessons

1. Difficult Judgements Don’t Take Longer

Learning: Judges spent the same amount of time on “easy” and “difficult” pairings, whether student work was similar in quality or obviously different.
Implication: RM Compare sessions can be scheduled with predictable timing, without bloated buffers for complex comparisons. Reliable planning gets easier.

2. Judge Fatigue Rarely Limits Assessment Quality

Learning: Across dozens of consecutive judgements, most assessors were consistent in speed and reliability. Only a handful displayed minor pacing shifts or signs of fatigue and even then, the trends varied by individual.
Implication: RM Compare is robust for classroom, department, or school-wide rollout. Fatigue is not a fundamental blocker for scalability.

3. Session Timing Is Predictable and Manageable

Learning: With judges averaging around 50 seconds per judgement, ACJ sessions are even faster than previous studies suggested, making the process manageable for busy educators.
Implication: RM Compare lets teachers and leaders plan sessions and reporting workflows with clear, evidence-based expectations reducing guesswork.

4. Algorithmic Efficiency Sets the Ceiling for Scale

Learning: The true limit for scaling ACJ isn’t slow human decision-making, it’s the number of pairings needed to reach robust rankings.
Implication: RM Compare’s ongoing technical advances continue to focus on refining its pairing and ranking algorithms, delivering faster results and lowering costs for users.

5. Intuition Drives Effective Judgement

Learning: Assessors relied on fast, instinctive choice even for challenging calls and not on laborious analysis. Professional heuristics powered reliable outcomes.
Implication: RM Compare continues to design for speed, prioritising intuitive prompts and interfaces that support teachers’ expertise, rather than demanding exhaustive evidence for every decision.

6. Transparency Earns Trust (and Adoption)

Learning: Many teachers and stakeholders prefer traditional rubrics, making ACJ’s less familiar processes a barrier unless the system is clearly explained.
Implication: RM Compare will keep investing in onboarding, help guides, and transparent communication - making algorithms and results understandable for both experts and newcomers. We are also working hard toward a refreshed user experience (news coming soon!).

Bottom line

Buckley and Zhu’s research shows ACJ is not only feasible and efficient, it’s ready for the real demands of today’s assessment environments, when paired with smart technology and strong user support. RM Compare is built to put these lessons into practice, helping educators deliver fast, fair, and transparent judgement at every scale.

Finally, while these results are promising the report notes that "they should be interpreted cautiously. The findings are based on a specific task and sample, and replication in additional studies with varied design contexts, age groups, and judgement volumes is necessary to confirm their generalisability."

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP