Research

ACJ in Action: What Recent (2025) Research Reveals About Reliable Writing Assessment

By Mark House

24th sep 2025

Recent research (Gurel et.al 2025) dives into how adaptive comparative judgement (ACJ) performs when used to assess writing—and confirms that this approach is both robust and practical for organisations looking for fairer, more insightful assessment methods. What’s really important about these findings, and how do they connect to using ACJ tools like RM Compare?

How Did the Study Work?

The researchers wanted to see how reliable ACJ is, not just under perfect conditions, but across a range of real-world scenarios:

Simulations with different group sizes: They created test situations involving 250, 500, and 1000 scripts, simulating how ACJ would operate from a small class to a large cohort.
Testing when to stop: In these simulations, they experimented with doing more or fewer rounds of judging, exploring whether requiring more comparisons would make the results even more dependable.
A real-world classroom test: Finally, they brought ACJ into the classroom, having 50 actual student writing samples judged both by ACJ and by experts using traditional mark schemes. This allowed for a direct comparison between the two approaches.

What Did They Find?

Reliable results every time: No matter the group size or the number of rounds, ACJ consistently delivered high reliability in ranking student work.
Higher effort means higher confidence—but it’s flexible: Doing more rounds of judging boosted reliability a little further, but the researchers noted that, in practice, it’s up to educators to decide when the results feel “good enough.” The study confirmed that ACJ aligns well with traditional expert judgements, so there’s no worry about missing the mark.
User control, not rigid automation: Unlike some research systems that automatically stop once a statistical goal is hit, ACJ platforms like RM Compare don’t impose a one-size-fits-all stopping rule. Instead, users get clear feedback after each round (reliability scores and confidence intervals), letting teachers and leaders decide how much judging is right for their unique context.

Why Does This Matter?

Fairness and trust: ACJ produces results that assessors can trust, and candidates can understand. The rankings are robust even if judging conditions vary.
Transparency and empowerment: RM Compare and similar platforms put data in assessors’ hands—reliability and confidence updates are always visible, supporting informed decisions about when to finish judging.
Alignment with good assessment practice: The study shows ACJ is not just an experimental technique—it matches up with established ways of judging, while removing much of the subjectivity risk and workload headaches.

Conclusion

This research, and ACJ itself, show how new technology can bring transparency, flexibility, and genuine professional control to the assessment process. It’s about keeping educators in charge, giving clear evidence for decisions, and supporting the fairest possible outcomes for learners—all without getting lost in complexity or statistics.

References

Gürel, S., Şahin, M., Uysal, İ., İbileme, A., & Gündüz, T. (2025). Adaptive Selection Algorithm and Standard Error Termination Rule in Comparative Judgement: An Application for Assessing Writing Skills. Education and Science, 50, 93-110.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP