- Research
ACJ in Action: What Recent (2025) Research Reveals About Reliable Writing Assessment

Recent research (Gurel et.al 2025) dives into how adaptive comparative judgement (ACJ) performs when used to assess writing—and confirms that this approach is both robust and practical for organisations looking for fairer, more insightful assessment methods. What’s really important about these findings, and how do they connect to using ACJ tools like RM Compare?
How Did the Study Work?
The researchers wanted to see how reliable ACJ is, not just under perfect conditions, but across a range of real-world scenarios:
- Simulations with different group sizes: They created test situations involving 250, 500, and 1000 scripts, simulating how ACJ would operate from a small class to a large cohort.
- Testing when to stop: In these simulations, they experimented with doing more or fewer rounds of judging, exploring whether requiring more comparisons would make the results even more dependable.
- A real-world classroom test: Finally, they brought ACJ into the classroom, having 50 actual student writing samples judged both by ACJ and by experts using traditional mark schemes. This allowed for a direct comparison between the two approaches.
What Did They Find?
- Reliable results every time: No matter the group size or the number of rounds, ACJ consistently delivered high reliability in ranking student work.
- Higher effort means higher confidence—but it’s flexible: Doing more rounds of judging boosted reliability a little further, but the researchers noted that, in practice, it’s up to educators to decide when the results feel “good enough.” The study confirmed that ACJ aligns well with traditional expert judgements, so there’s no worry about missing the mark.
- User control, not rigid automation: Unlike some research systems that automatically stop once a statistical goal is hit, ACJ platforms like RM Compare don’t impose a one-size-fits-all stopping rule. Instead, users get clear feedback after each round (reliability scores and confidence intervals), letting teachers and leaders decide how much judging is right for their unique context.
Why Does This Matter?
- Fairness and trust: ACJ produces results that assessors can trust, and candidates can understand. The rankings are robust even if judging conditions vary.
- Transparency and empowerment: RM Compare and similar platforms put data in assessors’ hands—reliability and confidence updates are always visible, supporting informed decisions about when to finish judging.
- Alignment with good assessment practice: The study shows ACJ is not just an experimental technique—it matches up with established ways of judging, while removing much of the subjectivity risk and workload headaches.
Conclusion
This research, and ACJ itself, show how new technology can bring transparency, flexibility, and genuine professional control to the assessment process. It’s about keeping educators in charge, giving clear evidence for decisions, and supporting the fairest possible outcomes for learners—all without getting lost in complexity or statistics.