- Research
Tackling Reliability in Adaptive Comparative Judgement: What RM Compare Users Need to Know

If you’ve been following the evolution of digital assessment, you’ll know that Adaptive Comparative Judgement (ACJ) is transforming how we judge quality—especially with platforms like RM Compare. But you might also have heard about concerns over “inflated reliability statistics.” Is this something to worry about? Let’s look at what the research says, and why RM Compare users can be reassured.
Where Did the Concern Come From?
The question of reliability in ACJ was brought into sharp focus by Tom Bramley’s influential research. In his 2015 Cambridge Assessment report, Bramley demonstrated through simulations that the adaptive algorithms used in ACJ could artificially inflate the Scale Separation Reliability (SSR) statistic—even when the underlying data was random. This meant that, in some scenarios, the reliability numbers could look much better than they truly were, especially if the adaptive process started too early or with too few comparisons per item.
Bramley’s work was a crucial wake-up call for the field, highlighting that while adaptivity made ACJ efficient, it could also introduce “spurious separation” among scripts, making SSR alone an unreliable indicator of true reliability.
How Did the Field Respond?
Professor Richard Kimbell, a leading figure in ACJ research and development, took these findings seriously. In his 2022 paper, Kimbell openly acknowledged the issue, describing how the problem was identified and then addressed in collaboration with software developers—including those behind RM Compare. The adaptive algorithm was refined to mitigate the risk of inflated reliability, and new guidance was put in place to ensure that SSR is interpreted in context, not in isolation.
Kimbell’s perspective is pragmatic and reassuring: innovation in assessment is a journey, and the willingness to identify and fix problems is a hallmark of a robust, transparent system
What Does the Latest Research Show?
The most recent and comprehensive evidence comes from Wang & Zheng (2025), who used RM Compare to assess spoken language proficiency. Their study went beyond SSR, validating reliability with split-half methods and cross-checking results using established tools like FACETS. The findings were clear:
- High SSR values (≥ 0.90) were matched by strong split-half reliability and robust agreement with traditional scoring methods.
- ACJ rankings using RM Compare closely tracked expert judgements and rubric-based scores.
- The study confirmed that the platform’s reliability is not an artifact of the algorithm, but reflects genuine consensus among judges.
Wang & Zheng also highlighted that RM Compare’s ACJ implementation addresses the reliability inflation concerns raised by Bramley, making it suitable even for high-stakes assessment
What Does This Mean for RM Compare Users?
- You can trust the results. The algorithms in RM Compare have been refined and validated by independent research, including the latest work by Wang & Zheng (2025).
- Reliability is multi-faceted. RM Compare supports a range of reliability and validity checks, not just SSR, giving you a fuller picture of assessment quality.
- Continuous improvement. The RM Compare team remains committed to staying at the forefront of research and innovation, ensuring the platform evolves with the evidence.
Looking Ahead
Bramley’s early critique was vital in making ACJ—and RM Compare—even stronger. Thanks to ongoing research and development, users can now be confident that RM Compare delivers results that are both efficient and genuinely reliable.
So, whether you’re running a classroom project or a large-scale assessment, you can be assured that RM Compare stands on a foundation of robust, transparent, and continually improving science.
Stay tuned to the RM Compare blog for more insights and updates as we continue to lead the way in digital assessment.
References
- Bramley, T. (2015). Investigating the reliability of Adaptive Comparative Judgment. Cambridge Assessment Research Report.
- Kimbell, R. (2022). Examining the reliability of Adaptive Comparative Judgement (ACJ) as an assessment tool in educational settings. International Journal of Technology and Design Education, 32(3), 1515-1529.
- Wang, Z., & Zheng, Y. (2025). Assessing intelligibility as conceptualised within the CEFR-companion volume (CV) framework using Adaptive Comparative Judgement.