Adaptivity - the Comparative Judgement secret sauce

RM Compare on a laptop

Adaptivity in Learning and Assessment

Most assessments that students encounter are known as "fixed length assessments" - every candidate gets the same set of questions. Adaptive assessments, on the other hand, are almost always computer-delivered and adjust as they are completed. Questions are presented based on previous answers. Generally a series of correct answers will result in slightly harder questions until the candidate reaches their 'level'. Such methods, it is claimed, are able to pinpoint student proficiency levels with a high degree of efficiency and accuracy. Adaptivity is one of the up-sides energising the world-wide drive to move from paper to on-screen assessment. It isn't just assessment of course where adaptivity offers transformative potential. Student's and teacher's can also benefit from personalised experiences in the learning process.

Comparative Judgement and the problem of scale

The principles and benefits of Comparative Judgement are well understood - you can read about them here. Even without software such as RM Compare, instructors and assessors have been comparing and ranking items by hand forever. In fact, prior to 1792, a team of Cambridge examiners convened at 5pm on the last day of examining, reviewed the 19 papers each student had sat and published their rank order at midnight. It wasn't until the new Proctor of Examinations, William Farish, turned up that marking was introduced. And the rest is history.

One of the reasons for the move away from a comparative approach to marking was the need to scale. There were just too many students and too many papers to make the original process work. Adaptivity unblocks this problem.

Adaptivity in Comparative Judgement - an example

If you sign up for the RM Compare Free Trial you are presented with a couple of completed demo sessions, including the reports. Understanding these helps to appreciate the fundamentals of the adaptivity algorithm powering RM Compare. We will be using the reports from the 'James Rank Order Session'. In this session there were 36 Judges and 131 Items. Judging took place over 16 rounds. A quick reminder:

  1. Judge: The assessor or person responsible for making Judgements or evaluations within RM Compare. For example, in Moderation this would Teachers, and in the Peer Learning it would be the Students.
  2. Item: The Work being assessed by the Judges.
  3. Round: Sessions consist of separate Rounds of Judgement. At the end of each Round RM Compare creates a Rank Order of the Items. This is used to inform the pairs presented to the Judges in the next Round. A Round is defined as when a Judge has at least once seen each Item. The transition between Rounds is not evident to the Judges, but can be tracked by the session Admin.

The image below shows the rank order of items after the first round of judging. At this stage the pairs of items have been presented entirely at random and we have no understanding of relative value.

Round 1 rank

By round 5 however we are starting to get some understandings. This allows the adaptivity to kick in and instead of presenting random pairs, we are instead able to show them intelligently. We know enough already for example to stop pairing items that at either end of the rank.

The rank order after 5 rounds

The adaptivity dramatically speeds up our efforts to form the rank. In this case, even by the end of round 8 we are seeing high levels of reliability and confidence.

Round 8 rank

In most RM Compare sessions very high levels of confidence can be achieved in about 10 - 12 rounds. In this case we have achieved a Reliability of 0.86 (+/- 0.03) by the end of the 12th round. RM Compare allows Admins to stop sessions at any time, for example when they have reached a desired level of reliability.

Round 12 rank

Can the RM Compare Adaptivity algorithm be trusted?

Some concerns were raised in the past about the reliability of Adaptive Comparative Judgement, especially around issues of potential bias. Two important papers have explored this in depth.

Addressing the issue of bias in the measurement of reliability in the method of Adaptive Comparative Judgement - Camila Rangel-Smith, Declan Lynch, 2018

Examining the reliability of Adaptive Comparative Judgement (ACJ) as an assessment tool in educational settings - Richard Kimbell, Goldsmiths, University of London, 2021

What next?

We know something about the effect of adaptivity on learners. Gradually increasing the difficulty of the judging process seems to engage higher order cognitive thinking. This is something being explored by a number of research partners with Learning by Evaluating.