Product

Adaptivity - the Comparative Judgement secret sauce

By Mark House

19th may 2022

Adaptivity in Learning and Assessment

Most assessments that students encounter are known as "fixed length assessments" - every candidate gets the same set of questions. Adaptive assessments, on the other hand, are almost always computer-delivered and adjust as they are completed. Questions are presented based on previous answers. Generally a series of correct answers will result in slightly harder questions until the candidate reaches their 'level'. Such methods, it is claimed, are able to pinpoint student proficiency levels with a high degree of efficiency and accuracy. Adaptivity is one of the up-sides energising the world-wide drive to move from paper to on-screen assessment. It isn't just assessment of course where adaptivity offers transformative potential. Student's and teacher's can also benefit from personalised experiences in the learning process.

Comparative Judgement and the problem of scale

The principles and benefits of Comparative Judgement are well understood - you can read about them here. Even without software such as RM Compare, instructors and assessors have been comparing and ranking items by hand forever. In fact, prior to 1792, a team of Cambridge examiners convened at 5pm on the last day of examining, reviewed the 19 papers each student had sat and published their rank order at midnight. It wasn't until the new Proctor of Examinations, William Farish, turned up that marking was introduced. And the rest is history.

One of the reasons for the move away from a comparative approach to marking was the need to scale. There were just too many students and too many papers to make the original process work. Adaptivity unblocks this problem.

Adaptivity in Comparative Judgement - an example

If you sign up for the RM Compare Free Trial you are presented with a couple of completed demo sessions, including the reports. Understanding these helps to appreciate the fundamentals of the adaptivity algorithm powering RM Compare. We will be using the reports from the 'James Rank Order Session'. In this session there were 36 Judges and 131 Items. Judging took place over 16 rounds. A quick reminder:

Judge: The assessor or person responsible for making Judgements or evaluations within RM Compare. For example, in Moderation this would Teachers, and in the Peer Learning it would be the Students.
Item: The Work being assessed by the Judges.
Round: Sessions consist of separate Rounds of Judgement. At the end of each Round RM Compare creates a Rank Order of the Items. This is used to inform the pairs presented to the Judges in the next Round. A Round is defined as when a Judge has at least once seen each Item. The transition between Rounds is not evident to the Judges, but can be tracked by the session Admin.

The image below shows the rank order of items after the first round of judging. At this stage the pairs of items have been presented entirely at random and we have no understanding of relative value.

By round 5 however we are starting to get some understandings. This allows the adaptivity to kick in and instead of presenting random pairs, we are instead able to show them intelligently. We know enough already for example to stop pairing items that at either end of the rank.

The adaptivity dramatically speeds up our efforts to form the rank. In this case, even by the end of round 8 we are seeing high levels of reliability and confidence.

In most RM Compare sessions very high levels of confidence can be achieved in about 10 - 12 rounds. In this case we have achieved a Reliability of 0.86 (+/- 0.03) by the end of the 12th round. RM Compare allows Admins to stop sessions at any time, for example when they have reached a desired level of reliability.

Can the RM Compare Adaptivity algorithm be trusted?

Some concerns were raised in the past about the reliability of Adaptive Comparative Judgement, especially around issues of potential bias. Two important papers have explored this in depth.

Addressing the issue of bias in the measurement of reliability in the method of Adaptive Comparative Judgement - Camila Rangel-Smith, Declan Lynch, 2018

Examining the reliability of Adaptive Comparative Judgement (ACJ) as an assessment tool in educational settings - Richard Kimbell, Goldsmiths, University of London, 2021

What next?

We know something about the effect of adaptivity on learners. Gradually increasing the difficulty of the judging process seems to engage higher order cognitive thinking. This is something being explored by a number of research partners with Learning by Evaluating.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP