Treasure what you measure - grading with ACJ

Measuring tape

Counting what counts

Perhaps the biggest request we receive from users is with regards to applying grades to session results. This is especially the case in those places where assessment is closely tied to accountability (In England for example). As simple as it sounds there is complexity here and we are working with a number of groups to bottom out what the needs are. We know that these will vary considerably and a having a flexible approach is going to be the way forward. Let's take a look at some current approaches.

Creating a linear scale suitable for grading

Without getting too deep in the statistical weeds there is a challenge of converting the rank order of parameter scores ('true scores') we get from a typical RM Compare session into a linear format ('scaled scores') that can be used for grading. This transformation process requires some consideration, and we anticipate that it's application may require some adjustment to meet end user needs.

The underlying principles are described HERE, including the necessary functionality for you to try it for yourself.

Grading by Script Seeding

Seeding scripts is a simple benchmarking method and is frequently used to apply grades to a rank order of items produced through Comparative Judgement. Typically, before starting a RM Compare judging the session we bring together some standardised exemplars for benchmarking purposes. This might be pieces of work from a previous cohort which has been assessed and graded by an expert for example.

Calibrate the scores from two assessments onto the same scale

Standard RM Compare sessions employ an adaptivity algorithm to intelligently surface pairs of items for judgement – this is called Adaptive Comparative Judgement (ACJ). This approach has a number of benefits including a dramatic improvement in efficiency. However, a Simplified Pairs session needs to remove the adaptivity and instead take control of both the pairing process and the judge allocation.

What next?

We recognise that the 'nuts and bolts' described here are going to have limited interest to the majority of educators who quite rightly are more concerned with matters of learning and progress. A key benefit of grading is that it more simply communicates outcomes - the concept of an A being better than a B is a lot easier to understand than differences in parameter scores. However, we are also conscious that the drive for simplicity by transforming data (For example True scores > Scaled Scores > Reading Ages) can produce challenges of its own, not least being properly able to interpret the transformed grades.

Applying grades to Adaptive Comparative Judgement sessions continues to be an area of considerable focus to our users and ourselves. Get in touch if you would like to get involved.