How the concept of Misfit maintains the integrity of the comparative judgement process

RM Compare - setting new standards

In the context of comparative judgement, the term "misfit" refers to a judge or an item that consistently deviates from the expectations of the model being used, such as the Rasch model that is used with RM Compare. Misfit can occur in two main contexts: judge misfit and item misfit.

It's important to note that misfit doesn't necessarily indicate a problem. It could simply reflect a different perspective or interpretation. However, significant misfits can affect the reliability of the comparative judgement process and may need to be addressed

Judge Misfit

A judge is considered to be a misfit when their decisions consistently disagree with the decisions of other judges, particularly when comparing items of similar quality or difficulty. This could indicate that the judge is interpreting the assessment criteria differently, or that they have a different understanding of the quality standards being used. High misfit scores mean that the judge's decisions are significantly different from those of other judges. This isn't always negative and can lead to professional discussions about judges' competencies.

Misfit figures for judges are typically standardized with a mean of 0 and a standard deviation of 1. Judges whose misfit figures are more than two standard deviations above the mean (shown by the dashed red line in the report shown below) are considered to be performing at odds with the other judges and might warrant further investigation. This could suggest that they are judging a slightly different construct, or that their judging behavior is somewhat erratic. In the diagram below we see that there is one judge (marked by orange dot) that falls into this category.

In RM Compare we recommend paying particularly close attention to any judge who is above the Critical Misfit line (marked in solid red) as it is highly likely that there is a significant lack of consensus evident.

RM Compare report showing judge misfit

Item Misfit

Item misfit refers to items that consistently lead to unexpected decisions by judges. This could suggest that the item is problematic in some way, such as being unclear or ambiguous, or not aligning well with the assessment criteria. Like judge misfit, item misfit can be identified by examining the extent to which the decisions about an item deviate from the model's expectations (see 'Digging Deeper' - below).

In the diagram below we can see an Item Misfit Report from an RM Compare session. You will see a few items marked in yellow that might be interesting to look at more closely, however there are no items in this case above the Critical Misfit line.

You may have spotted the two green dots mirroring each other at either end of the scale. Why might this be? More on this in a later post!

RM Compare report showing item misfit

Digging deeper

With RM reporting we can carefully interrogate the performance of Items and Judges throughout the session. In the example below we have filtered on Teacher 40 as they were shown to be misfitting in the earlier report. We can see for example every one of the Judges decisions.

We can get further insights if we have asked Judges to provide Judgement Feedback or Item Feedback. A reminder also that all data is exportable from RM Compare sessions for even deeper analysis and interrogation.

An RM Compare Report showing Judge Reporting

Sources

  1. Critiquing the rationales for using comparative judgement: a call for clarity (Kelly, Richardson, Isaacs 2022)
  2. A comparative judgement approach to teacher assessment (McMahon & Jones)
  3. How do judges in Comparative Judgement exercies make their judgements (Leech & Chambers 2022)
  4. Validity of comparative judgement to assess academic writing (Daal, Lesterhuis, Coertjens, Donche 2016)