Reporting

How the concept of Misfit maintains the integrity of the comparative judgement process

By Mark House

8th jan 2024

RM Compare - setting new standards

In the context of comparative judgement, the term "misfit" refers to a judge or an item that consistently deviates from the expectations of the model being used, such as the Rasch model that is used with RM Compare. Misfit can occur in two main contexts: judge misfit and item misfit.

It's important to note that misfit doesn't necessarily indicate a problem. It could simply reflect a different perspective or interpretation. However, significant misfits can affect the reliability of the comparative judgement process and may need to be addressed

Judge Misfit

A judge is considered to be a misfit when their decisions consistently disagree with the decisions of other judges, particularly when comparing items of similar quality or difficulty. This could indicate that the judge is interpreting the assessment criteria differently, or that they have a different understanding of the quality standards being used. High misfit scores mean that the judge's decisions are significantly different from those of other judges. This isn't always negative and can lead to professional discussions about judges' competencies.

Misfit figures for judges are typically standardized with a mean of 0 and a standard deviation of 1. Judges whose misfit figures are more than two standard deviations above the mean (shown by the dashed red line in the report shown below) are considered to be performing at odds with the other judges and might warrant further investigation. This could suggest that they are judging a slightly different construct, or that their judging behavior is somewhat erratic. In the diagram below we see that there is one judge (marked by orange dot) that falls into this category.

In RM Compare we recommend paying particularly close attention to any judge who is above the Critical Misfit line (marked in solid red) as it is highly likely that there is a significant lack of consensus evident.

Item Misfit

Item misfit refers to items that consistently lead to unexpected decisions by judges. This could suggest that the item is problematic in some way, such as being unclear or ambiguous, or not aligning well with the assessment criteria. Like judge misfit, item misfit can be identified by examining the extent to which the decisions about an item deviate from the model's expectations (see 'Digging Deeper' - below).

In the diagram below we can see an Item Misfit Report from an RM Compare session. You will see a few items marked in yellow that might be interesting to look at more closely, however there are no items in this case above the Critical Misfit line.

You may have spotted the two green dots mirroring each other at either end of the scale. Why might this be? More on this in a later post!

Digging deeper

With RM reporting we can carefully interrogate the performance of Items and Judges throughout the session. In the example below we have filtered on Teacher 40 as they were shown to be misfitting in the earlier report. We can see for example every one of the Judges decisions.

We can get further insights if we have asked Judges to provide Judgement Feedback or Item Feedback. A reminder also that all data is exportable from RM Compare sessions for even deeper analysis and interrogation.

An RM Compare Report showing Judge Reporting

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP

How the concept of Misfit maintains the integrity of the comparative judgement process

RM Compare - setting new standards

Judge Misfit

Subscribe to our newsletter

Item Misfit

Digging deeper

Got more questions?

Sources

Cookies