Product

Reading RM Compare Reports: A Practical, 5 Part Blog Series

By Mark House

27th nov 2025

If you have ever opened an RM Compare standard report, seen a lot of graphs and a reliability number, and thought “Is this good? What do I do now?”, this series is for you. Many users invest time and energy in running sessions, only to feel deflated when reliability looks lower than they hoped or when misfit graphs appear to suggest something has gone wrong. The aim of this series is to turn those moments of frustration into confident, informed decisions about how to run and improve your sessions.

Why Focus on “Session Health”?

RM Compare is built on adaptive comparative judgement, which uses many pairwise comparisons to estimate a shared scale of quality rather than traditional marks. The standard report offers three powerful lenses on what happened in your session: the rank order of items, judge misfit, and item misfit, along with reliability statistics that summarise how consistently the model can distinguish between pieces of work. Taken together, these allow you to judge the overall “health” of a session. It allows us to understand whether the story it tells about quality is strong enough for your purpose, and where you might want to refine design or practice next time.

Thinking in terms of health is more helpful than pass/fail. A session with moderate reliability can still provide valuable insight and rich professional discussion if you understand where judges differed and which items were complex. Equally, a high‑reliability session might still raise questions if, for example, a few judges did most of the work or certain items proved especially contentious. This series will show you how to go beyond a single headline number and read the full story your report is telling.

Who This Series Is For

This series is designed for everyday RM Compare users rather than psychometric specialists. It will be most relevant if you are for example

A teacher or department lead running classroom or departmental ACJ sessions.
A school or trust assessment lead using RM Compare for moderation or standard setting.
A quality assurance or assessment professional using RM Compare as part of a wider assessment strategy.
A researcher using RM Compare data and wanting a clearer practical sense of what the reports are saying.

The posts assume you know how to set up and run a session in RM Compare, but not that you are comfortable interpreting all the graphs and statistics. Where technical ideas are needed (for example, why a Rasch‑style model is used), they will be explained in plain language and linked directly to what you see on screen.

What the Series Will Cover

The series is organised around the three main elements of the standard report, followed by a design‑focused finale:

Part 1 – Understanding RM Compare Reports: Building Assessment Health, Not Chasing a Magic Number
Introduces the idea of session health, explains what rank order, judge misfit, item misfit and reliability represent at a high level, and reframes reports as diagnostic tools rather than verdicts.
Part 2 – Reading the Rank Order: What Story Is Your Session Telling?
Shows you how to read the rank order graph, interpret parameter values and reliability, and decide whether the emerging story about quality is strong enough for your context. This includes guidance on what different reliability ranges usually mean in practice and how reliability grows as judgements accumulate.
Part 3 – Judge Misfit: Making Sense of Agreement and Difference
Explains what judge misfit actually measures, why some misfit is expected and often useful, and how to respond when a judge is flagged. You will see how to use misfit data to support calibration, professional dialogue and judge support, instead of reading it as a simple “good/bad” label.
Part 4 – Item Misfit: Listening to What the Work Is Telling You
Unpacks item misfit as feedback on both the work and the task. You will learn to spot items that are genuinely complex, ambiguous or boundary‑pushing, and how to use them to refine tasks, identify exemplars and deepen shared understanding of what good looks like.
Part 5 – Designing Healthy Sessions from the Start Brings everything together into practical design principles: choosing item numbers, recruiting and briefing judges, setting expectations for reliability, and using real‑time indicators to decide when to continue or stop. This part will also include a simple checklist you can use when planning future sessions.

Each part will include concrete examples, screenshots or sketches of key graphs, and “try this now” suggestions so you can apply the ideas immediately to your own reports.

How to Use the Series

You can dip into individual posts as needed, but there are two recommended routes:

If you are relatively new to RM Compare reporting: Start at Part 1 and read in order. This will give you a coherent mental model before you zoom into judge and item misfit.
If you are already running regular sessions: Scan Part 1 for the “session health” framing, then jump straight to the area where you feel least confident – often judge misfit or item misfit – before coming back to the design ideas in Part 5.

In all cases, the goal is not to turn you into a statistician. It is to help you look at a report and say, with confidence: “I understand what happened in this session, I know whether it is healthy enough for my purpose, and I know what I’d tweak next time.”

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP

Reading RM Compare Reports: A Practical, 5 Part Blog Series

Report Playground

Why Focus on “Session Health”?

Who This Series Is For

What the Series Will Cover

How to Use the Series

Cookies