Reading RM Compare Reports: A Practical, 5 Part Blog Series

If you have ever opened an RM Compare standard report, seen a lot of graphs and a reliability number, and thought “Is this good? What do I do now?”, this series is for you. Many users invest time and energy in running sessions, only to feel deflated when reliability looks lower than they hoped or when misfit graphs appear to suggest something has gone wrong. The aim of this series is to turn those moments of frustration into confident, informed decisions about how to run and improve your sessions.

Why Focus on “Session Health”?

RM Compare is built on adaptive comparative judgement, which uses many pairwise comparisons to estimate a shared scale of quality rather than traditional marks. The standard report offers three powerful lenses on what happened in your session: the rank order of items, judge misfit, and item misfit, along with reliability statistics that summarise how consistently the model can distinguish between pieces of work. Taken together, these allow you to judge the overall “health” of a session. It allows us to understand whether the story it tells about quality is strong enough for your purpose, and where you might want to refine design or practice next time.

Thinking in terms of health is more helpful than pass/fail. A session with moderate reliability can still provide valuable insight and rich professional discussion if you understand where judges differed and which items were complex. Equally, a high‑reliability session might still raise questions if, for example, a few judges did most of the work or certain items proved especially contentious. This series will show you how to go beyond a single headline number and read the full story your report is telling.

Who This Series Is For

This series is designed for everyday RM Compare users rather than psychometric specialists. It will be most relevant if you are for example

  • A teacher or department lead running classroom or departmental ACJ sessions.
  • A school or trust assessment lead using RM Compare for moderation or standard setting.
  • A quality assurance or assessment professional using RM Compare as part of a wider assessment strategy.
  • A researcher using RM Compare data and wanting a clearer practical sense of what the reports are saying.​

The posts assume you know how to set up and run a session in RM Compare, but not that you are comfortable interpreting all the graphs and statistics. Where technical ideas are needed (for example, why a Rasch‑style model is used), they will be explained in plain language and linked directly to what you see on screen.

What the Series Will Cover

The series is organised around the three main elements of the standard report, followed by a design‑focused finale:

Each part will include concrete examples, screenshots or sketches of key graphs, and “try this now” suggestions so you can apply the ideas immediately to your own reports.

How to Use the Series

You can dip into individual posts as needed, but there are two recommended routes:

  • If you are relatively new to RM Compare reporting: Start at Part 1 and read in order. This will give you a coherent mental model before you zoom into judge and item misfit.
  • If you are already running regular sessions: Scan Part 1 for the “session health” framing, then jump straight to the area where you feel least confident – often judge misfit or item misfit – before coming back to the design ideas in Part 5.

In all cases, the goal is not to turn you into a statistician. It is to help you look at a report and say, with confidence: “I understand what happened in this session, I know whether it is healthy enough for my purpose, and I know what I’d tweak next time.”