From Steady State to Rulers: how RM Compare is building the future of shared standards

Assessment systems often talk about standards, but too often those standards remain abstract.​ Teachers, examiners and assessors are expected to align to a wider benchmark, yet in day-to-day practice they usually see only the work directly in front of them: their own class, their own cohort, their own centre.​ That gap matters. It helps explain why judgements can drift between schools, institutions and organisations even when everyone involved is knowledgeable, careful and acting in good faith.

This is why Richard Kimbell’s 2022 presentation, Sharing and securing learners’ performance standards across schools, remains so relevant.​​ It identifies a problem that still sits at the heart of modern assessment: if people cannot routinely see and work with a shared sample of performance, then “national” or system-level standards remain difficult to interpret and even harder to apply consistently.​ The importance of RM Compare today is that it has taken the central insight in that work and started to operationalise it in product form through on-demand rulers.

Sharing and securing learners performance standards across schools

Kimbell, R. (2022). Sharing and securing learners' performance standards across schools. In K. Burns (Ed.), Research Conference 2022: Reimagining assessment: Proceedings and program. Australian Council for Educational Research. https://doi.org/10.37517/978-1-74286-685-7-6

View PDF

The assessment problem Kimbell identified

Kimbell’s starting point is simple but powerful.​ Teachers constantly assess learners in classrooms, often informally and formatively, and their local standards are usually sufficient for that purpose.​ But as soon as those same teachers are asked to make judgements that are meant to reflect a broader benchmark - across schools, across a region, or nationally - they run into a structural problem: they rarely have access to enough external work to know what that wider standard really looks like.​

That means variation between schools is not only a problem of process; it is a problem of visibility.​ In the paper, Kimbell notes that awarding and regulatory systems often respond with alignment and standardisation processes, but these typically occur after judgements have already been made.​ They may improve system outcomes, but they do little to help teachers build a living understanding of shared standards as part of normal professional practice.

What Kimbell’s Steady State work adds

Kimbell’s work builds on the success of Adaptive Comparative Judgement (ACJ) as a reliable way to rank complex performances holistically.​​ Earlier ACJ studies showed that judges comparing pairs of portfolios could produce highly reliable rank orders, and teachers found the process educational because it exposed them to a much wider range of work than they would normally encounter.​ Seeing many examples side by side sharpened their sense of quality and helped them reflect on how their own students’ work related to a broader field of performance.​

The limitation, however, was that traditional ACJ was essentially single-cohort.​ A school or class could generate a rank, and another school could generate a different rank, but the two were not naturally linked unless they were combined in a single large comparative exercise.​ This made it hard to create a durable standard that could survive beyond one cohort or one assessment event.

Steady State is Kimbell’s answer to that limitation.​ The idea is to take an initial multi-school ACJ rank and treat it as a fixed ruler.​ New work from an additional school is then judged against the work already in that ruler, and only the new work moves until it finds its stable position on the scale.​ In effect, the rank becomes a reference instrument: a shared scale that can absorb new performances without having to rebuild the whole system from scratch each time.​

That is the breakthrough insight in the presentation.​ It moves comparative judgement from being only a way to produce a result in one session to being a way to maintain and extend a standard over time.​ Kimbell also makes clear that this is not only a technical advance. It has a pedagogical dimension too: if teachers can routinely judge work from multiple schools against a shared ruler, they can become more confident, more aligned and more articulate about what quality looks like across the scale.

Where RM Compare is now

RM Compare has taken that Steady State concept and translated it into a broader product idea: rulers. In practical terms, rulers are reusable scales derived from expert comparative judgement that can be stored, governed and then applied on demand to fresh work. That matters because it changes the value of comparative judgement. Instead of every assessment being a one-off ranking exercise, a high-quality judgement process can produce an asset that is useful again and again.

This is where the RM Compare ecosystem becomes strategically important.​ Studio-style activity is used to create high-quality ranks and convert them into rulers; governance patterns in the ecosystem make those rulers available to the right people; and on-demand assessment enables new work to be placed onto trusted scales whenever needed. The product is therefore not just helping users answer “Which is better?” but increasingly helping them answer “Where does this new piece belong on the scale we already trust?”.​

That is already powerful across multiple contexts.​ In schools, it supports shared standards for complex outcomes such as writing, oracy and project work. In awarding, it points towards more defensible standard-setting and more coherent use of exemplars and grade boundaries. In higher education, recruitment and training, it creates the possibility of applying previously established standards to new submissions, performances or work samples without having to redesign the whole assessment model every time.

Why this matters beyond assessment operations

The most important implication is that rulers make standards visible and reusable.​​ Once a trusted ruler exists, it can support much more than a single assessment event.​ It can be used formatively to show where current work sits on a shared scale, intermediately to monitor progress or drift, and summatively to anchor final decisions to a common standard. In that sense, rulers create continuity across assessment types and stages rather than forcing institutions to rely on disconnected instruments.

This has consequences for curriculum as well.​ If teachers, leaders or system designers can see progression on a real shared ruler, curriculum design can be thought of more clearly as movement along a continuum of performance, not just coverage of content or completion of tasks.​​ And it matters for pedagogy, because comparative judgement and ruler-based approaches help teachers and learners talk more concretely about what stronger performance looks like. They support richer professional conversations and can even create structured opportunities for learners themselves to improve by evaluating work against shared standards.

Why this is only part way to full Steady State

For all that progress, the current position is still only part of the full Steady State vision.​​ Today’s strength lies in using an existing ruler to place new pieces of work onto a trusted scale.​ That is already a major step forward because it brings continuity across cohorts, centres and repeated use cases. But it still assumes that the ruler is anchored in an original task set or assessment context.​

The next major step is to allow new items or tasks to be ingested into an existing ruler.​ Once that becomes possible, the ruler stops being only a reusable scale for new responses and starts becoming a living domain instrument that can evolve while preserving continuity. New tasks can be introduced, curricula can shift, fresh examples can be brought into the system, and yet performance can still be interpreted against the same underlying standard.

That is the deeper promise of the roadmap.​ It would move RM Compare closer to a true Steady State model in which standards are not rebuilt from scratch, nor frozen in time, but continuously maintained and extended.​ In schools, that could support evolving local tasks within trust-wide progression frameworks. In awarding, it could support more dynamic standard-setting across exam cycles. In professional and organisational settings, it could link fresh authentic tasks back to stable performance scales that remain meaningful over time.

From research concept to assessment infrastructure

Seen in the round, the story is now quite clear.​​ Kimbell’s presentation identified the problem of inaccessible shared standards and introduced Steady State as a way to build a ruler from comparative judgement and then place new work onto that ruler efficiently over time.​ RM Compare has taken that central idea and turned it into the language and product model of rulers: reusable, governable standards that can be applied on demand across cohorts, centres and sectors. The next phase is to make those rulers even more powerful by allowing new items to be brought onto the scale, helping the platform move from a strong implementation of the Steady State concept towards a fuller realisation of it.​​

That is why this is more than a feature story.​ It is a story about how assessment can shift from isolated events to durable shared standards; from one-off ranking exercises to reusable measurement assets; and from opaque moderation processes to more visible, discussable and governable conceptions of quality.​​ Kimbell’s work gives the intellectual foundation.​ RM Compare is now building the product and ecosystem that can carry that foundation into everyday assessment practice.