AI & ML

Race to Understanding: Your Item’s “Official Rating” in RM Compare

By Mark House

10th sep 2025

GenAI Prompt Generator: Explaining the Statistics

👉 How to use: Select your use case above. The generated prompt will instruct the AI to use the "Horse Racing Analogy" from this blog to make your data understandable.

Ever wanted a single, fair number to sum up how good something is? RM Compare gives each item a Parameter Value—for this blog, think of it as its Official Rating, just like a racehorse at the track.

Horse Racing and RM Compare: “Official Ratings” Explained

At the racetrack, every horse has an “official rating”—a number built from many results on the turf. The more races a horse runs, the more confident we become in that rating. The gap between two horses’ ratings helps us predict outcomes: big gaps mean obvious results, but we don’t learn much from those.

RM Compare is similar. Every essay or project gets its “official rating” (the parameter value) from repeated paired judgements—mini head-to-head “races.” The more comparisons, the more robust (reliable) the rating, and the more insight the rank order provides.

Calculating Probabilities and Learning from Results

Official ratings do more than rank; they allow us to calculate the probability of one entry beating another.

Horse Racing: Bookmakers use official ratings to set the odds. If Horse A is rated much higher than Horse B, Horse A’s chance of winning ("probability") is much greater. If ratings are nearly the same, the outcome is close to a coin toss, about 50/50.
RM Compare: Parameter values work the same way. Two items with similar parameter values have close to a 50/50 chance of winning a head-to-head decision. A bigger gap means a higher probability that the top-rated item will be picked.

When predictions are correct:
If the predicted winner takes the contest, little changes—the system already “expected” that outcome. Both official ratings (parameter values) are only slightly updated.

When predictions are wrong:
When the underdog wins—or, in RM Compare, the “weaker” item is chosen—this matters much more! The system uses this surprise to adjust ratings more significantly. Upsets ensure that the model keeps learning and doesn’t get stuck, reflecting new trends or unexpected improvements. Over time, both systems—racing and RM Compare—become “smarter”, better capturing true quality as more evidence comes in.

How Reliability and Confidence Increase with More "Races"

Think of the start of a racing season: odds are uncertain. After a season’s worth of races, you know which ratings and odds you can trust. The same is true in RM Compare. With just a handful of pairwise judgements, certainty is low—estimates jump around. But as more data comes in, parameter values settle, reliability increases, and confidence in the results grows. In other words things become more stable (no pun intended!).

In the image below we can see how confidence grows as more Judgements are made - in this case from approximately 3 Judgements per item (Round 3) to 7 Judgements per Item (Round 7).

Accelerating Learning in Uncertain Worlds

Both horse racing and adaptive comparative judgement through RM Compare thrive in complex environments—full of uncertainty and endless possible matchups. Both use smart pairings and rapid evidence-gathering to reach trustworthy answers faster. Rather than trying to know everything at the start, both systems learn and adapt, using every informative contest to get sharper, faster.

RM Compare uses AI to handle the high-levels of complexity and to accelerate learning. It is important to understand however that it is always the humans who making the judgements.

A Simple Story: Why Pair Gaps Matter

Imagine three horses/items:

Uncle Dick (rating 110)
Sunny Lad (rating 103)
Tonto Boy (rating 90)

If Uncle Dick races Tonto Boy, everyone knows the result; if Uncle Dick races Sunny Lad, it’s a valuable contest; and the more Sunny Lad and Tonto Boy race, the more we learn about both. Early surprises move ratings a lot. Later, as evidence stacks up, ratings become stable and fair.

Analogy Table: Horse Racing vs. RM Compare

Horse Racing ('Official Rating')	RM Compare ('Parameter Value')
Built from season of race results	Built from paired judgement outcomes
Ratings/odds improve over time	Confidence increases as comparisons accrue
Big gap = predictable, little learned	Big gap = predictable, little learned
Close contest = maximum learning	Close contest = maximum learning
Probability predicts winner	Probability predicts winner
Updates if prediction is overturned	Updates if prediction is overturned
Used to rank, handicap, predict	Used to rank, report, and support decision

Glossary

Official Rating: A single number, showing a horse’s ability based on many races.

Parameter Value: In RM Compare, the “official rating” for an item, based on paired judgements.

Pair Gap: The difference in ratings. Moderate differences teach us most.

Comparative Judgement: Making many paired decisions to build up a ranking.

Information Theory: The science of extracting the most learning from uncertainty.

Logit: The scale used for RM Compare’s ratings; all that matters is the gap between the numbers—not their precise values.

Probability: How likely one entry is to win (“beat” another); calculated from the difference in their ratings.

Takeaway

Whether at the track or in RM Compare, the secret to fair, meaningful, and confident ranking lies in pairing, learning from every contest, and letting every result—especially the surprises—make the outcomes smarter and more reliable over time.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP