Product

Transforming 'True Scores' to scaled scores to facilitate grading

The principles of generating a linear scale from the parameter values to enable the application of values / grades

What's the challenge here?

RM Compare admins will be familiar with this rank order report. Perhaps the most common question we receive is how to use the data within to apply grades and / or values. This article explains some of the challenges, our current approach, and how you can try it for yourself.

How can we convert true scores to scaled scores?

To understand this process, you must appreciate how values are calculated and the units in which they are reported.

The core of the ACJ system is the parameter value ('true scores'). The output of the complex calculation / estimation process are parameter values - one per item. To create a rank, we just "sort" the parameter values in descending order. You could use these ordinal numbers to give you a crude "score" value, but there's a lot more information here.

The parameter value in expressed in a unit called the "logit". This is a just a convention meaning that the numbers are the log of the odds of the estimated chance of an item "winning" a comparison. It's a convenient unit as it makes the calculation of specific odds between 2 items trivial. All you have to do is looks at the difference between 2 parameter values to get the odds, the probability that one will win over the other.

Ideally our score should capture the relative differences between values and an analogue of quality. So the raw parameter value is on a logarithmic scale (the natural logs). In theory it can range from positive to negative infinity. In practice it's effectively bounded between plus and minus 10. (Values that are slightly higher or lower are possible but not common)

We don't want the range of our scores to be open ended. We will have a fairly clear concept of the maximum score (100%) and an agreed definition of a minimum score (zero, 1, DNF...). Also, the score is likely to be considered more on a linear scale (e.g., anyone scoring 80% has performed twice as well as someone only scoring 40%).

Given these requirements, our first pass at creating a score is to take the parameter values and squash them into a fixed range. We can just use a sigmoid function for this and in our case the logistic function makes most sense (1 divided by 1 plus e to the power of negative x - x = parameter value). This "translates" our parameter values into a range from 0 to 1. Once we have this fixed range, we can then scale it to any values we like.

You can anticipate that the actual scaling and translating process will be very context specific, so, at this stage we're arbitrarily setting it to range from 1 to 100 (it is possible to change this to an alternate scale as required).

How can I try this for myself?

We have added a new column ('Params as a score') to the Item download file available in the reports area. This takes the principles outlined above to auto generate the linear values. This in turn allows the user to set their own scales and apply grade boundaries.

Anything else I should be thinking about here?

While we are excited about the potential that the principles outlined here offer, we recognise that there some learnings still need to take place. For example, the 'quality' of the linear data is dependent on the session data itself (see below). We can see in the examples below how different the graphs look as the session progresses.

Linear data taken from round 16 of an RM Compare session with 132 scripts (Reliability: 0.88 +/- 0.02)

Linear data taken from round 8 of an RM Compare session with 132 scripts (Reliability: 0.83 +/- 0.09)

This is an area we are at the early stage of investigating and look forward to working with our research partners to explore and describe more accurately over time.

Turning ranks into rulers

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP