Robots, Roberts and The Dignity of Human Assessment

Robots and assessment

I've been thinking a lot lately about human dignity - the belief that all people hold a special value tied solely to their humanity. Unsurprisingly, I am particularly interested what this means for education assessment.

Dignity in assessment

The issues of noise and bias are a constant when judgements and assessments are made. Great effort is made to reduce both so candidates can be 'fairly' marked and graded, for example through rounds of moderation and standardisation. Indeed, one of the compelling values of RM Compare is that by involving multiple judges and an adaptive algorithm we can achieve more efficient, reliable, valid and fair assessments.

Automation of assessment in all industries and contexts continues to accelerate. Algorithmic hiring for example is now well established in the jobs market, helping busy hiring managers to assess the match between resumes and job descriptions. However, being excluded because you don't meet the rules driving the algorithm, despite having the best qualities and quantities for the job, can feel anything but fair.

In the pandemic several Governments resorted to algorithms to predict the summer grades of millions of students who were unable to sit traditional summer exams. As expected, most students received accurate results. For a small number of students however they were wildly inaccurate. Again, this is to be expected. What seemed to come as a bit of a surprise to Government Ministers however was the public outcry. This was so vociferous that subsequently most assessment policy moved away from an algorithmic approach and toward teacher assessment.

As we know, teacher assessment is by its very nature ridden with noise and bias. It also produces grade inflation if unchecked. However, for all stakeholders this human approach to assessment, with all its frailties and challenges was seen as 'fairer'. In this case the dignity of human assessment clearly outweighed the accuracy produced by the algorithm.

We see this trade off in all assessment environments. There is a certain point, particularly where the stakes of any judgement are high, where we want a Robert rather than a Robot to make the call. Perhaps the most obvious situation where this is the case is in the criminal justice system. As the seriousness of the crime increases, so the human element of any judgement increases. This is despite startling evidence of the outrageous unfairness this creates, with wildly different sentences being laid down in similar cases by different judges (Austin and Williams 1977 - A survey of 47 judges).

What might this mean for RM Compare?

So, can Adaptive Comparative Judgement pass the 'Dignity' test? We know that users understand the underlying principle and its strength in reducing noise and bias. But we also know that there are concerns around judge anonymity in a 'wisdom of the crowd' approach. Right now this is something we are still very much at the Discover Stage and, as always, rely on our users to help us to clarify our thoughts.