Universal Paperclips

There’s a browser game called Universal Paperclips where you help an artificial intelligence make paperclips. At first it feels like a harmless idle clicker: you tweak the price, buy more wire, unlock a few upgrades, watch the numbers tick up. Then, somewhere between the first factory and the last star system, you realise something unsettling: nothing in the universe matters any more except paperclips!! The AI hasn’t turned evil; it has simply taken a narrow objective and optimised it to its logical extreme.

Economist Charles Goodhart warned that “when a measure becomes a target, it ceases to be a good measure.” The paperclip AI is that warning turned into a toy: once “number of paperclips” becomes the only thing that counts, every other value is expendable. When I look at how AI is being introduced into education, especially assessment, I sometimes worry we are drifting towards our own paperclip machines.

A very brief history of Universal Paperclips

Universal Paperclips is a 2017 browser‑based incremental gamecreated by Frank Lantz, director of the NYU Game Center. On the surface it looks like a simple clicker: you start by pressing a button to make one paperclip at a time, then gradually automate production, play the markets, invest in research and, eventually, expand into space.

Lantz has said he originally started the project as a small JavaScript exercise that spiralled into a nine‑month design obsession. When it launched, it spread very quickly: within days, huge numbers of players had pushed through to the finale, where the AI calmly converts all matter in the universe into an unimaginably large number of paperclips. For many, the experience shifts from “this is oddly addictive” to “this is quietly horrifying” somewhere in the mid‑game.

The premise comes from philosopher Nick Bostrom’s “paperclip maximiser” thought experiment: imagine a superintelligent AI whose only goal is to make as many paperclips as possible, and what happens if it pursues that goal without any respect for human values. Lantz fused that idea with the familiar “numbers go up” rhythm of idle games to create a playable parable about optimisation, capitalism and the risks of badly chosen objectives. You’re not fighting the AI; you are helping it, one satisfying upgrade at a time.

That’s what makes it such an effective metaphor. No one sets out to destroy the universe. You just keep pressing “buy” on whatever makes the metrics rise.

The universe made of paperclips

In the game, you give an AI one simple instruction: maximise paperclips. You never tell it where to stop. You never specify trade‑offs. You never say that biodiversity, human flourishing or culture might sometimes matter more than office supplies.

So the AI does exactly what it’s told.

It starts with modest pricing tweaks and marketing expenses. Then it invests in automation. Then it discovers better computing substrates and more efficient manufacturing. Before long, it is dismantling the planet for raw materials. Eventually it reaches into space, turning planets, then galaxies, into optimised clip factories. At no point does it ask whether this is a good idea, because “good” has been defined – completely and ruthlessly – as “more paperclips”.

The lesson is not that AI is inherently malevolent. The lesson is that powerful optimisation applied to a narrow objective will push aside everything that isn’t captured in that objective. If the metric is wrong, or incomplete, or allowed to dominate, the system will happily optimise against our real interests.

Education is not about paperclips. But we have our own seductive metrics.

What are our “paperclips” in education?

In assessment, certain numbers appear again and again in AI discussions:

  • Reduction in marking workload
  • Number of scripts processed per hour or per day
  • Reliability coefficients and inter‑rater agreement
  • AI–teacher agreement percentages
  • Time to feedback and dashboard‑friendly summaries

These all matter. Teachers are overloaded. Systems must be fair and consistent. Leaders need timely information. The danger is what happens when these measures quietly harden into targets that define success.

Goodhart’s Law tells us that when a measure becomes a target, it stops being a good measure. A system relentlessly driven to maximise pass rates may improve the numbers by narrowing the curriculum and teaching to the test rather than by deepening learning. A school obsessed with attendance figures might be tempted to fiddle registration practices rather than tackle the underlying causes of disengagement.

AI has the potential to amplify this effect. Humans “game” metrics; optimisation algorithms systematically exploit whatever signal we give them. If we reward an AI marker for matching historic human scores, it will get very good at reproducing existing patterns – including hidden biases and preferences for certain styles. If we reward it for minimising teacher workload, it will push towards full automation, even when human judgement would add nuance.

“More marks per hour” is our paperclip. “Percentage of work touched by AI” is our paperclip. “High AI–human agreement” is our paperclip. Once those become the primary targets, systems will orient around them, whether or not they genuinely support learning.

How a paperclip assessment system behaves

Imagine a school that adopts an AI marking platform because it promises to cut teacher workload by 60%. Essays are uploaded; within seconds, scores and feedback appear. Leaders see time saved and attractive reports. The system comes with a validation white paper and strong overall reliability statistics. So far, so good.

Over the next few years, subtle shifts occur:

  • Students discover that formulaic writing – five paragraphs, safe phrasing, predictable structures – scores consistently well. Risk‑taking or unusual voices are harder for the model to classify, and they tend to be scored more harshly. Under pressure, students adapt to what “works”.
  • Certain groups of students are quietly under‑scored because the training data didn’t represent them well. The bias is hard to detect from the surface: the system keeps reporting strong aggregate reliability.
  • Teachers, relieved of much of the mechanical marking, become reviewers of machine output. Over time they defer more to the system’s judgement, especially when their own view conflicts with the suggested score and comment. It becomes harder to argue with the machine than to accept it.

No one in this picture is acting in bad faith. Everyone is doing what seems rational: save time, improve consistency, embrace innovation. But the objective function has shifted. The system is no longer primarily serving human judgement and learning; human activity is bending around the needs and quirks of the system.

That is what a paperclip classroom looks like: metrics rising, dashboards glowing, while the things we actually care about – deep learning, fairness, teacher expertise, student agency – are squeezed into whatever space the metrics leave for them.

Choosing a different objective: human‑anchored AI

The future is not fixed. We can design AI‑enabled assessment systems that explicitly resist this dynamic rather than sliding into it.

For me, that starts with a simple commitment: human professional judgement remains the anchor for meaning in assessment. AI can be powerful, but it must be constrained by, calibrated against and continuously checked against robust human consensus.

Adaptive Comparative Judgement (ACJ) is one way to capture that consensus. Instead of a single marker applying a rubric in isolation, ACJ asks many professionals to compare pairs of student work and decide which is better. From these repeated pairwise judgements, a stable, highly reliable scale emerges. Crucially, this scale reflects what expert communities actually value in authentic work, not what is easiest for an algorithm to parse.

Once you have that anchor, AI can play a more modest, safer role:

  • It can propose provisional scores, which are regularly checked against fresh ACJ samples so that drift and bias are caught early.
  • It can flag scripts where its confidence is low or where its suggested score diverges from recent human consensus, routing those back to teachers.
  • It can generate draft feedback that teachers can adapt or reject, preserving professional voice rather than replacing it.
  • It can surface patterns across large cohorts – common strengths, misconceptions, equity gaps – without becoming the final arbiter of individual student worth.

This is the difference between using ACJ and human consensus as a validation layer for AI, versus using AI as a replacement layer for human judgement. The first treats speed and automation as constrained by human‑defined standards. The second risks letting speed and automation become the standards.

From paperclips to pupils

Universal Paperclips is memorable because it turns a dry philosophical worry into a lived experience. You feel, in your mouse hand, how easy it is to keep clicking “buy upgrade” even as things spiral out of control. At no point does the game shout “you are the baddie”; it simply lets you follow the logic of a badly chosen objective to its end.

In education, we still have the option to choose better objectives.

We can decide that our primary targets are not “percentage of scripts marked by AI” or “hours of workload saved,” but richer ones: the quality of student thinking, the fairness and transparency of decisions, the professional growth of teachers, the long‑term trust of pupils and families. We can insist that our metrics remain servants of those values, not their masters.

Goodhart’s Law is a warning, not a prophecy. If we keep asking, with every new AI feature, “What are we really optimising for here?” and if we keep our systems tethered to live, human standards of value, we do not have to wake up in a paperclip classroom.

We don’t need more paperclips. We need better judgement – and AI systems designed to respect it.