Resilience Is No Longer Enough: Why Curriculum, Assessment, and AI Need Antifragility

By Mark House

22nd may 2026

For years, we’ve been told that education systems need to be more “resilient”. Resilient curriculum. Resilient assessment. Resilient students. In a world of AI, that sounds reassuring, but it quietly sets the wrong ambition. Resilience, in the way we usually use it, means being able to withstand shocks and stay the same. The problem is that AI is not a one‑off storm. It is a permanent change in the climate.

In Nassim Nicholas Taleb’s terms, resilience is only the middle rung of a ladder. Below it sits fragility: systems that break under stress. Above it sits something more interesting: antifragility. Antifragile things do not simply survive volatility; they gain from it. They are structured in such a way that small shocks, variability, and randomness make them better over time, not worse.

That is the bar we now have to clear in curriculum and assessment design.

The Oak Tree Problem: Strong, Resilient, and Still Fragile

The easiest way to see the difference between resilience and antifragility is to step away from education and look at nature.

An oak tree is a good symbol of resilience. It is sturdy, deep‑rooted, and able to withstand repeated storms. It looks solid and dependable, and for long periods it is. But in Taleb’s language, it is also fragile, particularly to Black Swan events it was never optimised for such as a new disease, a sudden change in climate, a fire severe enough to overwhelm its defences. When that happens, the oak has no way to reconfigure itself, explore alternatives, or change strategy. It cannot move. It cannot experiment. It cannot increase its surface area exposure to upside optionality.

That is uncomfortably close to how much of our curriculum and assessment behaves today.

We design highly specified programmes of study, optimise teaching to those specifications, and reinforce everything with high‑stakes, one‑shot exams and tight rubrics. In good weather, this looks impressively robust. Results are predictable. Accountability data is tidy. But when the environment changes, the same strengths can reveal themselves as structural fragilities.

Generative AI is exactly that kind of environmental change. It will steadily eat into the kinds of tasks and performances we have relied on to certify knowledge and skill. Patching the old system to be more “resilient”, for example through stricter proctoring, better cheat detection, or clever tweaks to question formats, is, at best, oak‑tree thinking. We might survive this round of storms, but we are not becoming any less fragile to the next ones.

The Beehive: Built‑In Exploration and Optionality

If the oak is our cautionary tale, the beehive offers a more hopeful metaphor.

In a honey bee colony, not every bee does the same job in the same way. Some foragers act largely as collectors, returning to known food sources and using social cues – the famous waggle dance – to coordinate exploitation of those sources. Others behave as scouts, flying out into the unknown to search for new patches of nectar and pollen. The colony implicitly “knows” that this mix of exploration and exploitation is vital. If every bee only ever went where the last waggle dance pointed, the hive would be dangerously dependent on a small set of known patches. If every bee only explored, the colony would waste energy and starve.

By maintaining a proportion of bees that range more widely and test new possibilities, the hive continually increases its exposure to potential upside. Most exploratory flights fail quietly. A few find something dramatically better such as a nearer patch, a richer field, a newly flowering tree. In those moments, the waggle dance updates the behaviour of many other bees, and the whole colony pivots.

That is what optionality looks like in practice: lots of low‑cost local experiments, small individual downside when they fail, and the possibility of a disproportionate upside when they succeed. It is a pattern much closer to antifragility than the fixed, optimised oak.

The question is: does our curriculum and assessment behave more like an oak or a hive?

When “AI‑Resilient Assessment” eventually fails

Right now, much of the sector’s response to AI is focused on making the existing model of assessment “AI‑resilient”. We are seeing three common moves:

Adding layers of surveillance: remote proctoring, webcams, keystroke logging, identity checks.
Applying forensic tools: cheat detection, text analysis, attempts to spot machine‑generated prose.
Tweaking formats and conditions: more in‑person exams, tighter time limits, different question types.

Each of these may patch a specific vulnerability in the short term. They may even be necessary in some contexts. But taken together they amount to a strategy of building higher and higher walls around essentially the same structure.

In Taleb’s terms, they add complexity and hidden fragilities to a system that still has large downside risk (fail one high‑stakes event and doors close) and almost no upside from volatility. They make a fragile system slightly more resistant to yesterday’s attacks, while creating new points of failure of their own. It is ginger‑in‑the‑dyke engineering: constantly repairing cracks in a wall that the tide will eventually overtop.

If the underlying logic of assessment remains “one format, one shot, one narrow way of showing what you know”, AI will keep finding ways to help students produce plausible surface‑level responses. We can chase that with detection and policing, but there is no reason to think this arms race will end in our favour.

We need a different ambition.

Why Countries Cannot Afford Oak‑Tree Systems

This isn’t just a school‑level design problem. At a national level, the stakes are economic.

AI is a general‑purpose technology that will reshape productivity, employment, and growth across sectors. The macro‑economics are still uncertain, but one point is already clear in OECD, IMF and other analyses: countries that adapt their human capital systems fastest (building flexible, creative, continuously learning workforces) are likely to capture more of the upside and less of the downside. In other words, education systems that behave like oak trees, optimised for one stable climate, are a macro‑economic liability.

International work on skills shortages is already pointing in the same direction. Across advanced economies, the hardest gaps to fill are not narrow bits of codified knowledge, but complex bundles of skills: active learning, critical thinking, judgement and decision‑making, creativity, collaborative problem‑solving. These are exactly the kinds of capabilities that grow when learners are repeatedly exposed to complex tasks, uncertainty, feedback and iteration, and that wither in systems that train them to play safe for standardised tests.

From this vantage point, the question “Can our curriculum and assessment cope with AI?” becomes “Can our curriculum and assessment create the kind of optionality and antifragile human capital our economy now needs?” A country that continues to pour time and money into making old formats slightly more “AI‑resilient” is locking itself into concave bets: large downside if the exams stop meaning what we claim, and limited upside even if they “work”. A country that redesigns curriculum and assessment around big ideas, exploration and holistic judgement is doing the opposite, increasing its surface area exposure to upside optionality in the labour market.

Curriculum as an Engine of Options, Not a Canal

Antifragility in education starts upstream, with curriculum.

The kinds of curriculum frameworks that talk about “big ideas”, enduring understandings, and what matters most (for example, Understanding by Design® or the Curriculum for Wales) already move us in the right direction. They encourage schools to design learning around concepts that can be explained, applied, and transferred, rather than treating curriculum as a march through content.

But the antifragile move is to take this one step further.

An antifragile curriculum:

Exposes students regularly to complex, open‑ended tasks such as projects, investigations, performances, designs, portfolios where there isn’t a single predictable “right” route.
Encourages multiple ways of demonstrating understanding: different media, structures, and perspectives are not only allowed but valued.
Builds in iteration and reflection: students are expected to try, fail partially, see stronger examples, revise, and try again.
Allows for local experimentation: schools and teachers can adapt tasks, contexts, and combinations of disciplines based on what they learn about their students and communities.

This is curriculum as a generator of options, not a canal. It continually sends some “scouts” (students, teachers, ideas) into unfamiliar problems and domains, because that is how new strengths, interests, and opportunities are discovered.

However, curriculum cannot do this alone. If assessment stays narrow, high‑stakes, and brittle, it will eventually drag curriculum back toward safety and standardisation. Which is why the next step matters.

Why Holistic Assessment Is Non‑Negotiable

If we want to move from oak to hive, we have to change how we judge and value student work.

Traditional, rubric‑driven assessment has many strengths. It can clarify expectations, support feedback, and provide a shared language for teachers. But as tasks become more open, multimodal, and conceptually rich, conventional analytic rubrics start to strain. They push us toward fragmenting performances into tick‑box items. They reward what is easy to describe and defend, not always what is most important in the work.

This has two unintended consequences:

Students learn to perform to the rubric rather than fully engage with the idea. Safe, formulaic responses are rewarded; thoughtful risk‑taking and unusual approaches become dangerous.
Systems become resistant to variation. Novel, surprising, or boundary‑pushing work is hard to place. Under pressure, it is easier to mark it down than to revisit the descriptors.

In an AI world, that dynamic becomes even more problematic. Narrow, predictable formats are precisely where generative tools are strongest. If we continue to rely on those formats, and then try to bolt on detection, we are fighting on the most unfavourable ground.

Antifragile assessment needs to do three things differently:

Welcome diverse responses
It must be able to recognise quality across different formats and solution paths, because that is where genuine exploration lives.
Judge holistically at the level of understanding
Instead of slicing performances into ever‑smaller criteria, it must allow professional judgement to operate “in the round”, asking whether the work truly realises the intended understanding or quality.
Work iteratively
It must support cycles of comparison, feedback, and revision so that volatility in student performance becomes a source of learning, not just a mark against them.

This is exactly the level at which Adaptive Comparative Judgement operates.

ACJ and RM Compare: From Concept to Infrastructure

Adaptive Comparative Judgement (ACJ) is a way of assessing complex work that starts with a very human question: given two pieces of student work, which better demonstrates the intended quality or understanding?

Instead of asking teachers to attach absolute scores to isolated artifacts, ACJ asks them to make a series of pairwise comparisons. Each judgement is holistic and contextual. Over many comparisons, a reliable rank order emerges. The result is not just a mark; it is an ordered set of real responses that reveal what stronger and weaker work look like across the cohort.

RM Compare takes this idea and makes it usable at scale.

By orchestrating many judgements across multiple judges, and adapting which pairs are shown based on previous decisions, RM Compare allows schools and systems to:

Assess rich, open‑ended tasks fairly
Essays, presentations, prototypes, performances, portfolios in different media can all be judged on the same scale, because what matters is the quality of understanding, not the format.
Respect professional expertise
Teachers make the kinds of comparative, holistic judgements they are naturally good at, rather than forcing their insights through ever finer rubric descriptors.
Turn moderation into learning
Because judgements are distributed, teachers see a wide range of work. Conversations about “why this one is stronger” surface tacit knowledge about quality and help build shared standards.
Generate exemplars and trajectories
The ordered set of responses becomes a powerful library for planning, feedback, and student self‑assessment. When used across drafts, it can also show how individual pieces move up the scale as students respond to feedback and exemplars.

Seen through an antifragility lens, ACJ and RM Compare provide something the old system lacks: a way of turning variation, uncertainty, and even disagreement into assets. The more diverse the work, the more useful information the process generates. The more cycles of judging and revision you run, the more students and teachers gain from exposure to what others have attempted.

From Oak to Hive: An Invitation

If we continue to treat AI as a threat to be contained, our best case is a slightly more fortified version of the same fragile structures. We will spend increasing amounts of time and money propping up formats that are steadily losing their meaning, and we will ask students to jump through hoops we know are misaligned with the world they are entering.

There is another option.

We can treat AI as a forcing function that pushes us towards tasks, judgements, and systems that are harder to fake precisely because they are richer, more human, and more open. We can design curriculum that regularly sends students to explore big ideas in unpredictable ways. And we can back that curriculum with assessment that welcomes diversity of response, operates holistically, and uses comparison and iteration to turn volatility into learning.

Resilience helped our systems survive the last era. In this one, resilience without optionality is just a very sturdy way of being fragile. If we want education that actually gets better from the shocks ahead, we need more hives and fewer oaks – and we will need tools like RM Compare to make that possible in practice.

Group	Name	Domain	Expiration	Security	Purpose
necessary	csrftoken	compare.rm.com	365 days, 0:00:00	HTTP	Helps prevent CSRF attacks
necessary	_cf_bm	vimeo.com	1 day, 0:00:00	HTTP	Used to distinguish between humans and bots
preferences	wtm	compare.rm.com	365 days, 0:00:00	HTTP	Used to store users cookie preference choices
statistics	_ga	rm.com	365 days, 0:00:00	HTTP	Registers a unique ID used to generate statistical data on how visitor used the website
statistics	_ga_#	rm.com	365 days, 0:00:00	HTTP	Used by Google Analytics to collect data on user visits to the website
statistics	_hp2_#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_id.#	rm.com	365 days, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	_hp2_ses_props.#	rm.com	1 day, 0:00:00	HTTP	Collects data on the user's navigation and behaviour on the website
statistics	vuid	vimeo.com	365 days, 0:00:00	HTTP	Collects data on the user's visits to the website
marketing	td	googletagmanager.com	0:00:00	HTTP	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website
marketing	h	heapanalytics.com	0:00:00	HTTP	Collects data on the user behaviour and interaction with the website

Name	Domain	Purpose	Expiration	Security
csrftoken	compare.rm.com	Helps prevent CSRF attacks	365 days, 0:00:00	HTTP
_cf_bm	vimeo.com	Used to distinguish between humans and bots	1 day, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
_ga	rm.com	Registers a unique ID used to generate statistical data on how visitor used the website	365 days, 0:00:00	HTTP
_ga_#	rm.com	Used by Google Analytics to collect data on user visits to the website	365 days, 0:00:00	HTTP
_hp2_#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
_hp2_id.#	rm.com	Collects data on the user's navigation and behaviour on the website	365 days, 0:00:00	HTTP
_hp2_ses_props.#	rm.com	Collects data on the user's navigation and behaviour on the website	1 day, 0:00:00	HTTP
vuid	vimeo.com	Collects data on the user's visits to the website	365 days, 0:00:00	HTTP

Name	Domain	Purpose	Expiration	Security
td	googletagmanager.com	Used by Google Tag Manager to collect data on the user behaviour and interaction with the website	0:00:00	HTTP
h	heapanalytics.com	Collects data on the user behaviour and interaction with the website	0:00:00	HTTP