Escaping the Text Trap: Why the Future of Assessment is Spatial

If you look at the current headlines in education, you’d be forgiven for thinking human intelligence is made entirely of words.

We are currently locked in an arms race regarding "Text Intelligence." We worry about Large Language Models (LLMs) writing essays for students, and we counter by building AI tools to grade those essays. We are obsessed with the Linguistic Bottleneck - the idea that the only way to prove you understand the world is to write a description of it.

But what if we are assessing the wrong intelligence entirely?

What if the future of assessment isn't about better ways to read text, but better ways to see action?

The Rise of "World Models"

Fei-Fei Li, the "Godmother of AI," recently noted that we are entering a new era. While the last few years were about AI learning to read (LLMs), the next era is about Spatial Intelligence. Her new venture, World Labs, is building "Large World Models" designed to understand 3D space, physics, object permanence, and action.

Why? Because meaningful human intelligence isn't just about describing a cup; it's about the ability to navigate a room, reach out, and pick that cup up without spilling it.

For educators, this highlights a critical flaw in our current system. We often force spatially brilliant students - engineers, artists, surgeons, athletes - to translate their 3D understanding into 2D text just to get a grade. We assess their ability to write, not their ability to do.

The AI Judge vs. The Human Eye: Why We Still Need People

The temptation is obvious: if students are uploading images of their work, why not just let an AI grade them? Computer vision is getting better every day, right?

This line of thinking misses a fundamental difference between how a machine "sees" a drawing and how a human does. It’s the difference between identifying pixels and understanding intent.

The AI's View: A Statistical Guess. When a current AI model looks at a student's diagram of a bridge, it doesn't see a structure. It sees a mathematical grid of pixels. It compares patterns of lines and shapes against its vast database of images labelled "bridge."

  • It might grade a drawing highly because the lines are straight.
  • It might fail a brilliant, unconventional design because it doesn't match the statistical average of its training data.
  • Crucially, the AI has no concept of gravity. It cannot look at a joint in a drawing and intuitively know, "That's going to snap." It is grading the image, not the idea.

The Human's View: A Mental Simulation. A human judge - an art teacher, a design engineer, a peer - does something miraculous when they look at that same drawing. Their brain engages in mental simulation.

By looking at a 2D sketch, their spatial intelligence constructs a 3D mental model. They unconsciously apply their "tacit knowledge" of physics and the real world. They don't just see lines on a page; they "feel" the weight of the structure. They can instantly spot if a perspective is "off" not because it breaks a rule, but because their brain tells them the object couldn't exist in 3D space.

We Don't Need to Wait for "Spatial AI"

While Silicon Valley spends billions trying to teach computers to understand the physical world, we have found something fascinating at RM Compare: Human judges are already experts at this.

When we move assessment away from rubrics (which are text-based) and towards Adaptive Comparative Judgement (ACJ) of artefacts (images and video), we unlock the human capacity for Spatial Intelligence. We have evidence that shows exactly how this works.

Seeing 3D in 2D: The Art Project. In our Multi-School Art Assessment Project, there was a legitimate fear: Can you really judge a 3D sculpture or a textured clay tile just by looking at a photograph?

The results showed reliability was incredibly high. Why? Because of a cognitive process called Amodal Completion. The judges didn't need a written rubric to tell them if the texture was "rough" or "smooth" - their spatial intelligence allowed them to "feel" it through the image, accurately judging the 3D artefact via a 2D proxy.

Assessing Physics in Motion: The PE Project. We saw a similar breakthrough in our work assessing Physical Education at scale.

Physical education is the ultimate "Spatial" subject. You cannot write an essay about a forward roll that proves you can do one. By using video evidence in RM Compare, judges were able to assess the students' movement through space. They used their own "intuitive physics engine" to assess the biomechanics, flow, and momentum of the students. They were assessing the process of movement, not just a description of it.

The Future is an Artefact

As AI commoditises text, the value of the written word as a proxy for understanding is dropping. Conversely, the value of proven capability - the artefact, the video, the portfolio, the build - is rising.

RM Compare allows us to escape the text trap. By allowing judges to compare "anything" - from a video of a science experiment to a photo of a design prototype - we stop asking students to describe the world and start asking them to show us how they shape it.

The technology behind spatial assessment is complex, but the principle is simple:

Don't write about it. Show it.