Spatial intelligence may teach AI how objects move through time. But it does not automatically teach AI what time means to humans who wait, age, remember, depend, suffer, hope, and live with irreversible consequences.
Fei-Fei Li’s framing of world models is important because it clarifies where AI is moving next.
Language models gave machines a way to reason through words. World models aim to move beyond words into spatial intelligence: rendering scenes, simulating physical environments, and planning actions inside those environments.
That is a major step.
But there may be a missing layer.
World models can learn time as motion, sequence, physics, prediction, and planning horizon. They can learn that an object falls, that a cup moves when pushed, that a robot arm should close before lifting, or that an action produces a later state.
That is operational time.
It is not yet human time.
Human beings do not experience time only as state transition. We experience time as memory, waiting, aging, urgency, fear, fatigue, grief, hope, trust, regret, development, dependency, and irreversible consequence.
A model may learn how a body moves through space without understanding what delay means to a frail person waiting for help.
A model may learn how to complete a task sequence without understanding that a new event has changed the moral priority of the moment.
A caregiving robot may be instructed to finish feeding an elderly patient gently and safely. But if a smoke alarm sounds, or the patient suddenly needs urgent assistance, the system must understand that the meaning of “continue” has changed. Time is no longer just sequence. It is urgency, vulnerability, and consequence.
This is not a small edge case. It is central to embodied AI.
In human life, timing often changes everything. A delay of ten seconds may not matter in one context and may be catastrophic in another. Repetition may be useful in training but exhausting or harmful in caregiving. A small interaction repeated over weeks may create dependency. A decision that looks efficient in a simulation may produce human distress when imposed at the wrong moment.
This is where world models may still fall short.
They may become excellent at representing physical environments, predicting motion, and generating action plans. But spatial intelligence does not automatically produce temporal judgment.
An AI system can model the passage of time without living through it.
It can describe aging without aging.
It can describe memory without having human memory.
It can describe urgency without feeling the pressure of irreversible consequence.
That gap matters wherever AI systems interact with children, elders, patients, workers, families, institutions, or physical environments over time.
This is also why cognitive AI approaches deserve serious attention alongside world models and large language models. Peter Voss’s work on concept-based cognitive architectures points toward a different path: systems organized around concepts, learning, reasoning, continuity, and human-like adaptability rather than statistical scale alone.
That direction may have a better chance of addressing continuity, memory, conceptual development, and temporal understanding than language-only or spatial-only systems.
But even cognitive AI should not be assumed to solve human time automatically.
Human time is not only cognitive. It is embodied, biological, social, emotional, developmental, and mortal.
So perhaps the next frontier is not only spatial intelligence, but temporal responsibility.
World models may teach AI how objects, bodies, and environments move through time. Cognitive models may help AI organize concepts and continuity more deeply. But neither automatically guarantees understanding of what time means to beings who wait, age, suffer, remember, depend, recover, decline, hope, and live with consequences.
That is why world models, even if successful, will still need execution boundaries.
Renderers show what might be seen.
Simulators predict what might happen.
Planners choose what might be done.
Cognitive systems may reason more deeply about meaning and continuity.
But another layer is still needed to determine what must not be allowed to execute — especially when human time, human vulnerability, and irreversible consequence are involved.