Research
World Models

World models study whether AI systems can predict how a state changes under action. For software teams, the world is not only pixels or physics. It includes code, tests, browsers, services, customer workflows, logs, tickets, and production behavior.
What the field studies
CWM is one concrete version of this idea for code. Instead of learning only from static source files, it trains on observation-action trajectories from Python interpreters and agentic Docker environments. The model sees code, predicts execution states, and learns how actions change the computational environment.
That matters because programming is dynamic. A line of code changes variables, control flow, files, test outcomes, and sometimes the behavior of an entire repository. A world model is useful when it can forecast those state changes, not merely produce plausible patches.
Deja Vu points at the production version of the same idea. It asks whether a system can simulate a proposed change against customer workflows, incidents, deployments, configurations, and runtime signals, then predict which failures will become real customer tickets.
Why it matters
Static review asks whether the diff looks wrong. World modeling asks whether the software world will break after the diff lands.
That distinction is important for AI employees. Many failures are not obvious logic bugs. They are correct code in the wrong environment: a customer-specific configuration, a migration state, a feature flag combination, a downstream integration, or a transient deployment condition.
An agent that only reads code can miss those failures. An agent with a world model can ask a stronger question: if this change happened, what state would the program, repository, user workflow, or production system move into next?
Our approach
We treat world models as the foundation for autonomous QA. The model should connect source code to the environments where that code actually acts: interpreters, tests, browsers, services, customer workflows, logs, tickets, and production traces.
The near-term goal is not a single omniscient simulator. It is an agent that can build enough local world state to make better decisions: which scenario to replay, which invariant to check, which customer path is exposed, and which regression is likely before users report it.
For software teams, the useful output is practical: predict issues before they happen, generate targeted tests from that prediction, verify fixes against realistic trajectories, and keep learning as tickets and incidents reveal new production states.
For the broader argument about why world models matter beyond language fluency, read From Language Models to World Models.