What an LLM cannot do, and a world model would
Predict the consequences of action, update from prediction error, hold state across long horizons. World model names the function next-token prediction was never designed to provide.
Keon Kim ·
When Browser Harnesses Help, and When They Hurt
Jina MCP topped WebVoyager and AssistantBench, but the deeper lesson is about model-interface design. Browser harnesses are compression layers, and the same harness that lifts Haiku by 20 points can drop Opus by 30.
Keon Kim and Krish Chelikavada ·
Om Labs Tops ScreenSpot-Pro: 80.9% with a Free Confidence Signal
Our research found a hidden confidence signal in how AI models click on screens and used it to reach #1 on ScreenSpot-Pro, ahead of teams from Alibaba, H Company, Lenovo, and Tsinghua.
Keon Kim and Krish Chelikavada ·