Many researchers think that world models will prove essential to the future of robotics. Li, the World Labs founder, has written about how they could facilitate the development of robots that explore the deep sea and assist health-care providers, but for now, the applications are more modest. The makers of Pokémon Go, for instance, are using billions of images collected by the game’s players to build the first pieces of a world model that, they hope, could help guide delivery robots.
Google DeepMind and World Labs are currently focusing their efforts on building models that can generate interactive, 3D virtual environments from a combination of text, images, and in the case of World Labs, video prompts. Such tools could be used to streamline the design of video games and immersive VR experiences, but compared with large language models, they seem to have a limited range of applications. The real breakthroughs are likely to come from integrating such systems into flexible, intelligent agents that can represent their environments, predict the consequences of their actions, and then decide what to do.

