Niantic builds a visual positioning system using 30 billion Pokémon Go city images, achieving centimeter-level accuracy, and has tested with 1,000 delivery robots.
The globally popular augmented reality game Pokémon Go, developed by Niantic, has an AI company called Niantic Spatial that leverages billions of city images captured by players over the years to create a “visual positioning system” and AI world model capable of understanding the real world. This technology can accurately locate devices in urban environments with unstable GPS signals and has been tested with delivery robots, opening new possibilities for navigation of robots and AI in the real world.
Since its launch in 2016, Pokémon Go has quickly become a worldwide hit, with players capturing Pokémon through their phone cameras in real-world settings. Developed by Niantic, this well-known AR game continues to maintain over 100 million active players annually, even years after its release.
During gameplay, players constantly point their phones at city buildings and landmarks, unintentionally accumulating a vast amount of image data.
Niantic’s AI company, Niantic Spatial, recently announced that it has collected and organized approximately 30 billion photos from urban environments worldwide. These images include precise geographic locations and shooting information such as phone orientation, movement speed, and camera angles. This data is now being used to train AI to build a “world model” that understands real-world spaces.
According to NewsForce, Niantic Spatial’s latest technology is a Visual Positioning System (VPS). This AI model analyzes photos of buildings or landmarks to determine the user’s location with centimeter-level accuracy.
The company states that its database now covers over one million landmark locations worldwide. At each site, thousands of images taken at different times, angles, and weather conditions have been accumulated. By comparing these features, AI can infer the device’s position and viewing direction, providing highly accurate localization results.
Brian McClendon, CTO of Niantic Spatial, explains that this approach differs from traditional GPS, which relies on satellite signals. Instead, VPS uses “what it sees” to determine location:
In dense urban environments with tall buildings, GPS signals often drift, leading to errors of tens of meters or even wrong directions.
While such errors may not significantly impact everyday users, they pose serious issues for robots requiring precise navigation. Therefore, image-based localization technology is a key focus for robotics companies.
Niantic Spatial has partnered with Coco Robotics to test this technology. Coco has deployed about 1,000 delivery robots in multiple cities across the US and Europe, mainly for food and grocery delivery. These robots are roughly the size of small suitcases and can carry up to eight large pizzas or four grocery bags.
The company reports that, despite completing over 500,000 deliveries, GPS inaccuracies sometimes make it difficult for robots to stop precisely at restaurant or customer doors:
With Niantic’s visual positioning model, robots can analyze their surroundings using four onboard cameras to more accurately determine their location and direction, improving delivery reliability.
John Hanke, CEO of Niantic Spatial, states that the initial goal of developing visual positioning technology was to support AR glasses and augmented reality applications. However, with rapid growth in the robotics industry, the company has shifted focus toward robot navigation.
He mentions that they are building a system called “Living Map,” a highly detailed and continuously updated digital world model that adapts as the real environment changes.
In the future, delivery robots, smart devices, and even AR headsets could serve as data sources, constantly transmitting environmental information to keep the digital map aligned with the dynamic real world.
In recent years, AI research has increasingly emphasized the concept of a “world model.” While large language models (LLMs) excel at processing text and knowledge, they still face limitations in understanding physical space and real-world environments.
By integrating maps, images, and environmental data, world models aim to enable AI to comprehend objects, spatial relationships, and environmental changes. Companies like Google DeepMind are developing models capable of generating virtual worlds for training AI agents.
Niantic Spatial takes a different approach by using vast amounts of real-world image data to gradually reconstruct a digital model of the physical environment. As data accumulates, this system could become a crucial infrastructure for robots and AI to understand the real world.