Last month, the artificial intelligence company DeepMind introduced new software that can take a single image of a few objects in a virtual room and, without human guidance, infer what the three-dimensional scene looks like from entirely new vantage points. Given just a handful of such pictures, the system, dubbed the Generative Query Network, or GQN, can successfully model the layout of a simple, video game-style maze.
There are obvious technological applications for GQN, but it has also caught the eye of neuroscientists, who are particularly interested in the training algorithm it uses to learn how to perform its tasks. From the presented image, GQN generates predictions about what a scene should look like — where objects should be located, how shadows should fall against surfaces, which areas should be visible or hidden based on certain perspectives — and uses the differences between those predictions and its actual observations to improve the accuracy of the predictions it will make in the future. “It was the difference between reality and the prediction that enabled the updating of the model,” said Ali Eslami, one of the project’s leaders.
According to Danilo Rezende, Eslami’s co-author and DeepMind colleague, “the algorithm changes the parameters of its [predictive] model in such a way that next time, when it encounters the same situation, it will be less surprised.”