MuZero could soon be put to practical use too. Dr Silver said DeepMind was already using it to try to invent a new kind of video compression.
“If you look at data traffic on the internet, the majority of it is video, so if you can compress video more effectively you can make massive savings,” he explained.
“And initial experiments with MuZero show you can actually make quite significant gains, which we’re quite excited about.”
However, as Google owns the world’s biggest video-sharing platform – YouTube – it has the potential to be a big money-saver.
The firm believes it has been successful because MuZero only tries to model aspects of the environment that are important to its decision-making process, rather taking a wider approach.
The Nature paper reports that MuZero proved to be slightly better than AlphaZero at playing Go, despite doing less tree-search computation per move.
And it said it also outperformed R2D2 – the leading Atari-playing algorithm that does not model the world – at 42 of the 57 games tested on the old console.
Moreover, it did so after completing just half the amount of training steps. Both achievements point to the fact that MuZero is effectively able to squeeze out more insight from less data than had been possible before, explained Dr Silver.
“Imagine you’ve got a robot and it’s wandering about in the real world and it’s expensive to run,” he said.
“So, you want it to learn as much as possible from the small number of experiences it has. MuZero is able to do that.”