A common way of training a language network is by feeding it lots of text from websites and news outlets with some of the words masked out and asking it to guess the masked-out words.
What makes language models even more costly to build is that this training process happens many times during development.
This is because researchers want to find the best structure for the network.
But to get that 1% improvement, one researcher might train the model thousands of times, each time with a different structure, until the best one is found.
Researchers at the University of Massachusetts Amherst estimated the energy cost of developing AI language models by measuring the power consumption of common hardware used during training.
They found that training BERT once has the carbon footprint of a passenger flying a round trip between New York and San Francisco.
AI models are trained on specialized hardware like graphics processor units, which draw more power than traditional CPUs.
All of this means that developing advanced AI models is adding up to a large carbon footprint.
Unless we switch to 100% renewable energy sources, AI progress may stand at odds with the goals of cutting greenhouse emissions and slowing down climate change.
In my lab’s research, we have been looking at ways to make AI models smaller by sharing weights or using the same weights in multiple parts of the network.