Gebru, a widely respected leader in AI ethics research, is known for coauthoring a groundbreaking paper that showed facial recognition to be less accurate at identifying women and people of color, which means its use can end up discriminating against them.
Jeff Dean, the head of Google AI, told colleagues in an internal email (which he has since put online) that the paper “didn’t meet our bar for publication”.
“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” lays out the risks of large language models—AIs trained on staggering amounts of text data.
The paper, which builds on the work of other researchers, presents the history of natural-language processing, an overview of four main risks of large language models.
Training a version of Google’s language model, BERT, which underpins the company’s search engine, produced 1,438 pounds of CO2 equivalent in Strubell’s estimate—nearly the same as a round-trip flight between New York City and San Francisco.
Gebru’s draft paper points out that the sheer resources required to build and sustain such large AI models means they tend to benefit wealthy organizations, while climate change hits marginalized communities hardest.
This means researchers have sought to collect all the data they can from the internet, so there’s a risk that racist, sexist, and otherwise abusive language ends up in the training data.