According to a survey of research into the community’s dataset collection and use practices published earlier this month.
“Data and its (dis)contents: A survey of dataset development and use in machine learning” was written by University of Washington.
The paper concluded that large language models contain the capacity to perpetuate prejudice and bias against a range of marginalized communities and that poorly annotated datasets are part of the problem.
After Google fired Timnit Gebru, an incident Googlers refer to as a case of “unprecedented research censorship,” the company has started carrying out reviews of research papers on “sensitive topics” and that on at least three occasions, authors have been asked to not put Google technology in a negative light, according to internal communications and people familiar with the matter.
And yet a Washington Post profile of Gebru this week revealed that Google AI chief Jeff Dean had asked her to investigate the negative impact of large language models this fall.
That paper examined how the use of large language models can impact marginalized communities.
Nearly 2,000 papers were published at NeurIPS this year, including work related to failure detection for safety-critical systems; methods for faster, more efficient backpropagation; and the beginnings of a project that treats climate change as a machine learning grand challenge.