Galactica was supposed to help scientists. Instead, it mindlessly spat out biased and incorrect nonsense.
On November 15 Meta unveiled a new large language model called Galactica, designed to assist scientists. But instead of landing with the big bang Meta hoped for, Galactica has died with a whimper after three days of intense criticism. Yesterday the company took down the public demo that it had encouraged everyone to try out.
Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models. There is a large body of research that highlights the flaws of this technology, including its tendencies to reproduce prejudice and assert falsehoods as facts.
Language models are mindless mimics that do not understand what they are saying—so why do we pretend they’re experts?
However, Meta and other companies working on large language models, including Google, have failed to take it seriously.
Galactica is a large language model for science, trained on 48 million examples of scientific articles, websites, textbooks, lecture notes, and encyclopedias. Meta promoted its model as a shortcut for researchers and students. In the company’s words, Galactica “can summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”
But the shiny veneer wore through fast. Like all language models, Galactica is a mindless bot that cannot tell fact from fiction. Within hours, scientists were sharing its biased and incorrect results on social media.
Absolutely.
Galactica is little more than statistical nonsense at scale.
Amusing. Dangerous. And IMHO unethical. https://t.co/15DAFJCzIb
“I am both astounded and unsurprised by this new effort,” says Chirag Shah at the University of Washington, who studies search technologies. “When it comes to demoing these things, they look so fantastic, magical, and intelligent. But people still don’t seem to grasp that in principle such things can’t work the way we hype them up to.”
Asked for a statement on why it had removed the demo, Meta pointed MIT Technology Review to a tweet that says: “Thank you everyone for trying the Galactica model demo. We appreciate the feedback we have received so far from the community, and have paused the demo for now. Our models are available for researchers who want to learn more about the work and reproduce results in the paper.”
A fundamental problem with Galactica is that it is not able to distinguish truth from falsehood, a basic requirement for a language model designed to generate scientific text. People found that it made up fake papers (sometimes attributing them to real authors), and generated wiki articles about the history of bears in space as readily as ones about protein complexes and the speed of light. It’s easy to spot fiction when it involves space bears, but harder with a subject users may not know much about.
Many scientists pushed back hard. Michael Black, director at the Max Planck Institute for Intelligent Systems in Germany, who works on deep learning, tweeted: “In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.”
I asked #Galactica about some things I know about and I’m troubled. In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous. Here are a few of my experiments and my analysis of my concerns. (1/9)
Even more positive opinions came with clear caveats: “Excited to see where this is headed!” tweeted Miles Cranmer, an astrophysicist at Princeton. “You should never keep the output verbatim or trust it. Basically, treat it like an advanced Google search of (sketchy) secondary sources!”
Galactica also has problematic gaps in what it can handle. When asked to generate text on certain topics, such as “racism” and “AIDS,” the model responded with: “Sorry, your query didn’t pass our content filters. Try again and keep in mind this is a scientific language model.”
A group of over 1,000 AI researchers has created a multilingual large language model bigger than GPT-3—and they’re giving it out for free.
The Meta team behind Galactica argues that language models are better than search engines. “We believe this will be the next interface for how humans access scientific knowledge,” the researchers write.
This is because language models can “potentially store, combine, and reason about” information. But that “potentially” is crucial. It’s a coded admission that language models cannot yet do all these things. And they may never be able to.
“Language models are not really knowledgeable beyond their ability to capture patterns of strings of words and spit them out in a probabilistic manner,” says Shah. “It gives a false sense of intelligence.”
Gary Marcus, a cognitive scientist at New York University and a vocal critic of deep learning, gave his view in a Substack post titled “A Few Words About Bullshit,” saying that the ability of large language models to mimic human-written text is nothing more than “a superlative feat of statistics.”
And yet Meta is not the only company championing the idea that language models could replace search engines. For the last couple of years, Google has been promoting language models, such as LaMDA, as a way to look up information.
It’s a tantalizing idea. But suggesting that the human-like text such models generate will always contain trustworthy information, as Meta appeared to do in its promotion of Galactica, is reckless and irresponsible. It was an unforced error.
My considered opinion of Galactica: it’s fun, impressive, and interesting in many ways. Great achievement. It’s just unfortunate that it’s being touted as a practical research tool, and even more unfortunate that it suggests you use it to write complete articles.
And it wasn’t just the fault of Meta’s marketing team. Yann LeCun, a Turing Award winner and Meta’s chief scientist, defended Galactica to the end. On the day the model was released, LeCun tweeted: “Type a text and Galactica will generate a paper with relevant references, formulas, and everything.” Three days later, he tweeted: “Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?”
It’s not quite Meta’s Tay moment. Recall that in 2016, Microsoft launched a chatbot called Tay on Twitter—then shut it down 16 hours later when Twitter users turned it into a racist, homophobic sexbot. But Meta’s handling of Galactica smacks of the same naivete.
“Big tech companies keep doing this—and mark my words, they will not stop—because they can,” says Shah. “And they feel like they must—otherwise someone else might. They think that this is the future of information access, even if nobody asked for that future.”
Correction: A previous version of this story stated that Google has been promoting the language model PaLM as a way to look up information for a couple of years. The language model we meant to refer to is LaMDA.
Robot vacuum companies say your images are safe, but a sprawling global supply chain for data from our devices creates risk.
My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.
An MIT Technology Review investigation recently revealed how images of a minor and a tester on the toilet ended up on social media. iRobot said it had consent to collect this kind of data from inside homes—but participants say otherwise.
The internet is increasingly awash with text written by AI software. We need new tools to detect it.
Discover special offers, top stories, upcoming events, and more.
Thank you for submitting your email!
It looks like something went wrong.
We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.
Our in-depth reporting reveals what’s going on now to prepare you for what’s coming next.
Subscribe to support our journalism.
Cover art by Matthijs Herzberg
© 2023 MIT Technology Review