A billionaire-backed movement is recruiting college students to fight killer AI, which some see as the next Manhattan Project.
On that last front, Edwards thought young people would be worried about immediate threats, like AI-powered surveillance, misinformation or autonomous weapons that target and kill without human intervention — problems he calls “ultraserious.” But he soon discovered that some students were more focused on a purely hypothetical risk: That AI could become as smart as humans and destroy mankind.
Science fiction has long contemplated rogue AI, from HAL 9000 to the Terminator’s Skynet. But in recent years, Silicon Valley has become enthralled by a distinct vision of how super-intelligence might go awry, derived from thought experiments at the fringes of tech culture. In these scenarios, AI isn’t necessarily sentient. Instead, it becomes fixated on a goal — even a mundane one, like making paper clips — and triggers human extinction to optimize its task.
To prevent this theoretical but cataclysmic outcome, mission-driven labs like DeepMind, OpenAI and Anthropic are racing to build a good kind of AI programmed not to lie, deceive or kill us. Meanwhile, donors such as Tesla CEO Elon Musk, disgraced FTX founder Sam Bankman-Fried, Skype founder Jaan Tallinn and ethereum co-founder Vitalik Buterin — as well as institutions like Open Philanthropy, a charitable organization started by billionaire Facebook co-founder Dustin Moskovitz — have worked to push doomsayers from the tech industry’s margins into the mainstream.
More recently, wealthy tech philanthropists have begun recruiting an army of elite college students to prioritize the fight against rogue AI over other threats. Open Philanthropy alone has funneled nearly half a billion dollars into developing a pipeline of talent to fight rogue AI, building a scaffolding of think tanks, YouTube channels, prize competitions, grants, research funding and scholarships — as well as a new fellowship that can pay student leaders as much as $80,000 a year, plus tens of thousands of dollars in expenses.
At Stanford, Open Philanthropy awarded Luby and Edwards more than $1.5 million in grants to launch the Stanford Existential Risk Initiative, which supports student research in the growing field known as “AI safety” or “AI alignment.” It also hosts an annual conference and sponsors a student group, one of dozens of AI safety clubs that Open Philanthropy has helped support in the past year at universities around the country.
Critics call the AI safety movement unscientific. They say its claims about existential risk can sound closer to a religion than research. And while the sci-fi narrative resonates with public fears about runaway AI, critics say it obsesses over one kind of catastrophe to the exclusion of many others.
“The conversation is just hijacked,” said Timnit Gebru, former co-lead of Ethical AI at Google.
Gebru and other AI ethicists say the movement has drawn attention away from existing harms — like racist algorithms that determine who gets a mortgage or AI models that scrape artists’s work without compensation — and drown out calls for remedies. Other skeptics, like venture capitalist Marc Andreessen, are AI boosters who say that hyping such fears will impede the technology’s progress.
Open Philanthropy spokesperson Mike Levine said harms like algorithmic racism deserve a robust response. But he said those problems stem from the same root issue: AI systems not behaving as their programmers intended. The theoretical risks “were not garnering sufficient attention from others — in part because these issues were perceived as speculative,” Levine said in a statement. He compared the nonprofit’s AI focus to its work on pandemics, which also was regarded as theoretical until the coronavirus emerged.
The foundation began prioritizing existential risks around AI in 2016, according to a blog post by co-chief executive Holden Karnofsky, a former hedge funder whose wife and brother-in-law co-founded the AI start-up Anthropic and previously worked at OpenAI. At the time, Karnofsky wrote, there was little status or money to be gained by focusing on risks. So the nonprofit set out to build a pipeline of young people who would filter into top companies and agitate for change from the inside.
Colleges have been key to this growth strategy, serving as both a pathway to prestige and a recruiting ground for idealistic talent. Over the past year and a half, AI safety groups have cropped up on about 20 campuses in the United States and Europe — including Harvard, Georgia Tech, MIT, Columbia and New York University — many led by students financed by Open Philanthropy’s new university fellowship.
The clubs train students in machine learning and help them find jobs in AI start-ups or one of the many nonprofit groups dedicated to AI safety.
Many of these newly minted student leaders view rogue AI as an urgent and neglected threat, potentially rivaling climate change in its ability to end human life. Many see advanced AI as the Manhattan Project of their generation.
Among them is Gabriel Mukobi, 23, who graduated from Stanford in June and is transitioning into a master’s program for computer science. Mukobi helped organize a campus AI safety group last summer and dreams of making Stanford a hub for AI safety work. Despite the school’s ties to Silicon Valley, Mukobi said it lags behind nearby UC Berkeley, where younger faculty members research AI alignment, the term for embedding human ethics into AI systems.
“This just seems like a really, really important thing,” Mukobi said, “and I want to make it happen.”
When Mukobi first heard the theory that AI could eradicate humanity, he found it hard to believe. At the time, Mukobi was a sophomore on a gap year during the pandemic. Back then, he was concerned about animal welfare, promoting meat alternatives and ending animal agriculture.
But then Mukobi joined Stanford’s club for effective altruism, known as EA, a philosophical movement that advocates doing maximum good by calculating the expected value of charitable acts, like protecting the future from runaway AI. By 2022, AI capabilities were advancing all around him — wild developments that made those warnings seem prescient.
Last summer, he announced the Stanford AI Alignment group (SAIA) in a blog post with a diagram of a tree representing his plan. He’d recruit a broad group of students (the soil) and then “funnel” the most promising candidates (the roots) up through the pipeline (the trunk). To guard against the “reputational hazards” of toiling in a field some consider sketchy, Mukobi wrote, “we’ll prioritize students and avoid targeted outreach to unaligned AI professors.”
Among the reputational hazards of the AI safety movement is its association with an array of controversial figures and ideas, like EA, which is also known for recruiting ambitious young people on elite college campuses.
EA’s drive toward maximizing good initially meant convincing top graduates in rich countries to go into high-paying jobs, rather than public service, and donate their wealth to causes like buying mosquito nets to save lives in malaria-racked countries in Africa.
But from the start EA was intertwined with tech subcultures interested in futurism and rationalist thought. Over time, global poverty slid down the cause list, while rogue AI climbed toward the top. Extreme practitioners began to promote an idea called “longtermism,” prioritizing the lives of people potentially millions of years in the future, who might be a digitized version of human beings, over present-day suffering.
In the past year, EA has been beset by scandal, including the fall of Bankman-Fried, one of its largest donors. Another key figure, Oxford philosopher Nick Bostrom, whose 2014 bestseller “Superintelligence” is essential reading in EA circles, met public uproar when a decades-old diatribe about IQ surfaced in January.
“Black are more stupid than whites,” Bostrom wrote, calling the statement “logically correct,” then using the n-word in a hypothetical example of how his words could be misinterpreted as racist. Bostrom apologized for the slur but little else.
After reading Bostrom’s diatribe, SAIA stopped giving away copies of “Superintelligence.” Mukobi, who identifies as biracial, called the message “sus” but saw it as Bostrom’s failure — not the movement’s.
Mukobi did not mention EA or longtermism when he sent an email to Stanford’s student listservs in September touting his group’s student-led seminar on AI safety, which counted for course credit. Programming future AI systems to share human values could mean “an amazing world free from diseases, poverty, and suffering,” while failure could unleash “human extinction or our permanent disempowerment,” Mukobi wrote, offering free boba tea to anyone who attended the 30-minute intro.
Students who join the AI safety community sometimes get more than free boba. Just as EA conferences once meant traveling the world and having one-on-one meetings with wealthy, influential donors, Open Philanthropy’s new university fellowship offers a hefty direct deposit: undergraduate leaders receive as much as $80,000 a year, plus $14,500 for health insurance, and up to $100,000 a year to cover group expenses.
The movement has successfully influenced AI culture through social structures built around swapping ideas, said Shazeda Ahmed, a postdoctoral research associate at Princeton University’s Center for Information Technology Policy. Student leaders have access to a glut of resources from donor-sponsored organizations, including an “AI Safety Fundamentals” curriculum developed by an OpenAI employee.
Interested students join reading groups where they get free copies of books like “The Precipice,” and may spend hours reading the latest alignment papers, posting career advice on the Effective Altruism forum, or adjusting their P(doom), a subjective estimate of the probability that advanced AI will end badly. The grants, travel, leadership roles for inexperienced graduates and sponsored co-working spaces build a close-knit community.
Edwards discovered that shared online forums function like a form of peer review, with authors changing their original text in response to the comments.
“It’s really readable writing, which is great,” Edwards said, but it bypasses the precision of vetting ideas through experts. “There’s a kind of alternate universe where the academic world is being cut out.”
Edwards’s first book was on the military origins of AI and he recently served on the United Nations’ chief climate panel, leaving him too rooted in real-world science and politics to entertain the kind of dorm-room musings accepted at face value in the forums.
Could AI take over all the computers necessary to end humanity? “Not happening,” Edwards said. “Too many humans in the loop. And there will be for 20 or 30 years.”
Since the launch of ChatGPT in November, discussion of AI safety has exploded at a dizzying pace. Corporate labs that view advanced artificial intelligence as inevitable and want the social benefits to outweigh the risks are increasingly touting AI safety as the antidote to the worst feared outcomes.
At Stanford, Mukobi has tried to capitalize on the sudden interest.
After Yoshua Bengio, one of the “godfathers” of deep learning, signed an open letter in March urging the AI industry to hit pause, Mukobi sent another email to Stanford student listservs warning that AI safety was being eclipsed by rapid advances in the field. “Everyone” is “starting to notice some of the consequences,” he wrote, linking each word to a recent op-ed, tweet, Substack post, article or YouTube video warning about the perils of unaligned AI.
By then, SAIA had already begun its second set of student discussions on introductory and intermediate AI alignment, which 100 students have completed so far.
“You don’t get safety by default, you have to build it in — and nobody even knows how to do this yet,” he wrote.
In conversation, Mukobi is patient and more measured than in his email solicitations, cracking the occasional self-deprecating joke. When told that some consider the movement cultish, he said he understood the concerns. (Some EA literature also embraces nonbelievers. “You’re right to be skeptical of these claims,” says the homepage for Global Challenges Project, which hosts three-day expenses-paid workshops for students to explore existential risk reduction.)
Mukobi feels energized about the growing consensus that these risks are worth exploring. He heard students talking about AI safety in the halls of Gates, the computer science building, in May after Geoffrey Hinton, another “godfather” of AI, quit Google to warn about AI. By the end of the year, Mukobi thinks the subject could be a dinner-table topic, just like climate change or the war in Ukraine.
Luby, Edwards’s teaching partner for the class on human extinction, also seems to find these arguments persuasive. He had already rearranged the order of his AI lesson plans to help students see the imminent risks from AI. No one needs to “drink the EA Kool-Aid” to have genuine concerns, he said.
Edwards, on the other hand, still sees things like climate change as a bigger threat than rogue AI. But ChatGPT and the rapid release of AI models has convinced him that there should be room to think about AI safety.
Interest in the topic is also growing among Stanford faculty members, Edwards said. He noted that a new postdoctoral fellow will lead a class on alignment next semester in Stanford’s storied computer science department.
The course will not be taught by students or outside experts. Instead, he said, it “will be a regular Stanford class.”