Sora builds on the tech behind OpenAI’s image-generating DALL-E tool. It interprets a user’s prompt, expanding it into a more detailed set of instructions, and then uses an AI model trained on video and images to create the new video.
The quality of AI-generated images, audio and video has rapidly increased over the past year, with companies like OpenAI, Google, Meta and Stable Diffusion racing to make more capable tools and find ways to sell them. At the same time, democracy advocates and AI researchers have warned that the tools are already being used to trick and lie to voters.
This isn’t the first time such videos or audio have been created and other companies have built their own text-to-video AI generators. Google is testing one called Lumiere, Meta has a model called Emu, and AI start-up Runway has already been building products to help filmmakers create videos. But AI experts and analysts said the length and quality of the Sora videos went beyond what has been seen up to now.
“I didn’t expect this level of sustained, coherent video generation for another two to three years,” said Ted Underwood, a professor of information science at University of Illinois at Urbana-Champaign. While he cautioned that OpenAI probably chose videos that show the model at its best, he said that “it seems like there’s been a bit of a jump in capacity” from other text-to-video tools.
In Pakistan, former prime minister Imran Khan has used AI to create a digital version of himself giving speeches, even though he is in prison. An ad supporting Florida Gov. Ron DeSantis’s now-defunct campaign for Republican presidential nominee used an AI audio generator to mimic the voice of former president Donald Trump.
The tech companies building the tools say they are monitoring the use of their tools and have instituted some policies against using them to produce political content. But enforcement is spotty. In January, OpenAI suspended a developer that had made a bot of the Democratic candidate Dean Phillips, only after a report in The Washington Post. The developer had made similar bots of political candidates in the fall.
The rapid improvement in the technology is sending people in a wide variety of industries from filmmaking to the news business scrambling to understand how it might impact their work.
AI video generators have already caused a stir in Hollywood. Making films is expensive, time-consuming, and requires dozens or hundreds of people. Some technologists have theorized that AI could allow a single person to make a film with the same visual complexity as a Marvel blockbuster.
“Look where we’ve come just in a year of image generation. Where are we going to be in a year?” said Michael Gracey, a film director and visual effects expert who has been following AI’s impact on the industry closely. Gracey predicts that soon AI tools like Sora will allow filmmakers to carefully control their output, creating all sorts of videos from scratch.
“They won’t need a team of 100 or 200 artists over a three-year period to make their animated feature,” he said. “To me, that’s exciting.”
At the same time, Gracey said, the fact that AI tools are trained on the work of real-life artists without compensating them is a big problem. “It’s not great when it’s taking other people’s creativity and work and ideas and execution, and not giving them the due credit and financial remuneration which they deserve.”
Mutale Nkonde, a visiting policy fellow at the Oxford Internet Institute, said the idea that anyone can readily turn text into video is exciting. But she worries about how these tools might embed societal biases, their impacts on people’s livelihoods, and their ability to turn hateful texts or descriptions of harrowing real-world events into distressingly realistic footage.
Recent strikes by writers and actors guilds, Nkonde said, began to address questions about the use of AI language tools in screenwriting and the use of actors’ likenesses in AI-generated scenes. But she said tools like Sora raises new questions, such as whether human extras will even be needed. “From a policy perspective, do we need to start thinking about ways we can protect humans that should be in the loop when it comes to these tools?”
The quality of the Sora videos, especially the ones meant to look like real life, is higher than what most other AI companies have been able to produce so far.
Arvind Narayanan, a professor of computer science at Princeton University, said Sora “appears to be significantly more advanced than any other video generation tool,” based on the videos that OpenAI released Thursday. He said that is likely to result in “deepfake” videos that are harder for people to recognize as AI-generated.
If you look closely at some of the videos, he said, you can still spot numerous inconsistencies. For instance, he pointed out in a post on X that a woman’s right and left legs switch places in the video of a Tokyo street and people in the background disappear after something passes in front of them.
Still, a casual viewer might not notice such details, he added. “Sooner or later, we need to adapt to the fact that realism is no longer a marker of authenticity.”