On Thursday, OpenAI announced Sora, a new model that creates high-definition videos of up to a minute from text queries. Sora, which means “sky” in Japanese, won’t be available to the general public anytime soon. Instead, OpenAI is making it available to a small group of academics and researchers who will assess the damage and its potential for abuse.
“Sora is capable of creating complex scenes with multiple characters, specific types of movement, and precise details of subject and background,” the company said. on their website. “The model not only understands what the user requested in the query, but also how they exist in the physical world.”
One of the videos created by Sora, which OpenAI shared on its website, shows a couple walking through a snowy Tokyo city while cherry blossom petals and snowflakes are scattered around.
Another shows realistic-looking woolly mammoths walking across a snowy meadow against a backdrop of snow-capped mountain ranges.
Suggestion: “Several giant woolly mammoths approach walking across a snowy meadow, long woolly fur blowing lightly in the wind as they walk, snow-capped trees and dramatic snow-capped mountains in the distance, mid-afternoon wisps of clouds and a high sun Distance… pic.twitter.com/Um5CWI18nS
— OpenAI (@OpenAI) February 15, 2024
OpenAI says the model works as a result of “deep language understanding,” which allows it to accurately interpret text prompts. Still, like almost all AI image and video generators we’ve seen, Sora isn’t perfect. In one example, a query asking for a video of a Dalmatian looking out a window and people “walking and cycling on canal streets” omits the people and streets in the video entirely. OpenAI also warns that the model may struggle to understand cause and effect – for example, it may generate a video of a person eating a cookie, but the cookie may not have bite marks.
Sora isn’t the first model to convert from text to video. Including other companies Meta, Google and Runway, either made text-to-video tools or made them available to the public. Still, no other tool can currently create videos up to 60 seconds. Sora also creates entire videos at once, rather than stitching them together frame by frame like other models, which ensures that the subjects in the video remain the same even if they are temporarily out of sight.
The rise of text-to-video conversion tools has raised concerns about their potential to more easily create fake images that look realistic. “I absolutely fear that this kind of thing will affect less competitive elections,” said Oren Etzioni, a professor of artificial intelligence at the University of Washington and founder of True Media, a political disinformation organization. campaigns, he said The New York Times. And generative artificial intelligence has caught on more broadly backlash from artists and creative professionals concerned about technology being used to replace jobs.
OpenAI he said said it is working with experts in areas such as disinformation, hate content and bias to test the tool before releasing it to the public. The company also develops tools that can detect Sora-generated videos and embed metadata into the generated videos for easy detection. Company refused say Time Sora is taught how, except that he uses both “public videos” and videos licensed from copyright holders.