Sora is an OpenAI artificial intelligence model capable of creating hyper-realistic videos from a text description. We’ll tell you everything you need to know about this tool.
One of the significant innovations introduced by OpenAI this year is Sora. It’s an artificial intelligence model capable of creating hyper-realistic videos from a text description. In its initial version, Sora proves powerful enough to prompt a reevaluation of the authenticity of online content and holds the potential to profoundly transform the content creation and entertainment industry.
Sora’s operational principle aligns closely with other generative AI tools like DALL-E. Users of the new OpenAI platform simply input a text description or prompt, and the application handles the rest. With Sora, however, the outcome is not a still image but a video, up to 60 seconds long and in Full HD (1080p).
While Sora impresses with the quality and detail of its videos, capable of interpreting extremely precise and technical instructions, what truly sets it apart is how rapidly OpenAI has advanced the capabilities of AI-generated clips. Just a year ago, the notion of creating such realistic visual content from text seemed implausible.
As noted by YouTuber Marques Brownlee some time ago, the contrast between the video of Will Smith eating spaghetti and what Sora can now accomplish is striking. Early attempts, such as the endeavor to recreate Heidi with AI or the beer advertisement deemed “fantastically horrendous,” further underscore the remarkable progress achieved.
- What is Sora?
- How to use Sora?
- How Sora has been trained: Between Secrecy and Controversy
- AI seeks a place in Hollywood
- When will Sora be Available?
What is Sora?
Sora was announced in mid-February, causing great surprise among the public. Let us remember that OpenAI had just experienced an extremely turbulent end of 2023 following the dismissal and subsequent return of Sam Altman as CEO. Nobody expected that just a few months later the startup would shake up the market again with an app this impressive.
At the end of January, researchers from Google Research and other institutions had presented Lumiere, an AI for generating animated videos and images from text. This utility could create clips up to 5 seconds long, with 80 frames at 16 frames per second and a resolution of 1024 x 1024 pixels. The results with this tool were… interesting. But Sora’s appearance on the scene just a few weeks later revealed its limitations and showed that OpenAI’s technology was playing in another league.
After Sora’s announcement, several applications have appeared trying to position themselves as feasible alternatives. However, most leave a lot to be desired. So far, the only tools that have been presented as potential contenders are Haiper and Stable Video 3D. But the first can create videos of only 2 or 4 seconds, and the second specializes in generating multiple 3D views of the same object.
Read Also: Microsoft’s Violation of OpenAI Principles: Offering AI to US Military
How to use Sora?
As we indicated at the beginning, the operation of Sora does not differ from that of other applications or tools based on generative AI, whether they are from OpenAI or not. Users write a text description, and the model is responsible for converting it into a video of up to one minute in length and with 1080p resolution.
Following its announcement, Sam Altman boasted about how to use Sora and its capabilities through X (Twitter), using the platform to create clips following instructions from his followers. The startup’s leader stuck out his chest and said that the technology was capable of solving any request, no matter how complex it was.
This is because the video generator has used the same recaptioning technique as DALL-E 3. This means that the visual material used to train Sora has incorporated extremely specific text descriptions, giving it the ability to understand user instructions more faithfully.
“Like GPT models, Sora uses a transformer architecture, unlocking superior scaling performance. We represent videos and images as collections of smaller data units called patches, each of which is similar to a token in “GPT. By unifying the way we represent data, we can train diffusion transformers on a broader range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios.” OpenAI
While the main feature when using Sora is the ability to create videos from a text description, it is not limited to that. In fact, artificial intelligence can also be used to animate an already existing image. And to this is added the option to generate frames to extend the duration of a video. The latter has great potential in content creation and production environments, and Adobe intends to integrate this technology into Premiere Pro.
How Sora has been trained: Between Secrecy and Controversy
That Sora has managed to evolve the creation of videos with artificial intelligence in such a short time is raising suspicions. Fingers have already been pointed online against OpenAI for allegedly using copyrighted material to train this model. However, this has not yet been proven.
Speculations in this regard resurfaced again in March after an interview by Mira Murati, the startup’s chief technology officer, with The Wall Street Journal. When asked if she had trained Sora with videos taken from YouTube, Facebook, and Instagram, the executive said she didn’t know. She then attempted to navigate the situation by asserting that she had used publicly available information and licensed data.
Despite this, details about where the images used to create Sora came from remain scarce. The only certainty is that OpenAI has teamed up with Shutterstock, one of the world’s leading providers of stock photos and videos.
Meanwhile, YouTube issued a warning to Sam Altman’s team: using their videos to train Sora is against the rules. Neal Mohan, CEO of the platform, admitted to having no firsthand knowledge that OpenAI was failing to comply with its guidelines. But even so, he took the opportunity to remind those in San Francisco: YouTube’s terms of use prohibit the use of full videos to train language models, and they also do not allow scraping fragments of clips or audio transcriptions.
AI seeks a place in Hollywood
One of the great objectives that OpenAI pursues is for Sora to carve out a place for itself in Hollywood. The company has already begun lobbying major film and TV studios to give artificial intelligence a chance. Members of the startup have also met with representation agencies and other industry executives to promote this tool.
Additionally, Sam Altman is said to have given access to Sora to a small number of top-level filmmakers and performers. The intention was for them to test the technology and analyze its potential adoption in future series or films. Of course, the use of AI in large productions remains a source of controversy and has fueled strong debates during the 2023 screenwriters’ and actors’ strikes.
This is a topic that promises to generate much more discussion throughout 2024.
Read Also: Google Introduces SIMA, an AI that can Play Video Games like Humans
When will Sora be Available?
The million-dollar question: when will Sora be available? Although we cannot access it yet, this could change in the not-too-distant future. Mira Murati acknowledged some time ago that OpenAI’s idea is to launch the video generator this year, although she did not offer a specific date. As the executive mentioned in March, its launch would occur in “a few months.”
For now, this tool remains in the testing stage. The Californian firm is working with designers, filmmakers, and visual artists to fine-tune the technology as much as possible. Major AI companies are under a lot of scrutiny for using their technology to create deepfakes, so the general availability of something as powerful as Sora should not be taken lightly.
It is expected that, when Sora is available, the model will include safeguards to prevent videos from being created with public figures such as politicians, singers, actors, activists, businessmen, and others. In the technical realm, developers aspire for artificial intelligence to be able to generate clips with audio, something that is not possible until now.
We’ll see what new developments come on this front. But, prepare your wallet because Sora is unlikely to be free. Murati warned that training it is much more expensive compared to other AI models. OpenAI’s intention would be to offer access in exchange for a price similar to that of the DALL-E API. We’ll see if this materializes.