As the holiday season unfolds, OpenAI has announced a new campaign called the “12 Days of OpenAI,” which involves daily innovations, launches, and demonstrations in artificial intelligence. Among these, the release of Sora Turbo, the next generation of AI video model, has emerged as a highlight of the event. This article takes you through the significance of Sora Turbo and its implications for the AI community.
The Festive Unveiling of Sora Turbo
On day three of the “12 Days of OpenAI” campaign running from December 9th, OpenAI announced an advanced video generation model version called Sora Turbo. This second version of the model first introduced in February is even faster, cheaper, and more functional. These new possibilities in AI-driven video creation are open to ChatGPT Plus and Pro users.
Sora Turbo creates videos through text prompts or user-provided assets. It allows for the creation of high-quality video up to 1080p resolution, and its easy-to-use interface lets people experiment with aspect ratios and the length of videos, making it versatile for different creative needs. The model also has an explore page that is very feature-rich, where users can browse and learn from other people’s work in real-time.
Credits: https://www.youtube.com/@OpenAI
The Evolution of AI Video Generation
Sora Turbo is the next big deal in video generation technology. Diffusing 360-degree panoramas with spatial coherence and the potential to explore, the product makes it possible. For now, OpenAI added a storyboard feature that allows one to make adjustments frame-by-frame in creating seamless sequences.
During the live demo, Sora Turbo impressed with its ability to create realistic, visually stunning video outputs. Whether one is crafting a cinematic sequence or remixing existing footage, Sora represents the growing capabilities of generative AI for video production.
To date, OpenAI has made strict safeguards to avoid any misuse of Sora Turbo. The model cannot generate sexual deepfakes or CSAM content. All the Sora-generated videos carry metadata and watermarks for information on the origin of content. Limitations still persist, like it cannot perform long-duration complex actions, but this will be improved in further versions of OpenAI.
The launch of Sora Turbo coincides with other exciting announcements during the campaign, such as OpenAI’s expansion of its Reinforcement Fine-Tuning Research Program. This initiative allows developers to fine-tune AI models for domain-specific tasks, paving the way for more tailored and efficient AI applications.
Measures to Ensure Responsible AI Deployment
While Sora is not yet available in OpenAI products, it has undergone exhaustive safety measures. OpenAI works closely with a specialized group of red teamers from fields like misinformation, hateful content, and bias. These experts test the model adversarially, exposing vulnerabilities and mitigating risk. The company is also creating advanced tools to detect misleading content. One such tool is the ability of a detection classifier to determine when a video was originally created by Sora. They have committed to being clear and are going to deploy C2PA metadata going forward in the releases should it be used in other products by OpenAI.
OpenAI is borrowing what they have learned regarding developing safety mechanisms for some models, such as DALL·E 3, where those mechanisms are as transferable to Sora. For instance, OpenAI’s text classifiers will screen and reject any input prompts that contravene usage policies, such as those seeking to acquire content that may harm people through violence, sexual images, or the use of celebrities’ likenesses. More than that, there will be strong image classifiers checking each frame of the video before a user gets access to it to make sure it fits within OpenAI’s safety standards.
Reaching Out to the Global Community
OpenAI is also reaching out to policymakers, educators, and artists around the world to discuss concerns and identify positive use cases for Sora. Despite their best efforts in research and testing, OpenAI acknowledges that they cannot predict all possible applications of the technology, both good and bad. Thus, they believe that learning from real-world usage is crucial for the ongoing improvement and safe deployment of AI systems. This feedback loop will refine the technology in due course to make sure it keeps on improving responsibly.
The Technology of Sora
Sora is built on a diffusion model. This is a new concept that creates videos from nothing, beginning with noise, and gradually refining this noise through a series of steps until it produces a coherent video. It not only creates videos from scratch but also extends them, thereby making longer sequences from the original content. One of the most impressive features of Sora is its ability to maintain continuity in a video even when the subject temporarily goes out of view—an issue that has long challenged video generation models.
Similar to the GPT models, Sora uses a transformer architecture that makes it efficient in scaling. The model encodes videos and images as sequences of smaller units referred to as patches, analogous to tokens in GPT. This unified representation of data allows Sora to handle a vast array of visual data, covering varying lengths, resolutions, and aspect ratios, which makes it versatile.
Techniques that have been used in past research are borrowed in DALL·E and GPT models by Sora. For example, a recaptioning technique borrowed from DALL·E 3, in which very descriptive captions are created for visual training data, helps Sora better adhere to text instructions provided by the user in creating videos and improves its fidelity and responsiveness.
Apart from the ability to create videos based on text prompts, Sora can also animate any still image with impressive detail. Even the extension or filling in of missing frames in any existing video is possible to ensure that the content remains consistent and coherent in a video. This makes Sora a powerful tool for dynamic, interactive video content based on static images or an incomplete video sequence.
Sora – A Basis for Future AI Models
Sora acts as the base for further AI models aiming to grasp and replicate reality. It is an essential step in achieving the objective of Artificial General Intelligence (AGI), something to which OpenAI is devoted. Sora brings us one step closer to a world in which AI can produce and transform video content based on visual as well as text inputs for simulating very realistic environments and interactions in an adaptive manner.