Introducing the New Version of Stable Diffusion: What’s New and Improved?

Stylish woman with a black hat reflected in a mirror surrounded by lights - new version of stable diffusion

Introducing the New Version of Stable Diffusion: What’s New and Improved?

Our steadfast companion has undergone a makeover. The new version of Stable Diffusion has a new haircut and a mysterious smile. With its latest iterations, including Stable Cascade, Stable Video Diffusion, and the still popular SDXL 1.0, this tool not only changes production pipeline of digital art but also invites us to rethink our relationship with technology and the current entertainment industry.

What is the New Version of Stable Diffusion?

Stable Cascade is a text-to-image model that uses a three-stage approach to generate images from various types of prompts, such as text, sketches, or images. It is exceptionally easy to train and finetune on consumer hardware, and produces remarkable outputs with less resources than other models.

Stable Video Diffusion is a text-to-image model that can transform still images into high-resolution videos. It represents a significant advancement in AI-driven content creation, offering applications in various sectors like advertising, education, and entertainment.

SDXL 1.0 is a text-to-image model that can produce realistic and diverse images from a text prompt in a single network evaluation. It is the best open model for image generation, according to human evaluations. It can generate images of high quality in virtually any art style and is the best open model for photorealism³.

The Fresh Faces of Stable Diffusion

Introducing the New Version of Stable Diffusion: What’s New and Improved?

Stable Cascade: Bridging Text and Images

The Würstchen Architecture

Stable Cascade is the latest AI image generation model from Stability AI based on the Würstchen architecture. The model is extremely easy to run and train on consumer-grade hardware. One of the most significant advantages offered by Stable Cascade is its affordability in terms of training costs without compromising on quality or speed.

Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images. The model uses a highly compressed latent space of 24×24 pixels to encode a 1024×1024 image, achieving remarkable outputs while utilizing less resources than other models like Stable Diffusion.

Stable Cascade also performs well in both prompt alignment and aesthetic quality, according to human evaluations. The model can generate images from various types of prompts, such as text, sketches, or images. It can also be customized with extensions like finetuning, LoRA, ControlNet, and more.

Stable Cascade is currently in research preview and is available for non-commercial use only. You can find more information and examples on the Stability AI website or the GitHub repository.

Three Stages, One Vision

Stage A: Latent Transformation
- The journey begins with transforming user inputs into compact 24×24 latents. These condensed representations serve as the building blocks for what’s to come.
Stage B: Image Compression
- In Stage B, Stable Cascade compresses images using the aforementioned latents. The results are nothing short of remarkable—high-quality outputs that defy hardware limitations.
Stage C: The Latent Generator
- The true magic lies in Stage C. Here, the latent generator decouples text-conditional generation from pixel space decoding. What does this mean? It allows for additional training and fine-tuning, all while achieving a jaw-dropping 16x cost reduction compared to training a similar-sized Stable Diffusion model.

Ease of Training and Customization

Stable Cascade doesn’t just dazzle with its architecture; it’s also remarkably user-friendly. Training and fine-tuning become accessible even on consumer hardware. Stability AI generously provides training and inference code on their GitHub page, empowering users to customize the model to their heart’s content.

Research Preview and Beyond

As of now, Stable Cascade is in research preview. Its hierarchical compression of images opens up new possibilities, making it a compelling choice for creators, researchers, and visionaries alike.

In conclusion, Stable Cascade isn’t just another model—it’s a leap forward. Keep an eye on its progress, and who knows? Perhaps your next creative project will be powered by Stable Cascade’s latent magic.

Stable Video Diffusion: Next Generation Cinematics

What Is Stable Video Diffusion?

Stable Video Diffusion is a part of Stability AI’s suite of generative models, capable of transforming still images into high-resolution videos. It represents a significant advancement in AI-driven content creation, offering applications in various sectors like advertising, education, and entertainment.

Stable Video Diffusion is based on the image model Stable Diffusion, which uses a diffusion process to generate realistic and diverse images from text or image prompts. Stable Video Diffusion extends this process to the temporal domain, generating 2-4 second videos conditioned on an input image.

Stable Video Diffusion is currently available for use under a non-commercial community license, which includes the use and content restrictions found in Stability’s Acceptable Use Policy. You can find more information and examples on the Stability AI website or the Hugging Face documentation.

Stable Video Diffusion: AI Video Will Transform the Film Industry

Discover how Stable Video Diffusion, powered by AI, is set to revolutionize the film industry. Create stunning videos with just a few clicks.

The Architecture

Temporal Consistency
- Stable Video Diffusion maintains temporal consistency throughout the video generation process. Each frame transitions smoothly into the next, ensuring a coherent visual narrative.
- Imagine a serene landscape transforming from dawn to dusk—the sun rising, shadows shifting, and leaves rustling—all seamlessly stitched together.
Latent Dynamics
- The heart of Stable Video Diffusion lies in its latent space. Think of it as a hidden dimension where the dynamics of the scene reside.
- By capturing these latent dynamics, the model orchestrates graceful transitions, whether it’s a flower blooming, a wave crashing, or a dancer twirling.
Fine-Tuning Flexibility
- Stability AI, the brains behind this innovation, has made fine-tuning accessible. Creators can shape their videos with precision, adjusting parameters to match their artistic vision.
- Whether you’re an animator, a filmmaker, or a content creator, Stable Video Diffusion empowers you to breathe life into your visual stories.

Applications and Impact

Artistic Expression
- Poets have long painted vivid imagery with words. Now, imagine turning a poetic stanza into a moving canvas. Stable Video Diffusion allows artists to animate their visions, blurring the lines between stillness and motion.
Visual Effects
- Hollywood thrives on visual effects. Stable Video Diffusion adds a new tool to the filmmaker’s arsenal. From mind-bending illusions to realistic simulations, it’s a game-changer.
- Picture a dragon taking flight, its scales shimmering, or a futuristic cityscape bustling with hovercars—all brought to life by Stable Video Diffusion.
Educational Content
- Learning becomes dynamic when concepts unfold before our eyes. Whether it’s explaining the water cycle, the solar system, or historical events, Stable Video Diffusion enhances educational content.
- Students can watch molecules collide, continents drift, or historical figures come alive—an immersive learning experience.

Stable Zero123 The NexGen 3d Assets Generation

Stable Zero123 is an advanced AI model specialized in generating 3D objects. Developed by Stability AI, it stands out due to its capability to accurately interpret how objects should appear from various perspectives, which is a significant advancement in 3D visualization.

Stable Zero123 is a model for view-conditioned image generation based on Zero123, a text-to-3D model developed by Stability AI. It uses Score Distillation Sampling to produce high-quality 3D models from any input image or text, and supports threestudio for open research in 3D object generation. You can find more information and examples on the Stability AI website or the Hugging Face documentation.

View-Conditioned Image Generation:
- Stable Zero123 builds upon the foundation of Zero123, a text-to-3D model.
- It focuses on view-conditioned image generation, meaning it produces novel views of 3D objects based on specific angles or perspectives.
Improved Data Rendering and Conditioning:
- Stability AI enhanced the data rendering process and fine-tuned the model conditioning strategies.
- As a result, Stable Zero123 demonstrates improved performance compared to the original Zero123 and its subsequent iteration, Zero123-XL.
Score Distillation Sampling (SDS):
- By using SDS along with Stable Zero123, high-quality 3D models can be generated from any input image.
- The process can also extend to text-to-3D generation by first generating a single image using SDXL Turbo and then using SDS on Stable Zero123 to create the 3D object.
Open Research and Threestudio Integration:
- Stability AI supports open research in 3D object generation.
- Stable Zero123 is compatible with threestudio, allowing researchers and creators to explore and experiment with 3D mesh generation.

Licensing and Usage

Stable Zero123 comes in two versions:
1. Stable Zero123: Includes some CC-BY-NC 3D objects and is suitable for non-commercial and research purposes.
2. Stable Zero123C (“C” for “Commercially-available”): Trained only on CC-BY and CC0 3D objects, it can be used commercially with an active Stability AI membership.

Stable Zero123 and Stable Video Diffusion is pre-installed in WebUI Forge. Learn how to install it below and try it out.

How to Install WebUI Forge: A Faster Way to Use Stable Diffusion

Installing the WebUI Forge for Stable Diffusion requires a solid groundwork. If you’ve been following our guide series, you’ve likely laid down this essential foundation. This tutorial builds upon the preparatory steps detailed in our previous blog so that you can learn how to Install WebUI Forge for Stable Diffusion.

SDXL Turbo

SDXL Turbo is a new text-to-image model that can generate realistic and diverse images from a text prompt in a single network evaluation. It is based on a novel distillation technique called Adversarial Diffusion Distillation (ADD), which leverages a large-scale image diffusion model as a teacher signal and combines it with an adversarial loss to ensure high image fidelity. SDXL Turbo is developed by Stability AI and is currently available for non-commercial research use only. You can test SDXL Turbo on Stability AI’s image editing platform Clipdrop.

SDXL 1.0

SDXL 1.0 is a text-to-image generative model that can produce realistic and diverse images from a text prompt in a single network evaluation. It is developed by Stability AI and is based on a novel distillation technique called Adversarial Diffusion Distillation (ADD), which leverages a large-scale image diffusion model as a teacher signal and combines it with an adversarial loss to ensure high image fidelity. SDXL 1.0 is one of the largest and best open models for image generation, according to human evaluations. You can test SDXL 1.0 on Stability AI’s image editing platform Clipdrop.

SDXL 1.0 has one of the largest parameter counts of any open access image model, boasting a 3.5B parameter base model and a 6.6B parameter model ensemble pipeline. The full model consists of a mixture-of-experts pipeline for latent diffusion: In the first step, the base model generates (noisy) latents, which are then further processed with a refinement model specialized for the final denoising steps. Note that the base model can also be used as a standalone module.

SDXL 1.0 is the best open model for image generation, according to human evaluations. It can generate images of high quality in virtually any art style and is the best open model for photorealism. It can also generate concepts that are notoriously difficult for image models to render, such as hands and text or spatially arranged compositions.

SDXL 1.0 is currently available for non-commercial research use only. You can test SDXL 1.0 on Stability AI’s image editing platform Clipdrop. You can also find more information and examples on the Stability AI website or the Hugging Face documentation.

Key Features of Stable Diffusion XL

SDXL doesn’t just push the boundaries of photorealism; it shatters them, creating rich visuals with a heightened level of aesthetics. The model also offers enhanced image composition and face generation capabilities, transforming brief text prompts into vividly descriptive imagery. SDXL’s ability to produce legible text is yet another leap forward.

But SDXL isn’t confined to just text-to-image prompting. The model demonstrates impressive versatility, with features like image-to-image prompting (creating variations of a given image), inpainting (reconstructing missing parts of an image), and outpainting (extending an existing image seamlessly).

SDXL in Action

Stability AI’s premium consumer imaging application, DreamStudio, is powered by SDXL, as are popular third-party apps like NightCafe Studio. The model has already made waves in the beta testing community, with users posting awe-inspiring imagery online and in community forums.

For those eager to harness the power of SDXL, the model will be open-sourced following the release of the SDXL API. Like all of Stability AI’s open-source models, SDXL is designed for accessibility, opening the door for millions of talented individuals who can create incredible things with this state-of-the-art model.

Learn More and Test Drive SDXL

Want to experience Stable Diffusion XL first-hand? Head over to DreamStudio, or get access to the Stable Diffusion XL’s API. To try out and Install SDXL, go to the link below.

How to Install SDXL 1.0 for Automatic1111: A Step-by-Step Guide

Welcome to this step-by-step guide on How to install SDXL 1.0 for Automatic1111. This blog post aims to streamline the installation process for you, so you can quickly utilize the power of this cutting-edge image generation model released by Stability AI.

Introducing the New Version of Stable Diffusion: What’s New and Improved?

What is the New Version of Stable Diffusion?

The Fresh Faces of Stable Diffusion

Table of Contents

Introducing the New Version of Stable Diffusion: What’s New and Improved?

Stable Cascade: Bridging Text and Images

The Würstchen Architecture

Three Stages, One Vision

Ease of Training and Customization

Research Preview and Beyond

Stable Video Diffusion: Next Generation Cinematics

What Is Stable Video Diffusion?

The Architecture

Applications and Impact

Stable Zero123 The NexGen 3d Assets Generation

Licensing and Usage

SDXL Turbo

SDXL 1.0

Key Features of Stable Diffusion XL

SDXL in Action

Learn More and Test Drive SDXL

Understanding Stable Diffusion:

Tags And Categories

Share this post

Leave a Reply Cancel reply

categories

Hyprr Academy

Tag Cloud

AI Workflow News

Recent Posts From Blog

Blending Photography with the Language of Film.