Learn and Master the AnimateDiff User Interface in Automatic1111
What is AnimateDiff and How Does it Work?
To start, let me explain what AnimateDiff is and how it works. AnimateDiff is an extension for Automatic1111 that simplifies the creation of GIF animations from static images or text prompts. You can activate it within the Automatic1111 interface. Customizing your animations is made easy through several configurable parameters:
Motion Module:
This module, trained on an extensive dataset of images and videos, generates motion for your animations.
Number of Frames and Frames Per Second:
These parameters allow you to control the length and playback speed of your animations.
AnimateDiff operates through a neural network model, the motion module, which infuses your animations with lifelike and expressive movements. It can transfer motion between various sources, such as from a video to an image or from an image to a text prompt. The output of AnimateDiff is a GIF file composed of a sequence of frames that collectively bring your animation to life.
In the following sections, I’ll explain each of these parameters in detail and walk you through the AnimateDiff User Interface.
Table of Contents
To access the AnimateDiff module, you should be in either the txt2img or img2img tab. Scroll down, and you’ll find a menu labeled “AnimateDiff.” Click on it to expand the user interface.
Motion module: This is the model that generates the motion for your animation. You can choose from different motion modules that have different styles and effects. For example, mm_sd_v14.ckpt is the best of the official models, while TemporalDiff v1.0 adds some temporal consistency to the animation.nnYou can choose from either Motion Models or Motion LoRas.
The difference between motion models and motion LoRAs is that motion models are pre-trained neural networks that generate the motion for your animation based on the image or text prompt, while motion LoRAs are language prompts that you can use to control the motion of your animation manually.
Motion Models are the core component of AnimateDiff, and they determine the style and quality of your animation. There are different motion models that you can choose from, each with its own strengths and weaknesses. For example, mm_sd_v14.ckpt is the best of the official models, but it may produce some artifacts or glitches in some cases.
mm_sd_v14.ckpt
mm_sd_v15.ckpt
mm_sd_v15_v2.ckpt
Motion LoRAs are a feature that allows you to customize the motion of your animation using natural language expressions. You can specify the direction, speed, intensity, and duration of the motion using simple commands. For example, move left slowly for 2 seconds or shake violently for 4 frames. Motion LoRAs can be used with any motion model, and they can override or enhance the motion generated by the model.
To summarize, motion models are the automatic way of creating animations, while motion LoRAs are the manual way of creating animations. You can use them separately or together, depending on your preference and creativity. I hope this clarifies the difference between them.
v2_lora_ZoomIn.ckpt
v2_lora_ZoomOut.ckpt
v2_lora_PanLeft.ckpt
v2_lora_PanRight.ckpt
v2_lora_TiltUp.ckpt
v2_lora_TiltDown.ckpt
v2_lora_RollingClockwise.ckpt
v2_lora_RollingAnticlockwise.ckpt
To understand more about the capabilities of these models, visit my dedicated blog on motion models, as there’s a substantial amount of information to explore.
Download and learn aboutthe power of AnimateDiff Motion Models! Unlock the potential of motion dynamics and LoRa control for captivating animations.
Enable AnimateDiff is a button that turns on AnimateDiff, so always make sure this box is checked if you want to use the extension to create animated gifs.
The ‘Number of frames’ parameter in AnimateDiff is used to specify the number of frames that will be generated in the output. The model is trained with 16 frames, so it will give the best results when the number of frames is set to 16.
The number of frames determines the length of the video and how smooth it appears. A higher number of frames will result in a smoother video, but it will also take longer to generate. Conversely, a lower number of frames will result in a shorter video that is less smooth but can be generated faster.
For example:
If you set the number of frames to 8, AnimateDiff will generate a video with 8 frames. This video will be shorter and less smooth than a video generated with 16 frames, but it will also be generated faster.
In the US, the standard film frame rate is set at 24 frames per second (FPS). If you want to determine the total number of frames for your video, simply multiply your chosen FPS by the duration in seconds.
For example, if you’re aiming for a 6-second video at 24 FPS, the calculation would be 24 (FPS) x 6 (seconds), resulting in 144 frames, which becomes your “Number of frames.
Here are some examples of how different FPS values affect the generated GIFs:
Person running with FPS=8:
The animation is very choppy and unrealistic. The person seems to be teleporting from one position to another.
Person running with FPS=24:
The animation is smooth and realistic. The person seems to be running naturally and fluidly.
Person sleeping with FPS=8:
The animation is slow and dreamy. The person seems to be breathing softly and peacefully.
Person sleeping with FPS=24:
The animation is too fast and unnatural. The person seems to be hyperventilating and restless.
Display Loop Number is a parameter that you can use to control how many times the generated GIF will play. A value of 0 means the GIF will loop infinitely, while a positive value means the GIF will play that many times and then stop. For example, if you set Display Loop Number to 3, the GIF will play three times and then freeze on the last frame.
Closed Loop lets you control how the animation loops back to the first frame. This can make your animation look more natural and smooth, especially if you have a high number of frames.
To use ‘Closed loop’ you need to enable by clicking 1 of the 4 modes listed below.
This will tell the extension to try to make the last frame of your animation the same as the first frame. You can also choose one of these options for controlling the loop behavior:
N: No loop:
The animation will not loop back to the first frame.
R-P: Reduce loop:
Prompt. The extension will try to reduce the number of closed loop context. The prompt travel will not be interpolated to be a closed loop.
R+P: Reduce loop + Prompt:
The extension will try to reduce the number of closed loop context. The prompt travel will be interpolated to be a closed loop.
A: Auto:
The extension will automatically decide whether to make a closed loop or not based on the prompt and the motion module.
When should you use it?
N: No loop:
You should use this option when you do not want your animation to loop back to the first frame. nFor example, if you want to animate a text prompt like “A rocket launching into space”, you might not want the rocket to come back to the ground after reaching the sky. In this case, you can use the N option to make your animation end with the rocket in space.
R-P: Reduce loop – Prompt:
You should use this option when you want to reduce the number of closed loop context in your animation. This means that the extension will try to avoid making the animation look like it is repeating itself. nFor example, if you want to animate a text prompt like “A bird flying in the sky”, you might not want the bird to fly in a circle or a figure-eight pattern. In this case, you can use the R-P option to make your animation look more natural and random.
R+P: Reduce loop + Prompt:
You should use this option when you want to reduce the number of closed loop context in your animation and also make the prompt travel a closed loop. This means that the extension will try to avoid making the animation look like it is repeating itself, but also make sure that the prompt ends where it started. nFor example, if you want to animate a text prompt like “A car driving around a city”, you might not want the car to drive in a straight line or a zigzag pattern, but also make it return to its original location. In this case, you can use the R+P option to make your animation look more realistic and consistent.
A: Auto:
You should use this option when you want the extension to automatically decide whether to make a closed loop or not based on the prompt and the motion module. This means that the extension will use its own logic and creativity to make your animation look as good as possible. nFor example, if you want to animate a text prompt like “A unicorn dancing on a rainbow”, you might not have a clear idea of how to make it loop or not. In this case, you can use the A option to let the extension surprise you with its output.
The context batch size determines how many frames are processed simultaneously. A higher context batch size will result in faster processing times, but it will also require more memory. Conversely, a lower context batch size will result in slower processing times but will require less memory.
How Context Batch Size affects AnimateDiff
Context Batch Size is a setting that lets you choose how many images from your video will be used to make the new video. The images are called frames, and they show the movement in the video. The more frames you use, the smoother and better the new video will look. But using more frames also takes more time and memory. The best number of frames to use depends on the type of motion module you are using. The motion module is a part of the program that can change how the video moves. There are different types of motion modules.
Each type of motion module works best with a certain number of frames. For example, SD1.5 works best with 16 frames, and SDXL and HotShotXL work best with 8 frames. You can choose the number of frames yourself, or you can let the program choose it for you. The program will try to find the best number of frames for your video.
Below is a test of Context Batch Size using SD 1.5
A low Context Batch Size can make the new video look choppy and inconsistent.
This means that the movement in the new video will not be smooth and natural, and it may change suddenly or randomly. For example, if you use a low Context Batch Size to make a video of a person dancing, the person may look like they are skipping or glitching, and their dance moves may not match the music. A low Context Batch Size can also make the new video look blurry or noisy.
This means that the details and colors in the new video will not be clear and sharp, and they may have unwanted dots or lines. For example, if you use a low Context Batch Size to make a video of a landscape, the trees and mountains may look fuzzy and distorted, and they may have artifacts or noise. A low Context Batch Size can make the new video faster to make, but it will also make it lower in quality.A high Context Batch Size can make the new video look smooth and consistent.
This means that the movement in the new video will be fluid and natural, and it will match the input video. For example, if you use a high Context Batch Size to make a video of a person dancing, the person will look like they are moving gracefully and rhythmically, and their dance moves will match the music. A high Context Batch Size can also make the new video look sharp and clean.
This means that the details and colors in the new video will be clear and vivid, and they will not have any unwanted dots or lines. For example, if you use a high Context Batch Size to make a video of a landscape, the trees and mountains will look crisp and realistic, and they will not have any artifacts or noise. A high Context Batch Size can make the new video better in quality, but it will also make it slower to make.
The “Stride” parameter lets you adjust how much motion there is between frames in a video or animation. It works like this:
Stride 1:
This means the frames are very close together, and there is no gap or loop in the motion.
Stride 2:
This means the frames are two steps away from each other, and they move smoothly together.
Stride 4:
This means the frames are four steps away from each other, and they still match well.
In other words, the Stride changes how big the steps are between frames that look good together. Smaller Stride values make the motion more fluid and detailed, while larger Stride values make the motion more varied and simple.
When do you use it? You would use the “Stride” parameter when you want to change how the motion looks in your video or animation. Here are some reasons why you might use it:
Fine-Tuning Animation Smoothness:
If you want your animation to be very smooth and consistent, you can use a Stride of 1 to make sure every frame follows the previous one.
Balancing Performance and Quality:
Changing the Stride can also help you balance the performance and the quality of your video or animation. Smaller Stride values might need more computing power but make the motion smoother, while larger Stride values might need less computing power but make the motion less smooth.
Creating Different Visual Effects:
You can also use different Stride values to create different visual effects in your animation. For example, a larger Stride can make the frames jump more, which might be good for some artistic or stylistic purposes.
Dealing with Limited Resources:
If you have limited computing resources, you might use a larger Stride to lower the processing demand while still getting good results.
To sum up, the Stride parameter helps you control how smooth and consistent the motion is between frames in videos or animations. You can use it depending on your goals, resources, and the visual effects you want in your project.
The “Number of frames to overlap in context” parameter lets you control how much the frames in the context (the frames that give information or background for making new frames) match with each other. The aim is to make a smooth and steady reference for making new frames.
Overlap Setting: The text says that if you don’t give a value for “Overlap,” the default value is -1, which means that the overlap will be “Context batch size // 4.” This means that by default, the number of frames to match will be one-fourth of the context batch size.
Now, let’s understand the conditions and effects of this parameter:
The “Number of frames to overlap in context” parameter works under some conditions: It matters when the “Number of frames” (total frames in your video or animation) is bigger than the “Context batch size.” This means that it’s helpful when you have a lot of frames and you want to handle how they match in the context. Also, this parameter works when “ControlNet” is on. It’s also useful when the number of frames in the source video is bigger than the “Context batch size.”
Basically, this parameter helps you handle how frames match and make a steady context for making new frames when you have many frames to work with, and you want to improve the context making process. The default value is one-fourth of the context batch size, but you can change it to fit your specific needs for smooth and steady frame making.
When do you use it?
Suppose you want to create an animation from a single image of a flower. You use AnimateDiff to apply motion to the image, and you set the “Number of frames” to 100, the “Context batch size” to 20, and the “ControlNet” to off. This means that you want to generate 100 frames for your animation, and you want to use 20 frames as a reference or context for each new frame.
By default, the “Overlap” parameter is -1, which means that the number of frames to overlap in the context is one-fourth of the context batch size, or 5 frames. This means that for every new frame, the system will use the previous 20 frames as a context, and make sure that the last 5 frames in the context match with the first 5 frames in the next context.
For example, if the system is generating the 21st frame, it will use the frames 1-20 as a context, and make sure that the frames 16-20 match with the frames 21-25 in the next context. This way, the system can provide a smooth and consistent reference for generating new frames.
However, you might want to change the “Overlap” parameter to suit your needs. For example, you might want to increase the overlap to make the animation smoother, or decrease the overlap to make the animation more varied. You can do this by specifying a value for the “Overlap” parameter.
For example, if you set the “Overlap” to 10, the system will make sure that the last 10 frames in the context match with the first 10 frames in the next context. This will increase the smoothness and consistency of the animation, but it might also reduce the variation and diversity of the motion. On the other hand, if you set the “Overlap” to 2, the system will make sure that the last 2 frames in the context match with the first 2 frames in the next context. This will decrease the smoothness and consistency of the animation, but it might also increase the variation and diversity of the motion.
In summary, the “Number of frames to overlap in context” parameter lets you control how much the frames in the context match with each other, and how smooth and consistent the reference for generating new frames is. You can adjust it to fit your specific needs and preferences for your animation.
The ‘Save Format’ feature in AnimateDiff allows you to specify the format of the output file. You can choose from the following options:
GIF:
This option generates an animated GIF file.
MP4:
This option generates an MP4 video file.
WEBP:
This option generates an animated WEBP file.
PNG:
This option generates a sequence of PNG image files.
TXT:
This option generates an infotext file that provides additional information about the generated video.
The Reverse section allows you to modify the frames of the generated animation by adding or removing some frames. As of 10/26/2023, after the update, the Reverse section has been removed for some reason. I will update this section if necessary.
Here is what each option does:
Add Reverse Frame:
This option appends the reversed frames to the end of the original frames, creating a loop effect.
For example: If the original frames are [1, 2, 3, 4], adding reverse frames will make the animation with frames [1, 2, 3, 4, 4, 3, 2, 1]
Remove head:
This option removes the first frame of the animation when repeating. This can create a smoother transition between the loops.
For example: If the original frames are [1, 2, 3, 4], removing head will make the animation with frames [1, 2, 3, 4, 4, 3, 2].
Remove tail:
This option removes the last frame of the animation when repeating. This can also create a smoother transition between the loops.
For example: If the original frames are [1, 2, 3, 4], removing tail will make the animation with frames [1, 2, 3, 4, 3, 2, 1].
You can combine these options to create different effects.
For example, if you enable both Remove head and Remove tail, you will get an animation with frames [1, 2, 3, 4, 3]1. You can also adjust the duration and loop number of the animation in the Duration section. [Source]
Frame Interpolation is a technique that generates extra frames between two input images to create a smooth animation. FILM stands for Frame Interpolation for Large Motion, which is an algorithm that can handle large scene motion and produce realistic results.
‘Off’
Off means that no frame interpolation will be performed, while
‘FILM’ :
FILM refers to the FILM algorithm, which is a simple frame interpolation method that uses linear blending
Interp X is a parameter that controls the number of extra frames that will be generated by FILM. The default value is 10, which means that 10 extra frames will be inserted between each pair of input frames. The higher the value, the more frames will be generated, and the slower the animation will appear.
You can enable Frame Interpolation by clicking on FILM, and set the Interp X value according to your preference.
My guide on using AnimateDiff for Automatic1111 will teach you everything about frame interpolation. Learn more below:
Learn how to use Frame Interpolation in AnimateDiff to create stunning, smooth videos. Fill in the gaps between frames for a fluid and artistic result.
Download Flowframes for Interpolation
An application for effortless frame interpolation with Flowframes is available for download.
You can access it by clicking the button below and support the developer.
ControlNet Video-to-Video
ControlNet V2V is a feature that allows you to use video guidance for video style rendering. It uses ControlNet models to inject additional information into the generation process, such as depth, edge, normal, or pose. You can use ControlNet V2V with AnimateDiff extension to create more realistic and stable animations.
Multi-controlnet is a feature that allows you to use multiple ControlNet models simultaneously for a single generation. For example, you can use openpose and canny models to specify the pose and the edge of the output image. You can enable multi-controlnet in the settings of AnimateDiff extension and choose the models you want to use. Multi-controlnet can improve the quality and diversity of the generated images or animations by using the Video Source, Video Path, or batch photo sequences.
Video Source
Video Source is an optional parameter that allows you to generate video to video (ControlNet V2V) animations using ControlNet. ControlNet is a technique that uses a source image or video to control the motion and appearance of the target image or video.
For example, you can use a video of a person walking as the source to animate a painting of a person.
The recommended size for the video source is 512×512 resolution and 16 frames. This is aligned with their training settings. However, you can also try different sizes and frames depending on your VRAM and preference. For example, some tutorials suggest using 768×768 resolution and 25 or 40 sampling steps23. You may need to adjust the VRAM batch size and other parameters accordingly. [Source]
Video Path
Video Path is an optional parameter that allows you to use a folder that contains the frames of the source video for video to video generation. You can use a local path or a cloud storage path, such as Google Drive or Dropbox, as long as the folder is accessible by the extension.
You cannot use a YouTube video link as the Video Path, because YouTube videos are not stored as frames in a folder. However, you can download a YouTube video and convert it into frames using a tool like ffmpeg or youtube-dl, and then use the folder that contains the frames as the Video Path. Alternatively, you can use the Video Source parameter to directly input a YouTube video link, but you will need to enable ControlNet and specify a ControlNet model for video to video generation.
ControlNet Batch
The batch tab is a useful option that lets you process image sequences in ControlNet. You can find the batch tab in the ControlNet tab, not in the AnimateDiff tab.
To use the batch mode, you need to specify an input directory where your photo sequences or video files are located. You can also leave the input directory empty to use the default img2img batch controlnet input directory. Then, you can choose the ControlNet model and the parameters you want to use, and press generate. The output will be saved in the output directory that you specify or the default one.
You MUST enable ControlNet for Video Source and Video Path to work. It will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel. You can of course submit one control image via Single Image tab or an input directory via Batch tab, which will override this video source input and work as usual.
Learn how to generate mind-blowing animations with AnimateDiff and ControlNet V2V.
Move Motion Module to CPU (Default if LowVRAM)
Move Motion Module to CPU allows you to run the motion module on the CPU instead of the GPU. This can save some VRAM (video random access memory) for the stable diffusion model, which can improve the quality and speed of the image generation. However, running the motion module on the CPU can also slow down the animation generation, since the CPU is usually less efficient than the GPU for deep learning tasks. Therefore, you should only use this option if you have a low VRAM GPU or if you encounter any errors or crashes when using the motion module on the GPU.
Here are some examples of when to use this option:
If you have a GPU with less than 8 GB of VRAM,
You may want to use this option to avoid running out of memory when generating animations. This can prevent your program from crashing or freezing due to insufficient VRAM.
If you encounter any errors or warnings related to CUDA
(The platform for GPU computing) or PyTorch (the framework for deep learning) when using the motion module on the GPU, you may want to use this option to avoid potential bugs or compatibility issues. For example, if you see messages like “CUDA kernel errors might be asynchronously reported at some other API call” or “Compile with TORCH_USE_CUDA_DSA to enable device-side assertions” in your console output2, you may want to switch to CPU mode and see if that solves the problem.
If you want to generate animations with a large number of frames
(More than 100) or a high resolution (more than 512×512), you may want to use this option to reduce the VRAM usage of the motion module. This can allow you to allocate more VRAM for the stable diffusion model, which can improve the image quality and diversity. However, this may also increase the generation time significantly, so you should balance the trade-off between quality and speed.
Remove Motion Module allows you to free up some memory by deleting the motion module after generating the animation. This can be useful if you want to switch to a different motion module or if you want to save some memory for other tasks. However, this also means that you will have to reload the motion module every time you want to use it again, which can take some time. You can enable or disable this option in the Settings/AnimateDiff section of the WebUI.
Prompt Travel is a feature that allows you to steer how your animation evolves over time within AnimateDiff. It’s extremely handy for creating dynamic animations that exhibit progressively shifting characteristics or actions. The frame numbers correspond to the number of frames generated, and the yellow-highlighted portion indicates precisely when prompt interpolation and changes begin. This transition incorporates a gradual tapering, meaning the prompts introduced at that frame number will start earlier and extend later, creating a seamless transition between each interpolation.
Upon installing AnimateDiff in A1111, you might not immediately see the Prompt Traveling features. Instead, it operates through the Prompt box, where you provide it with the necessary prompt interpolations. This process begins with a Head Prompt, followed by Prompt Interpolation, and concludes with a Tail Prompt. Although both the Head Prompt and Tail Prompt are optional, they provide valuable information for providing a more comprehensive description of your scene.
How Prompt Travel works in A1111:
Head Prompt
The Head Prompt serves as the starting point for your animation, allowing you to establish the initial state and style. In this section, you can provide one or more initial prompts to set the stage for your animation. For instance, a Head Prompt could include descriptors like “cute blonde woman with braids in a dark theme red room,” influencing the animation’s starting appearance. It’s basically writing a prompt for an image but tailored for animation.
Prompt Interpolation
(Second and Third Lines): The Prompt Interpolation section is at the backbone of Prompt Travel, designed to structure your animation’s transformation over time. Structured as “frame number: prompt,” these lines define when and how your animation evolves. The frame number denotes the animation’s position, starting at 0 for the first frame.
Here, you can specify different prompts for various frames, facilitating precise control over your animation’s progression. Notably, prompts may include weights such as “(x:1.3)” to influence the animation’s characteristics at specific frames.Tail Prompt
(Last Line, Optional): Much like the Head Prompt, the Tail Prompt offers the opportunity to influence your animation’s content and style, particularly as it nears its conclusion. This section enables you to guide the final moments of your animation with one or more prompts, creating an ending.
By tactically combining the Head Prompt, Prompt Interpolation, and Tail Prompt within the AnimateDiff framework, you can create dynamic animations that evolve over time, conveying intricate emotions, actions, and narratives.
Example: Here’s a practical example of Prompt Travel in action, demonstrating how to create a dynamic animation:
Prompt: |
---|
cute blonde woman with braids |
Negative Prompt: |
---|
render, cartoon, cgi, render, illustration, painting, drawing, (hands:1.3) |
Frame 0:
The animation begins with the character having closed eyes and a closed mouth, intensified with a factor of 1.3.
Frame 5:
As the animation progresses, the character is prompted to “bun hair” with a factor of 1.3.
Frame 15:
At this point, the character’s hair transitions to red hair
Frame 30:
A sudden twist in the animation prompts the character to cry with an emphasis on tears.
Frame 30 and beyond:
As the animation concludes with a tail prompt, I added fox ears.
There is a fair amount of trial and error involved in achieving the ideal transitions, as seen below. It didn’t produce precisely what I wanted, but upon closer inspection, you’ll notice the changes when examining it frame by frame.
For a comprehensive guide on how to use Prompt Travel, check out the link below:
Prompt Travel in Automatic1111 might not be immediately evident, as it requires a certain formula to implement in your Text-2-Image Prompt. New users may find it challenging to grasp right away. To aid in this process, I’ve developed a guide that provides step-by-step instructions on how to use Prompt Travel in Automatic1111.
This example illustrates how Prompt Travel enables you to create animations that evolve and convey nuanced emotions or actions over time, providing a powerful storytelling tool within AnimateDiff.
Warning: If your Number of Frames is set to 32, and you set your last interpolation at 32 you may run into problems. I’ve experienced this a few times, so if you change it to 30 it could fix the problem. |
Source:
– Continue-revolution/sd-webui-animatediff
– AnimateDiff Github
– Guoyww/AnimateDiff
Leave a Reply