How to Train Stable Diffusion Model with DreamBooth on Runpod
To learn how to Train Stable Diffusion Model with DreamBooth on Runpod, you should have already completed the RunPod setup, as detailed in the previous guide.
DreamBooth Training is a step in refining Stable Diffusion Models. It allows you to impart your personal data, making these models not just generative but personalized. They learn to respond to text prompts with contextually rich visual outputs. By integrating DreamBooth into your RunPod environment, you’re using computational power to machine learn your images. Each cell in Jupyter Notebook guides you through downloading, configuring, and training these models to fine-tune your model.
In this detailed guide, I will walk you through the step-by-step process of setting up RunPod for DreamBooth and configuring it specifically for Stable Diffusion. This tutorial is designed to provide you with a comprehensive understanding of how to use GPU clouding of RunPod to fine-tune your Stable Diffusion models.
Table of Contents
Assuming your RunPod setup is complete, you can access it via the Pods menu, conveniently located under the Manage section. Within this menu, you should find the “RunPod Fast Stable Diffusion” Pod.
By simply clicking the play button, you can open up Jupyter Notebooks, which serves as the gateway to downloading and configuring Stable Diffusion—essential steps for establishing your training environment for DreamBooth.
Upon launching Jupyter Notebook, navigate to the “RNDP-Dreambooth-v1.ipynb” file, and click to open the dedicated notebook for Stable Diffusion. Within this notebook, you’ll encounter distinct cells, each serving a specific purpose:
Dependencies
Download the model
Create/Load a Session
Instance Images
Manual Captioning
Concept Images
Dreambooth
Test Your Trained Model
Upload The Trained Model to Hugging Face
Free up space
These cells can be categorized as either documentation cells or code cells, with code cells distinguished by their light grey background and square [ ] brackets on the left.
To interact with these cells, you have two options:
Click on the cell and press Shift+Enter or use the Play button located above the notebook. Executing code cells allows you to run the Python code contained within.
For first-time users, some of these cells need to be executed to download and prepare your notebook for the upcoming tasks. While documentation cells primarily display text, code cells carry out essential functions.
In this guide, we will take a deliberate approach and execute one cell at a time to ensure a thorough understanding of the process.
Dependencies:
Executing this cell installs all the necessary dependencies required for the subsequent notebooks.
Download the Model:
In this section, executing the cell defaults to the base Stable Diffusion 1.5 model and downloads it if no specific link is provided. You have three options for downloading your model:
Using a path from HuggingFace
From storage
From a Model link (e.g., Civitai.com)
Using a fine-tuned model from Civitai.com
This step is optional but a popular one. If you prefer utilizing a fine-tuned model on Civitai instead of the SD1.5 base model, you will need to provide a direct link to the model. You can skip this step if you intend to use the base model.
For this exercise, I’m using epiCPhotoGasm model. If you would like to get more details of this model, Click Below.
To obtain the necessary download link, right-click on “Copy Link Address” to acquire the URL,
which will appear in this extended format:
https://civitai.com/api/download/models/165888?type=Model&format=SafeTensor&size=pruned$fp=fp16 |
However, the extended format of this link may result in errors when input into the Runpod notebook. To address this issue, simply trim the link to retain only the bolded section, which represents the direct model download link and is all that is required.
Trimmed link below:
Model_link = “https://civitai.com/api/download/models/165888” |
Execute the cell by pressing Shift+Enter (or clicking the play button), and the model will be downloaded accordingly.
Create/Load a Session:
Here, you must modify the “Session_Name=” to something specific to your fine-tuned model. It’s essential to choose a name that reminds you of what this model was trained on and its purpose.
A suggested naming convention is to include:
The Stable Diffusion version
Token name
Relevant settings
Descriptor (e.g., “No CAPTIONS”).
Example:
Session_Name= “Name Here”
Essentially you want to name it something meaningful to you that will remind you of what this model was trained on and what purpose.
In my instance, I aim to choose a name that’s easily memorable and recognizable. Therefore, I’ve opted for the talented Korean actress, Go Youn-Jung, and assigned the token name “gynjng”
As for my Session_Name= “sdv15_gynjng512_gsm”.
DreamBooth originated as a Google research project designed to enhance Text-to-Image diffusion models. Essentially, it enables you to take an existing model, like Stable Diffusion, and customize it to generate content relevant to your specific prompts. These diffusion models are trained on vast datasets from the internet, making them proficient at generating recognizable images. However, they might not be able to generate images of lesser-known subjects, like your next-door neighbors, Fred.
Instance Image:
After executing this cell, you’ll encounter the “Choose Image” and “Upload” button. This is where you upload training images from your local file system.
Once you’ve executed this cell, you will see the:
Choose Image
Upload button
In this section, you’ll upload images specifically prepared for training from your local file system. The process of preparing these images was covered in detail in a previous guide, which you can reference if you haven’t already ,(Link Here). Once you’ve uploaded your images, it’s important to verify their successful upload.
Fine-tuning SD with DreamBooth + Captions
When providing the model with a set of images, it may not explicitly understand what to focus on, potentially leading to unintended learning outcomes. Therefore, it’s advisable to include a diverse range of backgrounds and, in the case of people, images with varying clothing styles.
However, even with such diversity, the model might pick up on other artifacts within the images.
For instance, if your intention is for the model to exclusively learn a person’s facial likeness, you can incorporate captions in the images to provide guidance. This concept is applicable across various subjects, not just limited to people. Whether you’re training a model for objects or any other imagery, these considerations remain equally important.
Manual Caption and Concept Captioning can be a bit complex, warranting a separate guide for a more in-depth understanding. However, here’s a brief overview of their functions.
Manual Caption: | This cell deals with image captioning, an essential component of the training process that we will cover on. |
Concept Caption: | An advanced topic that we’ll explore in another session. |
For a comprehensive guide on Captions, Click Below.
This captions and data sets guide is intended for those who seek to deepen their knowledge of Captioning for Training Data Sets in Stable Diffusion. It will assist you in preparing and structuring your captions for training datasets.
Dreambooth:
Executing this cell involves utilizing Dreambooth to create a fine-tuned model based on your input images. While there are several configuration parameters available, it’s advisable to begin by experimenting with just a few of them.
Resume_Training= False:
One important parameter is “Resume Training,” which enables you to continue training on an existing model. However, it’s recommended to keep this setting as “false” initially, especially if you’re still getting acquainted with the process.
Below, there are parameters related to training different components of Stable Diffusion, namely Unet and Text Encoder. Each of these components has associated values for training steps and learning rate.
Learning rate:
It’s generally recommended to stick with the default values, as they have proven effective in many scenarios. However, when it comes to the number of Unet Training Steps, it’s advised to calculate this based on the number of images in your training dataset. A common guideline is to use approximately 100 steps for each image in your training set.
Example: I am using 64 images of Go Younjung cropped at 512×512. Download it below:
(64 images x 100 Unet Training Steps = 6400).
Keep in mind that the number of training steps is proportional to training time, so for larger datasets, you may opt for 60-80 steps per image rather than 100.
Additionally, remember that longer training doesn’t always guarantee better results, as there’s typically an optimal duration for training. These are general guidelines, so experimentation is key to finding what works best for your specific case.
Text Encoder Training Steps: A common choice is 350.
While there may not be extensive documentation on some of these parameters, this value has proven effective in various use cases. It’s worth noting that some references suggest setting Text Encoder Training Steps to roughly 30 percent of the number of Unet Training Steps. However, starting with 350 as a value is a good point of reference.
Save Your Training Model Incrementally
Save_Checkpoint_Every_n_Steps= False
This feature allows you to periodically save your model based on the number of Unet training steps completed.
Example: If you enable this setting (set it to “True”), it will automatically save a fine-tuned version of the model every 500 steps (Based on the designated steps). This means you’ll have multiple models available, each trained for different durations. This flexibility allows you to experiment with various models to assess their performance.
However, it’s important to note that each of these saved models will consume approximately 2 gigabytes of storage space. Given the limited disk space available on RunPod, this is a consideration worth keeping in mind.
In Summary, based on 64 images, your settings should be:
Resume_Training= False
Unet_training_Steps= 6400
Text_Encoder_Training_Steps= 350
As for “External Caption,” we will touch on this aspect later in the process.
PRESS SHIFT + ENTER, EXECUTE and TRAIN
After executing the notebook training, it will commence with the text encoder training and then proceed to the Unet training phase. This process typically takes a few minutes, and upon completion, you’ll receive a message in the console indicating that the model has been successfully created.
At this point, you can navigate to the workspace:
Fast-Dreambooth folder > Sessions Folder > Your Fine-Tuned Model Folder
Here, you will find your fine-tuned model ready for use.
You can simply right-click on the model and download it to your local system, making it available for use in your Stable Diffusion WebUI.
Test Your Trained Model
In the “Test the Trained Model” work cell, simply press Shift+Enter to execute it, which will launch the Stable Diffusion WebUI Automatic1111.
From there, you can thoroughly test the model using the trained token.
Once you’ve finished testing and are satisfied with the results, you can proceed to download the fine-tuned model. You can accomplish this by downloading the model and placing it in the “models > Stable-diffusion” folder.
In the “Test the Trained Model” work cell, simply press Shift+Enter to execute it.
This will launch the Stable Diffusion WebUI Automatic1111.
From there, you can thoroughly test the model using the trained token.Once you’ve finished testing and are satisfied with the results,
Proceed to download the fine-tuned model.
You can accomplish this by downloading the model
Place it in the “models > Stable-diffusion” folder.
Free Up Space
In this final cell, you have the option to free up space by tidying up some of the assets you’ve generated. Deleting them manually through the folders isn’t possible in the Notebook and will result in a failure message.
However, you can use the code cell designed for freeing up space to accomplish this task. Upon execution, it will present you with a list of sessions, allowing you to specify which sessions you wish to remove from your workspace.
Conclusion:
We’ve successfully demonstrated the process of fine-tuning a Stable Diffusion model using a limited set of images, in this case, featuring Go Youn-Jung. As a result, our model is now capable of generating images resembling her likeness. This achievement highlights the remarkable versatility of Stable Diffusion, which can create images of various subjects and objects worldwide.
However, what sets this process apart is its ability to generate images of highly personalized or previously unknown subjects. Whether it’s replicating other individuals, objects, landscapes, or anything unique to you, Stable Diffusion offers a level of personalization that extends beyond its general capabilities. While Stable Diffusion excels at generating a wide array of images, it may struggle to capture the likeness of something deeply personal to you.
In such cases, crafting a text prompt that enables the generation of these specific images may prove challenging.
Leave a Reply