Preparing Training Data Sets for DreamBooth – A Simple Foundation for Amazing Results
Training Data Sets for DreamBooth | Setting the Foundation
Imagine these settings as the first stepping stones toward creating a powerful Stable Diffusion training model. Think of them as your initial building blocks for training data sets for DreamBooth.
However, just as in constructing a masterpiece, these settings may need to be adjusted to fit the unique characteristics of your data and environment. Achieving a stellar model often involves a bit of trial and error. In this guide, we’ll begin by covering the fundamentals to get you started. Keep in mind that, as you gain experience, there are more advanced options that I’ve written in my other guides you can follow along.
Table of Contents
Preparing Your TRAINING Data Sets for DreamBooth Training
The heart of any good training model is the dataset. It’s like the foundation of a house; without a strong one, your model won’t stand. Here’s what makes a dataset excellent:
High quality:
Use high-resolution images with no visible compression.
Clear view:
Make sure your subject is in focus and not obstructed.
Simplicity:
Keep the composition simple.
Variety: pressions, and poses.
If you’re training a person, include diverse backgrounds, lighting conditions, clothing, facial ex
Don’t use images with low resolution or those that have multiple subjects, duplicates, or are too similar. Your dream model learns everything in the image, so you want to focus on your subject.
Be cautious of what makes a dataset bad:
Low Resolution:
Avoid images with resolutions lower than your training resolution, and steer clear of those with compression issues.
Bad Cropping:
If your inputs are close-ups, your model’s outputs will be too.
Multiple Subjects:
Stick to a single subject per image.
Duplicates or High Similarity:
Don’t include identical or very similar images.
Distracting Backgrounds:
Your model learns everything in the image, so focus on your subject.
Bucketing
You can use images of different resolutions and formats. The system will automatically adjust them to your chosen resolution. This is done for better performance and quality. You can see how this works in the Testing tab.
Captioning
If you want to train with captions, you need to associate your dataset images with those exact captions. This means using the same words in your captions as you want your model to produce. Captions can be a powerful tool, but they should match your training objectives. To add captions, create text files with the same name as your images and include the captions within.
Don’t use captioning for style training. But for specific subjects, like a person or object, it’s a good idea.
To add captions, create text files with the same name as your images (e.g., cat1.png & cat1.txt) and put the captions in the text file.
Understanding the Settings and Configurations
This is where you define the model you want to train. Let’s break it down:
Create Tab:
Here, you name your model and choose the source model. This source model is like a starting point for your custom model.
Select Tab:
In the Model dropdown, you pick which model you’re training. It loads important information but doesn’t change other settings. To reuse settings from a previous model, select that model and click “Load Settings.”
The Input tab features all the important settings for DreamBooth fine-tuning. Here, we’ll break down each parameter, making it easy to understand:
The Performance Wizard:
This tool can assist you by automatically configuring some fields, but it’s not always perfect.
List of the available methods, ranked from best to worst quality:
Dreambooth
(default, requires >= 10GB VRAM)
LoRa
Extended (requires >= 8GB VRAM)
LoRa
(requires >= 6GB VRAM)
Imagic
(Specific VRAM requirement not specified)
Alternative Training Methods: You can choose different training methods in this section, with LoRa being one option. These methods allow you to adapt your training for various devices and objectives, balancing quality and resource requirements.
Intervals: Think of training as cooking. Learning Rate (LR) is like the temperature, and the number of Epochs is equivalent to the cooking time. Going too fast can lead to model ‘burnout,’ while going too slow may result in undercooked training. To maintain a balanced level of training, adjust either the LR or the number of Epochs and halve the other.
Training Steps Per Image (Epochs):
This setting determines how many times each image in your training and class datasets will be processed. The recommended default is 100.
Set Amount of Time to Pause Between Epochs:
To optimize training efficiency, set this to 0.
Save Model Frequency:
This parameter dictates how often a new model checkpoint will be created during training.
Save Preview(s) Frequency (Epochs):
Specifies the frequency of generating sample images during training.
Batch Size:
This determines how many dataset images are processed simultaneously. Increasing it can speed up training but at the cost of higher VRAM usage.
Gradient Accumulation Steps:
A feature for advanced users.
Class Batch Size:
This batch size setting impacts class image generation but doesn’t directly affect the training process.
Enable Set Gradients to None When Zeroing and Gradient Checkpointing:
These options offer advanced control over gradient handling during training.
A Learning Rate (LR) of 2e-6 is a good starting point.
You can increase it for faster training at potentially reduced quality, or lower it for improved quality at a slower pace.
LoRa Learning Rate: For LoRa, separate LR fields are used because LoRa typically requires much higher LR values than standard Dreambooth. Defaults for LoRa are 1e-4 for UNET and 5e-5 for Text.
LR Scheduler Settings: These settings allow you to control how the Learning Rate changes during training. The default is “constant_with_warmup” with 0 warmup steps. For beginners, I recommend setting the number of warmup steps to 500, while advanced users may explore different schedulers and settings.
Image Processing: If you’re new to DreamBooth, it’s best to leave these settings unchanged. They’re more advanced and can be adjusted for specific needs.
Memory Attention:
Set to “Default” if you’re using Torch2 or have VRAM greater than or equal to 16.
Cache Latents:
Generally, it’s advisable to enable this option.
Step Ratio of Text Encoder Training:
Set to 0. Advanced users may want to refer to the advanced guide for more details.
Prior Loss Weight:
Important when using class images. It determines how much weight is given to the class images. If your training isn’t progressing as desired, consider reducing this weight. Conversely, if your concept is “bleeding” beyond the desired tokens, increase the Prior Loss Weight.
Advanced: Advanced settings are best left untouched if you’re new to DreamBooth. These are for experienced users who require specific configurations.
In this tab, you’ll specify how your model should understand and generate images:
Directories:
Set the Dataset Directory to the location of your training images. The Classification Dataset Directory can be left blank unless you want to reuse previous class images.
Training Prompts:
This section allows you to provide guidance to your model based on the training objectives. If you’re not using class images, you don’t need to set a Class Prompt or Class Token.
Instance Prompt:
This describes the concept you’re training. If you’re using captions, you can use [filewords], which will be replaced by the captions.
Class Prompt:
This is a description of the images that don’t belong to your training concept.
Classification Image Negative Prompt:
This prompt is used solely for class image generation, not during the training process.
Filewords:
If you’re not using [filewords] in your Instance Prompt or Class Prompt, you can skip this section.
Instance and Class Tokens:
These tokens are mixed with your prompts to provide context. The Instance Token represents your trained concept, while the Class Token describes the concept’s class.nExample: In training images of a cat named Rufus, the Instance Token could be “Rufus,” and the Class Token might be “cat.”
Sample Prompts:
These prompts are used for generating sample images during and after training. You can also utilize [filewords] in this section to enhance diversity.
Image Generation and Sample Image Generation:
Configure settings for class and sample image generation.
Saving Tab: Managing Data Saving.
This tab controls how data is saved during and after training:
General and Checkpoints:
Defaults should work well for most users.
Diffusion Weights:
Advanced users can configure the saving of training snapshots. It’s recommended to leave snapshots disabled unless you have experience with this feature, as they consume significant hard-drive space.
When LoRa is enabled, use these settings to control the quality of LoRa output:
LoRa UNET and Text Encoder Rank:
These settings act as quality sliders for LoRa output. Higher ranks result in higher quality but larger output files. Starting with both set to 32 is a good choice, generating approximately 100MB output files.
LoRa Weights:
These settings manage the output to the models/Lora directory. To use smaller LoRa models, install the a1111-sd-webui-locon extension.
This comprehensive guide provides an in-depth exploration of configuring a Stable Diffusion training model, equipping you with the knowledge to achieve remarkable results. Keep in mind that this is an ongoing learning process, and I’ll continually update this resource as I gather more insights. It serves as a foundational reference guide to comprehending each facet of DreamBooth extension tabs.
For more detailed guidance on utilizing these settings, refer to my other guides, where I’ll walk you through their practical applications.
This captions and data sets guide is intended for those who seek to deepen their knowledge of Captioning for Training Data Sets in Stable Diffusion. It will assist you in preparing and structuring your captions for training datasets.
Source: Link to Guide
Leave a Reply