Newsletter

Exploring the Limitations and Functions of ControlNet: A Comprehensive Overview

Title: ControlNet Plugin: Enhancing Precision and Control in AI Drawing

Subtitle: A Closer Look at Installation, Model Download, and Basic Information

The Graph: A Useful Tool with Limitations

While the graph generated by Graph provides us with visual and textual guidance for Fixed Diffusion results, it does have its limitations. Sometimes, we only want to focus on specific elements within an image reference, such as composition, spatial relationships, color matching, body movements, or appearance. This is where ControlNet comes in. With ControlNet, we can selectively draw key points to provide clearer instructions to the AI, thereby influencing the image results based on our desired elements.

Installation: Adding the ControlNet Plugin

To install the ControlNet plugin, please refer to our previous article: AI Drawing-Stable Diffusion 007- Install, Update, Remove, and Backup Extensions.

Model Download: Obtaining the Necessary Models

You can download the ControlNet models from the following URL: [Download Link]. Simply download all the “.pth” files, as the corresponding “.yaml” files will automatically be downloaded when installing the plugin. Place the downloaded pth model files in the ..extensionssd-webui-controlnetmodels folder.

Basic Information: Models and Naming Conventions

The ControlNet plugin provides 14 models, 11 of which are finished models, while 3 are still experimental. The official models follow a naming convention as shown in the figure below. You can gather information about the downloaded models from their respective file names.

[Image Source: https://github.com/lllyasviel/ControlNet-v1-1-nightly]

– “control”: Indicates that the model is an official ControlNet model.
– “v11”: Represents the version (e.g., ControlNet version 1.1).
– “p”: Represents a finished model, while “e” represents a model in the testing stage and “u” represents an uncompleted model.
– “sd15”: Indicates that the plugin is developed based on Fixed Diffusion 1.5.
– “canny”: Refers to the constraint type of the model (shown in green).
– “pth”: Represents the model’s extension, and “yaml” is the model’s configuration file. Each pth model must have a corresponding yaml file with the same name in the ControlNet model folder.

Once you have installed and downloaded the required models from ControlNet, you can access the Settings/ControlNet section to enable multiple ControlNet models simultaneously.

ControlNet Interface Parameters: Enhancing Control over AI Mapping

ControlNet allows us to draw elements from various reference drawings simultaneously, providing more control over the generated drawings. Here, we will focus on the OpenPose function as an example to demonstrate the interface parameters.

1. Enter the reference image.
2. Enable: Check this box to activate ControlNet’s influence on the image. It must be enabled to take effect.
– Low VRAM: If your graphics card has low video memory, checking this box will allow smoother operation.
– Pixel Perfect: Check this box for more precise image feature extraction, or adjust the Preprocessor Resolution option to balance video memory usage and accuracy.
– Allow Preview: Open the Preview Preprocessor window for pre-processed image preview.

3. Type of Management: Choose the constraint type (e.g., OpenPose for character poses and actions).
– Preprocessor: Select the preprocessor corresponding to the constraint type, and download example files if necessary.
– Explosion Icon: Click to start preprocessing the reference image and extract image features.
– Model: Choose a model that matches the constraint type/preprocessor to interpret the preprocessed feature map.

4. Weight Management: Adjust the ControlNet weight to control its influence on the final image generated.
– Start and End Control Phase: Determine when the ControlNet effect/interruption occurs during image generation.

5. Mode of Control: Choose whether the prompt words content or ControlNet should take precedence, or strike a balance between the two.

Utilizing ControlNet: Enhancing Body Movements through OpenPose

For instance, by using the OpenPose ControlNet function, we can draw the body movements of characters in the reference image. This character skeleton diagram can then be used to influence the movements of characters in newly generated drawings. This type of action control is challenging to achieve solely based on textual prompts.

Example Prompt Words:
– A businessman in a suit with an office background.

Another Prompt:
– A man in a swimming suit with a beach background.

ControlNet Constraint Types: Expanding Possibilities

ControlNet offers various types of constraints, which can be categorized into five main functions:
– Line repair promises: Canny, Lineart, SoftEdge, Scribble, MLSD.
– Depth limitations (spatial distance/object’s three-dimensional sense): Depth, Normal.
– Postural Limitations of Human Action: OpenPose.
– Person/Item type constraints (semantic segmentation): Seg.
– Other Special Effects: Shuffle, Tile, Inpaint, IP2P, Reference.
– T2IA: A third-party preprocessor and model with its own unique functionalities.

Conclusion

This introduction provides a basic understanding of the ControlNet plugin, focusing on its installation, model download, and essential parameters. In subsequent discussions, we will explore the individual limitations and impact of each ControlNet model, as well as the differences between preprocessors, further highlighting their unique contributions to AI drawing precision and control.

Although the graph generated by graph allows us to guide the results of Fixed Diffusion by referring to pictures as well as text descriptions, it still has many shortcomings. Sometimes I don’t want to refer fully to the whole picture, but just want to draw one specific/partial element (composition, spatial relationship between characters/objects, color matching, body movements, outline of appearance, etc.) in the image reference to produce Limit/influence the image results. The appearance of ControlNet meets this requirement very well. We can “draw the key points” to tell SD more clearly, as long as the elements I specify in the picture are used as a reference in the production of pictures.

ControlNet installation, model download, basic information

Installation

For installing the ControlNet plugin, please refer to the previous article:
→ AI Drawing-Stable Diffusion 007- Install, Update, Remove and Backup Extensions

Download model

Model download URL:
→ https://huggingface.co/llyasviel/ControlNet-v1-1/tree/main

Here you just need to download all the “.pth” files, and the corresponding “.yaml” files with the same name for each model will be downloaded automatically when the plug-in has to install, so no need to download it again here. After the pth model file is downloaded, insert it ..extensionssd-webui-controlnetmodels Only in the folder.

basic information

Currently, 14 models are provided by the official/original author of ControlNet, 11 of which are finished models, and 3 are still in the experimental phase. The naming rules for the official models are shown in the figure below, and you can get information about the downloaded models from the file name.

(Image source: https://github.com/lllyasviel/ControlNet-v1-1-nightly)

control: All official models provided by ControlNet will uniformly display the word control. v11: represents the version (the illustrated case is ControlNet version 1.1) p: represents the finished model. e = testing stage, u = not completed. sd15: Indicates that this is a plugin developed based on Fixed Diffusion 1.5. canny: The part in green font is the name of the model’s function (the constraint type). pth: is the name of the model’s extension, and yaml is the model’s configuration file. In the folder where the ControlNet model is installed, each pth model must be matched with a yaml file with the same file name.

After installing and downloading the corresponding models required by the different types of restrictions from ControlNet, we can first go to Settings/ControNet to find the settings that can open several ControlNet models at the same time.

When using ControlNet, we can draw elements from several different reference drawings at the same time for drawing generation. In general, you can set the size according to your normal needs.

ControlNet Interface Parameters

The function of ControlNet is to extract various specific elements/features in the reference image. Then let SD use this single or multiple element / feature (multiple ControlNet Units enabled at the same time) as a reference guide / constraint for the drawing result. The final achievement allows us to have stronger control over the results of AI mapping and visualization.

ControlNet parameter interface function, here we first take OpenPose (extracting characters’ poses and actions) as an example to demonstrate.

1. Enter the reference image

2.
Enable : Turn on ControlNet, and ControlNet must be checked here to have an effect on the image.
Low VRAM : If your graphics card has a low video memory (below 8G), you can check it here to use it. Although the drawing speed will be slower, at least it can run normally with a low video memory configuration without bursting the video memory .
Pixel Perfect : If not checked here, the option of Preprocessor Resolution will appear in the interface. The lower the value, the less video memory will be occupied, but the pre-processed image will be less effective. Conversely, image feature extraction can be more precisely controlled. It is generally set between 512~1024, or you can check Pixel Perfect directly. In the picture below, we can see the effect of Canny preprocessing to understand the difference clearly.

Allow Preview : Open the Preview Preprocessor/pre-processed image preview window

3.
Type of Management : Choose the type of constraint, here we want to draw the action and pose of the character, choose OpenPose.
Preprocessor : After selecting the restriction type, there will be 1 ~ multiple preprocessors to choose from for different restriction types (except IP2P, which does not require a preprocessor). Different preprocessors have different degrees / effects of feature extraction. When each preprocessor is used for the first time, the background will also need to download the preprocessor example file first, so it will take a long time. If the download is accidentally interrupted or fails during the download, you can download it directly online, put the file in the folder location prompted in the operation window.

explosion icon : After selecting the preprocessor, click on the explosion icon in the middle to start preprocessing the reference image (extract image features)
Model : The DC preprocessed feature map is unintelligible, so here we need to choose a model that can match the type of constraint/preprocessor to interpret/process this feature map, so that DC can understand the meaning of the representative why. Generally, if there is a matching model available in the model folder, the matching model will be automatically substituted for you when the preprocessor is selected.

4.
Weight Management : ControlNet weight, the lower the value, the lower the limitation / influence of ControlNet on the final image generated.
Start and End Control Phase : Control the start and end timing of the ControlNet effect/guidance interrupt in the sample/itero steps when generating images. 0.1 represents 10% of the total sampling steps. (PS. In the number of sampling steps, the earlier the number of steps, the greater the effect on the overall shape of the picture, and the closer the number of steps to the changes in the details. So, if you want to have ControlNet effect to be more noticeable/effective, try your best Set to the earlier steps to intervene)

5.
Mode of Control : The result of drawing the graph should focus on the prompt word content, ControlNet, or a balance between the two.

In the example above, we use OpenPose ControlNet to draw the body movements of the characters in the reference picture, and then we can use this character skeleton diagram to control the movements of the characters in the newly generated pictures. To control this type of action, it is difficult/unlikely to be as similar to the description of stimulus words.

The prompt word for this picture is: a business man suit with an office background, there is no need to describe the character’s action pose at all.

man in swimming suit, beach background

ControlNet Constraint Type Classification

Currently, the main types of restrictions officially provided by ControlNet are mainly divided into 5 categories according to their functions:

a red boxLine repair promise : Canny, Lineart, SoftEdge, Scribble, MLSD with blue frameDepth limitations (spatial distance/object’s three-dimensional sense) : Depth, Normal with green framePostural Limitations of Human Action : OpenPose purple boxPerson/item type constraints (semantic segmentation) : Seg with a yellow frameother special effects : Shuffle、Tile、Inpaint、IP2P、Reference

As for the last one starting with T2IA, it is a third-party preprocessor and model. After finding out how to operate and use the original official ones, interested people can also download it to test. The basic logic principles are all the same, from the name of its model, you can probably see what kind of constraint function it belongs to and how to use it.

→ T2IA model download link

The introduction to the basic understanding of ControlNet is here first, and then we will implement and discuss the limitation function / degree of impact of each ControlNet model one by one, as well as the differences between each preprocessor.

#DrawingStable #Diffusion #ControlNet #Preface