AI Image Manipulation with Stable Diffusion ControlNet on Vultr Cloud GPU
Introduction
ControlNet is a neural network structure used in diffusion models to control final image generation through various techniques such as pose, edge detection, and depth maps. It provides an efficient way for an AI model to detect which parts of an input image require changes. Thus, it enables Stable Diffusion models to use additional input before manipulating the image and generating a more desired output.
This article describes ways you can perform image manipulation with Stable Diffusion ControlNet on a Vultr Cloud GPU server. You are to manipulate images using ControlNet methods that generate different results based on unique prompts.
Prerequisites
Deploy a Ubuntu A100 Cloud GPU server with at least:
- 1/3GPU
- 20 GB VRAM
- 3 vCPUs
- 30 GB Memory
Use SSH to access the server as a non-root user with sudo privileges
Switch to the non-root sudo user account
# su user-example
ControlNet Methods
Below are the Image manipulation ControlNet methods applied in this article:
- Canny Edge Detection
- Depth Map
- Multiscale Line Segment Detector (M-LSD) Detection
- Normal Map
- Pose Detection
- Image Segmentation
Set Up the Server
To run ControlNet models on the server, install the necessary Python dependency packages and set up Jupyter Notebook to run the model in a graphical environment session as described in the following steps.
Install PyTorch
$ pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu118
The above command installs PyTorch with the pre-built CUDA 11.8 libraries. To download the latest version, visit the PyTorch Documentation
Install Jupyter Notebook
$ pip3 install notebook
Allow the default Jupyter Notebook port
8888
through the firewall to accept incoming connections$ sudo ufw allow 8888/tcp
Restart the firewall to update the changes.
$ sudo ufw reload
Launch Jupyter Notebook.
$ jupyter notebook --ip=0.0.0.0
The above command starts Jupyter Notebook and accepts connections from all addresses as declared by the
--ip=0.0.0.0
option. In case the command fails to run, quit your SSH session and re-establish a connection to your server to activate the Jupyter library.In a new web browser session, access the Jupyter Notebook interface using your generated token.
http://SERVER_IP:8888/tree?token=GENERATED-TOKEN
Within the Jupyter Notebook interface, click New, and select Notebook from the dropdown list
Install Required Libraries
In this section, install the required ControlNet model libraries used by each method in your Jupyter Notebook file. To run each of the commands described below, press Ctrl + Enter on your keyboard, or click the run button on the main taskbar.
Update Jupyter and Ipywidgets
!pip3 install --upgrade jupyter ipywidgets
Install the libraries
!pip3 install diffusers accelerate safetensors transformers pillow opencv-contrib-python controlnet_aux matplotlib mediapipe
Import the model libraries
from PIL import Image import torch from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler from diffusers.utils import load_image
Below is what each imported library does:
PIL
: Provides various image processing capabilities such as opening images, manipulating, and saving different image formatstorch
: Provides tools and functionalities for building and training neural networksdiffusers
: Imports the Diffusers Python packagesStableDiffusionControlNetPipeline
: Controls the Stable Diffusion processControlNetModel
: The main ControlNet algorithmUniPCMultistepScheduler
: Handles scheduling and organizes multiple steps in a unidirectional mannerdiffusers.utils
: Imports theload_image
package that allows the model to load an image
Irrespective of the model you are running, you must install and import the above libraries. When all libraries and packages are available, the server is able to run ControlNet models as described in the following sections.
Implement the ControlNet Canny Edge Detection Method
Canny edge detection is a method that uses a special algorithm to detect edges in images that are further used by the model to edit the image. In this section, implement the Canny Edge Detection method in ControlNet as described in the steps below.
To clear GPU memory and start running the model, navigate to the Kernel menu option in your Jupyter Notebook, and click Restart Kernel.
Import the required libraries you installed earlier
Import the additional libraries required by the model
import cv2 import numpy as np
Below is what the additional libraries do:
cv2
: Provides all the functions and capabilities provided by OpenCV. These include reading and writing images, image manipulation, image processing techniques like edge detection, feature detection and matchingnumpy
: It's used for numerical computing. Together,cv2
andnumpy
enable efficient and flexible image processing
Import the base image to the
image
variableimage = load_image("https://example.com/image.png")
To use apply the image to the model, verify that your URL has a valid file extension such as
.png
,.jpg
, orjpeg
Convert the image into a NumPy array
image = np.array(image)
Many Computer Vision (CV) tasks run in NumPy format due to their efficiency and ease of manipulation. The above command converts a Python Imaging Library (
PIL
) image to a NumPy array.Define thresholds
low_threshold = 100 high_threshold = 200
low_threshold
andhigh_threshold
define the sensitivity and robustness of the edge detection process.low_threshold
allows detection of weaker edges, but leads to more noise in the classification. A largerhigh_threshold
value results in the detection of less strong edges potentially missing some edges.Process the image
image = cv2.Canny(image, low_threshold, high_threshold) image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) image = Image.fromarray(image)
Below is the processing performed by the above code:
cv2.Canny(image, low_threshold, high_threshold)
: Applies the Canny edge detection to the image NumPy array using thecv2.Canny()
functionimage[:, :, None]
: Adds a new axis to the NumPy array image, converting it from a 2D grayscale image to a 3D image with a single channel because the Canny function generates a binary imagenp.concatenate([image, image, image], axis=2)
: Concatenates the image array along the third axis to create a 3-channel image. This is necessary because most image processing operations require images with three channels (RGB), but the Canny edge detection only produces a single-channel image.Image.fromarray(image)
: Converts the NumPy array image back into a PIL image to save, display, or perform other PIL operations on the image.
Define the model.
controlnet = ControlNetModel.from_pretrained( "lllyasviel/sd-controlnet-canny" ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
In the above code:
controlnet
: Initializes theControlNetModel
model using a pre-trained dataset fromlllyasviel/sd-controlnet-canny
pipe
: Initializes theStableDiffusionControlNetPipeline
pipeline for stable diffusion-based image processing. It utilizes the previously created ControlNet model for controlling the diffusion process. The safety checker parameter, when set toNone
results in no safety checks in the pipelinepipe.scheduler
: Initializes theUniPCMultistepScheduler
scheduler to control and schedule the execution of the diffusion process in a stable mannerenable_model_cpu_offload()
: Enables CPU offloading for the model used in the ControlNetModel pipeline. It refers to the practice of running parts of a computational workload on the CPU instead of the GPU.
Using a prompt, execute the defined model on the image
image = pipe("YOUR_TEXT_PROMPT_HERE", image, num_inference_steps=20).images[0]
To apply a prompt such as, "make their hair blond", or "make their shirt green", edit your prompt like below
image = pipe("make their hair blond", image, num_inference_steps=20).images[0]
The above code executes the
StableDiffusionControlNetPipeline
on the input image. The text prompt defines the changes required in the image. Thenum_inference_steps
parameter specifies the number of diffusion steps the pipeline should perform. It's set to20
but it's reduced or increased depending on how much of the original image you want to keep.View the generated image
image
Depth Map Method
The ControlNet depth map method provides spatial information about the image, indicating the depth or distance of different parts of the scene from the camera's perspective, and helps to create a non-RGB 3D image for further processing.
To run the model, restart your Jupyter Notebook kernel
Import the required libraries you installed earlier
Import the additional libraries for this model
from transformers import pipeline import numpy as np
In the above code,
transformers
provides a wide range of Natural Language Processing (NLP) models for tasks such as text classification. Thepipeline
function is atransformers
library API that uses pre-trained models for specific NLP tasksDefine the
pipeline
that takes in atask
argument which isdepth-estimation
depth_estimator = pipeline('depth-estimation')
Load the image
image = load_image("https://example.com/image.png")
Retrieve the Depth Map and convert the image into a NumPy array.
image = depth_estimator(image)['depth'] image = np.array(image)
Process the image
image = image[:, :, None] image = np.concatenate([image, image, image], axis=2) image = Image.fromarray(image)
Define the model
controlnet = ControlNetModel.from_pretrained( "lllyasviel/sd-controlnet-depth" ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
Execute the model on the image with a text prompt
image = pipe("YOUR_TEXT_PROMPT_HERE", image, num_inference_steps=20).images[0]
View the generated image
image
M-LSD Detection Method
M-LSD is an algorithm used to detect the outline of an image which includes the line segments. In this section, use the M-LSD detection with ControlNet as described below.
Restart your Jupyter Notebook kernel to clear GPU memory
Import the required libraries
Import the additional library for this model
from controlnet_aux import MLSDdetector import matplotlib
Initialize the model with a pre-trained model
mlsd = MLSDdetector.from_pretrained('lllyasviel/ControlNet')
Import the image and apply the
MLSDdetector
image = load_image("https://example.com/image.png") image = mlsd(image)
Define the model
controlnet = ControlNetModel.from_pretrained( "lllyasviel/sd-controlnet-mlsd" ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
Execute the model on the image with prompt
image = pipe("YOUR_TEXT_PROMPT_HERE", image, num_inference_steps=20).images[0]
View the generated image
image
Normal Map Method
In a normal map, pixels encode in the direction of the surface normal in RGB. By encoding the surface normal as color information, the normal map provides a way of storing and representing detailed surface information without increasing the model complexity.
To run the model, restart your Jupyter Notebook kernel
Import the required libraries
Import the additional libraries required for this model.
from transformers import pipeline import numpy as np import cv2
Import the base image
image = load_image("https://example.com/image.png").convert("RGB")
When the image imports to the
image
variable, it's converted into the standard color representation RGB.Import the model and assign the image to the model
depth_estimator = pipeline("depth-estimation", model="Intel/dpt-hybrid-midas" ) image = depth_estimator(image)['predicted_depth'][0]
Convert the image into a NumPy array
image = image.numpy()
Process the image
image_depth = image.copy() image_depth -= np.min(image_depth) image_depth /= np.max(image_depth) bg_threhold = 0.4 x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3) x[image_depth < bg_threhold] = 0 y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3) y[image_depth < bg_threhold] = 0 z = np.ones_like(x) * np.pi * 2.0 image = np.stack([x, y, z], axis=2) image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5 image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8) image = Image.fromarray(image)
Below is what the code does:
image_depth
: Stores a copy of theimage
from the NumPy arrayimage_depth -= np.min(image_depth)
: Subtracts the minimum pixel value from the image_depth array and normalizes it to ensure all pixel values are non-negativeimage_depth /= np.max(image_depth)
: Scales the pixel values to the range[0, 1]
by dividing the entireimage_depth
array with its maximum valuebg_threhold = 0.4
: Defines a threshold value (0.4
in this case) to identify the background pixels in the depth map. Pixels with a value less than the threshold are background pixelsx = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
: Applies the Sobel operator on the original image along the x-axis to compute the gradient in the x-direction. The resulting x array represents the x-componentx[image_depth < bg_threhold] = 0
: Removes the gradients from the background regions. The same processes run for the y-directionz = np.ones_like(x) * np.pi * 2.0
: Initializes an arrayz
with the same shape asx
and fills it with a constant value of2.0 * np.pi
. The z component represents the orientation angle of the gradientnp.stack([x, y, z], axis=2)
: Thex
,y
, andz
arrays combine along the third dimension to create a 3-channel imagenp.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5
: Normalizes the combined gradient image to unit length. It divides each pixel's gradient vector by its value, ensuring that the gradient vectors have a length of 1(image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)
: Ensures that the gradient image scales and shifts to the range[0, 255]
for visualization as an 8-bit imageimage = Image.fromarray(image)
: Converts the NumPy array image to an Image object from PIL for visualization or other uses
Define the model
controlnet = ControlNetModel.from_pretrained( "fusing/stable-diffusion-v1-5-controlnet-normal" ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
Execute the model on the image with a text prompt
image = pipe("YOUR_PROMPT_HERE", image, num_inference_steps=20).images[0]
View the image
image
Pose Detection Method
This model detects the pose from an image and makes changes based on the retrieved pose information. The pose is always retained in this model irrespective of the required changes as described below.
Restart the Jupyter Notebook kernel to run the model
Import the required libraries
Import the openpose detector used to detect the pose in an image
from controlnet_aux import OpenposeDetector
Initialize the model with the pre-trained model and store it in a variable
openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
The
OpenposeDetector
performs pose estimation, a computer vision task that involves detecting and localizing human body key points in an image. It's initialized with a pre-trained model for faster execution.Load the image and pass it to the initialized
openpose
variableimage = load_image("https://example.com/image.png") image = openpose(image)
Define the model
controlnet = ControlNetModel.from_pretrained( "lllyasviel/sd-controlnet-openpose" ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
Using a prompt, execute the model on the image
image = pipe("YOUR_PROMPT_HERE", image, num_inference_steps=20).images[0]
View the generated image
image
Image Segmentation Method
Image segmentation is a method in which image processing uses the partition of an image into multiple parts or regions, based on the characteristics of the image pixels. Apply the image segmentation ControlNet method as described in the steps below.
Restart your Jupyter Notebook kernel
Import the required libraries
Import the additional libraries required for this model
from transformers import AutoImageProcessor, UperNetForSemanticSegmentation import numpy as np
Using
numpy
, define a color palettepalette = np.asarray([ [0, 0, 0], [120, 120, 120], [180, 120, 120], [6, 230, 230], [80, 50, 50], [4, 200, 3], [120, 120, 80], [140, 140, 140], [204, 5, 255], [230, 230, 230], [4, 250, 7], [224, 5, 255], [235, 255, 7], [150, 5, 61], [120, 120, 70], [8, 255, 51], [255, 6, 82], [143, 255, 140], [204, 255, 4], [255, 51, 7], [204, 70, 3], [0, 102, 200], [61, 230, 250], [255, 6, 51], [11, 102, 255], [255, 7, 71], [255, 9, 224], [9, 7, 230], [220, 220, 220], [255, 9, 92], [112, 9, 255], [8, 255, 214], [7, 255, 224], [255, 184, 6], [10, 255, 71], [255, 41, 10], [7, 255, 255], [224, 255, 8], [102, 8, 255], [255, 61, 6], [255, 194, 7], [255, 122, 8], [0, 255, 20], [255, 8, 41], [255, 5, 153], [6, 51, 255], [235, 12, 255], [160, 150, 20], [0, 163, 255], [140, 140, 140], [250, 10, 15], [20, 255, 0], [31, 255, 0], [255, 31, 0], [255, 224, 0], [153, 255, 0], [0, 0, 255], [255, 71, 0], [0, 235, 255], [0, 173, 255], [31, 0, 255], [11, 200, 200], [255, 82, 0], [0, 255, 245], [0, 61, 255], [0, 255, 112], [0, 255, 133], [255, 0, 0], [255, 163, 0], [255, 102, 0], [194, 255, 0], [0, 143, 255], [51, 255, 0], [0, 82, 255], [0, 255, 41], [0, 255, 173], [10, 0, 255], [173, 255, 0], [0, 255, 153], [255, 92, 0], [255, 0, 255], [255, 0, 245], [255, 0, 102], [255, 173, 0], [255, 0, 20], [255, 184, 184], [0, 31, 255], [0, 255, 61], [0, 71, 255], [255, 0, 204], [0, 255, 194], [0, 255, 82], [0, 10, 255], [0, 112, 255], [51, 0, 255], [0, 194, 255], [0, 122, 255], [0, 255, 163], [255, 153, 0], [0, 255, 10], [255, 112, 0], [143, 255, 0], [82, 0, 255], [163, 255, 0], [255, 235, 0], [8, 184, 170], [133, 0, 255], [0, 255, 92], [184, 0, 255], [255, 0, 31], [0, 184, 255], [0, 214, 255], [255, 0, 112], [92, 255, 0], [0, 224, 255], [112, 224, 255], [70, 184, 160], [163, 0, 255], [153, 0, 255], [71, 255, 0], [255, 0, 163], [255, 204, 0], [255, 0, 143], [0, 255, 235], [133, 255, 0], [255, 0, 235], [245, 0, 255], [255, 0, 122], [255, 245, 0], [10, 190, 212], [214, 255, 0], [0, 204, 255], [20, 0, 255], [255, 255, 0], [0, 153, 255], [0, 41, 255], [0, 255, 204], [41, 0, 255], [41, 255, 0], [173, 0, 255], [0, 245, 255], [71, 0, 255], [122, 0, 255], [0, 255, 184], [0, 92, 255], [184, 255, 0], [0, 133, 255], [255, 214, 0], [25, 194, 194], [102, 255, 0], [92, 0, 255], ])
Define the image processor and segmentor
image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-small") image_segmentor = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-small")
In the above code:
AutoImageProcessor
: Handles different image processing tasks such as resizing, normalization, and padding. Theupernet-convnext-small
model is a lighter version of UperNet, which is a semantic segmentation model.UperNetForSemanticSegmentation
: This function is a class designed for semantic segmentation tasks using the UperNet architecture. Theupernet-convnext-small
model trains on a large dataset that segments images into different classes to represent objects or regions with the same semantic meaning.
Both of the functions run on a pre-trained model to perform semantic segmentation on input images
Load the Image and convert it to RGB
image = load_image("https://example.com/image.png").convert('RGB')
Process the image
pixel_values = image_processor(image, return_tensors="pt").pixel_values with torch.no_grad(): outputs = image_segmentor(pixel_values) seg = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0] color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3 for label, color in enumerate(palette): color_seg[seg == label, :] = color color_seg = color_seg.astype(np.uint8) image = Image.fromarray(color_seg)
The above code defines the functions below:
image_processor
: Pre-processes the inputimage
. It takes theimage
as an argument and returns the image as a PyTorch tensor. Thereturn_tensors="pt"
argument sets the output to a PyTorch tensortorch.no_grad()
: Temporarily disables the gradient computationpixel_values
: Passes the processed image through image_processor, and obtains pixel values of the processed imageimage_segmentor
: Takes the processed image as input and produces segmentation outputsseg
: Passes theimage_processor.post_process_semantic_segmentation
function. It takes the model outputs and the target size of the image[image.size[::-1]]
as an argument to return the segmented imagecolor_seg
: An empty numpy array that stores the dimensions of the input imagefor label, color in enumerate(palette)
: Iterates over each label and its corresponding color in thepalette
color_seg[seg == label, :] = color
: Assigns the corresponding color to each pixel incolor_seg
based on the class label from the segmentationseg
color_seg.astype(np.uint8)
: Converts a segmented image to a numpy array with unsigned 8-bit integersImage.fromarray(color_seg)
: Converts the numpy arraycolor_seg
to an Image object fromPIL
Define the model
controlnet = ControlNetModel.from_pretrained( "lllyasviel/sd-controlnet-seg" ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None ) pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.enable_model_cpu_offload()
Execute the model on the image with a text prompt
image = pipe("YOUR_PROMPT_HERE", image, num_inference_steps=20).images[0]
View the generated image
image
Additional Model Parameters
Each model supports additional parameters that allow you to fine-tune the output, save the generated image, and change the precision mode as described in the optional steps below.
To save the generated image, view your working directory
pwd
Save your Image to the working directory path. For example, in your user home directory
image.save("/home/user/")
To save the image to a different path. Create the target directory first, and set it as the output path. For example, when you create a
model-images
directory, use:image.save("/home/user/model-images")
To view the processed image before the model executes. Run the
image
variable to view its valueimage
When you run the following prompt, the model executes on the variable, and when run again, you instead view the final generated output.
image = pipe("YOUR_PROMPT_HERE", image, num_inference_steps=20).images[0]
To view the final generated output, run the variable after the model executes the image as below
image = pipe("YOUR_PROMPT_HERE", image, num_inference_steps=20).images[0] image
In summary, running the variable before the prompt returns the processed image while running it after returns the final generated output.
To run the model on half-bit precision for faster execution, apply the
torch_dtype
astorch.float16
to the pipeline as belowcontrolnet = ControlNetModel.from_pretrained( "THE_MODEL_NAME_HERE",torch_dtype=torch.float16 ) pipe = StableDiffusionControlNetPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None,torch_dtype=torch.float16 )
It's important to note that using half-bit precision makes the model less precise
To optimize the model for faster speed and reduced memory consumption, use
xformers
before running executing the model on the imagepipe.enable_xformers_memory_efficient_attention()
Conclusion
In this article, you implemented Image manipulation with Stable Diffusion ControlNet models on a Vultr Cloud GPU server. You set up the server, installed common libraries, and performed image processing using the available model methods.
More Information
For more information about ControlNet methods, visit the following model card pages.