AI Generated Images with Stable Diffusion XL and Vultr Cloud GPU

Updated on November 11, 2023
AI Generated Images with Stable Diffusion XL and Vultr Cloud GPU header image

Introduction

Stable Diffusion XL (SDXL) is a deep learning text-to-image diffusion model developed by Stability AI. It can enhance existing images or generate new ones guided by a text description prompt. SDXL can generate high-resolution images that are more realistic and visually appealing. It can also generate images with more complex details, such as faces and objects.

This guide explains how you can use the Stable Diffusion XL (SDXL) model on a Vultr Cloud GPU instance. You are to install the required packages, generate an image using Stable Diffusion XL Model, increase the image quality using the Refiner model, combine the Refiner model and the base model and mask images by Inpainting.

Prerequisites

Before you begin:

Install Required Packages

  1. Install diffusers and other required packages

     $ pip install diffusers transformers accelerate safetensors matplotlib ipywidgets

    The above command installs the following packages:

    • transformers: Consists of multiple pre-trained models used for Natural Language Processing (NLP), Named Entity Recognition (NER), machine translation, and sentiment analysis.
    • diffusers: These are pre-trained diffusion models in the form of prepackaged pipelines. It provides tools for building and training diffusion models. Diffusers also include many different core neural network models, and are used as building blocks to create new pipelines.
    • accelerate: Enables PyTorch to run across any distributed configuration. It uses accelerators like GPUs and TPUs to improve efficiency and scalability, speed up natural language processing (NLP) workflows, and enhance performance.
    • safetensor: It is a package aimed at enhancing debugging and diagnostics for PyTorch tensors. It provides additional features and checks to help identify and prevent common tensor-related issues.
    • matplotlib: It enables you to display the images in Jupyter Notebook.
  2. Create a new directory to save generated images

     $ mkdir /home/jupyter/notebooks/generated_images

    The above command creates a new directory to save the generated images.

Image Generation using Stable Diffusion XL Model

Stable Diffusion XL (SDXL) is a pre-trained text-to-image generation model with 3.5 billion parameters, capable of generating realistic images with resolutions of up to 1024 x 1024 pixels.

To generate images with Stable Diffusion XL, import the required modules such as StableDiffusionXLPipeline from diffusers, torch, and matplotlib.pyplot. Then, initialize the model using the provided model_id and set it up for GPU acceleration by assigning it to the CUDA device.

  1. Open a new Notebook and set it's name to Stable Diffusion XL Base

    Image of new notebook

  2. To use the model, import the following modules

     import torch
     import matplotlib.pyplot as plt
     from diffusers import StableDiffusionXLPipeline

    Below is what each module does:

    • StableDiffusionXLPipeline class provides an interface to the model for generating images.
    • torch enables support for tensor computations. In this context, it's used for GPU acceleration.
    • matplotlib library allows you to display the generated images.
  3. Declare the model

     model_id = "stabilityai/stable-diffusion-xl-base-1.0"
     pipe = StableDiffusionXLPipeline.from_pretrained(
         model_id, 
         torch_dtype=torch.float16, 
         variant="fp16", 
         use_safetensors=True
     )
     pipe.to("cuda")

    By calling the from_pretrained method, the pipeline takes care of the necessary setup to generate images from text.

    The parameters passed to the from_pretrained() method are:

    • The model_id of a pipeline. The function call above loads the "stabilityai/stable-diffusion-xl-base-1.0" model. The model ID can also be the path to a local directory containing model weights or a path (local or URL) to a checkpoint file.
    • torch_dtype is the Torch datatype of the tensors used for pipeline computations. float16 is specified explicitly so that the model computations are done in 16-bit floating point numbers, to assist systems with less GPU RAM. It is possible to let the system choose the optimal data type using torch_dtype = "auto".
  4. Generate an image by providing a prompt as below

     prompt = "Astronaut in a jungle"
     image = pipe(prompt=prompt).images

    Replace Astronaut in a jungle with your desired text prompt

    The above code declares and feeds the prompt to the previously declared pipeline and stores the image attribute. A different image is generated each time you run the module, you can enhance the output image by providing a more detailed prompt.

  5. Render the generated image

     plt.imshow(image[0])

    The Astronaut in a jungle generates an image like the one below:

    AI Generated Image Output

Increase Image Quality using Stable Diffusion XL Refiner Model

Image-to-Image is a pre-trained diffusion pipeline from the Diffusers library that also includes a refiner checkpoint specialized in denoising low-noise stage images to generate images of improved high-frequency quality

The following section explains the steps to refining images using a pre-trained pipeline from the Diffusers library along with a refiner check point

  1. Open a new Notebook and set it's name to Stable Diffusion XL Image to Image.

    Image of new notebook

  2. To clear GPU memory and start running the model, navigate to the Kernel menu option in your Jupyter Notebook, and click Shutdown Down All Kernels.

    Notebook image

  3. To use the model, import the following packages

     import torch
     import matplotlib.pyplot as plt
     from diffusers import StableDiffusionXLImg2ImgPipeline
     from diffusers.utils import load_image

    Below is what each module does:

    • StableDiffusionXLImg2ImgPipeline class provides an interface to the model for refining images.
    • load_image function from the utils module within the diffusers library loads image data from a given source.
  4. Declare the model

     model_id = "stabilityai/stable-diffusion-xl-refiner-1.0"
     pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
         model_id, 
         torch_dtype=torch.float16, 
         variant="fp16", 
         use_safetensors=True
     )
     pipe = pipe.to("cuda")

    By calling the from_pretrained method, the pipeline takes care of the necessary setup to refine the generated images.

    The model_id of a pipeline. The function call above loads the "stabilityai/stable-diffusion-xl-refiner-1.0" model. The model ID can also be the path to a local directory containing model weights or a path (local or URL) to a checkpoint file.

  5. Load the image

     init_img_url = "https://i.imgur.com/UikG1MN.png"
     init_image = load_image(init_img_url).convert("RGB")

    url contains the image's URL, which needs to be refined. And then, load_image(url).convert("RGB") will load the image and convert the image into RGB color space.

  6. Generate an image by providing a prompt as below

     prompt = "A photo of an astronaut riding a horse on mars"
     image = pipe(prompt, image=init_image).images

    Use the same prompt here that you will use to generate your original image or use a prompt that describes your input image. For example, the above image's URL, image was generated by using this same A photo of an astronaut riding a horse on mars prompt.

  7. Render the generated image

     plt.imshow(image[0])

    The A photo of an astronaut riding a horse on mars generates an image like the one below:

    AI Generated Image Output

    This refiner checkpoint can also be used as a second-step” pipeline after having run the base checkpoint to improve image quality.

  8. You can also access the above code in the following Notebook

    Stable Diffusion XL Image to Image notebook

Combine Refiner and Stable Diffusion XL Model

Diffusers also allow you to use two pipelines simultaneously from its library. With this, you can generate an image using the StableDiffusion-XL base checkpoint and improve the quality of the generated image using the refiner checkpoint

When you use the base and refiner models together to generate an image, this is known as an ensemble of expert denoisers. The ensemble of expert denoisers approach requires fewer denoising steps in total compared to passing the base model's output to the refiner model, making it significantly faster to run. However, it's important to note that you won't be able to inspect the base model's output because it still contains a significant amount of noise.

  1. Open a new Notebook and set it's name to Stable Diffusion XL Base + Refiner

    Image of new notebook

  2. To clear GPU memory and start running the model, navigate to the Kernel menu option in your Jupyter Notebook, and click Shutdown Down All Kernels

    Notebook image

  3. To use the model, import the following packages

     import torch
     import matplotlib.pyplot as plt
     from diffusers import DiffusionPipeline
  4. Declare the base model

     base = DiffusionPipeline.from_pretrained(
         "stabilityai/stable-diffusion-xl-base-1.0", 
         torch_dtype=torch.float16, variant="fp16", 
         use_safetensors=True
     )
     base.to("cuda")
  5. Declare the refiner model

     refiner = DiffusionPipeline.from_pretrained(
         "stabilityai/stable-diffusion-xl-refiner-1.0",
         text_encoder_2=base.text_encoder_2,
         vae=base.vae,
         torch_dtype=torch.float16,
         use_safetensors=True,
         variant="fp16",
     )
     refiner.to("cuda")

    Because the text encoders and variational autoencoder are the same, you don’t have to load those again for the refiner.

  6. Define the number of inference steps and determine the point at which the model should be passed through the high-noise denoising stage (for example, the base model)

     n_steps = 40
     high_noise_frac = 0.8

    The base model is trained on timesteps 0-999, and the refiner is finetuned from the base model on low noise timesteps 0-199 inclusive, so use the base model for the first 800 timesteps (high noise) and the refiner for the last 200 timesteps (low noise). Hence, high_noise_frac is set to 0.8 so that all steps 200-999 (the first 80% of denoising timesteps) are performed by the base model and steps 0-199 (the last 20% of denoising timesteps) are performed by the refiner model.

    Remember, the denoising process starts at high value (high noise) timesteps and ends at low value (low noise) timesteps.

  7. Execute both the pipelines

     prompt = "A majestic lion jumping from a big stone at night"
    
     base_image = base(
         prompt=prompt,
         num_inference_steps=n_steps,
         denoising_end=high_noise_frac,
         output_type="latent",
     ).images
    
     refined_image = refiner(
         prompt=prompt,
         num_inference_steps=n_steps,
         denoising_start=high_noise_frac,
         image=base_image,
     ).images

    Make sure to set denoising_end and denoising_start to the same values and keep num_inference_steps constant. Also remember that the output of the base model should be in latent space.

  8. Render the generated image

     plt.imshow(refined_image[0])

    The image generated by the prompt A majestic lion jumping from a big stone at night is an example that illustrates the comparison between using a base model and a combined base and refiner model.

    However, you won’t be able to inspect the base model’s output because it still contains a large amount of noise.

    AI Generated Image Output

    The above base model image was generated using the Stable Diffusion XL base model with a manual seed. Subsequently, the same manual seed was employed to generate these images for demonstration purposes. If you wish to reproduce images using a manual seed, you can follow the steps below.

  9. Create a new generator

    Apply this section with the Stable Diffusion XL Base model, following the same steps used to generate the previous base model image.

     prompt = "A majestic lion jumping from a big stone at night"
     generator = torch.Generator("cuda").manual_seed(3078)
     image = pipe(prompt=prompt, generator=generator).images

    Above code block defines a new generator to the pipeline. Using manual_seed() with a fixed number makes the model produce consistent output. You can pick any integer as the manual seed. By default, the generator generates a new seed for a unique image every time.

  10. Execute both the pipelines with same manual seed

    Apply this section with Stable Diffusion XL combine Base and refiner model, following the same steps used to generate the previous model image.

     prompt = "A majestic lion jumping from a big stone at night"
     generator = torch.Generator("cuda").manual_seed(3078)
    
     base_image = base(
         prompt=prompt,
         generator=generator,
         num_inference_steps=n_steps,
         denoising_end=high_noise_frac,
         output_type="latent",
     ).images
    
     refined_image = refiner(
         prompt=prompt,
         num_inference_steps=n_steps,
         denoising_start=high_noise_frac,
         image=base_image,
     ).images

    Above code block above generates an image identical to the base model image using the same manual seed that was used to create the base model image. It then refines the image to improve its quality, allowing you to make a comparison between the two images.

  11. You can also access the above code in the following Notebook

    Stable Diffusion XL Base + Refiner Notebook

Mask Images by Inpainting

Inpainting is a pre-trained diffusion pipeline from the Diffusers library, designed to fill in missing or damaged regions of an image by predicting these areas based on the surrounding pixels.

Inpainting utilizes the same Stable Diffusion XL base model. The help of a trained diffusion pipeline enables the model to learn how to restore missing or damaged portions of an image while preserving its original style and content.

To generate inpainting images, you’ll need the original image and a mask of what you want to replace in the original image. Create a prompt to describe what you want to replace in the masked area.

  1. Open a new Notebook and set it's name to Stable Diffusion XL Inpaint

    Image of new notebook

  2. To clear GPU memory and start running the model, navigate to the Kernel menu option in your Jupyter Notebook, and click Shutdown Down All Kernels

    Notebook image

  3. To use the model, import the following packages

     import torch
     import matplotlib.pyplot as plt
     from diffusers import StableDiffusionXLInpaintPipeline
     from diffusers.utils import load_image
  4. Declare the model

     model_id = "stabilityai/stable-diffusion-xl-base-1.0"
     pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
         model_id, 
         torch_dtype=torch.float16, 
         variant="fp16", 
         use_safetensors=True
     )
     pipe.to("cuda")
  5. Upload the images

     init_img_url = "https://i.imgur.com/AsJ1lPf.png"
     mask_img_url = "https://i.imgur.com/liBhpAv.png"
     init_image = load_image(init_img_url).convert("RGB")
     mask_image = load_image(mask_img_url).convert("RGB")
  6. Generate an image by providing a prompt as below

     prompt = "A majestic tiger sitting on a bench"
     image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=50, strength=0.80).images

    Replace A majestic tiger sitting on a bench with your desired prompt

  7. Render the generated image

     plt.imshow(image[0])

    Below is an example of the image showing how the model works:

    AI Generated Image Output

  8. You can also access the above code in the following Notebook

    Stable Diffusion XL Inpaint Notebook

Save Generated Images

  1. Specify the directory you created earlier to save the generated images

     save_directory = "/home/jupyter/notebooks/generated_images" 

    The defined directory must exist on the disk. Verify that you created it in your user home directory.

  2. Using a for loop, save the images

     for i, image in enumerate(image):
    
         image.save(f"{save_directory}/image_{i}.png")

    The above code saves all generated images to the predefined save_directory path. It uses the save() method to save each image. Saved images are added with filenames in the format image_{i}.png, where {i} refers to the index of the image in the list.

  3. In your terminal session, verify that the images are successfully saved to the directory.

     $ ls /home/jupyter/notebooks/generated_images

    To download a copy of the generated images, you can use a secure transfer protocol like SFTP, FTP, or RSync to fetch the files to your computer.

Additional Parameters

Below is what each parameter used in the model pipelines does:

  • prompt: Represents the input text prompt that guides the image generation process
  • negative_prompt: It guides on what to ignore in image generation. If not defined, you need to pass negative_prompt_embeds instead. It's ignored when you're not using guidance guidance_scale < 1
  • generator: An instance of the torch.Generator class that allows you to control the random number generation
  • seed: Specifies the random seed used to initialize the model and the data loader. This helps to ensure that the results are reproducible
  • guidance_scale: It improves adherence to text prompts and affects sample quality. Values between 7 and 8.5 work well, and the default value is 7.5
  • add_watermarker: Whether to use the invisible_watermark library to watermark output images. If not defined, it will default to True if the package is installed, otherwise no watermarker will be used
  • images: A list of all generated image objects
  • height: Sets the height in pixels of the generated image in the pipeline
  • width: Sets the width in pixels of the generated image in the pipeline
  • numinferencesteps: It defines the number of steps involved in the inference process. It's recommended to set it to 50 to balance generation speed and result quality. A smaller value leads to faster results and vice versa

Conclusion

In this guide, you generated images using the Stable Diffusion XL (SDXL) model on a Vultr A100 Cloud GPU server. Additionally, you refined the generated images using Refiner, inpainted the mask images using an inpainting pipeline.

More Information

For more information, please visit the following resources: