AI Generated Images using Stable Diffusion Turbo Models

Updated on July 25, 2024
AI Generated Images using Stable Diffusion Turbo Models header image

Introduction

Stable Diffusion Turbo models, Stable Diffusion (SD) Turbo and Stable Diffusion XL (SDXL) Turbo are real-time revolutionary text-to-image generation models developed by Stability AI. They're are based on the previous Stable Diffusion 2.1 and Stable Diffusion XL (SDXL) models with a new execution Adversarial Diffusion Distillation (ADD) technique.

ADD allows these models to synthesize image outputs in a series of one to four steps as compared to the conventional 50 steps. This remarkable capability makes both models much faster than previous text-to-image models and image-to-image models, which typically require hundreds of steps to generate an image.

This guide explains how you can achieve AI generated images using Stable Diffusion Turbo Models on a Vultr Cloud GPU server. You are to install the required packages to create the development environment and further compare the model time differences, performance benefits, and limitations while generating images.

Prerequisites

Before you begin, you should:

Adversarial Diffusion Distillation (ADD)

Adversarial Diffusion Distillation (ADD) is an advanced training method for large-scale image diffusion models such as Stable Diffussion Turbo. It utilizes score distillation with a pre-trained Stable Diffusion model as a guide to distill valuable knowledge for a high-quality output. The inclusion of an adversarial loss mechanism ensures realism which makes ADD a potent tool for real-time image generation.

ADD speeds up SD Turbo models by reducing the refinement steps through knowledge distillation from a pre-trained "teacher" model. This dual approach leverages the teacher's knowledge, results in real-time image generation, lower computational costs, and quicker experimentation even in the absence of official code.

Install the Model Dependency Packages

  1. Open a terminal within the Jupyter lab interface.

    Image of terminal

  2. Install the required dependency packages.

    console
    $ pip install diffusers transformers accelerate matplotlib ipywidgets --upgrade
    

    The above command installs the following packages:

    • diffusers: Provides tools for building and training diffusion models. Diffusers include many different core neural network models and are used as building blocks to create new pipelines.
    • transformers: Consists of multiple pre-trained models used for Natural Language Processing (NLP), Named Entity Recognition (NER), machine translation, and sentiment analysis.
    • accelerate: Enables PyTorch to run across any distributed configuration. It uses accelerators such as GPUs and TPUs to improve efficiency and scalability, speed up natural language processing (NLP) workflows, and enhance performance.
    • matplotlib: Display the images in your Jupyter Notebook.

Generate Images using Stable Diffusion Turbo

Stable Diffusion Turbo is available as a pre-trained model with the Hugging Face Diffusers library. The model can generate images from text prompts and images from images. In this section, generate images using the model using both prompt types as described in the steps below.

Text to Image Generation

To generate images from text prompts, import the required modules such as AutoPipelineForText2Image from diffusers, torch, and matplotlib.pyplot. Then, initialize the model using the provided model_id to set it up for GPU acceleration by assigning it to the CUDA device.

  1. Open a new Notebook session, and set its name to Stable Diffusion Turbo Text2Image.

    Image of new notebook

  2. To use the model, import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForText2Image
    

    Below is what each module does:

    • AutoPipelineForText2Image: Provides an interface to the model for generating images.
    • torch: Enables support for tensor computations. In this context, it's used for GPU acceleration.
    • matplotlib: Allows you to display the generated images.
  3. Declare the model.

    python
    model_id = "stabilityai/sd-turbo"
    pipe = AutoPipelineForText2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    

    The pipeline applies the necessary setup to generate images from text using the from_pretrained method. The parameters passed to the method include:

    • model_id: Loads the stabilityai/sd-turbo model. The model ID can also be the path to a local directory containing model weights or URL to a checkpoint file.
    • torch_dtype: Defines the Torch datatype of the tensors to use for pipeline computations. float16 is specified explicitly to run model computations with 16-bit floating point numbers. To let the system choose the optimal data type use the auto value torch_dtype = "auto".
  4. Generate an image using a text prompt. For example, Sunset on a beach.

    python
    prompt = "Sunset on a beach"
    image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images
    

    The above code declares and feeds the prompt to the previously declared pipeline to store the image attribute. A different image is generated each time you run the module. To enhance the image output, enter a more detailed prompt to match your desired results.

  5. View the generated image.

    python
    plt.imshow(image[0])
    

    Verify that the generated image matches your Sunset on a Beach prompt or fine-tune the prompt to generate a more desired result.

    AI Generated Image Output

Image to Image Generation

  1. Open a new Notebook and set its name to Stable Diffusion Turbo Image2Image.

    Image of new notebook

  2. Navigate to the Kernel menu option in your Jupyter Notebook, and click Shutdown Down All Kernels to clear GPU memory.

    Notebook image

  3. In a new code cell, import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForImage2Image
    from diffusers.utils import load_image
    

    Below is what each module does:

    • AutoPipelineForImage2Image: Provides an interface to the model for refining images.
    • load_image: Loads image data from a declared source.
  4. Declare the model.

    python
    model_id = "stabilityai/sd-turbo"
    pipe = AutoPipelineForImage2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Load the image to the pipeline. Replace https://i.imgur.com/wFC9Yw6.png with your desired source image URL.

    python
    init_image = load_image("https://i.imgur.com/wFC9Yw6.png").resize((512, 512))
    
  6. Add a supporting text prompt to guide the image generation process. For example, A fantasy landscape, trending on artstation.

    python
    prompt = "A fantasy landscape, trending on artstation"
    image = pipe(prompt, image=init_image, num_inference_steps=4, strength=1, guidance_scale=0.0).images
    
  7. View the generated image.

    python
    plt.imshow(image[0])
    

    Verify that your input image matched with the supporting prompt generates a new image.

    AI Generated Image Output

Time Difference Compared to Stable Diffusion 2.1

Text to Image Generation Time Difference

In this section, compare the text-to-image generation times between SD Turbo and Stable Diffusion 2.1. The image generation steps are similar, but you need to import the time variable to calculate the image generation duration for each model as described in the steps below.

  1. Open a new Notebook and set its name to Stable Diffusion text-to-image Time Comparision.

    Image of new notebook

  2. Clear the GPU memory to start running the model

    Notebook image

  3. Import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForText2Image
    
  4. Declare the model.

    python
    model_id = "stabilityai/sd-turbo"
    pipe = AutoPipelineForText2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Generate an image using the Stable Diffusion Turbo model with a prompt such as Sunset on a beach.

    python
    import time
    prompt = "Sunset on a beach"
    start_time = time.time()
    
    image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images
    end_time = time.time()
    plt.imshow(image[0])
    
  6. View the model generation time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 0.7282497882843018 seconds

    Keep note of the time value.

  7. Generate an image with the Stable Diffusion 2.1 Model.

    The Stable Diffusion 2.1 model steps are similar to SD Turbo text-to-image generation. In the previous code examples, replace the model_id name and remove num_inference_steps, and guidance_scale variables to pass the prompt with its default setting as described below.

    Replace the model_ID.

    python
    model_id = "stabilityai/stable-diffusion-2-1"
    

    Generate an image by providing using a prompt such as Sunset on a beach.

    python
    import time
    prompt = "Sunset on a beach"
    start_time = time.time()
    
    image = pipe(prompt=prompt).images
    end_time = time.time()
    plt.imshow(image[0])
    
  8. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 6.339369297027588 seconds

In comparison with the earlier Stable Diffusion Turbo 0.7282497882843018 seconds time, Stable Diffusion 2.1 is slower with 6.339369297027588 seconds. This is because SD Turbo can generate images in just one step, while Stable Diffusion 2.1 requires 50 steps to generate an image.

Image to Image Generation Time Difference

  1. Open a new Notebook and set its name to Stable Diffusion image-to-image Time Comparision.

    Image of new notebook

  2. Clear the GPU memory to start running the model.

    Notebook image

  3. Import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForImage2Image
    from diffusers.utils import load_image
    
  4. Declare the model.

    python
    model_id = "stabilityai/sd-turbo"
    pipe = AutoPipelineForImage2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Generate an image with the SD Turbo Model.

    python
    import time
    prompt = "A fantasy landscape, trending on artstation"
    start_time = time.time()
    
    image = pipe(prompt, image=init_image, num_inference_steps=4, strength=1, guidance_scale=0.0).images
    end_time = time.time()
    plt.imshow(image[0])
    
  6. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 0.8927862644195557 seconds

    Keep note of the generation time.

  7. Generate an image with the Stable Diffusion 2.1 Model.

    Replace the model_ID.

    python
    model_id = "stabilityai/stable-diffusion-2-1"
    

    Generate an image with a prompt such as A fantasy landscape, trending on artstation.

    python
    import time
    prompt = "A fantasy landscape, trending on artstation"
    start_time = time.time()
    
    image = pipe(prompt, image=init_image).images
    end_time = time.time()
    plt.imshow(image[0])
    
  8. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 2.755378007888794 seconds

Based on the image generation time, SD Turbo is much faster than Stable Diffusion 2.1. SD Turbo creates an in four steps, while Stable Diffusion 2.1 uses 40 steps for an image.

Generate Images using Stable Diffusion XL Turbo

SDXL Turbo is similar to the SD Turbo model and is a larger version capable of generating higher quality and clearer images. Similarly, it can also be used to generate images from images.

Text to Image Generation

The SDXL Turbo model steps are similar to the SD Turbo text-to-image generation process. In the previous code examples, replace the model_id name as below.

  1. Open a new Notebook and set its name to Stable Diffusion XL Turbo Text2Image.

    Image of new notebook

  2. To clear GPU memory and start running the model, navigate to the Kernel menu option in your Jupyter Notebook, and click Shutdown Down All Kernels.

    Notebook image

  3. Import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForText2Image
    
  4. Declare the model.

    python
    model_id = "stabilityai/sdxl-turbo"
    pipe = AutoPipelineForText2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Generate an image by providing a prompt as below.

    python
    prompt = "A baby racoon wearing a robe"
    image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images
    

    Replace A baby raccoon wearing a robe with your desired prompt.

  6. Render the generated image.

    python
    plt.imshow(image[0])
    

    The A baby raccoon wearing a robe generates an image like the one below:

    AI Generated Image Output

  7. You can also access the above code in the following Jupyter Notebook.

    Stable Diffusion XL Turbo Text2Image Notebook

Image to Image Generation

Stable Diffusion XL Turbo model steps are similar to the SD Turbo image-to-image generation process. In the previous code examples, replace the model_id variable with the SDXL Turbo model to generate images as described in the steps below.

  1. Open a new Notebook and set its name to Stable Diffusion XL Turbo Image2Image.

    Image of new notebook

  2. Navigate to the Kernel menu option in your Jupyter Notebook, and click Shutdown Down All Kernels to clear GPU memory.

    Notebook image

  3. Import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForImage2Image
    from diffusers.utils import load_image
    
  4. Declare the model.

    python
    model_id = "stabilityai/sdxl-turbo"
    pipe = AutoPipelineForImage2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Load the image.

    python
    init_image = load_image("https://i.imgur.com/5OiqkNA.png").resize((512, 512))
    
  6. Generate an image using a text prompt such as Astronauts in a jungle.

    python
    prompt = "Astronauts in a jungle"
    image = pipe(prompt, image=init_image, num_inference_steps=4, strength=1, guidance_scale=0.0).images
    
  7. View the generated image.

    python
    plt.imshow(image[0])
    

    Verify that your generated image matches your input prompt Astronauts in a Jungle.

    AI Generated Image Output

SDXL Turbo and XDXL Text to Image Generation Time Differences

  1. Open a new Notebook and set its name to Stable Diffusion XL text-to-image Time Comparision.

    Image of new notebook

  2. Clear the GPU memory to start running the model

    Notebook image

  3. Import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForText2Image
    
  4. Declare the model.

    python
    model_id = "stabilityai/sdxl-turbo"
    pipe = AutoPipelineForText2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Generate an image with the SDXL Turbo Model.

    python
    import time
    prompt = "A baby racoon wearing a robe."
    start_time = time.time()
    
    image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images
    end_time = time.time()
    plt.imshow(image[0])
    
  6. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 0.8553969860076904 seconds
  7. Generate an image with the SDXL Model.

    Replace the model_id.

    python
    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
    

    Generate an image using a prompt such as A baby racoon wearing a robe.

    python
    import time
    prompt = "A baby racoon wearing a robe"
    start_time = time.time()
    
    image = pipe(prompt=prompt).images
    end_time = time.time()
    plt.imshow(image[0])
    
  8. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 13.608830213546753 seconds

Image to Image Generation Time Differences

  1. Open a new Notebook and set the filename to Stable Diffusion XL image-to-image Time Comparision.

    Image of new notebook

  2. Clear the GPU memory to start running the model

    Notebook image

  3. Import the following packages.

    python
    import torch
    import matplotlib.pyplot as plt
    from diffusers import AutoPipelineForImage2Image
    from diffusers.utils import load_image
    
  4. Declare the model.

    python
    model_id = "stabilityai/sdxl-turbo"
    pipe = AutoPipelineForImage2Image.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        variant="fp16"
    )
    pipe.to("cuda")
    
  5. Generate an image with the SDXL Turbo Model.

    python
    import time
    prompt = "Astronauts in a jungle"
    start_time = time.time()
    
    image = pipe(prompt, image=init_image, num_inference_steps=4, strength=1, guidance_scale=0.0).images
    end_time = time.time()
    plt.imshow(image[0])
    
  6. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 1.2138450145721436 seconds
  7. Generate an image with the SDXL Model.

    Replace the model_id.

    python
    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
    

    Generate an image using a prompt such as Astronauts in a jungle

    python
    import time
    prompt = "Astronauts in a jungle"
    start_time = time.time()
    
    image = pipe(prompt, image=init_image, strength=0.8, guidance_scale=10.5).images
    end_time = time.time()
    plt.imshow(image[0])
    
  8. Print the time.

    python
    elapsed_time = end_time - start_time
    print(f"Image generation time: {elapsed_time} seconds")
    

    Output:

    Image generation time: 4.103722095489502 seconds

Performance Benefits Compared to Other Diffusion Models

To make a selection for both SD and SDXL Turbo, Stability AI conducted comparisons among various model variants, including StyleGAN-T++, OpenMUSE, IF-XL, SDXL, LCM-Lora1.5, and LCM-XL by generating outputs with the same prompt. The goal was to determine which model generated outputs that closely aligned with a given prompt. Human evaluators were presented with two randomly selected outputs and asked to choose the one that best followed the prompt’s direction. A similar test was conducted to assess image quality.

Image of performance comparision SDXL Turbo

In these blind tests, both SD and SDXL Turbo models performed exceptionally well. SDXL Turbo outperformed a 4-step configuration of LCM-XL with just a single step, and in addition, it surpassed a 50-step configuration of SDXL with only 4 steps. In a recent benchmark, it achieved the generation of images with an impressive average CLIP score of 0.42, significantly higher than the average CLIP score of 0.28 for other text-to-image models. These results show that both models deliver superior performance compared to multi-step models, all while substantially reducing computational requirements. Importantly, this advancement does not compromise the quality of the generated images.

Limitations of Stable Diffusion Turbo Models

While SD Turbo models are powerful, they do have some limitations. The models are still under development, and may not be accurate as some other text-to-image models. Neither of the models makes use of the guidance_scale or negative_prompt while generating images at a resolution of 512x512. Although higher image sizes are possible, they may not perform as the default 1024x1024 resolution within the SDXL model.

Conclusion

In this guide, you have generated images using Stable Diffusion Turbo models on a Vultr Cloud A100 GPU server. You further installed the necessary packages to create an environment and generated images using both Turbo models. To choose the best model to implement, you compared the time difference with previous models to understand their performance benefits and limitations.

More Information

For more information, please visit the following resources: