Generating Images with Stable Diffusion | Generative AI Series
Introduction
Stable Diffusion is a text-to-image diffusion model developed by Stability AI. It is capable of generating high-quality, photorealistic images from text descriptions. Unlike other text-to-image models, Stable Diffusion generates consistent photos even when the input text description is complex or open-ended.
In this guide, you'll set up the Stable Diffusion environment and query the model using a web user interface. Then, you'll create a REST API to generate responses from the model and access the API through a Jupyter Notebook.
Prerequisites
Before you begin:
Deploy a new Ubuntu 22.04 A100 Vultr Cloud GPU Server with at least:
- 80 GB GPU RAM
- 12 vCPUs
- 120 GB Memory
Create a non-root user with
sudo
rights and switch to the account.
Install Dependency Packages
The Stable Diffusion requires some dependency packages to work. Install the packages using the following command:
$ sudo apt update
$ sudo apt install -y wget git python3 python3-venv libgl1 libglib2.0-0
Run Stable Diffusion in a Web Interface
You can run the Stable Diffusion model in a web interface. Follow the steps below to download an automatic script that installs all the necessary packages. Then, load the model:
Create a new
sd
directory and navigate to it.console$ mkdir sd $ cd sd
Download the Stable Diffusion
webui.sh
file.console$ wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh
Add execute permissions to the
webui.sh
file.console$ sudo chmod +x ./webui.sh
Allow port
7860
through the firewall.console$ sudo ufw allow 7860 $ sudo ufw reload
Run the
webui.sh
script to download the model and run the web interface.console$ ./webui.sh --listen
Output:
... Model loaded in 18.6s (calculate hash: 10.6s, load weights from disk: 0.2s, create model: 1.9s, apply weights to model: 5.4s, apply half(): 0.1s, calculate empty prompt: 0.2s).
Visit the URL below. Replace
PUBLIC_IP_ADDRESS
with the public IP address of your GPU instance.http://PUBLIC_IP_ADDRESS:7860
Type the following queries and review the output:
A cute white cat sitting next to a computer keyboard
Output:
Taj Mahal during sunset, photo realistic, high quality
Output:
Create a REST API for the Stable Diffusion Model
The bentoml
library provides support for deploying and serving the Stable Diffusion model through an API. Follow the steps below to create and run an API:
Use
pip
to install the required libraries.console$ pip install bentoml diffusers transformers accelerate pydantic
Navigate to the
sd
directory you created earlier.console$ cd ~/sd
Create a new
fetch_sd.py
file.console$ nano fetch_sd.py
Enter the following information into the
fetch_sd.py
file.pythonimport bentoml bentoml.diffusers.import_model( "sd2.1", "stabilityai/stable-diffusion-2-1", )
Create a new
service.py
file.console$ nano service.py
Enter the following information into the
service.py
file. The following script loads a BentoML service that uses the Stable Diffusion model to convert text to image.pythonimport bentoml from bentoml.io import Image, JSON from sdargs import SDArgs bento_model = bentoml.diffusers.get("sd2.1:latest") sd21_runner = bento_model.to_runner(name = "sd21-runner") svc = bentoml.Service("stable-diffusion-21", runners=[sd21_runner]) @svc.api(input = JSON(pydantic_model = SDArgs), output = Image()) async def txt2img(input_data): kwargs = input_data.dict() res = await sd21_runner.async_run(**kwargs) images = res[0] return images[0]
Save and close the file.
Create a new
sdargs.py
file.console$ nano sdargs.py
Enter the following information into the
sdargs.py
file. The following script defines anSDArgs
Pydantic model that allows extra fields while inputting data. The script handles data validation in the application.pythonimport typing as t from pydantic import BaseModel class SDArgs(BaseModel): prompt: str negative_prompt: t.Optional[str] = None height: t.Optional[int] = 512 width: t.Optional[int] = 512 class Config: extra = "allow"
Create a
service.yaml
fileconsole$ nano service.yaml
Enter the following information into the file.
yamlservice: "service.py:svc" include: - "service.py" python: packages: - torch - transformers - accelerate - diffusers - triton - xformers - pydantic docker: distro: debian cuda_version: "11.6"
Save and close the file
Run the
fetch_sd.py
file to pull the image from Hugging Face. This file allows thebentoml
library to download the Stable Diffusion image and make it available locally. The command takes around 10 minutes to complete.console$ python3 fetch_sd.py
Output:
Downloading (…)rocessor_config.json:... ...
List the models.
console$ bentoml models list
Allow port
3000
through the firewall.console$ sudo ufw allow 3000 $ sudo ufw reload
Run the
bentoml
service.consolebentoml serve service:svc
Save and close the file.
Access Stable Diffusion API from a Jupyter Notebook
After setting up a Stable Diffusion API in the previous section, you can now run a Python script to access the API using a Jupyter Notebook. Follow the steps below:
Invoke a Jupypter lab service and retrieve your access
token
.console$ jupyter lab --ip 0.0.0.0 --port 8890
Allow port
8890
through the firewall.console$ sudo ufw allow 8890 $ sudo ufw reload
Access the Jupyter Lab on a web browser. Replace
YOUR_SERVER_IP
with the public IP address of the GPU instance.http://YOUR_SERVER_IP:8890/lab?token=YOUR_TOKEN
Click Python 3 ipykernel under Notebook and paste the following Python code. The following script accesses the REST API to infer the Stable diffusion model. The script also provides a text prompt to the model with extra values like
height
andwidth
to generate a response.pythonimport requests from io import BytesIO from IPython.display import Image, display url = "http://127.0.0.1:3000/txt2img" headers = {'Content-Type': 'application/json'} data = { "prompt": "a black cat", "height": 768, "width": 768 } response = requests.post(url, headers = headers, json = data) display(Image(response.content))
Output:
Conclusion
In this guide, you've used the Stable Diffusion model to generate images using text inputs. You've run the model's functionalities using a web interface and later used a Jupyter Notebook to access the REST API.
- Generative AI for Developers | Generative AI Series
- Understanding Foundation Models | Generative AI Series
- Exploring Vultr GPU Stack | Generative AI Series
- A Deeper Dive Into Large Language Models | Generative AI Series
- Interacting with Llama 2 | Generative AI Series
- Implementing RAG with Chroma and Llama 2 | Generative AI Series
- Using LangChain with Llama 2 | Generative AI Series
- Fine-Tuning Llama 2 | Generative AI Series
- Generating Images with Stable Diffusion | Generative AI Series
- Transcribing and Translating Audio | Generative AI Series