Stable Diffusion prompt guide: Text-guided image-to-image generation with Stable Diffusion

VISIT

This tutorial shows how to create a custom diffusers pipeline for text-guided image-to-image generation with the Stable Diffusion model using the 🤗 Hugging Face Diffusers library. After reading, you will be able to create beautiful AI generated Artworks from a simple sketch.

Table of Contents

1Short introduction to Stable Diffusion

2Lets get started

Short introduction to Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on most GPUs. If you want to learn more continue reading here.

Lets get started

You need to accept the model license before downloading or using the weights. In this tutorial we'll use model version v1-4, so you'll need to visit its card, read the license and tick the checkbox if you agree.

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to this section of the documentation.

Now we will login into 🤗 Hugging Face. You can use the notebook_login function to login.

After this we will get started with the Image2Image pipeline.

Load the pipeline.

Download an initial image and preprocess it so we can pass it to the pipeline.

Define the prompt and run the pipeline.

Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input.

In Colab you can print out the image by just typing:

And there you have it! A beautiful AI generated artwork from a simple sketch.

Futhermore, you can tune the parameters and test what work best for your usecase. As you can see, when using a lower value for strength, the generated image is more closer to the original init_image: