Stable Diffusion 101

Hitchhiker's Guide to Text-To-Image Generation

Hardip Patel

Presentation in blog format

Stable Diffusion 101

Intro

  • Fulltime: Backend-Heavy Full-Stack developer
  • Sparetime: Work on my Accountability site
  • Hobbies:
    • Snooker (very recent)
    • Box Cricket
    • Try new hobbies
  • Currently Reading
Stable Diffusion 101

Why this topic? ...even though you're GenAI "noob"

  • Provide beginner's perspective
  • Wanted to help close the barrier to entry gap
Stable Diffusion 101

Inspiration ...for getting into it

Stable Diffusion 101

Journey Overview

  • Tried Midjournery on Discord very very early
  • Tested Automatic1111 after watching Overpowered
  • Reached saturation with UI, so wanted to try with code
  • Tried ComfyUI for this talk and it is quite awesome to say the least
Stable Diffusion 101

What is Stable Diffusion?

  • Text to Image model, combination of...
    • Language Model, to transform Text to Latent Representation
    • Generative Image Model, image conditioned on that Representation
  • Based on Diffusion (Probablistic) Models
    • Class of Latent Variable Generative models
Stable Diffusion 101

UI Tools for No-Code

  • Automatic1111
  • ComfyUI
  • Invoke AI
  • DiffusionBee
Stable Diffusion 101

Automatic1111

Installation Link

  • Widely used
  • Good extension support
  • Most compatible
  • But unstable...
Stable Diffusion 101

ComfyUI

Installation Link
Tutorial/Guide

  • Getting slack lately
  • Intuitive UI
  • Very stable
Stable Diffusion 101

Terminologies (1/5)

  • PyTorch
    • deep learning framework based on Torch
  • Base Model
    • Foundational model upon with specific model variants are made
    • For example, v1.5, v2, XL 0.9, XL 1.0
  • Checkpoint (Model)
    • Pretrained Weights
    • Types of images model is trained on
    • For example, Juggernaut XL, Anything v3.0, epicRealism, etc...
Stable Diffusion 101

Terminologies (2/5)

  • Guidance Scale (CFG)
    • Controls how much a process follows a text prompt
  • LoRA ( LOw Rank Adaptation Technology)
    • Add specific styles or characters while mantaining manageable file sizes
  • PEFT (Parameter Efficient Fine-Tuning)
    • Adapting Pre-trained Language Model(PLMs) to fine-tune extra parameters while keeping original parameters frozen.
    • Used to create LoRA
Stable Diffusion 101

Terminologies (3/5)

  • Weights
    • Numerical values associated with the connections between neurons in neural network architecture
    • Visualize
  • Prompt
    • Text based instruction
Stable Diffusion 101

Terminologies (4/5)

  • Text encoder
    • Transformer language model
    • Tokenizes text to be fed into U-Net
  • U-Net
    • Takes encoded text (plain text processed into a format it can understand) and a noisy array of numbers as inputs
  • VAE
    • Encodes and decodes images to and from a smaller latent space
  • Visualize
Stable Diffusion 101

Terminologies (5/5)

  • Pipeline
    • Running diffusion models in inference by bundling all the necessary components.
    • Provides flexibility
  • Seed
  • Fine-Tuning
    • Train a wide dataset model on a narrow dataset model
Stable Diffusion 101

Code demo

Stable Diffusion 101

ComfyUI with trained model

Stable Diffusion 101

Further capabilities of Stable Diffusion

  • Inpainting
    • Restore/Repair image
  • Outpainting
    • Extend canvas of the image
  • Image To Image
    • New image from input as image and text prompt
    • New image will follow the composition and color of input image
  • Depth To Image
    • Take depth of the input image for composition of new image
Stable Diffusion 101

THAT'S ALL FOLKS!

Stable Diffusion 101

Credits