Previous slide
Next slide
Toggle fullscreen
Open presenter view
Stable Diffusion 101
Hitchhiker's Guide to Text-To-Image Generation
Hardip Patel
Presentation in blog format
Stable Diffusion 101
Intro
Fulltime: Backend-Heavy Full-Stack developer
Sparetime: Work on my
Accountability
site
Hobbies:
Snooker (very recent)
Box Cricket
Try new hobbies
Currently Reading
Make by Pieter Levels
Stable Diffusion 101
Why this topic? ...even though you're GenAI "noob"
Provide beginner's perspective
Wanted to help close the
barrier to entry
gap
Stable Diffusion 101
Inspiration ...for getting into it
Want to create dynamically updating hero pic for my
Accountability
site
Pieter Levels
(Check
Photo AI
)
Sayak Paul
Overpowered
Abhishek Thakur
Stable Diffusion 101
Journey Overview
Tried
Midjournery
on Discord very very early
Tested
Automatic1111
after watching Overpowered
Reached saturation with UI, so wanted to try with code
So hopped on to
Google Colab
Tried
ComfyUI
for this talk and it is
quite awesome
to say the least
Stable Diffusion 101
What is Stable Diffusion?
Text to Image model, combination of...
Language Model, to transform Text to Latent Representation
Generative Image Model, image conditioned on that Representation
Based on Diffusion (Probablistic) Models
Class of Latent Variable Generative models
Stable Diffusion 101
UI Tools for No-Code
Automatic1111
ComfyUI
Invoke AI
DiffusionBee
Stable Diffusion 101
Automatic1111
Installation Link
Widely used
Good extension support
Most compatible
But unstable...
Stable Diffusion 101
ComfyUI
Installation Link
Tutorial/Guide
Getting slack lately
Intuitive UI
Very stable
Stable Diffusion 101
Terminologies (1/5)
PyTorch
deep learning framework based on Torch
Base Model
Foundational model upon with specific model variants are made
For example, v1.5, v2, XL 0.9, XL 1.0
Checkpoint (Model)
Pretrained Weights
Types of images model is trained on
For example, Juggernaut XL, Anything v3.0, epicRealism, etc...
Stable Diffusion 101
Terminologies (2/5)
Guidance Scale (CFG)
Controls how much a process
follows a text prompt
LoRA
(
LO
w
R
ank
A
daptation Technology)
Add specific styles or characters while mantaining manageable file sizes
PEFT
(
P
arameter
E
fficient
F
ine-
T
uning)
Adapting Pre-trained Language Model(PLMs) to fine-tune extra parameters while keeping original parameters frozen.
Used to create LoRA
Stable Diffusion 101
Terminologies (3/5)
Weights
Numerical values associated with the connections between neurons in neural network architecture
Visualize
Prompt
Text based instruction
Stable Diffusion 101
Terminologies (4/5)
Text encoder
Transformer language model
Tokenizes text to be fed into U-Net
U-Net
Takes encoded text (plain text processed into a format it can understand) and a noisy array of numbers as inputs
VAE
Encodes and decodes images to and from a smaller latent space
Visualize
Stable Diffusion 101
Terminologies (5/5)
Pipeline
Running diffusion models in inference by bundling all the necessary components.
Provides flexibility
Seed
Fine-Tuning
Train a wide dataset model on a narrow dataset model
Stable Diffusion 101
Code demo
Inference Code
Model Fine-Tuning code
Prepare Images for Training using
Birme
Don't use token which is already trained
Inference code with trained model
Stable Diffusion 101
ComfyUI with trained model
Stable Diffusion 101
Further capabilities of Stable Diffusion
Inpainting
Restore/Repair image
Outpainting
Extend canvas of the image
Image To Image
New image from input as image and text prompt
New image will follow the composition and color of input image
Depth To Image
Take depth of the input image for composition of new image
Stable Diffusion 101
THAT'S ALL FOLKS!
Stable Diffusion 101
Credits
Towards Data Science
Hugging Face
Google Colab
BIRME
Automatic1111
Comfy UI