ComfyUI Course - Learn ComfyUI From Scratch | Full 5 Hour Course (Ep01)
FULL TRANSCRIPT
Learning Comfy UI is like opening a
technical book at the last page.
Everything is there, but nothing makes
sense yet. This course starts from page
one. Before we go any further, I want to
be very clear about how this course
works. This is not a shortcuts course.
It is not about copying workflows
without understanding them. Each chapter
builds on the previous one. We start
simple, repeat the important ideas and
only add complexity when it actually
makes sense. You do not need any coding
knowledge. You do not need to be
technical. If you can think visually,
you can learn Comfy UI. If you want to
understand how AI image generation
really works locally and how to use
Comfy UI without feeling lost, this
course is for you. My name is Pixaroma
and on this channel I focus on creating
and teaching comfy UI workflows in a
simple and practical way. I am a graphic
designer not a programmer and that is
actually a good thing. Developers are
great at writing code but they often
explain things in a very technical way.
This course is designed from a visual
thinker's perspective. My goal is to
explain Comfy UI logically and visually
without needing any coding knowledge.
Even if comfy UI looks confusing right
now, that is completely normal. We will
start from the absolute basics and build
up step by step. But before we talk
about Comfy UI itself, we first need to
understand what AI image generation
actually is. Today, AI is not just one
thing. There are many different AI
models that can run locally on your own
computer, such as stable diffusion,
Flux, Quen, and many others. It is also
important to understand that Comfy UI is
not limited to image generation. Comfy
UI is a general interface for running
many different types of AI models
locally. While it is most popular for
image generation, it can also be used
for audio, music, video, animation, 3D,
and more. As long as a model can be
connected through nodes, Comfy UI can be
used as the interface to control it.
These models by themselves are like an
engine. They are very powerful, but you
cannot really use them directly. To work
with them, we need an interface. An
interface is what allows us to send
prompts, images, and settings to the
model and then receive results back.
There are many free interfaces that let
us interact with these models. Some
popular ones are Forge UI, Swarm UI,
Invoke, Focus, and of course, Comfy UI.
They often use similar models but they
work in very different ways. In this
course, we are going to focus on Comfy
UI. Comfy UI is different because it is
node-based. Instead of hiding everything
behind buttons and menus, it shows you
exactly what is happening step by step.
You can see how prompts, models,
samplers, and images are connected
together like building a system. Think
of it like this. The AI model is the
brain. The interface is how you talk to
that brain. Comfy UI is like building
your own control panel exactly the way
you want. Do not worry if this still
feels complex. Understanding comes from
seeing things connect, not from
memorizing nodes. In this course, I will
explain what each node does, why it
exists, and how everything connects
together. Before we install anything, we
need to talk about how you will actually
run Comfy UI. There is more than one way
to use Comfy UI and the right choice
depends on your system and your
expectations. Let's go to the official
Comfy UI website to see the available
options. The official website is
Comfy.org.
If we go to the products section, you
can see that there are two main options,
Comfy UI Cloud and Local Comfy UI. Comfy
UI Cloud runs online on their servers
and it is a paid service. This option
can be useful if your computer is too
old or not powerful enough to run AI
models locally. Local Comfy UI is free
and runs directly on your own computer,
assuming you have a reasonably capable
system. This is the option we will focus
on in this course. So, let's click on
local comfy UI. Here you can see three
main installation options. Download for
Windows, for Mac OS, and install from
GitHub. All of these options install
Comfy UI, but there are important
differences between them. In this
course, I will focus on Windows
operating system using the portable
version of Comfy UI. All the workflows,
tools, and installers I show are tested
on Windows using an Nvidia graphics
card. On AMD graphics cards and on Mac
OS, performance is usually slower, and
some features or custom nodes may not
work exactly the same way. So, if you
are using Windows with an Nvidia card,
it will be much easier to follow this
course step by step as I show it.
Because there are many different AI
models, hardware requirements can vary a
lot. Some models are small and can run
on a graphics card with 6 to 8 GB of
VRAM. Other models are much larger and
may require more than 24 GB of VRAM. For
this first episode, I tested the
workflows on two different systems. One
system uses an RTX 2060 with 6 GB of
VRAM and 64 GB of system RAM. The second
system uses an RTX 4090 with 24 GB of
VRAM and 128 GB of system RAM. For the
workflows in this episode, a graphics
card with 6 to 8 GB of VRAM should be
enough to follow along. In later
episode, we will explore newer and
larger models that may require more
powerful hardware. Now, let's talk about
which version of Comfy UI you should
install. As I mentioned earlier, I am
using the portable version of Comfy UI.
If we click on install from GitHub, we
are taken to the official Comfy UI
GitHub page. Here you can find detailed
installation instructions, but they
require more manual steps and setup.
Over the past year, I have been using a
portable version of Comfy UI that
includes additional tools to make the
installation process much easier. This
installer installs the original Comfy
UI, but it also adds helpful tools so
you can get up and running much faster.
This installer is called Comfy UI Easy
Install. You can find it on this GitHub
page. You can find the creator on our
Discord community under username IVO.
Thank Ivo for this installer. This
entire course is built around this
version of Comfy UI. You can still use
Comfy UI Desktop or Comfy UI Cloud, but
some things may look different or behave
differently compared to what you see in
this course. If you want the exact same
setup that I use and the easiest way to
follow along, I recommend using the same
version. Let me show you how to install
Comfy UI. So, we are on the easy install
GitHub page. This is the complete link.
If we scroll down, you can read more
about this installer. Even if you might
not understand what each of these things
means yet, it will make sense later as
you learn more about it. I will talk
later about the Pixaroma Discord server
where you can get more help and answers
to your questions. So, this installer
will install Git, which is a tool that
tracks changes to files in Comfy UI. It
helps developers safely update the main
app and custom nodes, fix bugs without
breaking everything, and lets you update
or roll back to an earlier working
version if a new update causes problems.
Then it will install the Comfy UI
portable version. A portable version
means the program is fully
self-contained in one folder, does not
need a normal system install, and can be
run, moved, backed up, or deleted
without affecting the rest of your
computer. in Comfy UI portable. This
means Python, libraries, models, and
settings all live inside the Comfy UI
folder. So, you can copy it to another
driver or PC, update it safely, and
avoid breaking your system Python or
Windows setup. Python embedded means
Comfy UI comes with its own built-in
copy of Python already included inside
the Comfy UI folder instead of using the
Python installed on your system. Then it
will install all the nodes that are
useful and that I tested over the last
year. It might not make sense for you
yet if you are a beginner, but do not
worry. Take it as general knowledge for
now and it will make sense later. Then
it will add an add-ons folder with more
advanced stuff we can use later to speed
up our generation, plus some extra tools
that can be useful and then more
technical stuff explained for each one.
But all you need to know for now is
where to download it from and how to run
the installer. It is important not to
run it as administrator. That means you
just doubleclick to run it, not
right-click and run as administrator,
but you will see that in a minute. Also,
avoid system folders and make sure your
NVIDIA drivers are up to date since some
things work only with more recent NVIDIA
drivers. Okay, let us go back to where
it says Windows installation and let us
download the latest release from here.
Then depending on how your browser is
configured, it will either download it
to the downloads folder or ask you where
to download it and you can decide where
to put it. As you will see over time, it
needs a lot of space if you download big
models. So I suggest downloading and
installing your Comfy UI on a hard disk
that has a lot of free space and
preferably on a solid state drive
because it will load the models faster.
I will go to my D drive and I will
create a new folder called Comfy UI, but
this does not really matter. The name
can be anything easy to remember.
Sometimes I put Comfy UI followed by the
month so I know when I downloaded and
installed it. So I will save this zip
archive in that Comfy UI folder.
Now let us go to the place where we
saved the file. Since this is a zip
archive, we need to unzip it. You
rightclick on it and depending on what
you prefer, you can use the Windows
integrated option and select extract
all. I like to delete the folder name at
the end so I do not end up with a folder
inside another folder. When I click
extract, it will extract these two
files. Let me delete it really quick and
show you. If you have WinRAR like me,
you just choose extract here and it does
the same thing. Once we extracted
everything, you can delete the easy
install zip file. Now we are left with
two files. A BAT file that is the
installer and a zip file that contains
extra resources that it will use. When
you run it, you might get a security
warning. That usually happens with BAT
or executable files because they install
files on your system. This one is safe.
I installed and tested it and I
personally know EVO, the creator of this
installer. You can rightclick and scan
it with your antivirus and you will see
it is clean. So let us double click on
the BAT file then press run and it will
start installing. If you already have
git installed it will update it. If not
you will get a window like this and you
have to press yes to continue the
installation. After that it will
continue the installation of Comfy UI
and everything it needs to run. You can
take a break for 3 to 5 minutes
depending on your internet speed and
your computer. So how do you know when
it is ready? You will see a message that
says installation complete along with
the time it took. On my PC, it took 247
seconds. After that, you can press any
key to exit. So, let us recap really
quick. From the GitHub page, you
download the zip archive of the easy
installer. You create a folder named
Comfy UI and place the downloaded zip
archive in that folder. You extract the
contents of the archive. You run the
Comfy UI easy install BAT file and if it
asks to run GitHub you press yes. In a
few minutes the installation is complete
and you get this screen. Now after the
installation we can see that inside our
Comfy UI folder a new folder appeared
called Comfy UI easy install. This is
portable which means you can copy this
entire folder and move it to a different
drive or folder and it will still run
Comfy UI. Basically, after you install
any Comfy UI portable version, you
should end up with a similar folder
structure to this. Since Comfy UI is
based on the Python programming
language, you will see many Python files
and BAT files inside these folders,
which are used to run those Python
files. The easy installer will also
create some shortcuts on your desktop.
If you use other versions of Comfy UI,
they might not do this, and you would
need to create the shortcuts manually.
That is one of the reasons I prefer the
easy installer. It makes everything
easier for us. Basically, we just
extracted an archive and ran a BAT file
and we now have Comfy UI. If we right
click on this shortcut and go to
properties, we can see the target of the
shortcut. If we open the file location,
you can see that it is connected to this
BAT file that starts Comfy UI. In other
versions of Comfy UI, the name might be
different like run NVIDIA GPU or
something similar. Are you ready for
your first Comfy UI launch? To start
Comfy UI, you either use this BAT file
called Start Comy UI or from the desktop
you use this shortcut called Comfy UI
easy installer. E and Z stand for easy
and I stands for installer. So double
click on it and when it starts it looks
like this. The first time it will be a
little slower, but after that it will
start much faster. If you are curious by
nature, you can find all kinds of
information about your Comfy UI and your
system when it runs. For example, you
can see what operating system I am
using, what Python version is running,
and where that Python is located, what
the path to the Comfy UI folder is,
where the user directory is, how much
VRAM you have, how much system RAM you
have, and the PyTorch and CUDA versions
that are running. When it starts, this
command window will be minimized to your
taskbar, and Comfy UI will open in your
default browser. The first time it
opens, it will show you some templates
made by Comfy UI that you can load. If
you have run a workflow before, it will
open that workflow by default. So the
workflow you see open is the last one
you used. A workflow is a set of
connected nodes that tells Comfy UI what
to do step by step. Let us close this
for now. You can open these templates or
workflows from here later. Comfy UI is
made of a few main areas. You do not
need to memorize them. I am naming them
so we can talk about the same things
later. If we go to the top, you can see
it says unsaved workflow. Basically, it
is like a document that is empty at the
moment since we did not add any nodes
yet. You can have multiple documents
open similar to what we have in
Photoshop and other programs. We can
click on this plus icon to create a new
blank workflow. All these tabs on top
are open workflows and we can close,
save and edit each one. Now this grid
like empty space is called the canvas.
Instead of drawing on it, we will
arrange blocks or nodes like using Lego
pieces and connect things together to
create a working workflow. You can use
your mouse wheel to zoom in and out on
the canvas. Then we have this top bar.
Depending on what extensions you have
installed, it might look different and
have more options. Things like the
manager or the run button, which lets us
run workflows, are usually here. On the
bottom right, we have view controls. For
example, we have a select tool that lets
us select nodes and a hand tool that
lets us navigate the canvas. You can fit
a workflow in view, but right now the
canvas is empty. We also have different
zoom controls that you can use if you do
not want to use the mouse wheel or if
you do not have one. For me, the mouse
wheel is the fastest and the one I
prefer. Then we have the show mini map
option. This shows a small map that we
can use to navigate when we have very
large workflows. There is also hide
links, but since we do not have any
nodes or links yet, we will see that
later. An important one is the main menu
which you open by clicking on the letter
C, the Comfy UI logo. We also have more
options on the left sidebar for nodes,
models, and workflows, which we will
explore soon. Back to the main menu. If
we click on the C, we get this menu. New
creates a new workflow, but it is faster
to use the plus sign from the top bar.
For file, it allows us to open, save,
and export the workflows we create. For
edit, you can undo actions like moving
nodes or changing something in the
workflow, clear the workflow and unload
models. For view, we can enable and
disable different panels. And we also
have zoom in and zoom out controls. Just
like in Photoshop, we can do the same
things in multiple ways. It is the same
with comfy UI. For theme, you can change
how it looks, but at the beginning, I
suggest leaving it on default so it is
easier to follow tutorials. Nodes 2 is
in beta at the moment of this recording
and still has some bugs, so I suggest
leaving it off until it is more stable.
You can browse templates and open
settings, which we can explore later.
For now, the default settings work fine.
Templates and settings can also be
accessed from here. So again, there are
multiple ways to access the same things.
In some newer versions, some people
might use a newer manager and it might
appear somewhere else instead of here.
For now, I am using the old manager
which appears here. Under help, you can
also find help options, but you will see
later in this video how to ask questions
and get help. We also have a console
sometimes called the bottom panel where
you can see exactly what has happened
since we opened comfy UI. If we look at
the taskbar and open the command window
from there, it shows the same
information. One view is at the bottom
and the other is in the taskbar. To
close Comfy UI, I recommend opening the
command window from the taskbar and
closing it. You will then see a
reconnecting message in the browser
because it cannot find Comfy UI running
anymore. After that, you can close the
browser window. You can also close the
browser window first and then close the
command window. It is time to test our
first ready-made workflow. Later, I will
explain in detail what nodes are and
what they do. So let us start Comfy UI,
wait for it to finish loading and get
the interface. To open a workflow, you
have different options. You can drag a
workflow directly onto the canvas, or
you can go to the menu, then file, and
choose open. All workflows for Comfy UI
have the extension.json.
JSON means JavaScript object notation.
It is a simple text format used to store
and share data in a way that both humans
and computers can easily read. In Comfy
UI, JSON is important because workflows
are saved asJSON files. These files
store all your nodes, connections,
settings, and prompts so you can reload,
share, or edit a workflow later. I will
include these workflows for free on
Discord for those who use a different
Comfy UI version. For example, I can
open this first workflow and you can see
that it opens with all the nodes and
links ready to be used. You can use your
mouse wheel to zoom in and out to see
the entire workflow. You can click
outside the nodes somewhere on the
canvas and drag to move around. You can
also use this hand tool, which I
personally never use. With the hand
tool, you can pan around the canvas.
With the normal mouse cursor, you can
select nodes and move them around. We
will talk more about that later. Now we
have the workflow open in this tab and
you can see its name at the top. With
the X button, you can close the workflow
and go back to a new empty one. If we go
to the sidebar and click on workflows, I
can open this folder called getting
started which I prepared for you for
this episode. Only the easy installer
comes with these workflows. So, if you
are using a different version, you can
get the workflows from Discord. You can
also make the sidebar wider if you want
to see the full text. I added a few
workflows here to test in this episode.
This one is just a help file with notes
and useful information that we will use
later in the video. Let us close it and
open the one with number one in front
called Juggernaut Reborn. If I click on
workflows again, the sidebar collapses.
Now let us move around using the mouse.
Left click and drag to see it better.
Each of these blocks is called a node.
All nodes are connected to each other
using links. Those small cables that go
from one node to another. Usually a
workflow is built from left to right.
When you run a workflow, it processes
from left to right. If something does
not work or the workflow is broken,
Comfy UI tells you where the problem is.
Think of it like a car dashboard. If a
door is open or a light bulb is not
working, you get a warning icon. It is
the same here. Errors look like this. It
says prompt execution failed and it also
tells you something like value not in
the list. These are some of the simplest
errors to fix. It is like the car
telling you a light bulb is missing. In
Comfy UI, it means a specific value,
object, or file could not be found. In
our case, it could not find the
checkpoint name, which is the model
name, the brain as we called it. Comfy
UI workflows include all the nodes and
settings, basically the interface, but
they do not include the models
themselves. Those brain or engine files
are not included. Since workflows are
just JSON text files, they cannot
include images or large files like
models. In this node called load
checkpoint, the checkpoint is just a
model file. The brain we talked about.
Even if I click here, I cannot select
anything because it is not in the list.
That means the model is not downloaded
yet or it is downloaded but placed in
the wrong folder. Since I did not
download any models yet, it is clear I
do not have it. That is why when I share
a workflow with you, I include a note
that tells you exactly what you need to
download for the workflow to work. Not
everyone on the internet does this, but
most good workflow creators do. The way
I organize it is like this. I tell you
where the model needs to be downloaded
and which node loads it. It says load
checkpoint, which is the node name. Then
it tells you the model name you need to
download. There is a button that says
here and then it tells you exactly which
folder to place it in and which folder
to create if it does not exist. That is
enough theory. Let us download the
model. You already saw where it needs to
be placed, but how do you find that
folder? You need to find your Comfy UI
folder. This depends on where you
installed it, on which drive, and in
which folder. You navigate until you
find the Comfy UI folder. In our case,
it is inside the Comfy UI easy install
folder. If we go inside, we see many
folders that Comfy UI needs to run. We
have an output folder where generated
images are saved. We have an input
folder where input images are stored. We
also have a models folder where all
downloaded models go. Inside the models
folder, you can see many subfolders for
different types of models. Over time,
you will learn what each one is for.
That is why I included the note so you
know exactly where to put the model
without guessing. For this workflow, the
model goes into the checkpoints folder.
We could just save it directly there and
it would work. But from my previous
tutorials, I learned that over time, you
will download many models and it becomes
hard to keep track of them. That is why
I like to organize models in subfolders.
In this case, I know this model is based
on stable diffusion 1.5. So, I will
create an SD15 folder and place the
model inside it. Now, we wait for the
model to download. Some models are a few
gigabytes in size. After that, we go
back to Comfy UI. You can see I placed
the model exactly where the instructions
said, so Comfy UI can recognize it. If
Comfy UI was closed, reopening it would
automatically detect the model. But
since Comfy UI is already open, it will
not see the new model yet. We need to
refresh it. To do that, press the R key.
You will see that the node definitions
update. Now, when I click here, I can
see the model name and select it. Right
now, there is only one model, but later
you will have a drop-own list with many
options. Now, the model is selected and
it is time to run the workflow again. By
the way, you can move the run button
anywhere you want on the canvas using
the small dots on its side. If you
prefer it docked, you can dock it back
to the top bar. Let us run it again and
see if it works. Everything turns green.
Each node runs from left to right and no
red nodes appear. That means the
workflow ran successfully and we
generated our first image. The model we
use in this chapter is quite old and
small. Later we will use smarter and
more advanced models. For practice, this
one is good enough because it is fast
and can run on smaller computers that do
not have a lot of VRAM. Each time I
press run, I get a new image because we
have a random seed here. Do not worry
about this yet. I will explain it later.
So now we can generate an image with
comfy UI and all of this comes from a
simple text called a prompt. Basically
we used a few nodes with specific
settings and a model trained for this
type of image generation. We can change
the prompt for example photo of a cat
closeup. Now when I run the workflow I
should get a cat. The more VRAM you have
the faster it will generate. We can see
the generated images here, but they are
also saved locally. If we look at the
output folder, we have a shortcut to it
on the desktop. Inside that folder, we
can see all the images we generated so
far. Let us go back to Comfy UI and
close this workflow. I do not want to
save it because I liked the prompts and
settings it had before. So, I choose no.
Now, we are left with an empty workflow.
or you can click on the plus sign to
create a new blank workflow. Before we
move to the next chapter, I recommend
taking a short break. Research shows
that short pauses help your brain
process and retain new information. Grab
a coffee, get some water, or take a
quick bathroom break, then come back and
continue. This chapter is about
understanding the building blocks of
Comfy UI and how they connect to form a
workflow. We are in Comfy UI and we have
this blank canvas and workflow. To add a
node, you doubleclick on the canvas and
it will open a search box that lets you
search for a node. For example, if I
type the word load, it shows me load
image, load checkpoint and all kinds of
nodes that let us load something. If we
click on the load image node, it will be
added to the workflow. The position
where it is added depends on where you
doubleclick on the canvas. You can also
move it after. You just leftclick on a
node, hold the left mouse button, drag
it to where you want it, and then
release the button. To deselect a node,
you just click anywhere on the empty
canvas. For me, that is the fastest way
to add a node. But there are other
methods. For example, I can write click
on the canvas in an empty area and get
this menu. From here, I can go to add
node. Then I see different categories.
If I click on the image category, I can
find the load image node. It is right
here. And if I click on it, it gets
added to the canvas. After that, you can
move it and arrange it wherever you
want. Another method is to use the node
library in the left sidebar. Here we
have all these categories. If I click on
the image category, I can see the load
image node. This is a good option if you
do not know exactly which node you are
looking for. You can also search for a
node here to filter the list, then add
the node or drag it onto the canvas. Out
of all these methods, my favorite is
still the double click on the canvas.
Once a node is selected, you also have
the option to delete it using this icon.
You can also use the delete key or the
backspace key to delete a node after you
select it. The load image node is how we
bring an existing image into Comfy UI so
other nodes can work with it. Each node
has a title at the top that tells you
what it does. Below that, it has
controls, inputs, and outputs that
connect it to the rest of the workflow.
Let us double click on the canvas again
and add another node. This time, I will
search for crop. And we get this node
called image crop. You can probably
guess what it does. It crops the image
that we loaded. You can change the image
using this button and upload any image
you want. If something goes into a node,
it is called an input. If something
comes out of a node, it is called an
output. The load image node has two
outputs but no inputs because the image
comes directly from your computer, not
from another node. The image crop node
has one image input and one image
output. It receives an image, modifies
it, and then sends out a new image. If
we leftclick on one of the outputs from
the load image node, we can drag a
connection or a cable to the next node
and connect it. Because the output and
the input have the same color and the
same name, it is easy to see that they
belong together. In most cases,
connections work between the same colors
and different colors usually do not
connect. There are a few special cases,
but we will talk about those later. Now,
if I try to connect the green output, it
does not connect. That is because the
green output is a mask and the input on
this node expects an image which is
blue. In the beginning, this color
system helps you quickly understand
which nodes can be connected. If two
nodes cannot connect, it usually means
they are not meant to be connected.
Sometimes you will also find nodes that
act like adapters or converters. These
nodes take one type of output and
convert it into a different type so it
can be used by another node. Now
basically we have a workflow but is the
workflow complete? How can we test it?
It is simple. We run it and see what
message we get. In this case it says the
prompt has no output. Even if you do not
understand exactly what that means yet
try to figure it out from the words. We
do not have an output node. So let us
close this message. If we look at the
workflow, the image is loaded from the
computer. Then it goes into the image
crop node which crops the image. But
after that nothing happens. There is no
output. Think of this like editing a
photo in Photoshop. You load an image,
crop it, but if you never save or export
it, the work exists, but you do not get
a file. In Comfy UI, the save image node
is the export step. So let us
doubleclick on the canvas again and
search for save. We have this save image
node. We can see that the image output
color matches. So we can connect it.
Even if the label says image or images,
it still works. Now let us make the
connection. If we run the workflow, it
will process from left to right. The
image is loaded, cropped, and then saved
in the output folder. We can also see it
directly inside the save image node. All
nodes can be resized using the corners.
You will see small arrow indicators on
the corners. You can click and drag to
resize a node. In this case, I want to
see the image preview bigger, so I
resize the node. To remove a connection,
you can leftclick on the output dot,
drag the cable out onto the canvas, and
it will disconnect. You can also
leftclick on the small dot in the middle
of the connection and choose delete. You
also have the option to add a reroute. A
reroute is like an extension cable or a
cable organizer. It does not change the
data at all. It only helps you route
connections more cleanly and keep your
workflow readable. From that reroute
node, you can add another link to
another node if you want. You can also
have multiple reroutes on the same link,
so you can arrange nodes, links, and
reroutes in a way that looks visually
clean or helps you see faster which node
connects to which node. This is very
helpful when you have a lot of nodes in
a workflow. To remove a reroute, you
just select it and press the delete key.
Let me arrange them and remove all links
so we can see it better. So in Comfy UI,
we have different types of nodes. First,
we have nodes that only have outputs.
These nodes usually load something from
outside Comfy UI like a file or some
text. In this case, the load image node
loads an image from your computer. So it
does not need any inputs, only outputs.
Then we have nodes that have both inputs
and outputs. These nodes are usually
placed in the middle of a workflow. They
receive something from one node, process
it, and then pass the result to the next
node. Finally, we have nodes that
usually sit at the end of a workflow.
These nodes only have inputs, and their
job is to show or save the result, for
example, by previewing an image or
saving it to disk. There are also nodes
that do not have any inputs or outputs
at all. For example, if I search for a
note node and add it to the canvas, you
can see that it is onlyformational.
These nodes are used to write notes and
make workflows easier to understand and
remember. They do not affect the
workflow at all and are just for
organization and clarity. On the top
left side of a node, next to the title,
you have a small gray dot. If you click
it, the node collapses, similar to
minimizing it. I often do this for nodes
where I already know the settings and do
not need to change them. Collapsing
nodes helps save space and makes the
workflow easier to read. If you right
click on a node, you get a menu called
the node context menu. This menu shows
options related to that specific node.
Each node has a slightly different menu
depending on what that node does. In
this case, we have options like opening,
saving, and copying the image, different
properties, resize, and colors. We can
also collapse the node from here instead
of using the gray dot. And there are
many more options. Try a few of them. If
you do not like what you did, you can
undo it with controll + z. You can also
change the title of a node. If you
double click on the node title, you can
rename it to anything you want. This
does not change how the node works at
all. It is only for your own
organization and to make the workflow
easier to understand. You can also
rightclick on a node and choose title to
rename it. This is another way to change
the node name. We already know that we
can move a node around once it is
selected. But when a node is selected,
you will also see a small floating bar
at the top. From here, you can delete
the node using this icon. You can also
click on this dot to change the node
color. This lets you choose from
different colors, which is useful for
organizing your workflow or grouping
nodes by function. Changing the color
does not affect how the node works. By
default, there is no color, the gray
one. This small eye icon is the node
info. If you click it, a properties
panel opens on the right with more
information about the node. Here you can
see what the node is supposed to do and
what the values mean. From this icon,
you can also close the properties panel.
All nodes, especially the default comfy
UI nodes, should have some kind of info
unless it is a custom node and the
creator did not add any documentation.
You can drag this side panel and resize
it the way you like. Here you can see
all the information about the image crop
node and what each setting does. Let us
close it for now. If you hover over an
icon, you can see more information about
what it does. For example, because this
node works with images, it lets you open
it in the mask editor. If we click on
it, you can see that it opens in the
mask editor. This will be useful later
when we do inpainting and image editing,
but that is for another episode. For
now, it is enough to know that it is
here and you will learn more about it
over time. These numbers are just
settings that you can change in Comfy
UI. They are called parameters.
Parameters control how a node behaves
and how it processes its input. Let me
reconnect the nodes. So, we have a
working workflow again. Now, if I run
it, you can see what these parameters
actually do. We are cropping a 512x 512
pixel area from the original image
starting from the X and Y coordinates
set to zero. That means the crop starts
from the top left corner of the image.
So basically we are taking this small
corner from the original image. The
original image was 1,024x
14 pixels and now the result is 512x 512
pixels. Even if it looks bigger here in
the node preview, it is not actually
larger. That is just the preview size.
The real image resolution is smaller. So
let us change some parameters or
settings or however it is easier for you
to remember them. Values is also fine.
And run it again. Now we have a
different crop. Let me speed up the
video while I try different values so
you can see how the result changes.
Let us remove this middle node, the
image crop node. Once it is removed,
something interesting happens. Comfy UI
tries to keep the workflow connected and
automatically reconnects the nodes
directly. That happens because the
output of the first node and the input
of the last node use the same type. If
the first and last nodes had different
input and output types, the connection
would disappear when the middle node is
removed. Now let us doubleclick on the
canvas. You should remember by now that
every time you see this search bar, it
means I doubleclicked on the canvas. Let
us search for invert and select invert
image. This node has an image input and
an image output. But it does not have
any parameters. That is because this
node is designed to do one specific
thing. Invert the image. Even without
parameters, the node still performs a
function. Let us connect this node into
the workflow. Watch what happens when I
connect it to the input. You can see
that the previous connection is removed
automatically. That is because an input
can only have one connection at a time.
An output on the other hand can connect
to multiple nodes. You can think of it
like electricity. An output is like a
power strip. It can send power to many
devices. An input is like a wall socket.
It can only accept one plug at a time.
If we run the workflow now, we can see
that the result is an inverted image. So
until now with these small workflows, we
did not use any AI. We only used simple
nodes like simple code to modify images.
We will see more in the next chapter
when we build a bigger workflow that
uses stable diffusion to generate an
image from text. But these small steps
help you understand how things work. At
least I hope they do. You can always ask
any questions you have on Discord as we
will have a special section for this
episode on the Discord forum. So we
learned that save image is usually the
last node in a workflow since it does
not have any outputs. And because the
output is an image that goes to disk,
not to another node. But that does not
mean we cannot continue the workflow. It
only means we cannot continue from that
node. We can still continue from the
previous node which has the same image
just not saved to disk yet. Let us clone
this node and use it again. One simple
way is to press the alt key and drag a
copy of this node where you want it. Let
us delete it and try again. Now we will
use controll + c to copy. And when I use
controll +v it will paste that node
where the mouse cursor is. Let us delete
it again. And now let me show you
another shortcut. Press the control key
and make a marquee selection over the
nodes you want to select. Now we
selected two nodes. With Ctrl + C, we
copy all selected nodes. And with Ctrl
+V, we paste them. If you click and drag
from any of the selected nodes, you can
move them together. If you press delete
while both are selected, it will remove
both. If we use controll + shift +v, it
will paste the nodes together with the
links they had in the workflow. Now we
have this extra link here. So basically
from one image we got two invert image
nodes and both do the same thing invert
the image. If the invert image node had
more parameters we could change the
settings in one and get different
results. Let us delete those again.
Practice this a few times. Press control
and select the nodes. Presstrl + c to
copy and ctrl + v to paste. Move them
into position. Now look at what I am
doing. I am continuing the workflow from
the last invert image node and then I
save the result. A workflow can have
many branches like a tree. The root
starts with the image. Then it inverts
the image and from there on another
branch it inverts it again. Can you
guess what happens when I press run? The
image is inverted again and looks like
the normal image. The original image is
inverted. Then that inverted image is
inverted again and we get the original
result. We can continue the workflow
even more. From the image that was
inverted twice, we connect it to an
image crop node. Now, instead of double
clicking and searching for a node, we
can drag a connection and release it.
When we do that, a context menu appears
with suggested nodes. From here, I can
easily pick the save image node and it
is added already connected. Let us
delete it and try again. Drag the
connection and release it. Then select
search. When I select the save image
node, it is added already connected. Let
us place the nodes properly and run the
workflow. You can see how many
operations are now in this workflow.
With a single image, we can invert it,
invert it again, crop it, and save it.
This is similar to a small program or an
action in Photoshop, but with more
control and much more flexibility. There
are nodes for images, audio, 3D, and
many other things. This is where you
start to see the power of comfy UI. Now
let us select everything. Hold controll
and drag to select all nodes. You can
press delete to remove everything or let
us cancel that and do it another way. Go
to the menu then edit and choose clear
workflow. It will ask if you want to
clear it. Click okay. And now we have a
blank workflow again. Do you like math?
I know you do not like it, but I just
want to show something quick to see the
different things it can do and help you
understand Comfy UI better. Double click
on the canvas and search for math. You
will see a few math nodes. If we look on
the right, you can see different names
like Comfy UI core, KJ nodes, easy use.
These names are the names of custom
nodes or extensions. By default, Comfy
UI comes only with the nodes you see
under Comfy UI core. With the easy
installer, you also get a few extra
custom nodes already installed. We will
talk more about custom nodes later when
we get to the manager. If you use the
easy installer like I showed at the
beginning of this video and install the
same version, you should have the same
nodes. So again, Comfy UI core nodes are
made by the Comfy UI team. Let us get
back to the math nodes. We will start
with something simple called math int.
int comes from the word integer which
means whole numbers. This node works
only with whole numbers like 1 2 10 and
so on, not decimals. All custom nodes
have an extra label on the top right
that shows which custom node pack they
belong to. This makes them easy to spot
compared to built-in nodes. These math
nodes are used for simple calculations
similar to a calculator. I personally do
not use math nodes very often, but they
can be very useful for automation. For
example, you might load an image, read
its width or height, and then use math
nodes to calculate a new value based on
that size. This allows you to
automatically adjust things like
resolution without manually changing
numbers every time. In this case, we
have letter A, letter B, and an
operation. The default operation is add.
Let us set A to five. and B to three.
For the operation, we will leave it on
add for now. Let us add another node
that I use often called preview as text.
You can see it comes with comfy UI. This
is one of those special nodes I
mentioned earlier that can be connected
to almost anything. Even if other nodes
cannot connect directly, this node will
convert the value to text and display
it. If I run the workflow, you can see
the result. Even though they look the
same, one is actually a number and the
other is a text display of that number.
This makes more sense if you have coding
experience, but we will not get into
technical details here. What is
important to remember is that we can use
this node to see a result as text. It
also has options for how the preview is
displayed. Let us move this node down
and make a copy of the math in node.
Now remember, we have a result in this
node, but it is not visible unless we
use a node to preview or save it. Here
is something interesting. We do not see
any inputs on this node, but when we
drag a link to it, two input dots
appear. This means we can actually
connect values directly to these fields.
You will see this behavior with many
nodes that have number fields or text
fields. We can copy a value from an
output and feed it into these fields to
use it in the workflow. Notice how the
field for letter A is grayed out. That
means it is no longer using a manual
value. Instead, it is taking the value
from the previous node which is 8. Now
let us change the operation to multiply.
We now have 8 multiplied by 3. Let us
add another preview as text node to see
the result. When we run it, the result
is 24. As expected, let us remove that
preview node and arrange the layout.
This small workflow does something
simple. It adds two numbers and then the
result is multiplied by three. That
three could also come from another node
and so on until you build more complex
workflows. If I change the value to four
and run it again, we get the correct
result for that formula. I hope this was
not too much math. Now let us select the
middle node that does the multiplication
and right click on it. We have a
function called bypass. When we enable
bypass, the node is temporarily ignored
as if it is not part of the workflow. By
the way, you can also access bypass
quickly from this icon when the node is
selected. Now, if I run the workflow,
you can see it ignored that node and
only did the addition. If I enable it
again and run the workflow, it takes
that node into account again. You can
see that the node changes color and
becomes purple and semi-transparent.
This visual change tells us that the
node is deactivated. Bypassing a node is
useful when you want to test a workflow
without removing the node completely.
Hold the control key and select all
three nodes. Now, if we rightclick on an
empty area of the canvas, we have the
option to add a group. If we choose add
group, it will create an empty group.
But since we already selected the nodes,
it is better to choose add group for
selected nodes. This creates a group
that contains all those nodes. You can
think of a group like a folder that
holds multiple nodes together. One very
important thing to remember is that if
you want to move the group with all the
nodes inside, you need to drag it using
the group's top bar. If you select and
move an individual node, it will move
outside of the group. Groups can also be
resized. You can see a small triangle in
the corner that lets you change the size
of the group. If you rightclick on a
group, you also have the option to
bypass all the nodes inside it. This is
very useful when you have multiple
workflows on the same canvas. For
example, you can deactivate one workflow
and enable another so only one workflow
runs at a time. This becomes important
as workflows get more complex and models
get larger because running multiple
workflows at once can require more
resources than your system can handle.
If you double click on the group title,
you can change the group name. Enough
with math. Let us work a little with
text as well. When we use AI, we give it
prompts. And sometimes it helps to
combine text from different sources to
get a better prompt.
Now I am searching for concat. And you
can see that there are multiple nodes
with similar names. That is because
concatenate is a general concept and it
exists for different data types. This
one here, concatenate, works with
strings, which means text. It simply
takes multiple pieces of text and joins
them together into a single string. That
is why I added this cat made from
multiple pieces joined together to make
it easier to remember. Even if you
search for cat, you can easily find the
concatenate node. Let us add it to the
canvas. For example, for string A, I add
the word home and for string B, the word
car. When I connect them, the output
becomes a single piece of text, a
string. Let us drag a connection from
that string and search for a node that
can preview it. We can use the same
preview as text node again. Now, because
the first workflow is bypassed, it will
only run this workflow with concatenate.
You can see how it joined those two
words, first home, then car. We can also
use a delimiter. For example, I can add
a space here and run it again. Now the
result has a space between the words or
I can add a comma and a space. And now
the result looks like proper text with
separation. Let us move this node down
and hold alt and drag a copy of the
concatenate node. You can move nodes
around to make room so the connections
are easier to see. Here we have these
text fields. And like you saw with the
math nodes, we can connect outputs
directly into these fields if they are
the same type. When I drag a link from
the string output, you will see input
dots appear showing that I can connect
there. I can connect to the first field,
the second field or even the delimiter.
Let us connect it to the first field. So
now the home and car result becomes the
first input. And for the second field,
let us add the word flower. I will hold
alt again and drag another duplicate
since that is faster. Then I connect it.
Can you guess the result? We now get
home, car, and flower. So basically this
is how people create workflows. You
connect nodes like Lego pieces. Some
nodes can be connected together because
they share the same input and output
types and you get a result. Over time,
you can build more complex workflows
that can save you a lot of time. Let's
add another node. Double click on the
canvas and search for primitive. This
node is called primitive because it
represents the most basic types of
values. Things like numbers, text, and
true or false values are considered
primitive values. The primitive node is
used to manually create a value inside
comfy UI instead of getting it from
another node. You can use it to type in
a number, write some text, or define a
simple value that can then be connected
to other nodes. Think of it like writing
a value by hand and injecting it into
the workflow. You can see here it says
connect to the widget input. So we can
drag a link from there. Now you can see
we have a lot of inputs where we can
connect this value. If we look at this
text, notice what happens when the
connection is complete. It changes to
the type of value that was connected, a
string. Now we can manually insert any
text value there. When we run the
workflow, the result will include that
value. This is useful because sometimes
you want to use the same value in
multiple nodes. Instead of typing it
manually each time, you add a primitive
value once and connect it to all the
inputs that need it. Let us right click
on this group. Usually nodes have a
bypass option to disable or enable them.
But for groups, this option is called
set group nodes to always. Now the nodes
inside the group are enabled and we can
run that workflow if we want. Let us add
another primitive node to see how it
adapts. Last time when we connected a
primitive node to a text field, it
automatically converted the value into a
string because that input expected text.
Now if we connect a primitive node to a
math int node, it adapts differently.
This time it is converted into an
integer value. You can see that now we
can only enter numbers. It does not
allow text because this node expects an
integer. Right now the value is set to
five. Let me resize the node so we can
see it better. You can clearly see that
this one is an int and the other one is
a string. If we change this number and
run the workflow, Comfy UI will rerun
all the workflows on the canvas using
the new values. In this simple example,
it runs almost instantly. But later when
we use larger models, you will see that
some workflows can take minutes to
generate instead of seconds. So if you
look at all the nodes in these
workflows, you can clearly see that we
use this easyuse custom node and all the
rest do not have that label. That means
they come with comfy UI by default. Let
us go to the menu then edit and choose
clear workflow. Now I want to do a quick
recap just to make sure you assimilated
some of the basics. We double click on
the canvas to bring up this search
option. So we can search for nodes. You
type a word to search like load and then
you can select a node. For example, the
load image node. This node is used to
load an image from your computer. If we
click choose file to load, we can
navigate our computer and load an image.
By the way, I asked EVO to include some
images for this first episode in the
input folder. So you can have the same
images I am using. The path is Comfy UI
easy install Comfy UI input. Let us say
I select this helmet but it can be any
image. We can choose open or we can
double click on the image and it will
open. Now that the image is loaded, let
us add another node, the image crop
node,
and connect it from left to right from
output to input. To remove a connection,
you just drag from the output and
release it somewhere on the canvas. It
is like unplugging a cable and leaving
it on the floor. Let us redo the
connection. You can also click on the
small dot in the middle of the
connection and choose delete and it does
the same thing as unplugging. Let us
connect it again. We can hold control
and select multiple nodes by dragging
with the left mouse button pressed. This
lets us select multiple nodes at once.
You can also hold control and click on
the nodes one by one. If you select a
node by mistake, just click on it again
to deselect it. Once nodes are selected,
we can move them together. If you plan
to move them a lot, you can also add
them to a group. If you click on the
canvas and drag, you are moving the
canvas itself. This is useful when you
have long workflows and want to see
different parts of the workflow. Let us
move these nodes to the left. Now,
double click on the canvas and add a
node called preview. This time, it is
not preview as text. We could add that
too, but it would show numbers. Here we
add preview image. This node is similar
to save image, but it does not save the
image in the output folder. It is useful
when you just want to preview parts of a
workflow and do not need to save the
image. If you like the image, you can
still save it. You can rightclick on the
image and choose save image. Then save
it anywhere you want on your hard disk.
Let us cancel that and remove the
preview image node. This time let us add
a save image node so we can save the
result and then run the workflow again.
Now the result is saved. Let us go to
the desktop. The easy installer comes
with a shortcut to the output folder. We
double click on it and now we are in the
output folder. You can see the path at
the top. Here you can see all the images
generated with Comfy UI. These images
are saved in this folder. You can delete
them, move them to different folders, or
organize them however you want. I
usually pick the images I need, move
them into the project folder, and then
delete the rest because over time you
will end up with thousands of images.
Here we can see the helmet image we just
generated, but not the previous one from
the preview image node. If we go back
one folder to the main Comfy UI folder,
we can see a temp folder. Comfy UI uses
this folder to store preview images
temporarily. The contents of this temp
folder are deleted when you start Comfy
UI again. So, you can still recover
preview images even if you did not save
them right away, as long as you did not
close Comfy UI yet. We can collapse a
node using the top left gray dot and
click it again to expand it. Once a node
is selected, it has multiple options at
the top. One very useful option is the
info icon, which gives you more
information about the node. You can
close the properties panel from here. We
can change the color of the node. And
this symbol here is for subgraphs.
Subgraphs are a bit more complex. So
maybe in a later chapter or another
episode, we will talk more about them.
This arrow lets you bypass the node. And
if you click on these dots, it shows
even more options for the node. For
example, you can change the shape,
change the color, or pin the node so it
is fixed and cannot be moved. If we move
these nodes apart to see the links, we
can also hide the links from here. Be
careful with that because it can look
like no nodes are connected. I never use
this option because I like to see how
nodes are connected. It helps me
understand the workflow better. If you
do not like how the links look, there
are ways to change their shape. I like
the default look, but some people prefer
other options. If we go to the bottom
left or open the menu and go to
settings, we can change this. Let us
click on settings. Here we have many
settings we can change. Let us search
for link since we want to change how
links are displayed. You can see the
current one is called spline under link
render mode. If we change it from spline
to straight and close the settings, you
can see the links are now straight, but
they still adapt when you move the
nodes. Let us go to settings again and
change it to linear. Now the links are
always straight lines.
Let us change it back to the default
which is spline. Now let us remove all
the nodes.
I personally prefer the spline view
because it reminds me of sci-fi scenes
with lots of cables hanging around.
Let us double click on the canvas and
add a load image node again. Now let us
add another node and search for upscale
image. We will select the node called
upscale image by and move it closer so
we do not waste space on the canvas.
Then we connect it to the workflow and
add a save image node at the end. So we
have a complete workflow. If we run this
now we get the same image as before.
That happens because some nodes have
default values that do not change
anything. They only start doing
something once you change their
parameters. In this case, the upscale
value is set to one. Scaling by one is
like multiplying a number by one. You
get the same result. Now, let us
increase the scale by value to two. When
we run it again, the image is upscaled
by two times. So, we get double the
resolution. This is similar to resizing
an image in Photoshop. In this case, it
does not use AI to add new detail. It
simply enlarges the image. These upscale
methods are different ways of resizing
an image. Nearest exact copies pixels
exactly, so it is very fast and keeps
hard edges, but it can look blocky.
Bilinear smooths pixels together, giving
a softer result that can look slightly
blurry. Area is mainly meant for
downscaling and is not ideal for
upscaling images. By cubic uses more
surrounding pixels to produce smoother
and better looking results. Lancos
preserves detail and sharpness the best,
but it is slower than the others.
Let us say you do a lot of changes to a
node like titles, values, and colors,
and you forget how the default values
were. You can add the same node again
and redo all the values and connections,
or you can rightclick on the node and
choose fix node, recreate. If you select
that, the node goes back to its default
state. If you rightclick again, you also
have the option to clone the node and
move that clone wherever you want. You
can also do this faster by holding alt
and dragging the node. You can remove a
node from this menu, but pressing the
delete key is faster. We also have this
pin option. If you use it, a pin appears
at the top of the node. And now if you
click and drag, the node does not move.
It is pinned to the canvas. To move it
again, you need to rightclick and select
unpin. Sometimes when you get workflows
from the internet, some people stack
many nodes on top of each other and pin
them. It can look like there are only
two nodes, even if there are 10 nodes
behind one. I do not recommend doing
this. If you do not want people to use
your workflow, just do not share it.
We also saw that we need to change
values on some nodes for them to work.
If we bypass a node, the workflow still
runs and ignores that node. The links
are still there and the connection
passes through the node. This is
important because there is another mode
where the connection does not pass
through. Let us enable the node again
using bypass. Now right click on it and
go to mode. You will see the option
always which means the node is active.
There is also an option called never.
This mutes the node and makes it behave
as if it does not exist. You can see
that the node is now gray, not purple
like bypass. When we run the workflow,
we get an error. That is because the
node is not passing anything through. So
the next node does not receive the image
it expects. It is like cutting the cable
where that node was. Let us remove that
node and delete the link as well. If we
run the workflow again, the result is
the same. The image is missing because
there is no connection. Let us go to the
menu then edit and click undo. You can
also use Ctrl + Z multiple times until
you get back to the state you want. I
will undo until everything is active
again. Let us zoom out with the mouse
wheel and add a group. Name your group
in a way that explains what the nodes
do. Do not name it something generic
like my group. You can move the group
around and you will notice that it works
like a magnet. If a node is inside the
group area, it stays inside. Let us make
the group larger and move it around. So
you can see that nodes are sticking to
it. Now adjust the group size and move
the nodes so it looks cleaner. Workflows
can get quite large, so I like to
optimize the space to make them easier
to read and navigate. Once the nodes are
positioned, hold control and select all
the nodes. You will see that only the
nodes are selected, not the group. Now
rightclick and choose fit group to
nodes. The group will resize to fit the
nodes tightly. Now we can move the group
and it does not take much space. Groups
can have more options especially when
using custom nodes like RG3. If we go to
settings in Comfy UI, we have general
settings, but we also have settings for
custom nodes. For example, for RG3, we
have extra settings here. I can click
this button to open them. RG3 is
installed when you use the easy
installer.
If you install Comfy UI manually, you
need to install RG3 from the manager. We
will talk more about custom nodes later.
You can also access the same RG3
settings directly from here, which is
faster. If we scroll down, we have
settings for groups. For example, there
is an option called show fast group
toggles in group headers. Let us enable
it. You can choose when to show it
always or on hover. I will leave it on
hover and save the settings.
Now when I hover over the group, you can
see extra buttons in the top right. From
here, we can bypass all the nodes in the
group easily. We also have an option to
mute the group. This is similar to
setting nodes to never. When a group is
muted, the nodes inside it do not run at
all and the workflow behaves as if they
do not exist. Bypass still lets the
workflow run through the nodes. Mute
does not. Let us make the group active
again. We can also run the workflow
using the play button on the group. We
can change values, for example, use
smaller values to get a smaller image,
maybe half the size. There are many more
things you can do with groups and switch
style nodes, but we will cover those in
later episodes. I told you at the
beginning to leave the nodes to option
turned off. At the moment of this
recording, it is still in beta and has
some bugs. Maybe over time they will fix
everything and it will become stable. If
I activate it, you can see that it
changes how the nodes look. For most
nodes, things still work in a similar
way, but this change exists. So, they
can add more functionality to nodes.
With the current system they use, there
are limitations in what nodes can do.
And the new node version should give
them more possibilities to build better
nodes. Instead of the gray dot, you get
an arrow that points down and then to
the right when the node is collapsed.
The inputs are placed on the edge of the
node and some nodes have more options.
For example, in load image, you can see
previews of images from the input
folder. And you can also browse for
another image on your disk. However, for
older workflows, this can slightly
change node sizes and mess up the
layout. Some nodes might not work yet
until the node creators update them.
Because of that, until everything is
fixed and stable, I recommend leaving
nodes 2 turned off. Just a quick
reminder that from mode you can mute a
node by setting it to never. This is
useful when a workflow is big and has a
lot of branches. You can mute a branch
of that workflow and it will still run
without errors as long as there are no
nodes after the muted ones that expect
an input. To turn the node back on, you
go to mode and choose always. We also
have shape options for nodes, but these
are only decorative. They just change
the corners of the node. By default, the
corners are rounded. There is also the
card option which rounds only two
corners. Personally, I do not think it
is worth spending much time on this. To
remove a group, you can select it and
press the delete key or you can write
click on it, choose edit group, and then
remove. This only removes the group
container, not the nodes inside it,
unlike folders in other systems. Let us
select these two nodes while holding
control and press delete to remove them.
Then add an image crop node and connect
the link to it. After that, let us add a
preview image node. Since we are only
testing with these settings, we get a
crop from the top left corner of the
image. Let us arrange the nodes. Then
hold control, select both nodes, and
press control + C to copy and then plus
shift + V to paste them with the links.
Since we have them copied, let us paste
again to get a third branch. Right now,
all the settings are the same. So, all
three give the same result. We can
change the x coordinate on one to get
the bottom right corner of the image.
For another one, let us change the
y-coordinate to get the bottom left
corner. Now, when we run it, we split
the image into three pieces. You could
add another one for the bottom right
corner to get the missing part. That is
homework for you to figure out the
correct coordinates. Now if we change
the input image and run it again, you
can see how useful this can be. In a
later episode, we will learn how to load
multiple images from a folder and
automate this process so we can apply it
to all images in a folder. Now select
all the nodes in the workflow using the
shortcuttrl + a and then press delete to
remove everything. It is time for
another short break. This has been a
long chapter and I want to make sure you
have time to absorb the information.
Take a few minutes, press pause, get a
drink, or step away from the screen, and
then come back. Now, I do not know what
learning method works best for you, but
I can tell you one method that usually
works very well for video tutorials.
First, watch the entire tutorial from
start to finish without stopping too
much. This helps you build a general
understanding of what is possible and
how things fit together. Then watch it a
second time, pause the video and follow
along step by step inside Comfy UI.
After that, try to repeat the same steps
without the tutorial playing just from
memory. Once you are comfortable, start
experimenting.
Try changing nodes, parameters, or
settings that were not covered in the
tutorial. And if something does not work
or you get stuck, that is completely
normal. You can always go back to the
tutorial, rewatch a part or ask
questions on Discord. Learning Comfy UI
is not about speed. It is about
understanding.
It is time to build a workflow from
scratch. But first, let us go to
workflows. Open the getting started
folder and open workflow number one, the
one we used in a previous chapter. What
I want to do now is give you an analogy
so you can understand what is happening
here with all these nodes connected. So
it makes more sense. I will use a note
node and add some info next to each
node. You do not have to do that. Just
watch and pay attention. I will open the
same workflow but with those notes added
next to each node and I will explain
each one in detail. You probably noticed
by now that when we generate images with
AI, we usually download a file called a
model. Sometimes people call it a model
and sometimes they call it a checkpoint.
In practice, they usually mean the same
thing. A model is the trained AI itself.
It contains everything the AI learned
during training like styles, shapes, and
how images are formed. The word
checkpoint comes from machine learning.
During training, the model is saved at
different points in time called
checkpoints. Those checkpoints are what
we download and use. So when you hear
model or checkpoint, you can think of
them as the same thing, the trained AI
file that does the image generation. In
Comfy UI, you will often see the term
load checkpoint, but what you are really
doing is loading the model you want to
use. We can think of the model as the
photographer we want to hire. The load
checkpoint node is the step where we
actually hire that photographer.
Depending on what the photographer
learned during training, they will be
good at different types of photos. That
is why there are so many different
models available. Just like in real
life, some photographers specialize in
portraits, others in landscapes, macro
photography, or food photography. AI
models work in a very similar way. The
better and more complex the training of
a photographer, the more expensive they
usually are. In our case, that cost is
not money, but computer power. Larger
and more advanced models usually need
more VRAM and a stronger graphics card
to run properly. To keep things simple
for now, we are hiring one photographer.
We will use a model called Juggernaut
Reborn. And this is the photographer
that will generate our images. So now
that we hired the photographer, what
comes next? We need to give instructions
to that photographer about what we want
to get and what we want to avoid. These
instructions are called prompts. We
usually use a positive prompt to
describe what we want to see in the
image and a negative prompt to describe
what we want to avoid. In Comfy UI, we
use the same node for both. I just
colored one green for the positive
prompt and one red for the negative
prompt so they are easier to recognize.
The node we use is called clip text
encode. This node takes our written text
and translates it into a form that the
model can understand. In simple terms,
clip text encode acts like a translator
between human language and the AI. It
turns words into instructions that the
photographer can follow during the photo
shoot. Besides giving instructions on
how the photo should look, we also need
to decide how big the photo will be. For
that, we use the empty latent image
node. This node is like choosing an
empty photo paper before taking the
photo. Here is where we decide the width
and height of the image. We are defining
the size of the photo before it even
exists. At this stage, there is still no
image. It is just an empty space where
the photo will be created. Once the
photo shoot happens, the final image
will always respect the size we set
here. Now, it is time for the photo
shoot. The case sampler node is the
photo shoot itself. This is where the
photographer follows the instructions
from the prompts and uses the empty
photo paper to take the photo. Each
different seed is like taking a new
photo of the same scene. The idea is the
same, but the result is slightly
different every time. The K sampler
controls how the image is generated. It
decides how many steps the photographer
takes, how much randomness is allowed,
and how closely the final photo follows
the instructions. You do not need to
understand every parameter right now.
What matters is that the K sampler is
the core of the workflow where the
actual image creation happens.
Everything before the K sampler prepares
the photo shoot. Everything after it
finishes the photo. After the photo
shoot, the image is created, but it is
not visible yet. That is because the K
sampler does not produce a normal image.
It produces something called a latent,
which you can think of as a hidden
version of the photo. It contains the
information of the image, but it is not
in a format we can actually view. This
is where VAE decode comes in. The VAE
decode node is like the dark room in
photography. The photo already exists,
but it still needs to be developed to
become visible. So, the VAE decode takes
that latent result and converts it into
a real image that we can see, preview,
and save. Without this node, the
workflow can still generate something,
but you would not be able to view the
final photo because it is still in that
hidden latent form. And finally, the
save image node is where the finished
photo is delivered to the client. After
the VAE decode step, we usually add a
node that either previews or saves the
image. Preview nodes let us see the
result inside Comfy UI, while the save
image node writes the final image to
disk. Without one of these output nodes,
the workflow has no final result. In our
photo studio analogy, this is the moment
where the developed photo is either
shown to the client or delivered as the
final file. Now, let us zoom out and
look at the entire process. First, we
load a model from our disk. This is like
hiring a photographer. Then, we give
instructions. The positive prompt
describes what we want. For example, a
close-up portrait of a pet. The negative
prompt describes what we want to avoid.
For example, saying we do not want dogs.
Next, we decide how big the photo should
be using the empty latent image node.
This is where we choose the size of the
photo before it is taken. Now, let us
run the workflow. You can see that all
these instructions are passed into the K
sampler where the image is actually
created. The K sampler is the photo
shoot. It uses steps and different
settings similar to camera settings like
shutter speed or aperture to decide how
the photo is taken. After that, the
image goes through VAE decode where it
is converted from latent space into
actual pixels. This is like developing
the photo in a dark room and finally we
save the image. This is when the
photographer delivers the finished photo
to the client. Every image generation
workflow in Comfy UI follows this same
basic idea even when it becomes more
complex. Let us do some quick
experiments. What happens if I change
the negative prompt the instructions
where we say what we want to avoid. For
example, if I say I do not want a cat,
it will probably give me another pet
that is not a cat and we might get a dog
instead. If we run it again, it is like
taking another photo of a pet because
the seed is random. Now we can change
the seed to be fixed. When the seed is
fixed, each time we use the same prompt,
the same settings, and the same seed, we
should get the exact same image. If I
try to run it again, you can see that
nothing happens. The result would be the
same. So, Comfy UI does not even bother
to generate it again. If we change a
setting like the seed, then it lets us
generate again and we get a different
image. If we go back to the previous
seed, we are back to the same image we
had before generated with that seed. Now
that we kind of understand how it works,
let us click on this plus sign and build
the same workflow from scratch. Double
click on the canvas and search for load.
Usually it is either load checkpoint or
load diffusion model. But in some cases
there are special loaders for specific
models. Now that we have the node, we
select the model. Since we did not
download more models yet, we only have
one juggernaut reborn. So we hired our
photographer. Now let us give it
instructions. Search for prompt. And we
can find this clip text end code node.
Let us move it next to the other node. I
like to change the color to green for
the positive prompt. Right click and
clone the node or just hold alt and drag
the node to make a copy. For this second
one, let us change the color to red.
Again, this does not influence how it
works. It is the same node. It is just
visual. For the positive prompt, I will
add closeup portrait of a pet. For the
negative prompt, I will add cat. Not all
models use negative prompts. Some older
models like this one still use it, but
you will see later that some newer
models are smarter and do not need a
negative prompt and they work better
when the negative prompt is disabled.
Can you guess how these are connected?
We have clip on both input and output.
So, we can only connect clip to clip. If
we try to drag from the model, you can
see it does not work. And the same if we
try from the VAE. So, let us connect the
clip output from the model to both of
the text encoders. Now we have the
instructions for how the image should
look but we still need to define the
size. Let us search again using the word
empty and add the empty latent image
node. There is also a newer one that we
will use later for newer models but for
this workflow we will use this simple
one. I like to change the color of this
node to purple but you can leave it as
it is if you want. Now we have width and
height. Because we work with computers,
most models work better with values that
are multiples of 64 or 8. That is why we
see values like 512 instead of 500. I
know that this model was trained with
square images at 512x 512 pixels. So I
use these values to get better results.
Some newer models are trained with
larger images and can generate bigger
images. But that comes at a cost. Just
like printing a big photo costs more
than a small one, a bigger image takes
more time to generate and sometimes your
PC cannot handle it. More about that
later. Now, let us add the most
important part where the magic happens,
the K sampler. As you can see, this node
has four inputs where it takes all the
instructions and one output. First, we
connect the model since it has the same
color and name. Then, we connect the
conditions. The instructions are yellow.
Even if the names are different, we
connect the positive output to positive
and the negative output to negative.
That is how it knows which one is
positive and which one is negative even
though they come from the same type of
node. The last input is the empty latent
image which defines the size of the
image we want. Now we have everything
needed to generate the image but it is
still in latent format. We need pixels
to actually see it. So let us drag a
link from the output and you can see
that it suggests VAE decode. We select
it and now the image is decoded like a
dark room where the photo is developed.
Here we also have a VAE input. In this
case the VAE model is included inside
the main model which is why we can
connect it directly. In some cases the
VAE comes as a separate file and then we
use a load VAE node. You will see that
later. Now the last step is to save the
image. So we add the save image node.
Let us run the workflow and see if it
works or if we forgot something. If
everything turns green, it worked
without errors. There are cases where
the image does not look right. Even if
there are no errors, that usually means
some settings are not ideal. People who
create AI models usually provide
recommended settings, especially for the
case sampler. Just like in photography,
macro and landscape use different camera
settings. The same idea applies here. If
we look at the previous workflow, we can
see recommended settings for this model.
Steps 35, CFG7, this sampler, and
thisuler.
So let us change steps to 35, CFG to 7.
For sampler, we use DPM++ 2M and foruler
we use caris. Now let us run it again.
For this seed, we get some small
deformations, but for next seed, it
looks fine. We will see later how to
improve the results even more. Let me
show you what happens when we try to
generate an image that is much bigger
than what the model was trained to
handle. For this example, I will double
the image size. On the first try, I did
not even get a pet. Sometimes you might
get something that looks okay, but most
of the time you will see problems. you
can get strange deformations, things
that do not make sense, or visible
mutations. If I increase the size even
more, these problems become even more
obvious. It also takes more processing
power and more time to generate the
image. The reason this happens is
because this model was trained mainly on
512x 512 pixel images. When we ask it to
generate a much larger image, it
struggles to understand the full space.
You can think of it like the model
trying to generate the image in parts.
One part might look okay, but then it
tries to continue the image next to it,
almost like stitching pieces together,
and that is where things break. That is
why you sometimes see double heads,
repeated objects, or strange structures
in large images. Bigger images are not
always better if the model was not
trained for that size. But if a model is
trained to handle larger images, you can
get more details and better results. Let
us say I add ugly to the negative
prompt. So we push the result toward
more beautiful images. For the positive
prompt, let us be more specific. We want
a dog and we want it to be beautiful.
Now when we run it, we get a more
beautiful dog. Because this model is
really old, like I told you, it is good
for practice. But today, we have much
bigger and more accurate models. They
produce better results with fewer
deformationations, but they are larger
and need more VRAMm to run properly. Our
desktop computers are very similar to
Comfy UI because they are both built
around the idea of connecting
specialized components together where
each one does a specific job. The CPU
acts like the central processor just
like the sampler or the model does the
main work in Comfy UI. The monitor is
like preview and output nodes that show
results. The keyboard and mouse are
inputs just like prompts and parameters.
Printers and speakers are output devices
like save image or audio nodes. Routers
handle communication similar to data
links between nodes. The reason we
design systems this way is because
breaking complex tasks into smaller
connected parts makes them easier to
understand, easier to control, easier to
upgrade, and more flexible. That is
exactly why Comfy UI uses nodes instead
of hiding everything behind a single
button. Now that we know how to create a
workflow, we also need to learn how to
save it. If you look at the top, you can
see it says unsaved workflow. That means
none of these settings or nodes are
stored yet. If you want to reuse the
same workflow later without recreating
everything from scratch, you need to
save it. If I click on this arrow next
to the workflow name, you can see there
are several save options. Personally, I
prefer using the main menu. So, I go to
file and here we have save, save as, and
export. When you click save and the
workflow has never been saved before,
Comfy UI will ask you to give it a name
and choose where to save it. If the
workflow was already saved and you just
made changes, clicking save will
overwrite the existing file with the
same name. Save as lets you save the
same workflow under a different name.
This is the option I use the most,
especially when I want to create
variations of a workflow. Export is very
useful because it is not limited to the
Comfy UI workflow folder. It allows you
to save the workflow anywhere on your
computer, even outside the Comfy UI
folder. The API option is mainly used
when working with online or cloud-based
workflows. So, we will not use it here.
So, let us click export. Now, it asks
for a name. Choose a name that makes
sense to you. Click confirm. Then,
choose where to save it. For example, I
can save it on my desktop. You can see
that the file is saved with theJSON
extension. This JSON file contains all
the nodes, connections, and settings of
your workflow. This file is your
workflow, and you can open it anytime,
share it with others, or modify it
later. JSON files are simple text files.
You can open them with any text editor
like Notepad. JSON stands for JavaScript
object notation, and it is just a
structured way of writing text so both
humans and computers can read it. In
Comfy UI, the JSON file stores things
like node types, connections,
parameters, and settings, all written as
text. That is why workflow files are
small in size and easy to share. They do
not include images or models, only
instructions. If we go to workflows, you
can see I have that folder with
workflows saved there. You can do that,
too.
Go to the menu. Go to file. Choose save
as. Give it a name
and confirm. Now if we go to workflows,
we can see the workflow is saved there.
Right now it is not organized into any
folder. It is just in the main list. But
you can add a folder name in front of
the workflow name when you save it. For
example, folder name, then a forward
slash, then the workflow name. Let us
see where it is saved. Go to your Comfy
UI folder. Then inside the Comfy UI
folder, go to user,
then default, then workflows. Here you
can see your saved workflow and also the
folder I created for this course that
comes with the easy installer. You can
create your own folder manually. For
example, I can create a folder called my
workflows, then drag that workflow into
it. Now, if we go back to Comfy UI,
nothing changes immediately because
Comfy UI usually reads this when it
starts. But we can refresh using this
refresh button. Now our folder appears
there and we can see the workflow inside
it. I suggest organizing your workflows
like this because over time you will
have a lot of workflows and it becomes
hard to keep track of everything. By the
way, you can also use the search bar to
search for a workflow by name. We also
have a bookmark icon. If we click it,
the workflow is added to the bookmarks
at the top. So the ones you use the most
stay there. If you click the bookmark
again, it is removed from the favorites
list. Let us collapse this and I will
show you one more thing. If we go to the
desktop and open the shortcut for the
output folder or if we go directly to
the output folder, you can see all the
images generated so far with Comfy UI.
The last one is this dog. You probably
did not think about this yet, but if you
open an image generated with Comfy UI in
Notepad, you can actually see some code
at the beginning. Just like with
workflows, this happens because Comfy UI
attaches the workflow to the image when
it saves it. After that, there is the
image data which we cannot really read.
This means that every image has the full
workflow embedded in it, including all
the settings and prompts. Let me drag
this image onto the Comfy UI canvas so
you can see what happens. Now you can
see that it loads as a workflow with the
file name. If we generate again, we get
exactly the same image because it uses
the same seed and settings. Let us go
back to the output folder and drag a
different image. For example, this
robot. Now it loads that workflow. And
if we run it, we get the exact same
robot. This is very useful. Another
thing you might notice is that all
images start with the word comfy UI
followed by a number. This happens
because in the save image node, the
prefix is set to that value. We can
change it. For example, I can set it to
pixa. And now when I run the workflow,
the image file name will start with that
word followed by a number. As you can
see here, if you hover over the prefix
field, you can get more information
about how to format it. You can include
things like the date and other values in
the file name. Now, let us change it
again. I will add a folder name. for
example, my images, then a forward
slash, then the image prefix.
When we run the workflow now, the images
will be saved inside that folder. Let us
go to the output folder. You can see we
now have a folder called my images. And
inside it, we have the images that start
with the prefix we set followed by a
number. Now, I will go back to the
workflows folder that we created earlier
and delete it.
Back in Comfy UI, if we refresh the
workflows list, you can see that the
folder is gone. We are left only with
the getting started folder we used for
this episode. When you create your own
folders and organize your workflows, I
suggest naming them in a way that makes
sense. You can name them by base model
like SDXL workflows or flux workflows or
by function like text to image
workflows, inpainting workflows or video
workflows. Choose whatever makes the
most sense to you, but organizing your
workflows early will save you a lot of
time later.
In this chapter, I want to show you how
Comfy UI is organized on your disk. This
is important because sooner or later,
you will need to know where to place
models, images, workflows, and custom
nodes. Do not worry if this looks
overwhelming at first. You do not need
to understand everything right now. I
will focus only on the folders you
actually need as a user. This is the
main Comfy UI easy install folder. Think
of this as the main workspace that
contains everything Comfy UI needs to
run. The most important things here are
the Comfy UI folder, the Python embedded
folder, the add-ons folder, and the
batch files used to start or update
Comfy UI. In normal usage, you will
mostly work inside the Comfy UI folder.
If you have a different version of Comfy
UI, you will not have the add-ons folder
and some of the BAT files will be named
differently, but pretty much everything
else should be similar. When we open the
Comfy UI folder, we see many files and
folders. Most of these are internal
files used by Comfy UI itself. As a
beginner, you do not need to touch most
of these. The important folders for us
are models, input, output, custom nodes,
and the user folder. The models folder
is where all AI models live. This
includes checkpoints, Laura files, VAEs,
control nets, upscalers, and more.
Inside the models folder, everything is
organized by type. For example,
checkpoints or diffusion models for main
image generation models, loris for Laura
files, V for VAE files, control net for
control net models. When a workflow
tells you to download a model, it will
also tell you exactly which subfolder to
place it in. If a model is not placed in
the correct folder, Comfy UI will not
see it. The input folder is where you
place images that you want to load into
Comfy UI. For example, images used for
image to image control net masks or
reference images. Any image you place
here will be visible inside Comfy UI
when using a load image node. The output
folder is where Comfy UI saves generated
images by default. Every time you use a
save image node, the result will appear
here. This makes it very easy to find
all your generated images in one place.
The custom nodes folder is where all
custom nodes are installed. These are
extra features added by the community.
Each folder here represents a custom
node package. For example, we already
used the RG3 node and we will use more
later. When you install nodes using the
manager, they usually end up here
automatically. If a custom node is
missing or broken, this is usually the
first folder you should check. Inside
the user folder, we have user specific
data. The most important part for us is
the workflows folder. This is where
Comfy UI stores workflows that you save
from inside the interface. These
workflow files are saved as JSON files.
The add-ons folder is specific to the
easy install version. It contains extra
tools, optimizations, and helper
scripts. You usually do not need to
touch this folder unless a tutorial
specifically mentions it. You do not
need to memorize this right now, but
this structure might change as new tools
are created by IVO. For example, this
BAT file lets you link a folder with
models from another Comfy UI
installation. This one installs the Naku
node and this one installs Sage
Attention. There are also different
torch pack versions for more advanced
users who need a specific version for
certain custom nodes. You will also find
extra tools like one for Windows 10 that
enables long paths so Comfy UI can
download models even if the path is very
long. There is also an update folder
with BAT files, but as you will see
later, for the easy install version, I
recommend using different update BAT
files. The Python embedded folder
includes a self-contained Python
installation. This helps avoid conflicts
with other software and makes Comfy UI
easier to run and update. As you use
Comfy UI more, this folder structure
will start to make sense naturally. In
the next chapters, I will always tell
you exactly where things need to go. Let
us talk a little bit about updates and
custom nodes. What you are seeing here
is the Comfy UI easy install folder.
This setup already includes everything
needed to update Comfy UI safely. The
most important rule is this. Always
close Comfy UI before updating. Never
update while it is running in the
browser. Start Comfy UI BAT. This only
launches Comfy UI. It does not update
anything. Update Comfy UI BAT. This
updates the core Comfy UI code. Use this
when you want the latest features or
fixes. Update Comfy UI and Nodesbat.
This updates Comfy UI and all installed
custom nodes. Update easy install.bat.
This updates the easy install system
itself. When should you update? Update
when something is broken. Update when a
node requires a newer version. Update
when you want new features. Do not
update right before an important
project. Updates can sometimes break
workflows. If something breaks after an
update, you can usually fix it by
updating again or removing the last
custom node you installed. One important
reminder, Comfy UI moves fast. Stability
comes from not updating every single
day. If everything works, it is okay to
stay on your current version. At some
point, you will mess up Comfy UI. Maybe
a node breaks or some dependencies get
messed up or an update has bugs. But
remember, you can always do a fresh
install when that happens. Just create a
new folder and reinstall using the easy
installer. Let us doubleclick on update
easy install. This updates only the easy
installer and adds extra tools and
add-ons. As we move forward in this
series, more models will appear, new
nodes will be added, and IVO likes to
create scripts that make these
installations easier. When you see that
the installation is complete, you can
read more about the new release using
this link or press any key to exit. You
may not see any changes immediately, but
if we go to the add-ons folder, you can
see that we now have more BAT files than
we had in the first chapter. Now, let us
go back to the main folder and try to
update Comfy UI to see if everything
still works or if we break some nodes.
Nodes sometimes break after an update
because Comfy UI itself changes how
things work internally. Many custom
nodes are made by independent
developers, not by the Comfy UI team.
These custom nodes often rely on
specific Comfy UI behavior, internal
APIs, or extra Python libraries and
dependencies. When Comfy UI updates,
those assumptions can change and the
node stops working until its creator
updates it to match the new version. So,
Comfy UI started after the update, but
let us open the command window to see if
everything worked correctly. Usually,
after startup, you can see import times
for custom nodes. As you saw before, all
custom nodes are inside the custom nodes
folder. But look what happened here.
After the update, one of the custom
nodes installed like TC failed to
import. That means if you have a
workflow that uses that node, it will
not work. If you do not use that node,
you can ignore it and try updating Comfy
UI again in a few days to see if it gets
fixed. I will close Comfy UI now and try
something else. Sometimes there are
newer versions of the custom nodes and
if the author fixed the issue, updating
the nodes can fix the problem. So this
BAT file updates only comfy UI and this
one updates both Comfy UI and the custom
nodes. This process can take a while
because it updates all the nodes. So I
will speed it up. Comfy UI started. So
let us check the command window to see
if the issue was fixed. The node was
still not fixed. This means that at the
time I recorded this video, the update
from that day broke that node. When you
watch this video, it might already be
fixed and work for you. Either because
Comfy UI fixed a bug, the node creator
patched the node, or a new developer
created a replacement node. There is one
more thing I want to try. We can go back
to an older version of Comfy UI that did
work, a version that had the right
conditions for that node. The downside
is that if Comfy UI released new
features or nodes for newer models,
those might not work on the older
version. So, it is always a compromise.
You have to choose between keeping a
specific custom node working or using
the latest Comfy UI updates. IVO hid
this option so beginners do not
accidentally mess up their Comfy UI. Let
us go to the add-ons folder, then to
tools, and here we have the version
switcher. When we run this BAT file,
Comfy UI is downgraded to a previous
version. In my case, it went from
version 0.7 back to version 0.6.
If you run this script again, it
upgrades Comfy UI back to the latest
master branch. Let us press any key to
close this. Now that we are on an older
version, it is time to check if that
node works. Let us start Comfy UI. Wait
for the interface to load. Then open the
command window and check the custom
nodes. Now it is fixed and there are no
errors with the nodes. In a few days I
will try updating again to see if it
gets fixed in the newer version. But
this is basically how you update and
downgrade Comfy UI using the easy
installer. Other Comfy UI versions might
require you to run commands manually,
but I keep pushing IVO to create BAT
scripts for these tasks. I want to spend
my time generating, not typing lines of
code. You will see that we have the
manager here. In other versions, you
might find it somewhere else in the
menu. Let us open the manager and see
what we have here. We also have update
and update comfy UI. These are similar
to the BAT files, but the BAT files have
something extra. They take into account
some dependencies needed for certain
custom nodes to work, which Comfy UI
itself does not handle when updating.
For example, for the nunchaku node to
work, it needs specific dependencies,
like a certain version of a library. The
BAT file updates Comfy UI, but then
adjusts or downgrades those dependencies
to the versions required by the custom
nodes we use. IVO tries to maintain
these BAT files and keep them updated so
they stay compatible with the versions
needed to run the workflows shown in
these video tutorials. Because I am
using the easy installer, I did not
touch these update buttons inside the
manager. I only use the BAT files. If
you have a different version of Comfy
UI, you will need to use these update
options or use a BAT file from the
update folder instead. In the manager,
you can also find the latest Comfy UI
news, such as what was fixed, what is
new, and recent changes. At the bottom,
you can see the Comfy UI version and the
manager version. Most of the time the
manager is used for managing custom
nodes. If we go to the custom nodes
manager, we can see all the available
custom nodes created by different
developers. There are a lot of them. I
personally try to keep the number of
installed nodes to a minimum and install
only what is essential or what I use
most often. Some people install hundreds
of nodes, but the more nodes you
install, the harder it becomes to keep
everything compatible because each node
can have its own dependencies and
requirements. If I filter by installed,
you will usually not see many nodes here
besides the manager itself. However, I
asked Ivo to include a few essential
nodes that I use most often. One example
is the RG3 custom node, which includes
the image compar node that is very
useful for comparing images side by
side. Each custom node has a title and a
version number. You can switch versions
if needed, for example, when an older
workflow only works with a specific
version of a node. For each node, you
also have several actions available.
Update only that node, switch the
version, temporarily disable it, or
uninstall it. You can also see how many
individual nodes are included in that
custom node package along with a short
description. Some nodes mention possible
conflicts with other nodes. If you click
on that yellow warning text, you can
read more details about those conflicts.
These conflicts usually matter only if
you use both conflicting nodes in the
same workflow. You can also see the
author of the node and the number of
stars it has on GitHub. Stars are given
by users and usually indicate how
popular or trusted a project is. Some
developers are wellknown and
consistently release highquality nodes.
That said, there have been cases in the
past where certain nodes had security
issues, so it is still a good idea to be
careful. You can also see when the node
was last updated. To switch versions,
you click the version selector, choose a
version from the list, click select, and
then follow the steps shown. We will not
do that right now. As you remember,
every custom node that gets installed
ends up in the custom nodes folder. Here
you can see all the custom nodes that
come with the easy install version at
the time of this recording. Now, let us
install one node as a test just to see
how the process works. Open the manager.
Go to custom nodes manager and search
for a node called align. We will use
this as a test because it does not
require special dependencies. So in
theory it should not affect Comfy UI too
much. Each node entry has a title. If
you click on it, it opens the GitHub
page for that node. On GitHub, you can
see the code because every custom node
is basically Python code and supporting
files. You can also check the issues tab
where users report problems and
sometimes solutions are discussed. If
you scroll down, you usually find
important information like required
Comfy UI versions, Python versions, or
other dependencies. These are the
dependencies I mentioned earlier, things
the developer relied on when creating
the node. You also see installation
instructions either through the manager
which we are doing now or manually using
commands like git clone which simply
copies the code into the custom nodes
folder. Before installing any custom
node, it is a good habit to read this
information. Some nodes require things
your system might not have and then they
will not work. Now let us install this
node. Click the install button. You will
be asked to choose a version. So select
the latest version. The button changes
and installation begins. When it
finishes, you will see a restart button.
Comfy UI needs to restart for the node
to become available. Click restart and
confirm. Comfy UI shuts down. You will
see the browser trying to reconnect
while Comfy is restarting. After a few
moments, you get a confirmation message.
Click confirm. The node is now
installed. If you go back to the
manager, open custom nodes manager and
search for the align node, you will see
that it now shows an uninstall button.
If installation had failed, you would
see an import failed message instead. If
you look inside the custom nodes folder
on disk, you will now see a new folder
for this node. It is simply the same
code you saw on GitHub copied locally.
This code is what adds new nodes to the
Comfy UI interface. If you deleted this
folder manually, that would also
uninstall the node. However, let us
uninstall it properly using the manager.
Go back to the manager, click uninstall
and confirm again. You will be asked to
restart Comfy UI. Confirm. Wait for the
restart and then confirm the browser
reload.
Now, go back to the custom nodes manager
and search for the align node again. You
will see the install button again which
means the node is no longer installed.
If you check the custom nodes folder,
you will also see that the folder for
this node has been removed. This is the
basic workflow for installing, updating,
and uninstalling custom nodes using the
manager. Sometimes when you download a
workflow from other people on the
internet, you will have missing nodes
because they used custom nodes that you
do not have installed. When you do not
know what nodes they used, you can use
the install missing custom nodes button.
This will give you a list of missing
nodes and the option to install them.
That said, I personally prefer to
install nodes manually so I have full
control over what gets installed. That
is why I usually include a note node in
my workflows explaining exactly which
custom nodes are required. Now let us
look at templates. If we open templates,
we can see different workflows created
by the Comfy UI team. If we filter by
image generation workflows and select
something like a Z image turbo text to
image workflow, Comfy UI will first tell
us that we have missing models. These
are the AI models required for the
workflow to generate images. Usually, it
tells you exactly which folder the model
needs to go into and gives you the model
name along with a download link or a
download button. In this example, you
can see it needs a VAE model and a few
other models. Once you download those
models and place them in the correct
folders, the workflow should work,
assuming you have enough VRAM to run it.
In this case, there are no missing
nodes. So, let us close this. Now, let
us go to menu, then file, then open, and
open a workflow that I know uses missing
custom nodes. You will see a message
saying the workflow uses custom nodes
that are not installed. At first, you
might not see any red nodes on the
canvas. That is because this workflow
uses subgraphs. Subgraphs are basically
nodes that contain other nodes inside
them. If you have experience with
Photoshop, you can think of them like
smart objects. When you see an icon with
a square and an arrow, you can click it
to enter the subgraph. Once inside, you
can see the red node that is missing. If
we now open the manager and click
install missing custom nodes, Comfy UI
detects that node and offers to install
it. For many nodes, this works
perfectly. However, some nodes like
Nunchaku require additional dependencies
and extra setup. We will talk about
those in a future episode. The important
thing to know is that for many
workflows, install missing custom nodes
can quickly fix the problem. Let us
close this for now. If we open the
manager again, you will also see a
models manager. This lets you browse and
download models by type. Personally, I
rarely use this because a model without
a workflow is not very useful. In my
tutorials and on my Discord server,
every workflow comes with notes
explaining exactly which models you need
and where to put them. The Comfy UI
templates also clearly list required
models and folders. So, let us do a
quick recap.
Use update Comfy UI.bat to update only
Comfy UI. Use update Comfy UI and
nodes.bat to update Comfy UI and all
custom nodes. Use update easyinstall.bat
to update the easy install system and
helper scripts. The update folder exists
for users with other Comfy UI versions.
The add-ons folder only exists in the
easy install version. Inside add-ons,
the tools folder includes the version
switcher, which lets you downgrade or
upgrade Comfy UI if needed. This is
useful when a new update breaks a node
you rely on. Inside the Comfy UI folder,
the custom nodes folder contains all
installed custom nodes. If you delete a
folder from here, you uninstall that
node. Sometimes if a node fails to
install correctly, deleting its folder
and reinstalling can fix the issue. I
know this is a lot of information. Do
not worry if it does not all stick right
away. Practice, experiment, and come
back to this tutorial in a month. You
will be surprised how many things
suddenly make sense that you missed the
first time. Regarding the tcash node,
after a few days, Comfy UI was updated
again and the problem was still not
fixed. There is now version 8 and even
if you downgrade to version 7, it is
still not fixed. Comfy UI keeps adding
updates and at some point some custom
nodes will stop working. If that node is
not important for you, you can delete it
or uninstall it. You can also just
disable it from the manager or drag the
TCH folder into the disabled folder so
it is disabled. You can move it back out
of the disabled folder anytime you want
to try it again. In this chapter, I will
try to simplify this complex world of
diffusion and AI a little. Do not worry
if you do not understand everything that
is happening. Like I said before, you do
not have to be a mechanic and know all
the engine parts to know how to drive a
car. This is the core idea behind
diffusion image generation. The model
does not draw an image all at once. It
starts from pure random noise. This
noise looks like static on a television.
The model then runs a sequence of small
refinement steps. At each step, a small
amount of noise is removed. Early steps
reveal very rough shapes. Later steps
reveal clearer forms. Final steps add
fine details and texture. Image
generation is therefore a gradual
process. It goes from noise to less
noise to recognizable shapes and finally
to a finished image. This slide is a
simplified visualization. The real
process is more complex. In practice,
most diffusion models work in a
compressed latent space rather than
directly on pixels. A neural network
predicts what noise should be removed at
each step. Even though the real math is
more advanced, this simplified view is
enough to understand how diffusion
works. It's like sculpting. You start
with a rough block and remove material
until the shape appears. or like a foggy
window clearing up step by step. You
don't instantly get a sharp scene. It
resolves gradually. Let's open comfy UI.
Go to workflows and from the getting
started folder, pick workflow 1, which
is the basic text to image example. Even
if we cannot fully see what is happening
inside the K sampler stepby step, we can
still get a good idea of the overall
process. Remember what we see here is a
simplified representation of what is
actually happening under the hood. First
we want a fixed seed. We will see later
that each seed starts with different
noise. Right now we are using 35 steps
which is enough for this model to
produce a clear image like this robot.
If we change the steps to one, you can
see that the model does not have enough
time to remove the noise. So the image
is very unclear with these settings. If
we add another step, the change is
subtle. Adding another one, you can
start to see something forming. By step
four, we can almost see a face. We can
automate this process to see the changes
faster. Double click on the canvas and
add a primitive node. Like you saw in an
earlier chapter, we can adapt this node
for different fields. Drag a connection
from the primitive node and connect it
to steps. Now we have control over the
steps including what happens after each
generation. Instead of fixed or random,
choose increment. After each run, the
value increases by one. So now we have
five steps. If we run it again, we get
six steps and the image starts to change
more. As more steps are added, more
noise is removed and the image becomes
clearer. Next to the run button, there
is a small down arrow. From here, select
run instant. This means we can click run
once and it will keep running until we
stop it. You can see the workflow now
runs automatically. On each run, more
steps are added and the image keeps
refining. You may also notice that as
the number of steps increases, it
becomes harder for the computer. Just
like climbing many stairs, more steps
mean more effort. So, generation becomes
slower and slower. Soon we reach around
35 steps which is recommended for this
model to get a nice clear image.
Although some results already look good
around 20 steps. Now we want to stop
this. Click the arrow again and switch
back to run. After the current
generation finishes, it will stop. There
is also another way to see a small
preview of what is happening inside the
K sampler. From the menu, you can go to
settings, but it is faster to access the
settings from here. In the settings
search bar, type preview. You will see
an option called live preview method. By
default, it does not show anything. But
if we set it to auto, we can see a small
preview during generation. Let's delete
the primitive node. Then change the seed
to random. Now when we run the workflow,
we can see a small preview of what the
image might look like before it even
finishes generating. Let us change the
steps to 30 and run again. You can now
quickly see what is happening in the
diffusion process. Even though this
preview is low resolution, you can
clearly see how the image becomes more
and more defined as noise is removed.
Now, let me try something more drastic.
I will use a very large image size. On
some computers, this might crash Comfy
UI or take a very long time to generate.
I will run it again with these settings.
You can see that generation is now very
slow. But the preview lets us observe
how the image slowly starts to appear.
This is a bit too slow. So I will cancel
the generation here. Instead, I will try
a slightly smaller image, still larger
than what the model is comfortable with,
just so we can see the preview updating
more slowly. Now we can clearly see the
diffusion process updating every few
seconds. The speed of this preview is
also influenced by the sampler and the
scheduler. As you may remember, models
are trained on specific image sizes. If
a model was not trained on large images,
it treats them more like multiple
smaller images stitched together. For
example, our juggernaut model was
trained on 512 pixel images only.
Personally, I prefer not to keep the
live preview enabled all the time
because it can slightly slow down
generation. So I will go back to
settings and set the preview option back
to default. I will also reset the image
width and height. You may notice that
the preview is still visible. This can
happen because something remains in
memory. To fix this, I will press F5 to
refresh the browser. Keep in mind that
refreshing the browser will reload only
the current workflow. If you had other
workflows open and did not save them,
they will be lost. Now everything is
back to normal without the preview.
There are still more useful things to
learn. This slide explains how a
diffusion model is trained. This is not
image generation yet. During training,
the model is shown millions of images
paired with text descriptions. For
example, images of cats, people,
objects, lighting styles, and
environments. The training process uses
something called forward diffusion.
Forward diffusion means gradually adding
noise to a clean image. At first, only a
small amount of noise is added. Then
more noise is added step by step.
Eventually, the image becomes almost
pure noise. At each step, the model is
trained to predict what noise was added.
In other words, it learns how images
break down as noise increases. By
repeating this process across millions
of images, the model learns patterns. It
learns what shapes look like. It learns
what objects look like. It learns how
lighting and structure behave. The goal
of training is not to memorize images.
The goal is to learn how to reverse this
process later. Training a diffusion
model requires massive data sets and
powerful hardware. In Comfy UI, we are
only using the result of that training.
Now that the model has learned how noise
works during training, we can use that
knowledge in reverse to generate images.
This slide shows the difference between
training and image generation. During
training, the model starts with a clean
image. Noise is added step by step until
the image becomes pure noise. This is
called forward diffusion. This process
teaches the model how images break down
when noise is added. During generation,
the process is reversed. We start from
pure random noise. The model removes
noise step by step to create an image.
It is important to understand this
clearly. During generation, we do not
add noise like in training. We only
remove noise using what the model
learned before. This slide explains an
important concept that is often
misunderstood. The model does not store
images in memory. During training, the
model never saves photos that it has
seen. Instead, it learns patterns and
relationships.
It learns what shapes look like. It
learns what objects look like. It learns
how parts of an image relate to each
other. For example, it learns that faces
usually have eyes in a certain position.
It learns that animals have specific
structures. It learns how lighting,
shadows, and perspective usually behave.
All of this knowledge is stored as
probabilities inside the model, not as
pictures, but as learned rules. You can
think of it like learning a language.
You do not memorize every sentence you
read. You learn grammar and structure.
The model works the same way. It learns
visual grammar, not individual images.
When the model generates an image, it is
not copying anything it has seen before.
It is using learned patterns to guide
the noise removal process. That is why
results can look familiar but are still
new images. This is why changing the
prompt changes the result. The prompt
activates different learned patterns
inside the model. That is also why the
same model can generate many different
images even though it was trained only
once. So far we talked about diffusion
in a simplified way as if it happens
directly on images. In reality, most
modern diffusion models do not work
directly on pixel images. Instead, they
work in something called latent space.
Pixel space is the image as we normally
see it. It is made of pixels with width,
height, and color values. Latent space
is a compressed representation of that
image. It keeps the important structure
and information but removes unnecessary
detail. You can think of latent space as
a simplified version of the image that
is easier for the model to work with. To
move between pixel space and latent
space, the model uses a VAE. VAE stands
for variational autoenccoder. The VAE
has two main jobs. First, it encodes a
pixel image into latent space. Second,
it decodes a latent image back into
pixels. During image generation,
diffusion happens in latent space. After
the dnoising process is finished, the
VAE decodes the result back into a
visible image. Working in latent space
makes diffusion much faster. It also
uses less memory and less computing
power. This is why models like stable
diffusion can run on consumer graphics
cards. Without latent space, image
generation would be much slower and more
expensive. In Comfy UI, this is why we
see nodes like VAE encode and VAE
decode. When we generate images from
text, the model works in latent space
and VAE decode converts the result into
pixels we can see and save. This also
explains why image resolution and VAE
selection can affect results.
Now we look at how text prompts
influence image generation. The prompt
does not act only once at the beginning.
During diffusion, the prompt is used at
every denoising step. At each step, the
model checks whether the image is moving
closer to what the text describes. You
can think of the prompt as guidance. It
gently nudges the image in the right
direction while noise is being removed.
This happens repeatedly, step by step,
until the final image is formed. CFG
stands for classifier free guidance. CFG
controls how strongly the prompt
influences the dnoising process. With a
low CFG value, the model follows the
prompt loosely and allows more
randomness. With a high CFG value, the
model follows the prompt more strictly
and forces the image to match the text
more closely. Here is a quick example.
You can find CFG here in the K sampler.
Too low CFG can produce images that
ignore the prompt. Too high CFG can
produce images that look unnatural or
oversharpened. CFG is like telling the
model how strict it should be about your
instructions. The prompt does not
generate the image by itself. The prompt
only guides the noise removal process.
The image is still created by diffusion
in latent space. As you can see with
CFG1, the cat is still a cat, but it is
not read like we asked. With CFG7, the
result is much closer to the prompt.
That said, this also depends on the
model we are using. Smarter or better
trained models tend to follow the prompt
more accurately. In fact, there are some
models where we intentionally use a
fixed CFG value of one, which
effectively ignores the negative prompt.
However, pushing CFG too high can damage
the image. It can introduce artifacts or
make the result look unnatural. Because
of that, we always try to find a
balance. The goal is to use settings
that give us the quality we want in the
shortest amount of time without hurting
the final image.
Now we talk about seeds. Seeds are very
important for understanding consistency
and variation. A seed defines the
starting noise used to generate an
image. You can think of it as the
initial random pattern the model starts
from. When diffusion begins, the model
always starts from noise. The seed
decides exactly what that noise looks
like. If you use the same prompt, the
same settings, and the same seed, you
will get the same image every time. If
you change the seed, you change the
starting noise, and the final image will
be different. The prompt guides the
process, but the seed decides the
starting point. Different starting noise
leads to different results, even when
everything else stays the same. You can
think of the seed like rolling a dice
before starting. If you roll the same
number, you start from the same
situation. If you roll a different
number, the outcome changes. This is a
simplified explanation. The seed
controls a random number generator used
internally by the model. You do not need
to understand the math behind it. You
only need to know that seeds control
repeatability. Let us put it into
practice. The seed is this number here.
It can start from zero and go up to a
very large number. So each seed can
produce a slightly different result. If
you also change the prompt and settings,
you can get millions of different
results. We can control the seed
behavior. If we set it to fixed, we
generate once and the result will never
change. To generate something new, we
need to change other settings. If we
choose increment, after each generation,
the seed number will increase by one. If
we choose decrement after each
generation the seed number will decrease
by one. So let us change it to fixed and
set the seed to 10. When I generate I
get this robot. Now let us change the
seed to 15. You can see that I get a
different robot this time in profile. If
I change the seed back to 10, I get the
previous robot again because we used the
same prompt, the same settings, and the
same seed.
In prompts, the order of the words
matters.
With this prompt, I got this image
because house was first. So, the model
focused on the house and mostly ignored
the car. With newer models, this happens
less often. But this is an older model,
so the effect is more noticeable. Now,
look at what happens if I put car first
and then house. This time, we clearly
get both a car and a house. Words that
appear earlier in the prompt usually
have more influence than words that come
later. You can think of the prompt as a
list of priorities. The model pays more
attention to the beginning and gradually
less attention as it moves toward the
end. On top of that, some words can
carry more weight, either because of how
the model was trained or because we
explicitly give them extra emphasis.
Because of this, two prompts with the
same words but in a different order can
produce noticeably different results.
Think of the prompt like giving
directions to someone. If you say a red
cat sitting on a chair in a room with
soft lighting, the most important idea
is red cat. Everything after that adds
detail, but the core idea comes first.
We can also add more weight to a word by
using round brackets. Right now, house
has more weight. So the model pushes the
car into the background and it is no
longer the main focus. If I add even
more brackets, the influence of house
becomes even stronger and now the car
disappears completely. If I instead add
more weight to the word blue, you will
see more blue appear in the generation.
One more thing you might notice is that
there is no spell check by default.
Sometimes it can be useful to turn it
on. To do that, go to settings,
search for spell, and enable text area
widget spell check. Now, words that are
misspelled or not part of the dictionary
will be underlined.
Now, we talk about denoising steps.
Steps control how many refinement passes
the model performs during generation.
Each step removes a small amount of
noise. The image is not created in one
action. It is refined little by little,
step by step. When you increase the
number of steps, the model has more
chances to clean up noise and add
detail. When you decrease the number of
steps, the process is faster, but the
image can look rough or incomplete. More
steps means slower generation and more
refinement. Fewer steps means faster
generation and less refinement. There is
always a balance between speed and
quality. You can think of steps like
polishing an object. More polishing
gives a smoother result. Less polishing
is faster but rougher. In Comfy UI,
steps are set inside the K sampler node.
For most models, a good starting range
is between 20 and 30 steps. Going much
higher often gives diminishing returns.
Going much lower is useful for fast
previews. Steps work together with the
seed and the prompt. The seed decides
the starting noise. The prompt guides
the direction. Steps decide how far the
refinement goes. Now we are ready to
look at a real workflow in Comfy UI.
This is called text to image. Often
shortened to text to img. Text to image
means we start from pure noise and
generate an image only from text
instructions. There is no input image
involved. This is usually the first
workflow people learn and it is the best
way to explore ideas and styles from
scratch. We start by loading a model.
This model contains everything the AI
learned during training. Next, we give
the model instructions using a text
prompt. This describes what we want to
see in the image. We also define the
image size using an empty latent image.
This decides the resolution before the
image is generated. Then the K sampler
runs the diffusion process. This is
where noise is removed step by step
guided by the prompt. After that, the
VAE decodes the latent result into a
visible image. Finally, the image is
saved to disk. Use text to image when
you want to explore new ideas, you want
to test prompts and styles, or you are
starting from nothing. This workflow is
ideal for concept art and
experimentation. But we can also start
from an image not just from pure noise.
In that case instead of beginning with
random noise we use an existing image as
the starting point and apply d noiseise
on top of it. You can think of den
noiseise as how much freedom the model
has to change the image. With low den
noiseise the model stays very close to
the original image. With higher d
noiseise it moves further away and
behaves more like text to image. So
rather than generating everything from
scratch, we are guiding the diffusion
process using an image as the base and
then controlling how much it changes
using the D noiseis value. Image to
image is like starting with a rough
sketch and deciding how much you want to
redraw it. You can see that in the text
to image workflow we have empty latent
image. That node generates the noise. In
this workflow, we have an image that is
encoded to latent so it can go to the K
sampler. Let me show you how I did it. I
removed the empty latent image node.
Then I doubleclicked on the canvas and
added a load image node. From here we
can load an image and I will choose this
robot. Now you can see it does not have
a latent output. So we cannot connect it
to the K sampler yet. So we need a VAE.
If we look we have decode and encode. We
already have VAE decode when it converts
from latent to pixels. Now we want to
encode it. An easy way to find the right
node is to drag a link and release it.
And you will see a suggestion for VAE in
code. Now we have a latent output which
means we can connect it to the K sampler
which is what we want. If we try to run
it like this, something is missing. It
says missing VA AE. You can see a big
red outline around the node with the
problem and a small circle around the
input which means we need a connection
there. So let us connect it to the VAE.
In this case, the VAE is included in the
main model. So we connect it from there.
Now we encoded it and then we decode it.
Let us run again. And now it works. But
the result is still different from my
input. We have the right prompt, but
something is influencing it. Remember
this. Every time you use an image as
input, we need to adjust the D noiseis
because that controls how much the image
changes. With the default value of one,
it is at the maximum. So, it changes the
image too much. Let us change it to 0.2
and see how that affects it. Now, you
can see it is very similar to the
original. It is hard to tell what parts
changed. Let us increase it to 0.5.
Now, we can see more changes in the
robot face. There is an easy way to
compare these images. Double click on
the canvas and search for image compar.
This is part of the RG3 node pack. You
can see it has two inputs, image A and
image B. I want to compare the original
image. So I will connect the load image
output to image A. For the second image,
remember the save image node is only for
saving to disk. The image we want to
compare is the one coming out of VAE
decode. So we connect that to image B.
Now let us run the workflow. We get this
small preview. Let me make it larger. It
is still too small. So I will move some
nodes to make space so you can see it
better. By default, it shows image A,
the original. When we move the mouse
over to the right, it shows the second
image. Now it is much easier to compare
before and after. If I change the noise
to 0.1, we get a very similar result
because the amount of den noise is
small. If I change it to 0.9,
we get a big variation. All of this is
also influenced by the sampler, the
scheduleuler, and the model itself. But
in general, this is how it works. I
prefer to start with values around 0.7.
If that is too much, I reduce to 0.5
and keep adjusting until I like the
result. Another thing you should know is
that the input image size influences the
result size. Since we do not have an
empty latent image node where we set
width and height, the loaded image
decides the size. Comfy UI will also
round the size to a multiple of 8. For
example, if your image is 511 pixels, it
can be rounded down to the next number
that is a multiple of 8, like 504.
You can also control the input size by
resizing or cropping it, like you saw in
the earlier chapters. For example, I can
add an upscale image node here, then
redo the connections so the image passes
through it. I can upscale to a bigger
size with the same ratio. Now when I run
it, the final image should be larger
because it follows the input image size.
Now we are going to talk about samplers
and schedulers which you can find here
in the K sampler. This is one of the
most confusing parts at first, but the
idea is actually simple. Everything
begins with the same initial noise. The
seed defines that noise, but once the
noise exists, two different systems
control what happens next. The sampler
decides how noise is removed. It defines
the strategy the model uses to go from
noisy to clean. Different samplers use
different mathematical paths to dn
noiseise. Some remove noise more
directly. Some refine the image
gradually. Some are more random and
creative. Some are more stable and
precise. Even with the same prompt, the
same seed, and the same number of steps,
changing the sampler can change the
final image. So the key idea is this.
Sampler controls how each denoising step
is calculated. Or in simple terms,
sampler equals how noise is removed.
Theuler does not change how denoising
works. It changes when denoising happens
during the steps. A linearuler spreads
denoising evenly across all steps. Each
step removes roughly the same amount of
noise. A nonlinearuler removes noise
faster at the beginning and slower near
the end. This allows fast structure
early and fine detail later. Both
approaches can reach a clean image, but
they feel different in how detail is
introduced. So the key idea here is
controls when noise is removed or simply
equals when noise is removed. Sampler
anduler always work together. You never
choose one without the other. The
sampler chooses the denoising method.
The scheduler chooses the timing of that
denoising. The same noise plus a
different sampler or a differentuler can
produce different results.
Let us do a little experiment in comfy
UI. From workflows, I open again this
text to image workflow and I change the
seed to fixed. Then I run the workflow.
With this sampler anduler, we get this
robot. Here we have a lot of samplers
and schedulers. Depending on the model
we use, some work better than others.
Let us say I pick the Oiler sampler. Now
when I run it, even if the seed and
prompt are the same, the result is
slightly different because the sampler
influences how the den noiseise is
applied. Let us say I also change
theuler to simple. Now the result will
again be different because theuler
changes when the den noiseise happens
during the steps because the model we
use is quite small. We can actually
preview multiple results at the same
time. So I hold the control key and drag
over these three nodes. Then I use
controll + c to copy them and controll +
shift + v to paste them with the links
connected. Now this workflow will
generate two images and has two k
sampler nodes. Let me use controll +
shift +v again to get a third one. Now
this workflow uses the same seed and
prompt with three different k sampler
nodes. And I want to change the samplers
andulers for each one. You can play with
these all day and try many combinations.
I will choose something random for this
example.
Now when I run it, you can see it
generates an image for each sampler.
Some results are quite similar, but some
details are different. For example,
parts of the robot may change from one
image to another. Let me now put the
same sampler on all of them and use
different schedulers only so we can see
how the timing of den noiseise
influences the result.
Again, the differences are subtle but
they are there. Sometimes this can mean
one image has five fingers and another
has six. So having options is useful
especially when you want small
variations. Now let us double click on
the canvas and add a primitive node. I
want to control the steps value for all
three K samplers, but I do not want to
change it manually on each one. So I
drag a connection from the primitive
node to the steps input of the first K
sampler. Then do the same for the second
and the third one. Now from this single
node I can control all three. If I
change steps to one, you can see we get
very similar results. If I change steps
to three, you can already see
differences. Someulers are faster. For
example, with one, the image is still
very noisy, while with another, you can
already see a shape forming. If I change
to four steps, the differences become
more visible. At five steps, some start
to form clearer shapes. At six steps,
some images already show eyes and a main
structure. At eight steps, the middle
one is almost fully formed. At 10 steps,
almost all of them have something that
could work for certain concepts. And at
20 steps, most of them have enough
detail to be usable in a project.
Usually, the people who create AI models
suggest specific samplers and
schedulers, or the community tests them
and shares which ones work best. This
way, you do not have to test everything
yourself for every model. But if you do
find good settings, it is always a good
idea to share them with the community so
everyone can improve their image
generation results. Let's talk a little
about subgraphs in Comfy UI. Go to
workflows and open the juggernaut text
to image workflow. Here you can see a
bunch of nodes. Just like before, hold
the control key and drag to select most
of the nodes except the export node,
which in this case is the save image
node. Now that the nodes are selected,
look at the icons at the top. One of
them says convert selection to subgraph.
When you hover over it, when you click
it, all those selected nodes are
combined into a single node. If you
write click on this new node, you will
see an option called unpack subgraph.
When you click it, the nodes go back to
how they were before. Let's do it again.
Select two or more nodes, then use the
subgraph button to create a subgraph.
Resize it so it is easier to see. A
subgraph is a way to group multiple
nodes into a single reusable block.
Instead of showing a long chain of nodes
every time, you collapse them into one
node that represents an entire process.
You can think of a subgraph like a
function or a macro. Inside it there can
be many nodes, but from the outside it
looks simple. It is very similar to
smart objects in Photoshop which can
contain multiple layers inside a single
object. Subgraphs solve three main
problems. First, they reduce visual
clutter. Large workflows can become
messy very quickly and subgraphs help
keep things readable. Second, they help
reuse logic. If you repeat the same
setup many times, like a prompt encoding
chain or an image pre-processing step,
you can reuse it instead of rebuilding
it every time. Third, they make
workflows easier to explain and share.
People understand a few clean blocks
much faster than dozens of individual
nodes. At the time I recorded this
tutorial, subgraphs were still being
improved and may still have some bugs. A
subgraph does not make a workflow faster
by itself. It is about organization, not
performance. Performance depends on the
nodes inside the subgraph, not on the
subgraph wrapper. A subgraph is like
putting many Lego pieces into one box
and labeling the box with what it does.
All the pieces are still there. You just
do not need to see them all the time.
You can see the title says new subgraph.
Let's double click on that and rename it
to something that makes sense like text
to image and maybe also include the
model name. So, juggernaut text to
image. Now, it looks like a simple
workflow with only two nodes. I do not
like the order in which things appear in
the node. So, let's write click on the
node and select edit subgraph widgets.
Here you can choose what parameters to
show in that node and what to hide. Let
me hide all of them so we have a clean
subgraph that does not show any
parameters. You can enable them one by
one later if you want only the ones you
need. But we will build those manually
so you understand them better. Let's
close this panel. Now let's go inside
the subgraph. You can see that all the
nodes are there plus some input and
output. On top you can see a new tab
next to the workflow name. If I click on
the main workflow name that is how we
exit the subgraph. From there we can go
back inside and from inside we can go
back outside. You can see the output
where the image is saved. If we go
inside the subgraph that image output
appears here as a link. From this dot we
can drag a connection to where it says
checkpoint name. Now that field becomes
gray just like when we added a primitive
node before. If we go back outside you
can see that the checkpoint name appears
here. Let's go back inside again, double
click on that name, and rename it to
model to see what happens. Now, when we
go back to the main workflow, you can
see it says model instead of checkpoint
name. So, this is very customizable.
Let's go back inside and drag another
connection, this time to the positive
prompt and rename it so we know what it
is. Do the same for the negative prompt.
Now, when we go back outside, we have
positive and negative prompt visible. Go
back inside again and drag connections
to width and height. And maybe do the
same for all the parameters from the K
sampler. Now we can see all those
parameters exposed here. And when we go
back outside, we have this single node
that acts like a mini interface that can
control everything we need. You might
say that it looks nice, but does it
actually work? Let's try it. And the
answer is yes, it works. If you right
click on it, you can see it still has
other options like node color, bypass
and so on. With that subgraph selected,
right click on the canvas this time and
you will see an option called save
selected as template. It asks for a
name. I will name it juggernaut text to
image. Then press enter or confirm. It
looks like nothing happened, but where
was that template saved? Let's open a
new workflow. Now right click on the
canvas and go to node templates. You can
see that name there now. And you also
have the option to manage templates and
remove them. When I select that
template, it is added to the canvas with
all the nodes, connections and settings
it has inside. Now we can just drag a
link from the image output and add a
save image node or connect it to other
nodes to create more complex workflows.
Over time, this simplifies workflows
because we can organize them into pieces
and group them by category or function.
Let's go back to the first workflow just
to show you that any node or combination
of nodes can be saved as templates.
There are cases where some connections
can break when some nodes are inside a
subgraph and others are outside. So,
keep that in mind. For example, I use
this Pixaroma note node a lot. I want to
save it as a template so I can access it
easily next time. This might not be
useful for everyone, but as a workflow
and tutorial creator, I use this a lot.
I will save it as a template and give it
a name. Now I can go to any other
workflow and quickly access that
template from anywhere. You can also
have subgraphs inside other subgraphs
like boxes inside boxes.
You can disconnect or remove links at
any time. I could select two nodes here
and combine them into another subgraph
or go outside and combine all these
nodes even if some are simple nodes and
one is already a subgraph and it will
still let me create a new subgraph. If
we go inside all those nodes are there.
If we go back outside we can unpack it
using the icon or rightclick and choose
unpack subgraph. These things will make
more sense as you work with them in
practice. So play with them and have
fun. When you see that icon on a node,
you know it is a subgraph. It also has
the icon that lets you go inside the
subgraph, which is another indicator
that it is not a simple node. Remember
that you can also use the interface to
edit subgraph widgets. One thing I
forgot to show is that you can use those
dots to rearrange the order of the
parameters shown in the subgraph node.
This way you do not need to go inside
it. Most of the time you can control
things directly from the outside. Now we
are going to talk about loris. Laura
stands for low rank adaptation. In
simple terms, Allora is a small add-on
that modifies how a base model behaves.
Allora does not replace the model. It
works together with the model. You can
think of the base model as the main
photographer we hired earlier. Allora is
like giving that photographer extra
experience in a specific style or
subject. Why loris exist? Training a
full model is very expensive. It
requires a lot of images, time, and
powerful hardware. Loris exists to solve
this problem. Instead of retraining a
full model, we train a small adapter
that teaches the model something new.
This could be a specific art style, a
character, a face, a pose style, or a
lighting style. Loris are much smaller
than full models. That is why they are
easy to download and experiment with. So
remember, a Laura does not work by
itself. It always needs a base model and
a compatible architecture. For example,
a stable diffusion 1.5 Laura needs a
stable diffusion 1.5 model. An SDXL
Laura needs an SDXL model and so on. If
you mix incompatible models and loris,
the results will be broken or random.
Let's open comfy UI to test it because
what is theory without practice, right?
Open workflow 3, the one that has Laura
in the name. As you can see, the
workflow is very similar to what we had
before. That is one of the reasons I am
using this older model instead of a
newer one. It is easier to learn the
basics first and then we can make things
more complex as we move forward.
Compared to the first text to image
workflow we used earlier, we now have
this Laura loader node that loads a
Laura model in our photographer analogy.
This means the photographer took some
classes on how to take photos of cakes
and is now specialized in that subject.
Let's look at the note node first. We
need to download the Laura model.
Remember the workflow comes with nodes
and settings, but since it is just a
text file, it cannot include the actual
models. We have to download those
separately and place them in the correct
folder. In this case, we are using a
Laura called cake style. It is a small
model trained on images of cakes. So, it
understands cakes better than the base
model alone. A few years ago, when
stable diffusion 1.5 models first
appeared, they could not handle many
subjects very well, and luris were often
used to fix those limitations. So, we
need to download this Laura and place it
inside the Laura's folder. Click where
it says here. Then we need to place that
file in the Laura's folder. Go to your
Comfy UI folder. Open the models folder
and then find the Laura's folder. If we
place it directly here, it will work
perfectly. But this time, I want to keep
things organized. I want to create a
folder that tells me which base model
this Laura is compatible with. So, I
will create a folder called SD15.
This way I know it works with that model
and I do not mix it with others. Save
the Laura inside that folder. If you
look at the file now, you can see that
the Laura is much smaller than the base
model. All lures should go into this
folder and it is best to organize them
by base model name like SDXL, Flux,
Quinn, and so on just like we did with
checkpoints in an earlier chapter. Now
go back to Comfy UI. We have everything
we need to run this workflow, but
because Comfy UI was already open when
we downloaded the model, it cannot see
it yet. We need to press the R key to
refresh the node definitions.
Now it appears in the list and we can
select it. You can also see a note here
with a trigger word. I like to add these
notes so I remember them. Many Lauras
are trained using specific trigger
words. These are words the Laura learned
during training. If you do not include
the trigger word in the prompt, the lura
may have little or no effect. Some luras
work without trigger words, but many
require them. Always read the Laura
description from the place where you
downloaded it. If we look at the
positive prompt, we first added the
trigger words so we do not forget them.
It is not required to be first. It can
also be placed after a few words, but I
like to put it first. Then we have the
prompt for a robot. I did it this way so
we can clearly see how the Laura and a
simple trigger word affect the result.
Now if we run this workflow, we get this
robot cake. You might think this model
could do that without a Laura, but it
depends on the prompt and the model. Let
me change the seed to fixed so we can
get a consistent result. So this is how
it looks with the Laura applied. Now
what I want to do is see the effect the
workflow without Laura and without
changing anything else. Same prompt,
same settings, same seed, just disable
the Laura. To do that, I right click on
this node and choose bypass. Now, when I
run the workflow, the Laura is bypassed.
And you can see we get a normal robot
instead of a cake robot. If I enable the
node again and run it, you can clearly
see the effect the Laura has on the
image. Now that you see how it works,
let's adapt a normal text to image
workflow and add the Laura ourselves for
practice. Open workflow 1, the basic
text to image workflow. Now I want to
add the Laura between the model and the
K sampler. Double click on the canvas,
search for Laura, and add the node
called Laura loader model only. Let me
resize it so the text is easier to see.
I also like to color these nodes blue so
I can spot them faster in big workflows,
but that is optional. Now, we need the
model connection to go through this
node. If you look now, the model is
connected directly to the K sampler, but
we want the extra knowledge from the
Laura. Drag a connection from the model
output to the Laura loader and then from
the Laura loader to the K sampler. The
workflow is now complete. Let's set the
seed to fixed so we can clearly see how
different settings affect the result. It
runs without errors, so everything is
connected correctly. Even if Allora
sometimes works without a trigger word,
it is best to include it when one is
provided. So let's add the trigger word
cake style to the positive prompt. Now
when I run it, we get a different result
even though the seed is fixed. That
shows the Laura is doing its job. If I
change the seed, we get another
variation. To avoid forgetting trigger
words, I like to add a note node. I
write the trigger word there. change the
note title so it is clear what it is for
and often change the color to match the
Laura nodes so I know they are related.
One important thing I have not mentioned
yet is that you can use multiple loris.
If I want I can clone this node by
holding alt and dragging then connect
them one after another. You can stack
several luras this way. I personally do
not use more than three or four at once.
In this setup, the base model is
combined with the first Laura, then the
second Laura, and all that information
goes into the K sampler. In the prompt,
you add the trigger words for all the
Loras you use. If I run this now, some
strange things can happen. First, I use
the same Laura twice, which makes its
effect too strong. Second, when using
multiple lores, it is usually a good
idea to reduce their strength so they
blend better instead of overpowering the
image. If I lower the strength values,
the result becomes much more stable and
usable. If your result looks too weird,
one of the first things to try is
reducing the Laura strength. Let me
delete the extra Laura and keep only
one, then set its strength to one. Each
Laura has a strength value. This
controls how strongly the Laura affects
the model. Low values give subtle
influence. High values give strong
influence. If the value is too high,
images can break, faces can deform, and
styles can become unstable. A good
starting range is usually between 0.6
and 1.0. There is no universal best
value. Each Laura behaves differently.
Let's delete this Laura loader node. And
let me show you another node you can
use, this time from a custom node.
Search for power Laura loader. This one
comes from the RG3 node pack. What is
different compared to the previous one
is that it has two inputs and two
outputs. Because of that, we need to
route both the model and the clip
through this node. First, connect the
model output to the power Laura loader
model input. Then connect the clip
output to the clip input on this node.
After that, the clip output from the
power lura loader goes to both the
positive and negative prompt nodes. Let
me select all these nodes and move them
a bit so you can see more clearly how
both the model and the clip go through
the Laura loader. Now we can add the
Laura we want directly inside this node.
We can also add multiple Lauras here.
You can see that I can add a second one
and even a third one. If I right click
on a Laura entry, I can remove it. I can
do the same for any of them. If we right
click on a Laura and choose show info,
we get more details. There is also a
button called fetch from civitai. Civet
AAI is a website that hosts models and
you will see it later. If the Laura is
public and available on Civai, this will
fetch useful information about it,
including examples and trigger words. We
also have toggle buttons here. We can
toggle all Loris on or off or toggle
them individually. Let me add another
Laura so you can see how that works.
This way we can load multiple loras but
enable only the ones we want at any
moment. After playing a bit with this
Laura, I found that a strength value
around 0.55 to 0.6 works better for this
specific one. I tried it with the
trigger word, then added an orange
golden fish, cute and adorable, and got
this result. It is not bad for such a
small model. For a second example, I
tried a marzipan cake shaped like a
woman and got this result.
For a third one, I tested a marzipan
castle. Again, this is just for
practice. Later, you will see better
models that can produce much higher
quality images with fewer errors. Loris
are lightweight. They do not increase
VRAMm usage very much. Common beginner
mistakes are using the wrong base model,
using strength values that are too high,
forgetting trigger words, and expecting
a Laura to fully replace a model. Loris
enhance models. They do not replace
them. Stacking many Loras can slow
things down slightly, but not
dramatically. For beginners, it is best
to start with one Lura at a time. The
base model is the photographer. The
Laura is a specialty training course
that photographer took. The photographer
still uses the same camera. They just
learned a new style.
Now that you understand diffusion,
prompts, imageto image, loris, and
workflows, we are ready to talk about
controlNet. ControlNet is one of the
most powerful features you can use in
Comfy UI. In simple terms, controlNet
lets you guide image generation using an
extra image, not just text. Instead of
saying what you want only with words,
you can also show the model what you
want. What control net is? Control net
is an additional neural network that
works alongside the main diffusion
model. It does not replace the model. It
does not replace the prompt. It adds
extra control. The base model still does
the image generation. The prompt still
guides the style and subject. Control
net adds structure and constraints. You
can think of it like this. The prompt
says what the image should look like.
The seed decides the starting noise. The
sampler and scheduler decide how noise
is removed. Control net tells the model
where things should go. Why control net
exists? Text prompts are powerful, but
they are also vague. If you say a person
standing, the model decides the pose. If
you say a city street, the model decides
the layout. If you say a face, the model
decides the proportions. Control net
exists for cases where you want more
control. For example, you want a
specific pose. You want a specific
composition. You want to follow a
sketch. You want to preserve the
structure of an input image. Control net
makes results more predictable and
repeatable. This is a simplified
explanation. In reality, control net
works by injecting additional
conditioning into the diffusion process
at every denoising step. But you do not
need to understand the math for learning
and practical use. This mental model is
enough. Control net guides structure
while diffusion fills in details. Let's
open comfy UI. Go to workflows and
select workflow number four. The one
that has control net in the name. The
workflow is similar to the text to image
workflow. It is still a textto image
workflow but it is guided by an image
using controln net. You can quickly tell
it is text to image because of the empty
latent image node. I highlighted in
yellow the nodes that we usually use for
control net. Let's go to the note node
to see what we need. The checkpoint
model was already downloaded earlier. We
also need to download some control net
models for custom nodes. We need this
specific custom node which comes with
the easy install version. But if you are
using a different Comfy UI version, you
need to install this node first. We have
a Cany model, another one called depth,
and another one called open pose. There
are more types available, but these are
the most popular and commonly used ones.
Let's download all three so we can test
them. First, download the Cany model.
Then, go to your Comfy UI folder. Open
the models folder and think about where
this model should go. If you guessed the
control net folder, you are right. We
place it there because different base
models can have different control net
models. Just like with Loris, a control
net is only compatible with the base
model it was trained for. So, let's
organize them properly and create a
folder so we know which base model these
control nets are compatible with. Save
the model in that folder. Next, download
the depth model and save it in the same
folder.
Then download the open pose model and
save it in the same folder as well. Wait
for all downloads to finish. If Comfy UI
was open while downloading, we need to
refresh it. Press the R key to refresh
the node definitions. Now we have
everything we need to run this workflow.
In the workflow, we have an apply
control net node with some settings. We
also have a node that loads the control
net model we downloaded and a
pre-processor node that converts the
input image into a format that control
net understands and was trained on.
Let's run the workflow. In this example,
we are using the canny model. We loaded
a bunny sketch and with the help of the
pre-processor, it generates a canny map
which is an image that detects the edges
of the input image. With the prompt, we
influence what we want to generate. And
with apply control net, the model
interprets that canny map and uses it to
guide the generation to get this image.
You can imagine that without control
net, it would be very hard to get
something this complex using only a
prompt, especially with the small model
we are using today. Now, let's build
this workflow ourselves so you
understand it better. Open again the
first workflow that you already know how
to build and we will adapt it to use
controlNet.
We know control net comes before the K
sampler. So let's move some nodes to
make room for it. Double click on the
canvas and search for apply control net.
Add the node and change its color to
yellow so it is easy to recognize. Now
let's connect the parts that are obvious
first. Positive goes to positive,
negative goes to negative. For the
outputs, there is only one place where
they make sense. So we connect those as
well. At this point, the node still has
missing inputs. One of them is the VAE.
We already know where the VAE comes
from. In our case, it is included in the
checkpoint model. The same VAE we
already used for encode and decode. The
next missing input is the control net
model itself. Double click on the
canvas, search for load control net,
add the node, and color it yellow as
well.
Now connect it to the apply control net
node.
The last missing input is the image. Add
a load image node.
Then we can select an image. In this
case, I will use a bunny sketch. You
might be tempted to connect this image
directly to control net. But that
usually does not work. Control net
expects a very specific type of image
because it was trained on that type of
data. Our sketch is just a normal image.
So, we need a pre-processor to convert
it into something ControlNet
understands. Double click on the canvas
and search for AIO, which stands for
all-in-one. Add the prep-processor node
and color it yellow. Connect the load
image node to the prep-processor. Then
connect the prep-processor to the apply
control net node. Right now, the
prep-processor is set to none, so we
need to choose one. Since we plan to use
a Cany control net model, select a Canny
edge prep-processor.
To better understand what is happening,
add a preview image node after the
pre-processor. This allows us to see the
control image that is actually being
sent to control net. When we run the
workflow, we get a cany map, white edges
on a black background. This shows
exactly what ControlNet will use as
guidance. You can also adjust the
resolution here if you want more detail
in the map. Now, let's look at the
result. It does not look very good yet.
This happens often when working with
controlNet. And there are a few things
to check. First, look at the prompt. We
are still using a robot prompt, but the
image is a bunny. So, let's change the
prompt to something like a watercolor
painting of a bunny. Next, reduce the
control net strength and the end%
slightly. Run again. The result is a bit
better, but it still does not follow the
sketch very well. If changing the seed
does not help, the next thing to check
is the control net model itself. If you
look at the load control net node, you
may notice that the selected model is
not a canny model, but a depth model.
That is the problem. The pre-processor
and the control net model must match.
Select the correct CANY control net
model. Now run the workflow again. The
result is much better and follows the
sketch closely. Let's try another
example. Load a 3D text image. Since
this image has depth information, we can
try a depth control net instead. Change
the control net model to depth and
update the prompt to something like
golden text in snow. When you run it,
you may notice the preview still looks
like a cany map. That means we forgot to
change the pre-processor. Switch the
pre-processor to a depth prep-processor.
The first time you run a new
prep-processor, Comfy UI may take longer
because it downloads a small model
automatically. This only happens once.
If you get a long path error on Windows,
close Comfy UI and run the long path
enabler from the tools folder. Now we
see a depth map. Dark areas represent
parts that are farther away. Lighter
areas represent parts that are closer.
ControlNet uses this information to
understand spatial structure. The
generated result now follows the depth
and composition of the original image
very closely. If for some reason you get
an error saying the model is incomplete
or something similar, you can close
Comfy UI, go to the tools folder and run
the batch file called long path enabler.
This should fix the long path issue and
allow Comfy UI to download the model it
needs even when the file path is longer.
You can also try the same image with a
Cany control net. Switch both the model
and the pre-processor back to cany and
run again. Even if some edges are
missing, it can still guide the
generation. Well, try switching back to
depth and compare results. Often, one
will work better than the other
depending on the image. Now, let's talk
about the key control net parameters.
Control net does not replace diffusion.
It only guides it during certain parts
of dnoising. Strength controls how
strongly control net influences the
image. Low values make control net a
soft suggestion and the model can drift
away. High values strongly enforce
structure and make the output closely
follow the control image. Typical values
are between 0.5 and 0.7 for natural
results and 0.8 to 1 for strict
structure matching. Start percent
controls when control net begins
influencing the dnoising process. A
value of zero means control net starts
from the very first step, locking
structure early. Higher values allow the
model to form rough shapes first before
control net takes over. End% controls
when control net stops influencing
dnoising. A value of one means control
net stays active until the end, locking
structure even in fine details. Lower
values allow control net to stop
earlier, letting the model finish on its
own and add more style. In simple terms,
strength is how hard control net pulls.
Start percent is when it starts pulling
and end percent is when it lets go. That
is why control net is so powerful. You
can guide structure without killing
creativity.
It is time to test the pose control net
as well. But first, let us add a pose
image as reference. Let us say I add
this woman. Again, it is kind of hard to
put into words the exact same pose. So
this is a good use case for control net.
Let us change the prompt to something
else. Maybe woman in a sumo yoga pose.
Not sure how to call it. Then do not
forget to change the model to open pose.
I know it might seem complex at first,
but newer models have union control net
models that include everything in one.
So you only load one model, which makes
things easier. That said, this is how we
used to do it. And there are still cases
where we need specific models. Then we
can try either DW pose or open pose.
Type pose and select this open pose. And
then let us run it again. It will take a
bit since this is the first time I use
it. In fact, if you look at the command
window, you can see it is downloading
that model from hugging face. That is
why it takes so long. After it finishes,
it gives this pose image that looks like
a skeleton with each color representing
a bone. That is how it knows which side
is right and left and so on. So it
captured the pose and now let us see the
result. Holy sumo, what is this? Okay,
let us adjust the prompt. Maybe a fit
woman will help. That did not help much.
I bet the word sumo has too much weight.
Like we talked about before, some words
have more power than others. So if I try
without that word, I get a better
result. Even the face is not so great.
And that can happen with people in the
distance. Usually with portraits, we get
better faces. Newer models fixed most of
that. Let me try to change the
resolution to see if something changes.
Now I get a better resolution for the
skeleton. And the pose is okay. Just the
face. That face does not invite me to do
yoga. Okay. Let us try a different pose.
Something for a portrait. Let us say I
use this portrait photo. Let us change
the prompt to a businesswoman and run it
to see what we get. The pose looks okay.
Even if it is missing an arm, it should
still work. The results are much better
now that the face is closer. Let us try
a warrior woman as well.
That works well too. So with control
net, you have to continuously search for
balance. Make sure you select the right
control net model for the job. Then
choose a pre-processor that matches the
model. As you saw, it is easy to forget
to change something. I usually play with
strength and endstep. Also, do not
forget control net models made for SD1.5
only work with SD 1.5 base models. If
you use SDXL, you need SDXL control net
models. In a later episode, we will
check some advanced models that do not
even need controlNet and can do
everything from prompts. Beginner
mistakes to avoid. Using the wrong
control net model for the base model.
Forgetting to install or download
controlNet models. Using very high
strength values. Expecting control net
to fix bad prompts. Using control net
when it is not needed. Control net is a
tool not a magic fix. When you start
using comfy UI you will notice there are
many different model types. AIO models
FP16,
FP8, GGUF, and others. This can be
confusing at first, but the reason is
actually simple. At their core, all
diffusion models are just very large
collections of numbers. Those numbers
represent what the model has learned.
The knowledge itself does not change,
but the way those numbers are stored can
change. Different model formats exist to
balance memory usage, speed, and
hardware compatibility. Some formats are
larger but more precise. Others are
smaller and faster, but slightly less
accurate. FP32 is the highest precision
and is mostly used for training. It uses
a lot of memory and is rarely used for
image generation. FP16 is the most
common format for stable diffusion. It
offers a very good balance between image
quality and VRAMm usage. This is the
safest and most recommended choice for
most users. FP8 uses even less memory
and can be faster on newer GPUs that
support it. The trade-off is that it can
sometimes reduce image stability or
detail slightly. AIO models stand for
all-in-one. They bundle the main model
VAE and sometimes clip into a single
file. They are designed to be easy to
use and reduce setup mistakes. The
downside is that they give you less
flexibility if you want to swap
components later. GGUF models come from
the language model world. GGUF stands
for GPT generated unified format. They
are optimized for very low memory usage
and can run on CPU or low VR RAM
systems. It is important to understand
that these formats do not make the model
smarter or more creative. They do not
change what the model knows. They only
change how efficiently that knowledge is
stored and processed. You can think of
it like the same video saved in
different resolutions. The content is
the same, but the file size and playback
requirements are different. For most
users, FP16 models are the best starting
point. AIO models are great for
beginners. FP8 is useful if your GPU
supports it. GGUF is best when memory is
very limited. Once you understand this,
choosing models becomes much easier. You
saw in the workflows that I include
links to models, but you might wonder
where I find those models, right? One of
the sites is hugging face, but it is not
the most beginnerfriendly one. At the
top, we have a models tab. And here you
can find a lot of models, but not all of
them are diffusion models or used for
generating images. Some are for video,
some for audio, some for large language
models, and many are not compatible with
Comfy UI. Some require different
interfaces to run or they are so large
that you cannot even run them on your
computer. For example, I can sort them
by text to image. And here you can see
some popular ones like Quen, Zimage, or
even the Flux model. If I click on one
of these, you will see that some models
require you to sign in and accept
certain terms. Each model has a license.
Some are open- source, some are free
with conditions, and others are
available only in certain countries. By
default, you are on the model card. This
is basically an info page about the
model. On another tab, you have files
and versions where the files are usually
available in different formats like in
this example. And there are a few more
files inside those folders. Let us go
back to the homepage. Here you can also
search for a model if you know the name
or browse popular ones like Z image.
Always check the tabs at the top to find
more information about the models since
they can be quite large and you want to
make sure you can actually run them on
your system. I know this is a lot of
information, but as I said before, I
usually include the model link directly
in the workflow, so you do not have to
stress about it. Still, it is good to
understand how models work and where
they come from. Another site that is
more beginnerfriendly and better
organized is the Civot AI website. The
downside is that recently they removed
access for some countries like the UK.
If you are from one of those countries,
you will need a VPN to access it and
download models. If you click on the
models tab, you can find all kinds of
models for different interfaces like
Comfy UI, Forge UI, and others. Most of
them are compatible with Comfy UI. On
the right side, you have filters. These
let you sort models by when they were
added. You can also filter by model
type. For example, checkpoints are the
main AI models. You can also filter by
Laura or Control Net since we talked
about those model types in previous
chapters. Of course, you can also filter
by base model so you know what is
compatible with your workflow. The first
workflow we used was based on an SD 1.5
model, but I can also sort by other ones
like the Fluxdev model or an older one
like SDXL. By the way, SDXL is newer
than SD 1.5 and Flux is even newer than
SDXL. So, use these buttons to sort
models. If you already know the name,
you can just search for it. For example,
I can search for Juggernaut. Here you
can see multiple versions of that model
based on different base models like SDXL
or SD1. If I click on SD1.5,
I will only see those versions. If I
click on the one that says Juggernaut,
it opens the model info page. At the
top, you can see different versions. We
used the Reborn version, but you can try
other versions as well. Below that, you
have details about the model. It clearly
says what type it is. It can be a
checkpoint allura or a checkpoint merge.
In this case, it is a checkpoint merge,
which means the main SD 1.5 model was
mixed with other SD 1.5 models to
combine the best parts of each one. It
also clearly states that this is a base
SD 1.5 model. You can see the publish
date as well, which shows that it is
quite old. At the top you have the
download button and the file will go
into the correct folder. In this case,
it goes into the checkpoints folder. As
I mentioned before, the author sometimes
includes recommended settings. You can
see them here. This is how I knew what
settings to use in the K sampler for the
workflow. At the top, you also have a
gallery with images generated using that
model. This helps you understand what
the model is capable of. Some images
also have an info button that shows the
prompt and settings used to generate
that image. So, explore Civid AI if you
have access to it and see what models
and loris are available. Once you are
signed in, you also get more options to
control what type of models are visible
since some are disabled by default. So,
now that we played a little with that
old SD 1.5 juggernaut model, it is time
to try a better, newer model to see how
far AI has come in just 2 years. Let us
go to workflows again and this time open
the workflow named 5A, the one for Z
image turbo with the all-in-one model.
The workflow is quite similar to the
others we tried. We just have two extra
nodes this time. One of them is this
conditioning node that we use instead of
the one for the negative prompt. And the
other one is this model sampling node.
Since we are using a new model, we need
to download it because we do not have it
yet. The model is called Zimage Turbo.
Juggernaut and Zimage Turbo are very
different types of models built with
different goals in mind. Juggernaut is
based on stable diffusion 1.5.
It uses the classic diffusion
architecture that has been used for
years. The model file itself is
relatively small, usually around 2 GB.
Juggernaut was created by the community
by fine-tuning and merging stable
diffusion models. Z image Turbo is a
newer type of model created by the Tongi
team from Alibaba. It uses a more modern
architecture designed to generate images
more efficiently. Even though Zimage
Turbo is much larger in file size, it is
optimized to produce good results in
very few steps. One important difference
is how the models understand prompts.
Juggernaut relies on the classic clip
text encoder. It understands prompts if
are short, but it often requires careful
wording, sometimes key words like
prompts. Zimage Turbo uses a more
advanced text understanding system
inspired by large language models. This
allows it to understand prompts in a
more semantic and natural way. Because
of this, Zimage Turbo can often follow
instructions better, even with shorter
or more loosely written prompts. So, in
simple terms, Juggernaut is smaller,
very flexible, and highly compatible.
Zimage Turbo is a larger, newer model,
and smarter at understanding what you
ask for. So, we have here an all-in-one
model. And there are two types, a
smaller one, FP8, and a bigger one,
BF-16. It depends on your graphic card.
If you can run the big one, use that
one. For this first episode, I want to
run it on a low V RAM card, so I will
use the FP8 small version. All-in-one
means it has everything it needs
included. The clip and VAE model in this
case, so we do not need to download
those models separately. That is why it
is easy to use for beginners. The models
go into the checkpoints folder and there
we can create a special folder for the
Zimage model. Also, if you want to learn
more about the model, I included an info
link here. So click on it. Now we are on
the hugging face page and you can learn
more about this specific version from
workflows to different model versions.
If we go to files, we can see different
model versions that you can try
depending on how good your graphic card
is. So let us test the small version.
Click here. Then go to comfy UI. Go to
models
then checkpoints and create a folder
called Z image. So everything is more
organized inside this folder. Place the
model. Since this is a big model, you
need to wait for it to finish
downloading. Because Comfy UI was open,
you can see that it does not appear in
the list yet, only Juggernaut. So I
press the R key to refresh node
definitions. And now we can see both
models nicely organized in folders.
First is Juggernaut and second is
Zimage. So let us select the Z image
model. That is all for the model
download. And now we can run the
workflow. The first time you run a
workflow, it is slower because it needs
to load the model. The second time you
run it, it should be faster. For me, it
took about 10 seconds because I have a
lot of ROM and VROM. The result looks
pretty good compared to the robots we
used to get with the SD 1.5 model. We
have much nicer details. For the image
size, I used a smaller size so it runs
faster. This model was trained with
bigger images, not like SD 1.5. So, we
can even use larger sizes like 1,600
pixels if we want. Even if you go bigger
than the size it was trained on, it does
not produce many errors like SD 1.5 did.
It just becomes a bit more diffused.
Usually, for most newer models, a good
place to start is around 1,024 pixels.
So let us say I try a landscape image
this time using these sizes. The result
looks pretty good. I like it.
Let us go back to workflows again and
open the first workflow to see what is
different and how we can recreate the Z
image turbo workflow. So we already have
the right node to load the model in this
case. So I just select the model from
the list for empty latent. This one is
used more for older models with a
different architecture. Many newer
models use a different empty latent
node. If we look at the nodes and search
for empty, we have one empty latent and
one empty SD3 latent. In this case, we
want the one with SD3. On the surface,
they look identical. It is just a
different latent representation
internally. If we make it purple, it
looks like the previous one. If you do
not have enough VRAM to run this, you
can use sizes like 768 for width and
height. I will use 1,024 pixels since it
is the most popular size and my system
can handle it. So let us delete the old
empty latent. And now reconnect the new
node. This model does not use a negative
prompt, only a positive one. So I will
remove the negative text. You can also
collapse it if you want. That way you
know not to add a negative prompt. Then
we have the settings which as you
remember are different from model to
model. If we look here, we only have
five steps. So, it can generate with
fewer steps and the CFG is one. Let us
change the steps to five and the CFG to
one. When the CFG is one, it ignores the
negative prompt. We also need a sampler
and auler.
So, let us add the DPM++ SDE sampler.
And for theuler, we use beta. Let us see
what else is missing. We have this extra
node called model sampling oraflow. It
has a long name. Not sure why it cannot
be something simpler, but anyway, let us
search for that node.
We change the shift to three and we make
the connection go through that node just
like we did with Laura. The model
sampling aura flow node is a special
node that modifies the model sampling
behavior before it goes into the K
sampler. It is designed to work with
models that use the Auraflow sampling
method, which is an advanced sampling
technique used by some modern models for
better stability and quality. What this
node does is apply a sampling adjustment
or patch to the model itself. So, the
sampler works in the best way for that
model. The node takes the current model
and a shift value as inputs and outputs
a modified version of the model with the
Auraflow sampling logic applied. The
shift parameter controls how strong that
sampling adjustment is. Changing the
shift value can subtly affect contrast,
sharpness, and how the generation
behaves internally. So, we changed the
empty latent to the SD3 version. We
added a node to shift the model values
and we adjusted the settings to work
better with the Z image model that we
loaded in our workflow. Let us run it
and see if it works. As you can see, it
works just fine and we get a nice robot.
If we look at the previous Z image
workflow, you can see that it does not
use a negative prompt, but instead it
has a conditioning zero out node. So let
us go back to our workflow and search
for that conditioning node. As I
mentioned before, this model does not
use a negative prompt. So you might
wonder why we do not just delete the
node. We could do that, but then we
would have a missing input and the
workflow would throw an error. To fix
this, we use the conditioning zero out
node. You can make space for it and
place it between nodes if you want. This
conditioning does not come from clip
like the negative prompt did before. We
connect it directly to the negative
input on the K sampler. You can place it
wherever you want to make the
connections clearer, but I like to put
it under the positive conditioning to
save space. The conditioning zero out
node does exactly what its name
suggests. It removes the influence of a
conditioning input without breaking the
workflow. In simple terms, it takes a
conditioning signal, usually text
conditioning, and replaces it with a
neutral zeroed version. So the model
still runs normally, but that
conditioning contributes nothing to the
generation. Why this exists and when it
is used? In diffusion models, the
sampler always expects both positive and
negative conditioning inputs. If you
want to remove or disable one side, you
cannot just unplug it. That would break
the workflow. conditioning zero out is a
safe way to say use conditioning but
make it have no effect. So if we run the
workflow everything works fine without
any errors. Now the good part about Z
image turbo is that it is very good at
realistic images but it is also very
good at understanding prompts. For
example, if I want to create a portrait
of a cat with a hat, I can easily get an
image like this. But you can also create
more complex prompts by using a large
language model. Maybe you use chat GPT,
Gemini, or even a local LLM. I will use
chat GPT for this example. I ask it for
a detailed photo prompt and give it the
details of what I want. Chat GPT then
gives me a long detailed prompt that I
can copy and paste directly into Comfy
UI. So, let us test it again. Now, we
get a different cat, but it is still a
bit too simple. Let us make it more
complex. I go back to chat GPT and ask
for the cat to hold a rose in her mouth
and wear a t-shirt that says Pixa.
Again, we get a long detailed prompt.
And from that prompt, we get this image.
Sometimes the model can take things very
literally. So, you need to explain
details clearly if you want more
control. For example, you might need to
mention that you want a full rose held
horizontally in the mouth and not
something else. Let us create something
different now. This time a cartoon bunny
since this series is full of bunnies.
Anyway again we get a nice prompt and
the result looks like this. It is pretty
cute. Maybe now I want the bunny to be a
ninja. Let us see what this prompt
generates. And we get our ninja bunny.
If we generate again we get another one.
As you can see compared to older models
the results with different seeds are
quite similar. You do not get a huge
variation from seed to seed. That is why
I recommend using longer prompts and
adjusting each prompt carefully. This
model is very good at following
instructions. So the more precise you
are, the more control you get over the
result. Let us open the first workflow
again so we can compare it with workflow
5A. Now let us say I use the same long
prompt and the same fixed seed for both
workflows. If I generate with Z image, I
get a robot like this one which looks
nice and detailed. Now if we try the old
juggernaut model using the same prompt
and the same fixed seed, the result
looks like this. It is smaller and much
less detailed. Let us copy this image
and paste it into this workflow. So you
can clearly see the difference in
quality and also how well the image
follows the prompt. But maybe this
single test is not enough to fully see
the difference. So let us try something
else. Let us test text generation. Newer
models can generate readable text, but
older models usually cannot. We normally
put the text we want inside quotes. So,
let us test that. Look at this result.
It looks very good. And it understood
the assignment.
Now, let us go back to the juggernaut
model and use the same prompt. We get
something like this. What is this? What
does it even say? Gold gola or something
like that. It clearly cannot do text.
Let us go back to Z image and try
another test. A red sphere on top of a
green cube placed on a black car.
We get this realistic result. Z image is
more specialized in realism, but it can
also do 3D paintings and other styles.
Now let us see what Juggernaut does with
the same prompt. It gets the red sphere
since that was mentioned first and then
it gets lost and forgets what it needs
to do next. So clearly Z image is a very
good model to have and you will probably
spend more time playing with this model.
Still keep an eye on new models because
they keep getting smarter and better as
they get more training. You have now
seen how an all-in-one model works and
how we load checkpoints. In the next
chapter, we will use models that are
split where clip and VAE are loaded
separately so we can have more control.
Let us talk a little bit more about
diffusion models. Open Comfy UI and then
open workflow 5A and also workflow 5B so
we can compare them.
In the first workflow, Z image is loaded
as an AIO model. AIO means all-in-one.
You can see that we used a load
checkpoint node to load that model. The
checkpoint already contains the
diffusion model, the text encoder, the
VAE. Everything is bundled into a single
file. Advantages:
Very easy to use, fewer nodes, less
setup required. Good for quick testing
and simple workflows. Disadvantages:
Less flexible. You cannot swap the text
encoder. You cannot change the VAE.
Harder to customize or optimize. This
format is designed for simplicity and
convenience. Now, let us check the
second workflow, the 5B version. You can
see that we have three nodes now instead
of one. We have the load diffusion model
node that loads the main model. Then we
have the clip load node that loads the
text encoder. And then we have the load
VAE node that loads the VAE. So it is
like we split the previous checkpoint
into separate pieces. And now we have
more flexibility. Even though the final
result is still Z image turbo, the
pipeline is modular. Advantages more
control. You can change the text encoder
and experiment with different VAE.
Better for advanced workflows and
optimization and easier to update
individual components. Disadvantages,
more complex setup, more nodes, higher
chance of misconfiguration if you do not
fully understand what each part does.
However, this is actually one of my
favorite workflows. The reason is
flexibility and efficiency. With a
modular setup like this, you save disk
space. For example, this VAE is the same
VAE used by the Flux model. So, if I
already use Flux, I do not need to
download the VAE again. With an
all-in-one model, every new version
means downloading the entire model
again, even if only one part changed. In
a modular setup, I can update or swap
individual components. I can test
different text encoders without
downloading the main diffusion model
again. So while modular workflows
require more understanding, they are
more efficient, more flexible, and
better for experimentation.
That is why I personally prefer this
approach. But we did not download these
models yet. I suggest that when you
follow this tutorial, you test
everything to see what is better or
faster on your computer and then keep
only the ones you like. There is no
point in keeping all types of models if
they do the same thing unless you have a
lot of space on your hard disk. So let
us start with the main diffusion model.
This long name is actually describing
how the model is built and optimized. Z
image turbo. This is the model family
and architecture. FP8. This means the
model uses 8 bits floatingoint
precision. FP8 models use much less
memory than FP16.
I did not include a link in this
tutorial for the FP16 version, but you
can find those online if you have more
VRAM and want to try them. Scaled refers
to the FP8 format being calibrated for
better precision. This improves quality
and stability compared to a raw unscaled
FP8 format. You can think of it as FP8
with tuning for better accuracy. E5M2.
This is the specific FP8 encoding
variant used. KJ. It is usually a
variant tag or builder ID added by the
person or team that exported or
repackaged the model. It does not change
the model itself. It just helps
distinguish between different builds.
Safe tensors. This is the file format.
Safe tensors is a safe and efficient
format and is recommended over older
formats like CKPT for better stability
and speed. We can download this model
from here. And I also added more info
about the model so you can check
different versions. You can also see the
author. So now you know what that KJ in
the model name stands for. So let us
click here and see where we place it.
Navigate to the comfy UI models folder.
You should already know this by now.
This time we do not use the checkpoints
folder because that is usually for
complete models that already include
most of what they need. Instead we place
this one in the diffusion models folder.
To keep things organized, we create a
folder called Zimage and place the model
inside. Next, we have the text encoder.
I used one recommended by ASD from the
Discord server, but there are other text
encoders you can try made by different
people. For this one, we again go to the
models folder and this time we place it
in the text encoders folder. Here I do
not create a Zimage subfolder because
many text encoders work with multiple
models. I usually create subfolders only
for main models Laura and controlN net
when it is important for the workflow
that they match the same base model.
Then we have the VAE. This is the same
VAE that we might also use later for the
flux model. So again we go to the models
folder and this time we place it in the
VAE folder. Some of these models are
large so wait for them to finish
downloading. Once everything is done,
press the R key to refresh the node
definitions. Now let us check that all
models are visible and selected
correctly. The Z image diffusion model
is here. The clip text encoder is here
and the VAE is also here. That means we
have everything we need to run this
workflow. So let us click run and see if
we get any errors. Everything works fine
and we get this image. What I usually do
next is compare the results with the
first workflow. When I have multiple
models available, I download all of
them, test them, keep the ones I like
the most, and delete the rest. When I do
testing, it can get confusing which
model generated which image. So, here's
a small trick. Double click on the
canvas, search for it tools ad, and
select the node called IT tools add text
overlay. This node comes with easy
installer, but if you have a different
Comfy UI version, you can install the I
tools nodes from the manager. We add
this node right after VAED code and
before the save image node. This way,
the final image goes through this node.
The text overlay is added and then the
image is saved to disk. For example, we
can add the model type in the text
overlay. You can also add more text like
the model name or other info. Let us say
I add FP8 scale diffusion. So I know
this image comes from this workflow. Now
when I run it, text will be added on top
of the image. We can control the text,
the background color, the font size, and
whether the text overlays the image or
is placed under it. Let us disable
overlay mode and try again. Now the text
is under the image. This way we know
exactly which model generated it. Next,
I select this node and press Ctrl + C to
copy it. Then I go to the first workflow
and press Ctrl +V to paste it. Now we
connect the node the same way as before.
We need a name that represents this
model. So let us name this one FP8
all-in-one.
Now I can test it and you can see the
text under the image. To make a fair
comparison, we use the same settings for
both workflows. Let us also enable the
bottom panel to see how much time it
takes to generate. As you remember, the
first time you run a model, it is slower
because it loads the model. We can
unload the models using this button and
clear the cache using this one. This
lets us compare which model loads faster
and which one generates faster. I run it
once and you can see the first run took
around 8 seconds. The second and third
runs are faster around 3.57 seconds. Now
let us go to the second workflow.
I unload the models and clear the cache.
Then run it again a few times.
This one loads slower, but the second
and third runs are faster. On my older
PC, the all-in-one model was faster, so
it really depends on your system. Test
it yourself and see what works best for
you. Now, let us look at quality. We use
a fixed seed with a value of 50 and run
the workflow. Then we do the same for
the first workflow, same fixed seed and
run it.
Right click on the image result and copy
the image. Then create a new workflow
where we compare the two images. I press
Ctrl +V to paste the image and you will
see it adds a load image node with that
pasted image. I do the same for the
image from the second workflow. Now we
have both results. Let us add an image
compar node so we can compare them.
Connect the first image to image A and
the second image to image B. Then run
the workflow. Now we can enlarge and
compare them. The results are quite
similar but still slightly different.
This happens because I used a text
encoder that is different from the one
included in the all-in-one workflow. If
I had used the same text encoder, the
results would have been much more
similar. Let us try again with a
different prompt. Maybe we do a portrait
photo of an old woman. We get a result
like this one. Now let us do the same
for the second workflow and we get
another woman for this one. Let us copy
both images and go to the compare
workflow. Select the load image node and
use Ctrl +V to paste the image into that
node. Now we can compare the two
results. Again, because the text encoder
is different, the comparison is a bit
harder. Still, I kind of like the FP8
scaled version more. You can see that we
use the same settings for both
workflows. One has everything included
and the other has everything separated.
If I searched for the same clip used in
the AIO workflow, I could get much
closer results. This load clip node is
something we will use in other workflows
as well. As you can see, it has a type
option that lets you select different
types of models to match the diffusion
model you loaded. Do not stress too much
if you do not understand everything yet.
It will make more sense as you practice.
If we look at the VAE, you can see where
it goes. As you remember, we use it to
connect to nodes like VAE decode and VAE
encode. If we go back to the first
workflow, that VAE is coming directly
from the one included with the main
model. Different model formats do not
change what an AI knows. They only
change how that knowledge is stored. So
choosing the right format is about
balancing quality, speed, memory, and
flexibility for your hardware and
workflow. GGUF stands for GPT generated
unified format and it is a model format
designed to run large models efficiently
on systems with limited memory. In Comfy
UI, let us go to workflows again and
this time open workflow 5B and 5C so we
can compare them. The workflow we saw in
the previous chapter had three nodes.
Load diffusion model, load clip, and
load VAE. And you can see that it was
loading safe tensors files. If we go to
the GGUF workflow, you can see that some
nodes are different. We now have a unit
loader that has GGUF in the name and the
file format is GGUF instead of safe
tensors. We use this node to load GGUF
type diffusion models. For the clip, we
could have used the previous node to
load an existing text encoder, but I
wanted to show that you can also use a
clip loader GGUF node to load text
encoders in GGUF format. The last node
is the same load VAE as before. So,
compared to the previous workflow, we
only changed two nodes so we can load
GGUF models, but we do not have those
models yet. So let us go to the notes
and check which node loads which model
and also look at the download links. If
we go here you can see there are many
GGUF model versions. Most of the time
you will see something with a Q version
in the name like Q2, Q4, Q6 or Q8. Most
of the time I use Q8 models. If that is
too big, I switch to Q6. And if that is
still too big, I use Q4. The lower the Q
number, the lower the quality of the
generation, but the models are smaller
and can be faster on limited hardware.
Let us look at what this model name
means. Zimage Turbo. This is the core
model family and variant name Q4. This
means the model is quantized to four
bits precision. Lower bit quantization
reduces file size and VRAMm usage. The K
indicates a specific quantization
method, usually a blockbased or KQ quant
method, which helps preserve model
accuracy even at low bit precision. The
S usually means small or standard
variant within that quantization type.
It trades a bit more quality for a
smaller footprint compared to M versions
which stand for medium. GGUF. This is
the file format. Let us download this
model and give it a try. We go to the
Comfy UI folder, then to the models
folder. This main model, just like in
the previous workflow, goes into the
diffusion models folder. Since we
already have the Z image folder from the
previous chapter, I will place it in the
same folder because it is the same base
model just a different quantization. So,
we save it there. Now, let us do the
same for the text encoder. We click here
to download it. Then again go to the
models folder, find the text encoders
folder and place the model there. For
the VAE, if you followed the previous
chapters, you should already have it. If
not, download it and place it in the V
folder. These models are big, so wait
for them to finish downloading. After
the download is finished, press the R
key to refresh node definitions so Comfy
UI can see the new models. Now we go
back to the nodes and make sure we can
select the models. The Z image model is
there. The text encoder is also there. I
am using the one with GGUF in the name
because if you use a safe tensors
version here, even if it does not give
an error, the results will not be what
you expect. For the VAE, we already have
it. So now we have everything we need.
Let us run the workflow. The result
looks pretty good for a Q4 version. Let
us open the bottom panel and run it
again. You can see that the first time
it loads the model, it is slower, but
after that it takes around 5 seconds to
generate. In my case, this was slower
than the all-in-one model or the FP8
scaled version. That does not mean it
will be the same on your system. On some
systems, it might be faster. That is why
I keep saying you should test everything
and then keep the best model for your
setup. What is best for me will not
necessarily be best for you because we
have different video cards, different
VRAM amounts and probably different
drivers. Now I am curious how a larger Q
version will perform. So let us go back
to the model list. This time I want to
test a bigger one. The biggest available
here is Q8 which is around 7 GB in size.
I have 24 GB of VRAM so I can easily fit
this model in memory and even larger
ones. Sometimes if a model is larger
than your available VRAM, it will be
slower because it tries to load the
model in parts. You lose time during
that process and generation can be slow
or it can even crash CompuI and force
you to restart it. So let us download
this one. We place it in the diffusion
models folder inside the Z image folder.
right next to the Q4 version. Again,
wait for it to finish downloading.
Luckily, I have a fast internet
connection. After that, press R to
refresh. So now in the unit loader, we
can see both models. By the way, UNET is
the main neural network inside a
diffusion model that predicts what noise
should be removed at each step to turn
random noise into an image. First let us
change the seed to fixed so we can
compare the models properly. I get this
image for Q4. I copy the image, create a
new workflow and paste it there. I will
rename the node to Q4 so I know which
model was used. Now let us go back to
the workflow and select the Q8 model.
Everything stays identical. Only the
model changes. Let us see what we get.
It looks similar at first glance. I copy
this image, go back to the new workflow,
and paste it there as well. I rename
this one to Q8.
At first glance, the Q8 version seems to
have fewer mistakes and looks clearer.
Let us add an image compar node to
compare them properly.
Connect the two load image nodes to the
image compar.
The first image shown is image A, which
is the Q4 version. As we move the cursor
to the right, we see the Q8 version. In
my opinion, Q8 has better details and
fewer errors. For example, some bolts
seem to be missing in the Q4 version,
while the Q8 version looks more
complete. In most cases, Q8 will be
better than Q6 and better than Q4 in
terms of quality. But now, let us check
the speed.
The first time Q8 took longer to load
because it is a 7 GBTE model. Let us
change the seed and try again. Now the
second run takes under 4 seconds. Let us
try once more. We change the seed again
and once more it takes under 4 seconds.
Now let us switch back to the Q4 model.
This one is lower quality but also
smaller. You can see that the first time
it loads faster. Let us change the seed
and try again. The second run takes more
than 5 seconds. Let us try one more
time. And again, it takes more than 5
seconds. This is why I keep saying you
should test all of them and then decide.
For me, Q8 is faster and gives better
quality than Q4, but that is because my
video card probably works better with
that quantization on your system.
Especially if you have an older card, it
might be the opposite and Q4 could be
faster. So please test them yourself and
then keep the one that gives you the
best quality and the best speed on your
system.
So let's go to workflow again. And now
it might make sense why I named workflow
5 all these three workflows because they
are workflows for the same model just
different model types. So let's open
workflow 5a and you will see how we can
adapt the workflow. Let's move this to
the side. So, this has an all-in-one
model with everything included. We want
to change it into a workflow where the
models are split. So, let's start with
the model. Instead of load checkpoint,
we search for load diffusion model. This
one only loads the model without clip
and VAE. And we select the Z image model
from the list. Then, we need a node that
has that clip output that loads the text
encoder. So, we search now for a node
called load clip. Let's make it bigger
so we can see the parameters.
We first select the text encoder. Then
we select the type. Z image uses
luminina 2. Lumina means light. You can
think of it like reaching the end of a
tunnel. Z is the last letter of the
alphabet and at the end you see the
light. Luminina 2 represents a newer,
clearer way for the model to understand
prompts and guide image generation. It
simply means more advanced guidance
compared to older models. Then what is
left is the VAE. So, we use the load VAE
node and we select that VAE model. So,
now all that's left to do is to redo the
connections. We drag a link from model
to model. Pretty easy, right? Now, we
need the clip. So, let's drag another
link. And all that's left is the VAE.
This one will connect to the VAE decode
node. And if the workflow is image to
image, it will go to VAE end code also.
Now, we can get rid of the load
checkpoint node. So now we successfully
replaced all the models and basically we
have the workflow version 5B that we
used before. So let's run it and it all
works okay as it should. Let's say the
model we use now is too big and our
video card doesn't have enough VRAM.
Then we can try a GGUF model to see if
it works faster or better. So let's
search for a node again and this time
search for the unit loader, the one that
has GGUF in the name. So in this node we
can select a GGUF model. You can see I
downloaded two versions before. So let's
say Q4 is smaller in size than the FP8
version in this case. So it has better
chances to run faster than a bigger
model. But as you saw before on my
computer Q8 was faster. So maybe I will
use that to get better quality instead.
Let's connect the model. And now we can
remove that node. So we replaced an FP8
safe tensors model with a GGUF version
and if we run this workflow you can see
it works just fine and we got a nice
result. If for some reason you are not
happy with the text encoder maybe it is
not so accurate or it is too big we can
try a GGUF version of the text encoder
also. So let's delete that node and
let's search for clip loader the gguf
version. You can see it has clip loader
in one word. So now we can select the
GGUF model. And of course we need to
adapt the type since it is not stable
diffusion. It is luminina 2 instead.
Remember that light at the end of the
tunnel and then link the clip to text
encode prompt. And basically now we have
the workflow 5C. So you saw that having
a modular version allows you to change
models and have more freedom just like
on your computer. If you're not happy
with a mouse or your printer, you can
change it with a smarter or faster
version. Now, if you do have enough
VRAMm, you can try to increase the size
for width and height to get more
details. For example, at this size, I
got this image and now we can see more
details on those cables and overall. But
usually for Z image, I use values
between 10, 24, and 1280 pixels. So, at
the moment of this recording, Zimage is
a pretty good model to have. It is free
and you can generate all kinds of stuff
with it. Let's compare a few of these
models to see what the difference is.
So, for this one, I compared the FP8
all-in-one version with the FP16
all-in-one version, which is double the
size. The results are quite similar.
Maybe the FP8 is a little more
desaturated compared to FP16 and FP16
might be a little bit clearer, but it is
not a huge difference. Both are good
quality. For the Viking image, the FP8
version has fewer details in some areas.
In FP16, it added some extra things like
more ornaments. Again, FP8 looks a
little more desaturated. For the bunny,
both look good. So, I would say if FP8
is faster, has half the size, and the
results are very close, you can get away
with FP8 and keep that. Now, let's
compare the FP8 version with the FP8
scaled version. Keep in mind that the
text encoder is also different in this
case compared to the one included in the
first model, but the results are still
quite similar. Sometimes FP8 does it
better, sometimes the scaled version
does it better. So if you do more tests,
you can decide which one is better for
you. Since the results are very close,
again, it makes sense to keep the one
that is faster on your system. Now,
let's compare the FP8 version with the
GGUF version. Instead of the Q4 version,
I will use the Q8 version downloaded
from here so we can see the difference.
For the portrait, it looks a little
clearer on the Q8 version. For the
Viking, some details are more defined on
the Q8 version. For the Bunny, it is
pretty similar for me. For other models
we will test in the future, the
difference might be bigger and more
obvious, but in this case, the
difference is quite subtle. So, which
one will I keep for my video card? Maybe
the FP8 scaled or the Q8 version, mainly
because they are modular and I can save
space and time when I use the same
models for other workflows. In this
chapter, we explore batch generation and
styles. So, let's open another workflow.
this time workflow 5A since it has fewer
nodes and you can see things better. But
the methods I show work with any
workflow. Right now, each time you press
the run button, the workflow runs once
and you get a single image. But what if
you want more images and you do not want
to click run every time? In this node,
we have an option for batch. By default,
it is set to one. You can change that,
but keep in mind it will use more VRAMm
because it is like running multiple
workflows at the same time. If your
video card can handle it, it will be
faster than generating one image at a
time. So now we get two images. If I
change it to four and run again, we get
four images. If we toggle the bottom
panel, we can see the time it took for
one image, for two images, and for four
images. If we multiply 3.77,
which is the time for one generation, by
four, we get over 15 seconds. But
because we used batch, it only took 13
seconds. So, you need to see what batch
size works best for your video card. I
might be able to use a bigger batch, but
you might need a smaller one. Now, from
these four images, we can click on any
of them to open it bigger. These images
are saved in the output folder as well.
To close the big preview, you can use
the X in the top right. Let's open
another one. You can also navigate using
the buttons in the bottom right corner,
so you can check all generations. The
bigger the image, the more VRAMm it will
need. Let's say I set the batch to
eight. Since the image size is quite
small, the result is eight images. You
can check the results and pick your
favorite. You can rightclick on an image
and save it in any folder you want. Now
let's change the batch back to one. So
we only get one image. Next to run, we
have an arrow that shows multiple
options. Here we also have batch count.
This is not the same as the batch we
used before. Think of this like a
counter where you tell it how many times
to run the workflow. So if I set the
value to four and hit run, it will run
once, then again, and again until it has
run four times. This is a bit slower
than the previous batch method, but it
uses less VRAM. If we add these values,
we get over 14 seconds. With the batch
and empty latent, we got 13 seconds. You
might say 1 second is not much, but if
you use bigger images and longer
workflows, seconds can quickly turn into
minutes. Let's change the batch back to
one. And let's explore more run options.
Run on change will run the workflow when
we change a value. So if I change the
seed, it will start running. It should
stop after the run. But I am not sure if
this is a bug or if this is how it is
supposed to work. Because the seed is
random, it keeps generating continuously
after the first change. But if the seed
is fixed, it only runs the workflow when
I make a change and then it stops. So I
will stop it manually by switching back
to run. If I change it to run instant,
it will generate forever until you stop
it. So do not forget to stop it by going
to the arrow and selecting run. After it
finishes that workflow, it will stop.
But what if we want to run multiple
prompts? Until now, we only had one
prompt and the seed was different. But
for the Zimage model, for example, the
seed variation is not that big compared
to other models like flux or stable
diffusion. So let's search for a node
called I tools line loader. This node
loads each line as a prompt. If we drag
a link from this line loader output and
connect it to the text encoder, a small
dot will appear in the top left corner
of that text input. By default, you have
three prompts here, cat, dog, and bunny.
Let's say the first prompt is a cat
photo. The second prompt is a bunny with
a flower. And the third prompt is a lion
logo. We have a seed here that decides
which prompt will generate. And we also
have control after generation. Randomize
means that after each generation the
seed will change to a different random
value. So let's run it. We got a cat and
now we have a different seed.
For this seed we got a bunny. Now let's
change the seed to fixed so we can
understand better how this works. For
the seed we put zero. In computer
programming lists usually start with
zero not with one. So instead of 1 2 3,
it is 0 1 and 2. So 0 corresponds to the
cat prompt. If I run the workflow, I get
a photo of a cat. If I change the seed
to one, it corresponds to the second
prompt, which is the bunny. So the
result is a bunny. And for seed 2, we
get a lion logo. Now we know the order
in which this node uses the prompts. Can
you guess what we will get for seed 3?
It will start over with the first
prompt. So the result will be a cat
photo. Let's add another prompt like a
rose and maybe a house with a car in
front. I will start with zero so it
starts with the first prompt. Then for
control after generate I will use
increment so it starts with the first
prompt and continues with the next one
and so on. This way it is more
controlled and not random. Now that we
have five prompts I can change the batch
to five so it runs the workflow five
times. You can see it will generate all
those images one prompt at a time in
order and it will stop after five
generations.
You can also put 10 if you want to get
two generations for each prompt or you
can let it run continuously and stop it
when you get something you like. We saw
how we can load prompts line by line,
but we can also load prompts from a text
file. Let's search for it tools prompt
loader. As the name says, this node
loads prompts from a file. Let's drag a
link from it to the positive prompt. You
can see here it says file path. We
already have an example with a
prompts.txt file. So let's run it. It
will pick a random prompt from that
file. And the result is this cat. Now
let's find that prompts text file. Let's
go to the Comfy UI folder. Then go to
custom nodes. And here look for the
Comfy UI tools folder. These are all the
files used by that node. Basically, the
note itself looks like this. If we go to
the examples folder, we have a text file
with prompts. Let's open it. You can see
we have a few prompts here. And the
image that was generated corresponds to
one of these prompts. You can delete
everything and add your own prompts here
one by one. Or you can ask chatgpt to
generate a bunch of prompts. So maybe I
will add one for a dog, maybe one for a
cat, and one for a rose. Now I can save
that file and close it.
Let's go back to Comfy UI and generate
again. It should pick prompts from the
same text file. Now we got that cat
playing with a mouse. Let me remove this
node and try again to see if we get
another prompt. And now we got a rose.
If the text file is in a different
location, you just add the path to that
file here so it knows where to load it
from. And of course, you can change to
run instant and let it run, then stop it
when you have enough images generated.
Let's delete this node and I will show
you more things you can do. Let's search
for a node called it tools prompt styler
and select this one. This node picks
prompts or art styles from a file. We
have positive and negative prompts. But
since our workflow only uses the
positive prompt, we drag a link from
there and connect it to the positive
prompt input. Here we have an area where
we can type our prompt. Let's say I type
a white bunny holding a rose. Then we
can select the style file. These files
that contain different prompts are
stored locally. Let's go to the custom
nodes folder again, then to eye tools,
and this time go to styles. You can see
here a few example style files, which
are actually YAML files. If we go to
more examples, we have even more. Now,
if we look back here at the file list,
we can see exactly those files. Among
them, there is one called Pixaroma. I
asked the creator to add my file there
so you can access it easily. Thanks,
Mikotti. Once you have the file
selected, you can choose a template from
that file. You can see here different
templates. For example, I can select a
3D icon or something else. What is
important to remember is that you select
the file first and then the template
inside it. For example, let's open one
of these files with Notepad so we can
see what is inside. Each template looks
like this.
You have the template title, the
negative prompt, and the positive
prompt. As you saw in our workflow, we
only use the positive prompt this time.
So, it will only pick that part. In the
positive prompt, you can see the word
prompt inside brackets. That is where it
takes your prompt and combines it with
the rest of the template prompt. So, if
I do not have anything selected here for
the template, it will use something like
landscape photography of the prompt. and
instead of the word prompt, it will
insert a white bunny holding a rose.
Basically, we recreated what these
styles do. This system saves you time by
letting you write a short prompt and
combine it with a ready-made prompt from
a template. This was created back in the
days for stable diffusion models when we
did not have access to AI prompt
generators. It still works today with
most models that recognize these prompts
even though you have much more freedom
using a custom prompt made with chat
GPT. So if the prompt is a white bunny
holding a rose and for the file I select
the Pixaroma file, then for the template
I can filter by landscape and select
photography landscape. Now when I run
it, it should combine my bunny prompt
with the landscape photography prompt.
And the result is this one. So
everything works quite nicely. Let's
change the template. Let's say I select
3D icon and run the workflow. And we get
this 3D icon of a bunny holding a rose.
Now let's try an ancient Egyptian mural.
We run it again and we get this mural.
And you can clearly see our bunny in the
image. Let's say I select the RCOO art
style. The result is this decorative
style illustration of the bunny. Now
let's open the Pixarroma styles file
with Notepad. You can see all the
templates and prompts for each style and
you can edit them if you want. Just keep
the same format. Otherwise, it will not
work. Let's say I want to use the
template for surreal toy. If I use it in
the workflow, it will take this prompt
and replace the word prompt with my
bunny holding a rose. So, let's test it.
From the templates, I search for surreal
and select that toy style. Now, let's
run the workflow. And we get this 3D
surreal bunny with a rose. Pretty cool.
Let's scroll down and see what else we
can use. Let's say afroofuturism art.
That means it will use that specific
prompt. Let's change the style and test
it. And the result is this one. Keep in
mind each model will interpret these
prompts differently depending on how it
was trained. There are over 300 styles
or prompts saved in this file from 3D to
art styles, painting, photography,
design, all kinds that I use most often.
Let's say I select the vector coloring
book page style. Now, when I run it, I
get this clean coloring page design. Of
course, if you want it to be more
unique, give more information in the
prompt, like how the bunny looks, how it
is dressed, how the environment looks,
and maybe make it fit your story. Let's
say I want to do a cartoon illustration.
Let's search the list to see if we have
something like that. For example, I can
select a soft 3D cartoon environment and
see what we get. And the result is this
one. Let's search for cute and test this
cute cyberpunk style. and we get this
illustration. These are good for
discovering art styles you might not
have thought to try yet. Now, let's
remove this node and search again. This
time, we look for it tools prompt styler
extra. It is called extra because it has
slots for multiple files and templates.
Let's connect it to the positive prompt.
For the base file, let's select the
Pixarroma file since that one has the
most styles. For the second file, I will
use the same one. Let's set both to
random. So we get a random combination
of two styles. If I run it now, I get
something like this. There is no bunny
because we added a new node and we did
not add a prompt yet. Let's drag a link
from the output called used templates.
This outputs the actual styles that were
used. Then search for preview and add a
preview as text node. Now when I run it,
you can see what styles it combined.
Reflection with fantasy. Now, let's add
the prompt, "A white bunny holding a
rose," and generate again. This time, it
combined propaganda art style with
knitting art, something you probably
would not think to combine. Let's select
a third file, again, the Pixar file. For
the third style, let's select random or
any other style you want. Now, if we
look again, it combined Japanese
traditional sticker and fine art. Let's
run it once more and we get this gilded
fantasy bunny. Pretty cool. Let's try
again. And this time we get a cute
minimal line art style. Of course, you
can also manually select which styles to
combine. For example, let's choose a
game asset style combined with low poly.
And for the third one, select Adam Punk.
And the result is this one. You can also
run it multiple times to get different
seeds. By now, you should start to get
an idea of how styles work. Let's try
one last combination. Change low poly to
a steampunk style. We get this image
because we used a game asset style. If I
change the game asset to cute cartoon, I
get this cute bunny in a steampunk
environment. So, create your own styles
for the things you use most often. or
use chat GPT or other large language
models to generate longer prompts that
describe exactly what you need. In the
previous chapter, we saw how we can use
different prompts to change the style of
the image. But if we want a style that
the model did not learn, we cannot
generate that style. For that, we have
the Laura files, which add extra
information to the main model. We talked
more about this in episode 13. Let's go
to workflows, and this time, let's
select workflow number six. the one with
Laura in the name. This is a simple Z
image text to image workflow. In fact,
if we remove these nodes, we get exactly
workflow 5A that we used before. So,
let's undo that. What is different here
is this Laura loader model only node
which allows us to load a Laura from our
computer. I just changed the color to
blue. That is all. The node with trigger
words is just a simple note. Again,
revisit chapter 13 for more details. So,
let's go and download Aurora. I created
a Laura for a girl with white hair, and
you can download it from here. After
that, navigate to Comfy UI, go to
models, and then open the Laura's
folder. Here, we already have one from
chapter 13, the SD1.5 Laura. Now, we
create a new folder called Zimage. Since
this Laura only works with the Zimage
model, and we save the Laura inside that
folder. After the Laura is downloaded,
press the R key to refresh the node
definitions.
Now, if we go to the Laura loader, we
can select that Laura. You can see the
folder and the Laura name there. Just
like for all other Loris, this is the
trigger word that I used when I trained
that Laura. I use that in the prompt
together with more words to describe
what I want to generate. Now, when I
generate, I get this girl with white
hair. The Laura I am using here is a
character Laura. There are also loris
for styles, objects, or functional ones
that speed things up. This also allows
you to keep a character consistent. So
even if you change the prompt and keep
the trigger words, you get the same
character, which is very useful. There
are many Lauras trained by people online
on sites like Hugging Face or Civot AI.
Over time, you can also learn how to
train them yourself, either online or
locally, if you have enough VRAM. Let's
search for Allora on the Civit AI
website. Again, if you are from the UK,
you will need a VPN to bypass the
restrictions they set for your country.
Let's go to models. Then we can filter
them. Set time period to all. For model
type, select Laura. For base model,
select Z image turbo. Now we should see
only Loris compatible with our base
model. We can sort by highest rated. We
have quite a few here. Let's pick one at
random. Maybe this one that lets us
create character design sheets. Now we
are on the Laura page. At the top you
can see this Laura is available for
different models, but we want the Z
image version. We check the type to make
sure it is a Laura and that the base
model is correct. We also check if it
has trigger words or other settings.
Then we can download it from here. You
must be logged in to download models. We
save the Laura in the same Laura's
folder inside the Z image folder. After
the download finishes, press the R key
to refresh. Now we should be able to see
that Laura in the list and select it.
Let's see what else it says about this
Laura so we can learn more. Here it
shows the trigger words. I can copy
those and paste them in a note so I have
them for later. They also give an
example prompt showing how to use it.
Let's copy that and paste it into the
positive prompt. I will remove the
beginning and ending quotes. Now, let's
copy the trigger words and place them
here instead of the previous trigger
words. We also have a subject. So, let's
say a white bunny warrior. For art
style, maybe I add a 3D render style.
The rest looks fine. For the model
strength, I will use one. If it is too
strong, I can reduce the weight. For the
size, let's make it bigger so we get
more details. Now let's run the
workflow. We get this character sheet
which is not bad. This could be useful
for concept artists to see different
angles. Let's run it again with a
different seed. And we get this result.
Now let's change some things in the
prompt. Maybe it is a medieval bunny.
For art style, I try a vector art style.
Let's run the workflow again. Now we get
a different image. This seed does not
look that good. So maybe I try another
seed to see if I get something better.
Again, not perfect, but at least it
gives some ideas. Let's go back to Civid
AI and look again at models. You can see
there are luras for all kinds of things.
That does not mean all of them are
great. They are trained by people like
you and me and shared for free.
Depending on the training, some are very
good and some are not so good. Training
is never perfect. If you want to see how
much the Laura influences the result, we
can test that too. Change the seed to
fixed and generate once to see the
result. In my case, I got this image.
Now, let's go to the Laura loader node.
Right click on it and select bypass.
Then, we run the workflow again with the
same settings and prompt. You can see
that without the Laura, we do not get a
character sheet anymore. So, this Laura
clearly helps with creating multiple
characters on a sheet. Remember, Allora
is like an add-on to the main model. It
adds extra training to that model, like
the model took a new course and learned
how to do character sheets. Hope that
helps. I explained controlNet basics in
chapter 14, but there are models like
Zimage that need different nodes to run
control net. Let's go to workflows. And
now I want to open workflow 4 and also
workflow 7 since both are using
controlnet. And you can see the
difference. So let's go to the
juggernaut workflow which is a stable
diffusion 1.5 model. Here we use a load
control net model node and it is the
same node used for SDXL models or flux
models. Then for control net we have
different models like depth canny pose
or other types that control the image
generation. Now if we go to the Z image
turbo workflow, we have a different node
here called model patch loader. Here we
load a control net model and it is
called union because it has depth, canny
and pose integrated into one single
model. So we do not have to keep
changing the model. It is one model that
does everything it needs. Back in the
juggernaut workflow, we had a
pre-processor node that converted our
image into a format the control net
model understands.
For the Z image workflow, that part
remains the same. We can try different
pre-processors like canny depth or DW
pose and they will work with this model.
For the last part in the juggernaut
workflow, we had an apply control net
node between the prompts and the K
sampler with different parameters. For Z
image turbo, the node is different. It
is called Quen image diff control net.
Here we only control the strength which
I set to 0.8. So it is not too strong.
Now let's download the required models.
By now you should already have the main
models downloaded either FP8, FP16 or
BF-16. The principle is the same even if
you use a GGUF version or other types.
We also need to download the control net
model because we are not using the load
control net model node but the model
patch loader. We need to place this
model in the model patches folder. So
let's click here to download it. Go to
the Comfy UI folder, then to models, and
here you will find the model patches
folder. Let's save the model here. Wait
for it to download since it is around 3
GB. Also, keep in mind that over time
more versions can appear like version
two or three. So, always check if there
is a newer version available. After the
download is finished, press the R key to
refresh the node definitions so the
model appears in the list. Then, select
that model from the drop down. Now we
should have everything we need to run
the workflow. I have here in the load
image node a robot image loaded. For the
pre-processor I use depth or cany but
let's start with canny. For the
resolution I will make it bigger so the
cany map has better details. Then for
the prompt we describe what we want to
get and then we run the workflow. Now we
can see that we got a cany map that
control net understands.
Look at the result. It looks much better
and it follows the edges of the original
robot. Let's try with a different image.
As you remember in the input folder, I
added some images you can use. So, let's
say I load this sphere and cube image.
For the pre-processor, let's use depth
this time. Then, let's adjust the prompt
to fit. Maybe a green sphere on top of a
golden cube in the desert, golden hour,
alien. Now, when I run the workflow, I
get a depth map for that image. For the
most part, it got it right, except the
ground. Let's help it understand what I
want. So, I will add to the prompt, the
sphere and cube levitate in air. Let's
run it again and see if it understands
it better. Now, we got exactly what we
asked for. You can also give the image
to chat GPT and ask for a prompt
together with instructions on how you
want it to look. Let's try something
else.
This time, let's upload that woman in a
yoga pose that the juggernaut model
struggled with to see how much the model
advanced in the last 2 years. For the
pre-processor, I use the DW pose
pre-processor. For the prompt, I will
add a photo of a woman dressed in white
doing yoga on top of a mountain. Maybe I
add photo taken with a DSLR camera. Not
sure if it will take that too literally.
So now we got our pose skeleton which
looks correct. We also got an Asian
woman which Z image tends to generate
when you do not specify what kind of
woman it is. It also added a DSLR camera
on the ground which I do not want. So
let's go back to the prompt. I remove
the DSLR part and for the woman I add
that she is European. Now let's test
again. The result is actually great.
Same pose, the clothes I asked for and
on the mountains. a perfect result. What
do you think? Let's try to recreate this
controlN net workflow so you can
practice. Go to workflows and let's open
workflow 5a since this is a simple
textto image workflow for the z image
model. Search for quen image written as
one word. Then select the diff synth
control net node. Now we need to connect
this between the model and the k
sampler. So let's add the links so
everything goes through this node. Let's
see what other inputs we have here. It
says model patch. So let's search for
that node. We add the model patch
loader.
And here we select the union control net
model. Now we drag a connection from
this node to the control net node. We
also need the VAE and we already know
where it is in this workflow. So we
connect that as well. All that is left
now is an image. To load an image, we
use the load image node. So search for
that node and add it. If we try to
connect the image directly, it will not
work correctly because this model is
trained with canny depth and pose. So we
need something to convert the image into
those formats. Search for AIO and add
the OX pre-processor node. Now our image
goes through this pre-processor. From
the list, we can select one, for
example, the depth anything
prep-processor. For the resolution, we
can increase it a bit to get more
detail. Now, we connect the output of
this node to the control net node since
this is the correct format that control
net understands. We can also add a
preview node to see how the processed
image looks. All that is left now is to
adjust the prompt. Let's say the prompt
is a modern house in winter. We can also
increase the width and height to get
more details. Now, we are ready to test
the workflow. We can see the depth map
of the building. We can enlarge it to
see it better. The result looks like
this. It is similar, but not exactly the
same building shape. You could try a
more detailed prompt or a different
pre-processor. Let's add an image compar
node to see the differences. I want the
original image before processing. So, I
connect it to image A. Then, just after
the VAE decode, I connect that output to
image B. Now let's run the workflow
again and make the image compar node
larger. We can see that it shares some
building edges with the original image
but not all of them. If we want more
accuracy, we can change the
pre-processor.
Let's select a canny pre-processor
instead. Now when we run it, you can see
it captures all the edges in the canny
map. The result should be more accurate.
And this is the result we get. Now we
can see many things in common with the
original image. Keep in mind this is
controlled mainly by edges. So it will
not be exactly the same building. We can
get more control later when we cover
edit models like Flux 2, Quinnedit or
Nano Banana Pro. Up to now everything we
did in Comfy UI happened locally inside
the interface. We loaded models,
connected nodes, ran workflows, and
generated images on our own machine. API
nodes are different. They allow Comfy UI
to communicate with external services.
An API is simply a way for one program
to talk to another program over the
internet. Instead of doing everything
locally, we can send data out, let
another service process it, and then
receive a result back. Think of it like
this. Local nodes are tools on your
desk. API nodes are tools you rent
remotely. You send instructions and you
get results back. In Comfy UI, you can
click on the plus to add a new blank
workflow. Then double click on the
canvas and search, for example, for chat
GPT. You can see that it says API node
under the node name. Let's select this
node. Now, this node looks different
compared to others. It comes already
colored in gold like a VIP version. On
top, it tells you how many credits this
node will consume depending on the
settings. Those credits change based on
what you use. Here we have a list of
models from OpenAI that are accessible
through the API. The API letters stand
for application programming interface.
For example, if we select a big model,
it can cost between 2 to 8 credits
depending on what you ask from it and
how long the answer is. If I change to
chat GPT mini, it is almost zero
credits. It is not zero, but it is 0
something. So, it is quite cheap. This
node has a string output. So like chat
GPT, you ask something and you get a
text reply back. Let's drag a link and
search for a node that displays text.
Search for preview. And we have this
preview as text node that we can add.
Here we will get our reply from the chat
GPT model. Let's say I ask it to
generate a prompt for a cute cartoon
bunny, something 3D. Now when I try to
run that, it asks me to sign in if I
want to use the API. We could use this
login button or we can cancel and go to
the menu then settings. Here we have the
user section in the settings and again
we have the sign-in option. Let's click
sign in. If you have a comfy UI account,
you can use that or you can simply log
in with Google which is a faster option
for me. Then you select your Gmail email
from the list and you will be signed in.
Now you also have the option to log out.
So now we are connected but we need
credits to run API nodes. Let's go to
credits. Credits are like money. You
basically use real money to buy credits
that you can spend on a lot of models
that are available through the API in
Comfy UI. I have here some credits I
bought a while back. I can click on
purchase credits and then it asks me how
much I want to spend. For example, I
have $10 here, but that might be too
much for a beginner to spend on a first
try. Let's click on minus to see if we
can go lower. The minimum you can buy is
1,55 credits using $5. Then you can
click continue to payment. Depending on
your country, you have different options
to purchase. You can use link, but you
can also choose without link if you do
not have one set up. Here you have
options to pay with a card or you can
use Google pay if you want and you also
have the option to purchase as a
business. Back in Comfy UI, I have
enough credits to test a few nodes in
today's tutorial. Now, when I run the
workflow again, this node sends
information to the server wherever those
are located in the cloud on OpenAI or
somewhere else. Depending on the
situation, sometimes it is faster,
sometimes it is slower. From the
workflow point of view, nothing special
is happening. Nodes still connect left
to right. Data still flows through
cables. The only difference is where the
computation happens. Local nodes use
your GPU or CPU. API nodes use someone
else's hardware. This has advantages and
disadvantages.
Advantages are that you can use very
powerful models that you cannot run
locally. You save local VRAMm and system
resources. Some APIs are faster for
specific tasks. Disadvantages are that
you depend on an internet connection.
There may be usage limits and of course
it costs credits. It is not free. You
have less control over model internals.
So we got the response from chat GPT and
it gave us multiple prompts and
suggestions instead of a single prompt.
So let's refine what we asked and tell
it to generate a single prompt. Maybe
repeat it once again to reinforce that.
Let's run it again. This time we got a
single prompt just like I asked. Now we
can copy the prompt and paste it into
another workflow. If we want with this
node selected I will use controll + c to
copy the node then let's go to workflows
and open a workflow like this five a
workflow that uses z image turbo which
we know likes long prompts I will move
this node to the side then controll +v
to paste that node to connect this node
we just drag a link to the positive
prompt now we have a mix of local models
that take the prompt from an API node we
can also drag a preview here if we
Let me search for preview as text. Now
we can see what prompt it gave us. I can
rename it prompt so I know this is the
prompt. Let's run the workflow. You can
see it generated a prompt for me. Then
it continues to the next part of the
workflow and generates the image. This
can be quite useful. There are free
models that can also do this, but we
will talk about that in another episode.
Let's change the prompt to be a ninja
bunny, maybe in an action pose. Generate
again. We get a new prompt describing
that bunny and the result is this image.
There are many API nodes and many
options to connect them. Let's go back
to the previous workflow where it was
just those two nodes. Now let's add a
concatenate node, a node that lets you
combine two strings or prompts. Let me
remove this prompt since we want to get
the prompt from the concatenate node. I
will use it for string B for now. And
then connect this concatenate node here.
I will add a green color so it looks
like a positive prompt node. For the
first part, I write a cute cartoon bunny
ninja. For the second part, I write
something like, "Use the prompt to
generate a single detailed prompt
creative. Adapt the prompt to match the
prompt style and mood." You can use all
kinds of chat GPT formulas here to get
exactly what you want. Let's drag a link
to a new preview as text node so we can
see the result of the concatenate node.
Maybe I name it prompt, but I might
change that later. Still exploring what
we can do. When we run it, you can see I
forgot to add a separator. So, it just
combined the ninja prompt with use the
prompt to generate. In the end, it still
understood and generated the prompt.
But let's fix that delimiter and add a
comma and a space. Of course, you can
split this into multiple nodes and make
workflows more complex, one going into
another workflow, and so on. This is the
prompt that goes into chat GPT and this
is the prompt that comes out of chat
GPT, the one we want to use in other
workflows. Now we can run it again. You
can see the input prompt to chat GPT is
this combined text. I like to use
concatenate because I can easily change
the first prompt without changing the
formula below. So it is easier to edit.
The result is this long prompt for the
Ninja Bunny.
These two nodes are the same, so I only
need one and I remove the other. What we
did here is split a workflow into
multiple pieces so we can easily edit
the prompt without worrying about the
formula. I can quickly change the first
prompt,
run it again, and get a new prompt. It
is quite easy to use at the cost of a
few cents or 0 something credits. I
probably do not need that preview
anymore since I know how they are
combined. So I will leave just one
concatenate node, the chat GPT node and
the preview of the final prompt. Now
that we have this, we can save it. Hold
control and drag a selection over all
nodes. Right click on the canvas, then
use save selected as template. Give it a
name, maybe chat GPT prompt, so we know
it generates prompts. Now we can paste
it into any workflow. Let me open that
5A workflow again.
Since it was already open, I will close
it because I do not want the extra
nodes. Then open it again fresh with
default values. I move this to the side.
Right click on the canvas. Go to node
templates. And now we have that template
there. We can move it wherever we want.
Remember that the chat GPT prompt comes
from this string output here. And we
connect it to any workflow we have to
the positive prompt. Now when I run the
workflow, chat GPT generates a prompt.
That prompt is used in the Z image
workflow and the result is this cute
monk cat.
But the chat GPT model is also a vision
model which means I can give it an image
and it can see what is in that image.
Let me remove this concatenate node.
Let's add a load image node.
I upload this image of a helmet. Now we
can connect this node to where it says
images. It says images because you can
add multiple images if you use a batch
images node, but maybe we explore that
in a future episode. For the prompt,
let's say something like, give me a
single prompt description for this
image. Descriptive prompt. There are
more complex formulas, but I am just
trying something on the spot. Now, let's
run the workflow. Chat GPT looks at my
image, and after a few seconds, it
should give me a prompt based on that
image. We got this nice long prompt and
the result is this one. It is not
perfectly identical, but with a better
formula, we can probably get something
even closer. It is still pretty close to
what we asked. You can also run the
workflow multiple times to get different
seeds. Let's see what else we can do.
Let's create a new blank workflow.
Double click on the canvas and search
for Nano Banana. We have this first
version of Nano Banana that is cheaper.
It is eight credits, so you can probably
get something similar for free from
Google Gemini. We also have Nano Banana
Pro, the more powerful model that can do
big images. This one costs 28 credits.
If we change to 4K size, it will cost 51
credits. Depending on the model, some
can cost over 100 credits, so be careful
what nodes you use because you can run
out of credits pretty fast. Both accept
images, but Nano Banana Pro understands
prompts and images better. You can see
what model is used for the first Nano
Banana. And for Nano Banana Pro, it is
actually called Gemini 3 Pro image.
Let's remove the first node and use this
one to generate an image. I add a load
image node so we can load an image from
disk. Then connect the nodes. Now I
upload an image. For example, this
portrait of a man. Then I add the
prompt. We did not talk yet about
editing models like Flux 2, Quenedit, or
Nano Banana Pro, but these can be used
to edit or modify an image. I could say
change the t-shirt or replace the
background or hair color. Let's try
something simple like telling it to
change what he wears to a steampunk
suit. For resolution for this test, I go
with 2K since it uses about half the
credits. We also have aspect ratio.
Instead of auto, I set it to 9 to6, but
you should use whatever ratio you need.
When I run it, I get a prompt failed
message. Can you guess why? It says it
has no output. That is because we did
not save the image. So, let's drag a
link and add a save image node. I also
see it has a string output. So, let's
add a text preview node to see what it
outputs there.
Now, we can run the workflow again. This
one takes longer, over a minute to
generate. You can also check your
profile here to see how many credits you
have or sign out, manage subscription,
and so on. You can also check partner
node pricing. This opens the Comfy UI
website where you can see how much it
costs to use any of the models that are
not free, the so-called partner nodes.
You have models for images, text, and
also a few nodes for video. These
usually need a lot of VRAM and you can
generate video even if you do not have
that VRAM locally but at a cost. Back in
Comfy UI we got our generation. If we
look at it the result is quite good in
2K size and it is quite similar to the
original man. So it is a good model but
expensive. For the text output we also
got something like a peak into what the
model was thinking. basically a prompt
it used to generate that image based on
the small prompt I gave it and the
image. You can explore more API node
workflows created by the comfy UI team.
If you go to templates here you have all
kinds of workflows but if you want to
see the API ones select partner nodes
then you can filter them by model if you
know what model you are looking for or
just explore random workflows. It does
not cost you anything to open and check
a workflow. It only costs credits when
you run it. Let's say I like the preview
of this workflow. I click on it and I
get the workflow. Let's see what it
uses. We have a load image node. So, it
expects an image from our computer. We
have a nanobanana prompt and it says
color this image. So, if you upload a
sketch, it will color that sketch using
the nano banana pro model. By default,
it is set to 1k, but you can change the
settings to fit your needs. Let's go
again to templates and check another
workflow. Maybe this one with the shoe.
This one is more complex. It expects an
image of a product like a shoe. Then it
uses a bite dance model which is similar
to nano banana but a cheaper version.
Once the image is saved, it goes to
different video models. These models
cost around 103 credits each. It looks
like it generates multiple videos from
that shoe, depending on the prompt, and
then combines all those videos into one
final video. A workflow this big takes
some time to run and can cost you maybe
around 300 credits, so roughly a couple
of dollars. I did not do the exact math,
but you can spend credits very fast with
video models. One important thing to
understand is that API nodes do not make
Comfy UI cloud-based. Comfy UI is still
running locally. You are simply adding
external steps into your pipeline. From
a mental model point of view, treat API
nodes exactly like normal nodes. The
cables do not care where the data comes
from. If the output type matches, it
works. This also means you can mix local
models, gguf models, diffusion models,
and API nodes all in the same workflow.
Comfy UI is a complex application with a
lot of nodes created by different
people. And we combine all these things
together like Lego pieces. At some point
you will get an error either because you
forgot to connect a link, you connected
the wrong nodes or you used the wrong
models. In this chapter I will try to
explain what you can do when that
happens because it will happen. If we
look at the workflows we have this
workflow with the number zero. The first
one, this one is for help and resources.
And I tried to gather here some
information that might help you. Let's
start with resources. The best way to
learn Comfy UI is to watch tutorials and
to practice. You have here a link to the
Pixar YouTube channel, but there are
many other YouTubers who do tutorials
for Comfy UI. You can search on YouTube
for different tutorials. Try to look for
more recent ones because if a tutorial
is two years old, most things are
probably different. Now, that is one of
the reasons I made this new series. On
the top of my YouTube header, I added a
link to my Discord. Click on it, then
click go to site, and you will get an
invitation to join the Pixaroma Discord
server. If for some reason it says
invalid, try a different browser or the
mobile application. You click accept
invite and you will land in the welcome
channel. I will show you more about how
to navigate Discord in a minute. In this
note, I also included a link to Discord
and some useful info like where the
Pixar workflows are and so on. Let's
click on this link and we get to the
same invite. It goes to the same server
and the same welcome channel. This is my
server called Pixaroma, but there are
many other servers for Comfy UI. For
example, if you go to the comfy.org or
website and then to resources. They also
have a Discord link. It is the same
process. You accept the invite and you
land on their welcome channel. On the
left side, you have the servers you
joined like my Pixaroma server or the
Comfy UI server. Let's go to Pixaroma
and explore a bit more. On the left, we
have different channel names so we stay
organized. Each channel has its purpose.
There are also categories that contain
multiple channels which you can collapse
or expand. For example, if I collapse
this category, you might think you
cannot find the Pixarroma workflows
channel. But if we click on this arrow,
we expand the category and now we can
see the Pixaroma Workflows channel
there. For every server you join, check
the rules so you know what you are
allowed to do and you do not break the
rules and get banned. If your Discord
account gets hacked and posts spam in
your name, you might get banned as well.
You can send me a message to remove the
ban if you fixed your account and it is
not hacked anymore. Here we also have a
help channel where you can find what
each channel is for and where to post.
Some channels are public, some are
private and only for members, and some
are public, but only moderators or
admins can post like news and updates.
We also have a daily challenge for
people who use AI. You can find more
info in this channel and you can
participate in the challenge in the
daily challenge channel. When you see a
number with a red circle, that means
someone mentioned you or everyone on the
server. For example, when you see that
the news and updates channel has a
notification, go check it because I
probably posted a new tutorial or shared
an update. You can see this post used
everyone to mention everyone on the
server. In off topic, you can discuss
things that do not fit in Comfy UI or
other channels, but try to avoid spam
and make sure it still respects the
rules. For Comfy UI here, you usually
have the most active chat. People talk
about Comfy UI, so if you post here for
quick help, and if members know the
answer and have time, you might get
help. If not, it might get ignored.
Another channel where you can post is
the forum. There you usually post things
for longer term discussion. You might
post today and get replies in hours,
days, or sometimes not at all if people
do not know the answer. You can see that
I can post in this channel because it
lets me type. You can ask for help here
and include screenshots and all the
details. This is also the channel where
EVO posts updates about the Easy
Installer, which he continuously
improves and adds more scripts to make
things easier. Thanks to EVO for all the
help. Make sure you check this area for
updates related to the easy installer.
The most visited channel is probably the
Pixarroma workflows channel where people
come to get my workflows from tutorials.
Here I have a list of older episodes
from 2024 to 2025 and also this new
series I am doing now. Starting with the
first episode, you can see links that
lead to specific episodes. For example,
if I click on the first episode, I land
on this page where I will also add a
link to the YouTube video once it is
ready. You will find all the chapters of
that video plus links to Comfy UI and
the workflows. You can download the
workflows either as a zip archive that
you extract or as individual JSON files.
You can also comment on this forum post.
Since I post this series as forum posts,
you can comment if something does not
work so we can try to fix it if
possible. Keep the conversation limited
to that specific episode. For the next
episodes, comment on their respective
posts. If it is not related, use the
forum or the Comfy UI channel. For
off-topic discussions, use the off-topic
channel. You can also use Discord to
navigate quickly. You can create links
to different channels. For example, if I
type the hash sign, I can select
different channels like Pixaroma
Workflows. Or I can type hash and then
help. And you can see what happens. It
adds a link to that channel. When I
press enter, I get a clickable link to
the help channel. If I click it, I land
in the help channel. Let's go back to
the off-topic channel. If you hover over
a message, you have different options
like edit or add reactions. Some servers
also allow you to use emojis from other
servers. For example, I can select this
Pixab Bunny emoji. To remove a message,
hover again, click the three dots, and
you have different options, including
delete. If I type hash and then
Pixarroma, and select Pixarroma
workflows, press enter, then click it, I
land in the Pixarroma workflows channel.
I keep getting messages that people
cannot find the Pixarro Workflows
channel. So, I hope this tutorial helps
you find it more easily. In channels
where you cannot comment, you will see a
message saying that posting is not
allowed. Usually only admins or
moderators can post there. Use the Comfy
UI channel for discussion and help
related to Comfy UI. Use Offtopic if you
cannot find the right channel. Let's go
to the forum. Here you can find all
kinds of forum posts. For example, we
have this pinned forum post that you
should read before you post anything.
From here you can create a new post. You
can close this if you want to see more
of the forum. When you create a post,
you can add a title, a message, and
screenshots with your workflow that has
problems. You can also add tags to your
post depending on what the post is
about. Add a clear title and a
descriptive message, not something
vague. When you are done, you can post
it using this button. You can also check
other posts to see how they are written.
For example, the workflows from the
first episode are posted in a forum
post.
Besides that, you have more channels for
AI video and AI music for chat GPT and
other AI topics and a few more channels
that I will let you explore in your free
time. Keep discussions civilized and
help when you can. I visit the Discord
every day, but I cannot respond to all
messages. mention Pixarroma if something
is important. In the top right, you also
have an inbox. On the left side, if you
click on the logo, you have direct
messages where you can talk with your
friends. If you click on unread, you can
see notifications, including mentions,
so you can quickly see when someone
mentioned you or everyone on the server.
You can jump directly to that message
using the jump button. Always check
mentions, especially when you see your
username. You also have a search bar
which many people forget exists. Here
you can type a model name or a few words
that people might have used in
discussions. For example, if I search
for LTX2, I can see quite a few posts
with that search query and I can jump to
any of those discussions. You can also
search posts from a specific user. For
example, I can search for posts from
Pixaroma. Make sure the username is
Pixarroma and not something else because
some people try to mimic the name. Both
the username and display name should be
Pixarro. Now you can see all the posts
from Pixarro. You also have more options
and filters that you can use for
different channels and searches.
You can use these arrows to reply to a
message or forward it to someone else.
Okay, enough with Discord. Let's go back
to Comfy UI. I included here more
resources for Comfy UI like the official
ones and also some unofficial ones like
Reddit or Facebook groups that you can
try. Let's open the Reddit group for
example. We have this comfy UI Reddit
group where you can see discussions,
news, tutorials and so on and where you
can post your questions. There is also
one called stable diffusion which
includes discussions about stable
diffusion and free models comfy UI but
also other interfaces not only comfy UI.
You can also search for a word on Reddit
like Comfy UI and sort the results by
communities. Then you can check which
ones have more members. The two I use
the most are these ones. Make sure you
also check the other notes I added here
like definitions for beginners, what a
model is, what a text encoder is, and so
on. There's also more information about
performance, common errors and fixes,
model locations, custom nodes, and how
to update Comfy UI. I also included a
link to the easy installer in case you
want to go back to it and find more info
or check what is new in the releases. If
you want, I also created an experimental
chat GPT that you can try, especially
for this easy install comfy UI version.
Like any chat GPT, it can hallucinate
sometimes, but it is still better than a
simple chat because it is more
specialized for comfy UI. For example,
if I ask where the images are saved, it
will think and also search the knowledge
database where I added some files. Then
it will answer. You can see the answer
is pretty good. So I think it will help
a lot of beginners. Sometimes if you
think it made a mistake, maybe because
something is new and the model was
trained months ago, you can ask, "Are
you sure?" Look online and it will
search the web. This way you can double
check and improve your chances of
getting a more accurate response. In
this case, it knew that images are saved
in the output folder. Let's ask
something else like where are the
Pixarroma workflows? Where can I find
them? It will tell you they are on
Discord and give you the channel name.
Let me try something else. Let's open
workflow number one, the juggernaut text
to image workflow and disconnect this
node to cause an error. When I run it, I
get this error. Now I take a screenshot
of this error, go to that custom chat,
paste the screenshot, and ask how to fix
this error. You can see that in this
case, since it was a simple error, it
knew how to answer and told me to drag a
wire from load checkpoint VAE to the VAE
decode node. This can save you a lot of
time in many cases. So, I hope you find
it useful. You can give it more
screenshots and more info, even ones
without the error, so it can understand
the workflow better. Sometimes on
GitHub, it asks you to post an error
report, and you can find here a report
that gives more info about the error.
You also have find this issue which
opens the issue pages on the Comfy UI
GitHub page. This is the official Comfy
UI GitHub page for the portable version,
not the easy installer. Even though the
easy installer installs the same version
plus extra scripts, there is an issues
tab where people post problems. You can
search issues that are open or include
closed ones as well. You can also post a
new issue if it is something new and you
did not find any information about it.
Make sure it is an issue with a comfy UI
node, not a custom node. For custom
nodes, you need to go to the custom node
page instead. To fix this error, we just
connect the VAE back to VAE decode. But
let's say you have your VAE as a
separate file for some workflows. So you
use load VAE to load a VAE and connect
that to VAE. Let's see what happens when
I run the workflow. It gives this error
which usually means we used models with
different architectures that are not
meant to work together. The error is
shown in VAE decode. But VAE decode is
not really the problem. The problem is
the input that goes into that node. In
this case, the VAE loaded with load VAE
was the issue. Let's go back to the help
workflow. I want to remind you that when
you ask for help, include screenshots of
your workflow. Tell us what video card
you have, how much VRAM and system RAM
you have, and which operating system you
are using. Also, explain what you
already tried and what did not work.
This helps the community assist you
faster. Okay, one more chapter to go.
Are you ready? You have now reached the
end of this course. At this point, you
understand how Comfy UI works, how
workflows are built, how models differ,
and how to use tools like Laura,
ControlNet, and advanced diffusion
models. But learning Comfy UI does not
really end here. This is just the
foundation. The most important thing to
understand is that Comfy UI is not a
fixed tool. It is constantly evolving.
New models appear, new nodes are
created, new workflows solve problems in
better ways. So the best way to continue
learning is by experimenting. Open
workflows, break them, rebuild them,
change one thing at a time and see what
happens. That is how real understanding
happens. Another important habit is
reading workflows, not just using them.
When you download a workflow, do not
just press run. Look at the nodes.
Follow the connections. Ask yourself why
something is there. If a workflow looks
confusing, that usually means it is
teaching you something new. Next, stay
connected to the community. Use Discord
to ask questions, share results, and
help others when you can. Very often,
answering someone else's question will
make you understand things better
yourself. Follow model releases, but do
not chase everything. You do not need
every new model. Find a few that work
well for your style and hardware and
learn them deeply. As you get more
comfortable, start building your own
workflows from scratch, even simple
ones, especially simple ones. That is
how you move from copying to creating.
Also, remember that AI tools change
fast. What matters most is not
memorizing settings, but understanding
concepts. Noise, conditioning, sampling,
structure versus style. Those ideas will
stay useful even when models change.
Finally, do not rush. There is no finish
line here. Learning Comfy UI is a
process, not a goal. Take your time,
have fun, and keep experimenting. This
is just the beginning. So, what comes
next? Obviously, I will continue this
series and do episode 2, 3, and so on.
But I cannot make them as big as this
first episode. They will be shorter
videos focused on things we did not
learn yet like other models such as Quen
or Flux video models and so on. We still
have a lot to cover and every week or
month we see new models and new nodes
appearing. My plan for the new series is
to show you these new models and
workflows in a more easy to understand
way so everything makes sense as much as
possible. Some of these workflows and
models are so new that nobody really
knows much about them yet. I will try to
post a new episode every week if my
health allows it. If not, then at least
one episode every 2 weeks. This new
series will have bunnies on the
thumbnails so you do not confuse it with
the old series.
For the new series, as you saw, the
workflows on Discord are posted in the
forum. This makes it easier for me to
see when you find a bug or when
something does not work anymore so I can
try to fix it. That is why the old
series, even though it still has good
tutorials and you can still watch it,
especially the last episodes, will not
receive updated workflows. I will not go
back and try to fix those old workflows.
Instead, I will focus on the new series.
When needed, I can revisit those older
workflows, adapt them to new models, and
present that in a new episode in the new
series. I wanted you to have the basics
in this long episode 1, this course that
you probably cannot find somewhere else.
I worked one month on this episode, and
I wanted everyone to have access to it
for free. I could have put it behind a
paid course, but I feel better when I
can help people. That being said, I do
appreciate your support. There are many
ways you can help me and this channel so
I can create more videos. The easiest
way is to press the like button,
subscribe to the channel, and leave a
comment, even if it is just a simple
thank you. This shows activity to the
YouTube algorithm and helps the video
reach more people. So now, if someone
asks where they can start learning Comfy
UI, you can share the link to this
course. I will also create a new
playlist that will host all the new
episodes from this series. For those who
can afford to buy me a cup of tea, since
I do not drink coffee, being a bunny, I
already have too much energy. You can
use the join button. Here you have four
different options from really cheap,
like half a cup of tea per month, to
more expensive, like a premium cup of
tea. Depending on the option you choose,
you get different perks. For example,
Legends have a private channel on
Discord where they get to know me
better. If you do not want to help
monthly, you can also help one time. On
each video, you can find this heart with
a dollar sign called super thanks. Super
thanks allows you to select an amount of
money that you want to donate and send
it. You can use this for videos that
really helped you like this course or
any other episode where you learned
something useful. Speaking about legends
and those who subscribed to the
membership, I want to thank all of you
who made this course possible with your
support. Together, we can help other
people learn new tools, understand this
crazy AI world we live in, and maybe
even make it a better place. Thank you
all. You are the best. Have a great day,
and I will see you on Discord.
UNLOCK MORE
Sign up free to access premium features
INTERACTIVE VIEWER
Watch the video with synced subtitles, adjustable overlay, and full playback control.
AI SUMMARY
Get an instant AI-generated summary of the video content, key points, and takeaways.
TRANSLATE
Translate the transcript to 100+ languages with one click. Download in any format.
MIND MAP
Visualize the transcript as an interactive mind map. Understand structure at a glance.
CHAT WITH TRANSCRIPT
Ask questions about the video content. Get answers powered by AI directly from the transcript.
GET MORE FROM YOUR TRANSCRIPTS
Sign up for free and unlock interactive viewer, AI summaries, translations, mind maps, and more. No credit card required.