Workflow

Tools Used

A user-friendly interface to generate and refine images

AUTOMATIC1111 Stable Diffusion Web UI
- Dynamic Prompts Extension

Various models to use with the Web UI. I've tried a wide variety, but mostly use the two listed below

Text To Image Models
- Stable Diffusion 1.5
- Dreamlike Diffusion

Upscaler models used with the "SD upscale" script

Upscaler Database

Occasionally Photopea to touch up images during or after generating

Photopea

1) Bulk Generating Low Resolution Images

The goal of this step is to mass produce low resolution images, often overnight, to sift through and select the best ones to upscale and refine. I've found adding random elements to the prompt, via the Dynamic Prompts extension, tends to add interesting creative twists, at the cost of more "duds". Feel free to omit the wildcards if you're aiming for something specific.

Q) Why not bulk generate high resolution images and skip the refining steps?
A) Most models were trained on datasets containg exclusively 512x512 images. Generating at higher resolutions often results in unintended outcomes, usually repeating the subject matter, limbs, background, etc. Also, by upscaling with the method below, we have a higher degree of control over the final image.

Below is an example of my settings on a typical overnight batch.

Prompt: High detail {0-2$$__modifiers__} Black Cat digital art by {0-2$$__artist_fav__} {0-2$$__3d_terms__} {0-2$$__style__}
The wildcards in curly brackets will be filled in dynamically for each generation. The format X-Y$$__name__ will select between X and Y lines from the file name.txt. For example, style.txt has a few hundred lines with styles such as cartoon, sketch, logo, minimal, abstract, etc. Get creative with your wildcard files!

Negative Prompt: This should describe what you don't want the image to look like or features you don't want present. I use wildcards again in the format {10-25$$__negative__}, with negative.txt containing lines describing what I don't want generated. Blurry, extra limbs, horrifying, black and white, poor quality, text, etc.

Sampling Method / Steps: Feel free to experiment, but I've found DPM++ SDE Karras to produce consistently good images with a low step count (20-30).

Resolution: 512 x 768. Make sure at least one dimension is 512, but you can adjust this to portrait or landscape. Going much higher than 768 in a dimension will start to introduce artifacts and duplicate features.

Batch Count: How many images to generate. 500 takes a few hours on my PC, feel free to adjust this as needed.

Seed: -1 produces a random seed with each generation. This is generally what you should be using, unless trying to replicate a specific generation.

CFG Scale: This essentially determines how close the generation is to the prompt. A lower value (5-10) tends to be more "creative" as it loosely follows the prompt, and a higher value (11-20) will steer the result to closely match the prompt. Below 5 tends to be incoherent, while above 20 tends to introduce noise. The optimal CFG Scale is subjective and depends on the image, so we'll take advantage of the Randomize extension to let luck decide. Clicking Enable and filling 7,14,1 will choose a CFG Scale for each generation between 7 and 14, with a step value of 1. The lower and upper bounds can be adjusted, but I've settled on this range after trial and error.

After clicking Generate and waiting, we now have hundreds of images to sift thorough and choose the best ones to upscale and refine.

Black Cat in Winter Wonderland - Bulk Output (Wildcards Not Used)

2) Image to Image Upscaling

The goal of this step is to take the chosen 512x768 image and scale it to high resolution, while adding details and fixing any imperfections. Rather than generating with txt2img from random noise, img2img generates an image using the supplied image as the base. This method can be exploited to generate higher resolution images with more detail, without the side effect of introducing duplicate limbs and other imperfections.

I like the composition and lighting in this image, so we'll see what we can do to improve it.

Before working with the Stable Diffusion Web UI, manual edits can be made using Photopea to edit out or add in elements. These edits don't have to be perfect, the AI will do the heavy lifting. For the sake of an example, I'll use the Spot Healing Brush to edit out falling snow I think looks out of place in the cat's ear and on the face.

In the Stable Diffusion Web UI, drag the AI-generated image to the "PNG Info" tab. This will load in all details from the generation, including the prompt with all wildcards filled in with the exact terms used. Click "Send to img2img"

After sending to img2img, all settings used will be copied from the PNG Info tab. Unlike before, the specific seed of the original image will be used to maintain consistency. Once all settings are loaded, if you chose to make any Photopea edits, drag and drop the edited image over the original.

Most settings will be left as-is, but there are two more settings to cover before we generate the upscaled image.

"Denoising strength" adjusts how similar the generated image will be to the supplied base image. 0.0 will be idential to the base image, while 1.0 will be a completely new image. A range of 0.2 - 0.4 is ideal to add details without deviating from the original composition. Sometimes it'll take a few attempts to determine which value in the 0.2 - 0.4 range looks best.

From the "Script" dropdown, choose "SD upscale". This script works by breaking the image down into tiles, generating each tile individually, then stiching them back together scaled up by the Scale Factor. I leave Tile overlap at the default of 64, and Scale Factor set to 2. Upscalers can be found from the Upscaler Database link at the top of this page. "Remacri" and "UniversalUpscalerV2" are the two I use. Remacri works best for anything with more of a digital art or drawing aesthetic, while UniversalUpscalerV2 tends to work best with photorealistic images.

Once everything is set, click "Generate". After examining the result, you can re-generate with a higher or lower Denoising strength to find what works best.

Now our image is twice the resolution, with noticably more detail. The snow accumulated on the top of the head looks real now, and the nose has realistic lighting reflections. However, the eyes don't look great, which brings us to the final step.

The goal of this step is to re-generate specific areas of the image, punching them up with an extreme level of detail.

Similar to before, we can use Photopea to manually edit undesired elements of the image. Ther are some orange "sparks" that look out of place, and it appears the AI attempted to draw an eye on the cat's leg. These can quickly be fixed by drawing over them with the Spot Healing Brush.

From the img2img tab, click "Send to inpaint" on the upscaled image we generated above. This will copy the image, and all settings, to the Inpaint tab. Use the brush tool to mask the area we want to inpaint, which in this case, will be the cat's eyes.

Most settings will remain the same, but there are a few imporant changes to make before generating.

"Inpaint area" should be set to "Only masked". This will genereate only the mask and a small area around it, at full resolution, which allows for a much higher level of detail. The result will then be downscaled and stitched into the original image.

Denoising strength should be higher than the previous step, since now we don't want to closely match the original. This can be optimized with trial and error, but I've found 0.65 to work well in most cases while inpainting.

Seed is set back to -1 (random), since the original seed didn't produce a desirable outcome in the masked area.

Finally, since there are elements of luck involved, particularly with the seed, a higher Batch count can be used to bulk generate multiple images. I'll set it to 12 in this example, but you can go as high as you're willing to wait for and sort through.

4) Optional Finishing Touches

Any combination of steps 3 and 4 can be repeated to continue upscaling and refining the image to your heart's content.

As you can see below, even without further refinement, we've significantly improved the image from the original 512 x 768 generation.

Tools Used

1) Bulk Generating Low Resolution Images

2) Image to Image Upscaling

3) Inpainting Refinement

4) Optional Finishing Touches