site banner

Tinker Tuesday for September 9, 2025

This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.

Post your project, your progress from last week, and what you hope to accomplish this week.

If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

Jump in the discussion.

No email address required.

Now that I have a 5070 Ti I am ready to dabble with some local AI. What's a good guide for tech-literate AI n00bs? I am interested in generating images and maybe using it to classify my photo archive.

For image generation, I recommend starting with KritaAI, an open source add-on to the open-source image editing software Krita, available on Github. It adds a panel to Krita where you can enter prompts, and a further settings menu where you can create saved settings for generation inputs. It has automated ways to download default models to use, and it installs an instance of ComfyUI that is used on the back-end for generations. Adding on new models manually is easy and intuitive, just copy-pasting to the proper folders and then selecting them in Krita's settings after a refresh. Krita's interface also makes inpainting extremely easy and intuitive.

For a more powerful/capable but less intuitive tool that takes longer to set up, you can install ComfyUI directly. I never got too much into it, since I enjoy working on and polishing up individual images, which Krita is set up very well for, but the subreddit for ComfyUI had good resources. It's much better than Krita if you want to generate a large batch of images, a batch of images with random variables or systematic changes in the prompts, a batch of images that each have to follow some sequence of steps requiring multiple generations, or any combination of these.

What about a general guide on the process? How do I structure my prompt, what kind of parameters do I tune, how do I plug in a style I want? A basic conversational-style prompt gives me residents of Innsmouth.

Honestly the easiest way to get started is to go on CivitAI, find something similar to what you want, and then start playing with the knobs and dials from there. Many images have attached metadata including ComfyUI workflows. This gives you at least a known-good configuration to fall back on when something goes wrong.

The /g/ board on 4chan also has a long running Local Diffusion General which has a bunch of guides and resources for getting started.

One thing to note though is that ComfyUI is a bit of a security nightmare. Custom nodes are basically just python scripts downloaded and run from the internet with little to no security screening by the maintainers. If you're experimenting with this on a computer with sensitive information, I'd recommend not installing random custom nodes with two stars on GitHub.

The way models are influenced by the prompts are very model-specific, and so the easiest way to formulate your prompts to be effective would be to start by copying an existing example and going from there. I personally use anime-based models, which are almost all (all?) trained on Danbooru data sets with underscores removed from the Danbooru tags, which means that Danbooru-specific tags (e.g. 1girl, looking at viewer, 3d, cowboy shot) or Danbooru-common artist names or organizations (e.g. takeuchi takashi, cle masahiro, a1 pictures, kyoto animation - you can mix & match and put different weights, so e.g. "(takeuchi takashi:0.3), (kyoto animation:0.6) will produce a style that looks roughly 2x as inspired by Kyoto Animation's house style as it was inspired by the VN illustrator Takeuchi Takashi) give very predictable and reliable outputs, at least by standards of diffusion models. I've heard that this feature of anime-based models is such a strength that some people opt to use them to make the initial generation, followed by using IMG2IMG with different models and/or ControlNet to turn it into a photorealistic-ish or other style image. Dunno how common that is, though. But also, if you want specific poses or framing or whatever, using ControlNet with a separate input image that you manually create or pick out from somewhere is much more effective.

Parameters can be a real crapshoot and luck of the draw IME, or maybe I'm just poorly informed and lazy. Most anime-based models use Clip Skip = 2 (has to do with skipping the final layer of the neural net or something, which is how the very first useful anime-based models were trained), and you also have to choose the sampler, the CFG, and the number of steps.

As a rule of thumb, 20 steps is a good starting point; steps scale run time linearly, and usually above 20 you get returns that diminished to almost nothing (though usually not totally nothing - it can be good to inpaint over parts with 30+ steps sometimes).

CFG has something to do with how much denoising happens at each step, and lower values tend to make images look blobby and ill-formed, while higher values tend to make them look embossed, harsh, with a look that people have described as "deep fried" or "overcooked." I find values from 4-20 tend to be good, but it's also highly model-dependent and also dependent on what you want to get.

I don't think the sampler matters much, but this, too, is model-dependent. Samplers with "A" at the end of the name are "ancestral," which means they add noise with each step, such that the pictures will never converge on a single image no matter how many steps you take. There are also some samplers that require half as many steps as the others (but each step takes 2x as long), but I forget their naming scheme.