This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.
Post your project, your progress from last week, and what you hope to accomplish this week.
If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

Jump in the discussion.
No email address required.
Notes -
Man, this week is happening too fast.
Well, I spend some time trying to figure out how to handle mapping from Substack. It turns out that my previous approach was a bit goofy, as for Twitter I ended up introducing an intermediate layer that matches neither my, nor Tiwtter's data structure. I could have a data layer that matches each API, but the level of nesting in Twitter's data is something I do not want to deal with at all, so direct translation into my structure it is, and the same will apply to Substack. This means I've been slowly refactoring the import code, and I will be doing so for a while yet.
How have you been doing @Southkraut?
Been putting in extra work at work, which completely ate up my free time during the week days, and on the weekend I social calls. No progress this time.
More options
Context Copy link
More options
Context Copy link
Ideally I'd like to have the third draft of my NaNoWriMo project finished by the end of this week, or failing that, next Thursday.
More options
Context Copy link
Now that I have a 5070 Ti I am ready to dabble with some local AI. What's a good guide for tech-literate AI n00bs? I am interested in generating images and maybe using it to classify my photo archive.
I highly recommend Conrad's "Making Images with Stable Diffusion" guide; it's what really got me into the hobby. Some parts are a little outdated (for example, he recommends SD 1.5 models like AbyssOrangeMix3 and RealBiter, but the meta has long since moved on to SDXL models for photorealism and Pony models for anime) but his description of the basic workflow is both excellent and still relevant.
As for software, I strongly recommend Fooocus over the standard choices of Automatic111 and ComfyUI. It's incredibly simple to set up and use; after you download, unzip, and run, Fooocus will automatically download several useful models and LoRAs into the right folders, as well as any necessary VAEs, and set them up as helpful presets. The presets include lists of positive and negative prompts to improve your images; in less than three clicks from the time you download the zip file, you can already type "forest elf" and get four stunning results. But despite this user-friendliness, Fooocus is not just a toy; it is a powerful and flexible tool that has a lot of options hidden under the hood. As you get more comfortable generating images, you can open the various menus and customize your settings, download other models, create your own presets, etc. See this guide for more.
More options
Context Copy link
For imagegen, you've got two generalist options:
There are some specialty cases (eg, Wan2GP is like Automatic1111, but only for running video models on mere-mortal-level GPUs; a big stack of options for 3d model generation), but those are the big ones.
For classification and categorization, there's a lot of options, but most of them are intended to run on servers with less powerful graphics cards passively, rather than on-demand from a desktop client. The three I've tried are PhotoPrism, Immich, NextCloud Memories. All worked well enough for my purposes, but the user experience and setup difficulty is wildly different from one to the other -- I'd probably point to immich if you are okay with Docker now, and NCM if you absolutely won't, but there's a bunch of tradeoffs to each.
I'd assume there's some desktop tools for this, but I haven't found any that were good and turnkey. You do have the VRAM necessary to train your own AI classifier (I'd recommend YOLOV4 using WANB) pretty quickly if you've got the training data, but it does take a lot of preclassified photos to train it (>200 per category minimum, imo), and you'll need to do some (high-school-level, simple CSV munging) code to actually do the sorting or tagging.
More options
Context Copy link
For image generation, I recommend starting with KritaAI, an open source add-on to the open-source image editing software Krita, available on Github. It adds a panel to Krita where you can enter prompts, and a further settings menu where you can create saved settings for generation inputs. It has automated ways to download default models to use, and it installs an instance of ComfyUI that is used on the back-end for generations. Adding on new models manually is easy and intuitive, just copy-pasting to the proper folders and then selecting them in Krita's settings after a refresh. Krita's interface also makes inpainting extremely easy and intuitive.
For a more powerful/capable but less intuitive tool that takes longer to set up, you can install ComfyUI directly. I never got too much into it, since I enjoy working on and polishing up individual images, which Krita is set up very well for, but the subreddit for ComfyUI had good resources. It's much better than Krita if you want to generate a large batch of images, a batch of images with random variables or systematic changes in the prompts, a batch of images that each have to follow some sequence of steps requiring multiple generations, or any combination of these.
What about a general guide on the process? How do I structure my prompt, what kind of parameters do I tune, how do I plug in a style I want? A basic conversational-style prompt gives me residents of Innsmouth.
Honestly the easiest way to get started is to go on CivitAI, find something similar to what you want, and then start playing with the knobs and dials from there. Many images have attached metadata including ComfyUI workflows. This gives you at least a known-good configuration to fall back on when something goes wrong.
The /g/ board on 4chan also has a long running Local Diffusion General which has a bunch of guides and resources for getting started.
One thing to note though is that ComfyUI is a bit of a security nightmare. Custom nodes are basically just python scripts downloaded and run from the internet with little to no security screening by the maintainers. If you're experimenting with this on a computer with sensitive information, I'd recommend not installing random custom nodes with two stars on GitHub.
More options
Context Copy link
The way models are influenced by the prompts are very model-specific, and so the easiest way to formulate your prompts to be effective would be to start by copying an existing example and going from there. I personally use anime-based models, which are almost all (all?) trained on Danbooru data sets with underscores removed from the Danbooru tags, which means that Danbooru-specific tags (e.g. 1girl, looking at viewer, 3d, cowboy shot) or Danbooru-common artist names or organizations (e.g. takeuchi takashi, cle masahiro, a1 pictures, kyoto animation - you can mix & match and put different weights, so e.g. "(takeuchi takashi:0.3), (kyoto animation:0.6) will produce a style that looks roughly 2x as inspired by Kyoto Animation's house style as it was inspired by the VN illustrator Takeuchi Takashi) give very predictable and reliable outputs, at least by standards of diffusion models. I've heard that this feature of anime-based models is such a strength that some people opt to use them to make the initial generation, followed by using IMG2IMG with different models and/or ControlNet to turn it into a photorealistic-ish or other style image. Dunno how common that is, though. But also, if you want specific poses or framing or whatever, using ControlNet with a separate input image that you manually create or pick out from somewhere is much more effective.
Parameters can be a real crapshoot and luck of the draw IME, or maybe I'm just poorly informed and lazy. Most anime-based models use Clip Skip = 2 (has to do with skipping the final layer of the neural net or something, which is how the very first useful anime-based models were trained), and you also have to choose the sampler, the CFG, and the number of steps.
As a rule of thumb, 20 steps is a good starting point; steps scale run time linearly, and usually above 20 you get returns that diminished to almost nothing (though usually not totally nothing - it can be good to inpaint over parts with 30+ steps sometimes).
CFG has something to do with how much denoising happens at each step, and lower values tend to make images look blobby and ill-formed, while higher values tend to make them look embossed, harsh, with a look that people have described as "deep fried" or "overcooked." I find values from 4-20 tend to be good, but it's also highly model-dependent and also dependent on what you want to get.
I don't think the sampler matters much, but this, too, is model-dependent. Samplers with "A" at the end of the name are "ancestral," which means they add noise with each step, such that the pictures will never converge on a single image no matter how many steps you take. There are also some samplers that require half as many steps as the others (but each step takes 2x as long), but I forget their naming scheme.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link