This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.
Post your project, your progress from last week, and what you hope to accomplish this week.
If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service.

Jump in the discussion.
No email address required.
Notes -
Do we have many local LLM users here? I'm curious what people are doing: what models are people using? For what jobs? For what reason? On what hardware? With what runner?
As I mentioned to @WhiningCoil a few days ago, I mostly run a
Qwen3.6-A3B-Q4_K_XLonllama.cpp'sllama-serverand connect to it from https://pi.dev/, using a Radeon 780M in my laptop. It's been decent for grinding through smaller coding jobs under close observation, though like any Chinese model it'll just give you the party line if you start asking it about Taiwan or Tiannemen Square. I've also been using agemma4-26B-A4Bfor general questions about the world when I'm at session quotas. The other big reason I'm getting into this stuff is that I never want to be locked out by a subscription. Haven't looked at image or video generation at all.I have a small ML server that I initially set up for some work stuff, and have since retrofitted for LLM and diffuser use. nVidia 3090, i5-14400, running between 128 GB RAM to 192 GB RAM depending on what else I've shut down. Squeaked in just before RAM prices spiked (and am kicking myself for not grabbing three or four more of the 64x2 kits), if you want to know the why on the weird RAM numbers.
I'll caveat that you just shouldn't expect Claude or even Grok-level outputs from local models on their own.
For LLMs runners, I've mostly stuck to
llama-server(and forks) as well, after an initial and short-lived love-hate relationship withLMStudio. I have a few custom bits of code for sequencing larger grouped requests, but they're worse-than-vibe-code level stuff and basically just a UI and for loop. Toyed with SillyTavern, just in the hopes of getting better organization, but it's really heavily built for roleplay and I'm not that interested in it. I've looked at and played with some agentic-ish stuff in heavily sandboxed and airgapped environments, but when the best options arenanoclaw,hermesandodysseus, but when the least obnoxious one is powered by pewdiepie, there be dragons here.Writing:
gemma4-26-A4Bis a great editor, beta reader, and brainstorm sounding board. It's the closest to okay prose from a local model in its class, although beating the obvious AI tells out of it takes some effort and it's seldom very interesting. Also seems to have the best MTP assist (though I had the to build theatomic-turboquantvariant llama variant to get MTP to work when it first came out; don't know if the situation has changed there).[Cydonia](https://huggingface.co/TheDrummer/Cydonia-24B-v4) (24B, mistral-based)and ```Strawberry Limeade (70B, llama-based) are older models that were pretty useful and I'll still pull up on occasional to sanity-check stuff against.Coding:
Qwen 3.6is hard to beat for simple and fast work, especially things like bringing in an image sketch and converting it into XAML or webdev, or beating some simple file munging into shape. In addition to35B-A3B, I'd also point to the27B-MTPdense variant. It's not as fast as A3B, but the gap's smaller than you'd expect, and in some use cases it comes across as much smarter in my experience. I'd also recommend it any time you have a sketch or powerpoint art-level design you want converted into a GUI representation, and can't use a cloud model -- far from perfect, but easily saves hours of work.GLM(ranging from 4.5-Air at 106B up) can be good for complex work, refactoring, and troubleshooting -- but even at moderate quants, it can be fifteen minutes per turn. Great where I've got a ton that needs to go into the hatch and can work on something different; terrible for anything where a fast OODA loop is important.Standard vs uncensored/abliterated models is a hard question. Qwen is very refusal-prone, and not just on political topics. While Gemma4 is surprisingly willing to play along for a variety of topics, it still has some hard refusal points, some of which can come up surprisingly rapidly. And not just for weird smut, either. I've had .
For Image Generation, your two power user options are Automatic1111/Forge WebUI and ComfyUI. WebUI is the easier option to get started with, and still has a good level of support for things like img2img, swapping models out, or using various plugins or controlnets. ComfyUI's much more capable and eventually lets you do things like switch between models for different stages of a pipeline, but there's very much a 'who wants to drink from the firehose' moment every time you get started, and managing workflows sucks. On the other hand, if you want to run something like TRELLIS2 or Wan3d, ComfyUI's a lot easier (though not easy!) to set up.
In terms of models:
IllustriousandNoobAIfamily are good for producing general 'vibe'-ish scenes with one or two actors, so long as you don't need precision, and they're pretty fast.[Chroma](https://huggingface.co/lodestones/Chroma)(9B, FLUX.1-schnell-based) and[Anima](https://huggingface.co/circlestone-labs/Anima)(2B), which favor natural language over the SD-style "throw a bunch of words at it" approach. Much slower, though.Qwen Image 2512variants also fall here, although their workflows can be a lot more annoying.Qwen Image EditandFlux2-Kleinare the best edit models, especially for keeping consistency in a scene while tweaking it, or moving a character from one setting to another.2D->3D Models:
TRELLIS(2)gives the nicest-looking outputs for a given input image, and supports(ish) transparency.Wan3Dgives more 'whole' models that require less post-processing to ship to a 3d printer, but tends to be a little fuzzy.Animation:
WAN2has the most support and has been out the longest.LTX-2.3is much faster and comparable or better quality.SCAIL2just came out, but I haven't even tried to set it up yet. Initial reports look good.I had a hell of a time getting any non-trivial animation model working in Forge WebUI, and the ComfyUI workflows get nutty pretty fast. If you want to experiment with them and not go leaping into the deep end, WAN2GP gives a lot of workflow options, at the cost of sometimes serious performance costs and a bad tendency to automatically download a model without warning.
More options
Context Copy link
More options
Context Copy link