KingOfTheBailey

1 follower follows 0 users joined 2022 September 10 01:37:00 UTC

No bio...

User ID: 1089

‎

Tinker Tuesday for June 9th, 2026

KingOfTheBailey 7d ago

Skills are essentially a pointer to a loadable chunk of context, a fuzzy program written as a .md file. "If you want to know how to do X, read this file first" sort of thing. The models get more confused as their context (message history) grows, and especially when the context window fills and the initial stuff gets cut off. Then they have no idea what they're doing. So a "skill" is detected by the harness (e.g. opencode) which injects the short summary into the prompt that goes to the model, which can then ask to read the full file if necessary.

Context

Tinker Tuesday for June 9th, 2026

KingOfTheBailey 8d ago · Edited 7d ago

Do we have many local LLM users here? I'm curious what people are doing: what models are people using? For what jobs? For what reason? On what hardware? With what runner?

As I mentioned to @WhiningCoil a few days ago, I mostly run a Qwen3.6-A3B-Q4_K_XL on llama.cpp's llama-server and connect to it from https://pi.dev/, using a Radeon 780M in my laptop. It's been decent for grinding through smaller coding jobs under close observation, though like any Chinese model it'll just give you the party line if you start asking it about Taiwan or Tiannemen Square. I've also been using a gemma4-26B-A4B for general questions about the world when I'm at session quotas. The other big reason I'm getting into this stuff is that I never want to be locked out by a subscription. Haven't looked at image or video generation at all.

Context

Culture War Roundup for the week of June 8, 2026

KingOfTheBailey 8d ago

I've been lucky enough to stumble into a relationship with a nerdy girlfriend, but if I hadn't, I'd be asking if that company was hiring. I think you're right.

Context

Culture War Roundup for the week of June 8, 2026

KingOfTheBailey 9d ago

decently-smart, decently-attractive, actually kind of geeky ladies

I don't think there are, and this unfortunate fact has been one of the driving engines of the sex-relations theater of the culture war going all the way back to Elevatorgate. It drives the differences in how many women get into STEM, it drives the "I'm sorry, but I'm just flat-out exhausted by the constant string of even decent suitors thinking I'm their last romantic shot" that women in STEM/atheism/whatever have been writing about for 25 years, and generally it's pretty hard to find natal women who are into any given nerdy thing.

Context

Friday Fun Thread for June 5, 2026

KingOfTheBailey 10d ago

Don't get me wrong, the people who say these local models are "almost-Claude-tier" are still too starry-eyed. The model I recommended you - Qwen3.6-A3B-Q4_K_XL - told me earlier that both the Linux kernel and Busybox used the autotools. It will confabulate as badly as a previous-gen frontier model but if you drop it in an established project where it can read stuff written by people with a clue, it can often be guided into doing useful things like push through refactors and updates where the compiler and tests can keep it on track.

Context

Friday Fun Thread for June 5, 2026

KingOfTheBailey 10d ago

Your work is doing things in the most retarded way possible if they're forcing you to use local CPU inference only. I'm not one of the AI boosters around here but I do use models from the big US labs a fair bit at work. I can see the local models becoming more capable and even quite useful for local tasks, but there's no way I'm getting one to vomit up a project from nothing and expecting miracles. I have had asked for audits of old code and had it found bugs that I missed (as well as a lot of noise).

As far as I can tell, llama.cpp is a lot better than ollama. While it's much less of a turn-key experience, many models never seem to make it across to ollama, and its inference is slower in my experience. It also seems to be slower at picking up newer developments, like multi-token prediction:

By pairing a heavy target model (e.g., Gemma 4 31B) with a lightweight drafter (the MTP model), we can utilize idle compute to “predict” several future tokens at once with the drafter in less time than it takes for the target model to process just one token. The target model then verifies all of these suggested tokens in parallel.

For running local models, there are quite a few things to think about:

Which model to use: at the moment, the strongest locally-runnable coding model seems to be Qwen3.6.
How big of a model to use: smaller models are stupider. If it doesn't fit in VRAM, inference speed generally tanks too low to be useful. Qwen3.6 comes in two sizes: 27B (27 billion parameters) and 35B-A3B (35 billion parameters, but only 3B parameters are active for each token). The 27B model will be smarter because it activates more parameters per token but 35B-A3B model will be a lot faster. For your GPU, I'd try Qwen3.6-35B-A3B, not least because as a "mixture of experts" model, some of those experts can be kept on CPU.
What quantization of that model to use: this is where someone crunches the model down to make it take up less space at the cost of making it dumber. More accurate quantizations will also have slower inference. It seems like Q4_K_XL is usually a "sweet spot" most people go for, and here's someone claiming 80 tokens/sec with Qwen3.6-35B-A3B-Q4_K_XL on a 12GB card: https://old.reddit.com/r/LocalLLaMA/comments/1t82zxv/80_toksec_and_128k_context_on_12gb_vram_with/ . Ignore the stuff about the MTP PR; that's since been merged to master.

Then there's actually getting it running and doing something useful. For that I'll defer to the above guide on how to launch llama-server for this model on a 12GB card. You'll then have to point OpenCode at your local server and see if it goes any better for you. No promises, my sense is that local stuff is on the edge of being "quite decent" and it's worth having a finger in the local model pie so when it does get genuinely good. I don't ever want to be locked into paying for subscription compute.

Context

Culture War Roundup for the week of June 1, 2026

KingOfTheBailey 11d ago

Don't forget "herd immunity".

Context

Culture War Roundup for the week of June 1, 2026

KingOfTheBailey 12d ago

Is this not just an instance of Sailer’s Law of Female Journalism?

The most heartfelt articles by female journalists tend to be demands that social values be overturned in order that, Come the Revolution, the journalist herself will be considered hotter-looking.

In this case, by problematizing the appreciation of the women currently considered hotter than her as "objectification".

Context

Transnational Thursday for June 4, 2026

KingOfTheBailey 12d ago

The same thing, Aboriginal land rights edition: https://youtube.com/watch?v=u5OlBT2OcGg

Context

Small-Scale Question Sunday for May 31, 2026

KingOfTheBailey 17d ago

I can't find any good community things about this. Could be the 13th? That's got things like Houdini writing about "Conjuring", Marconi writing about wireless telegraphy, etc:

The contributors included Niels Bohr (“Atom”), Marie and Irène Curie (“Radium”), Albert Einstein (“Space-Time”), Henry Ford (“Mass Production”), Sigmund Freud (“Psycho-analysis”), George Bernard Shaw (“Socialism: Principles and Outlook”), and Leon Trotsky (“Lenin”).

https://www.britannica.com/topic/Encyclopaedia-Britannica-English-language-reference-work/Thirteenth-edition

Context

Small-Scale Question Sunday for May 31, 2026

KingOfTheBailey 17d ago

And even if it's not the one that answers your question, there's apparently one edition of the Britannica that's considered peak. Does anyone remember which it is?

Context

Friday Fun Thread for May 22, 2026

KingOfTheBailey 25d ago

Yeah I agree with you and the two siblings that it's still a good show and very enjoyable to watch. The franchise had its spinoffs and its run and then died, but now Amazon MGM is attempting to revive it. What I'm asking is more: at what point does it become a referent in the nostalgia cycle whose 'memberberries can be harvested? It doesn't seem as ready to go as classic 90's conspiracy theory stuff (which would absolutely be part of 90s!Stranger Things), X-Files, etc.

Context

Friday Fun Thread for May 22, 2026

KingOfTheBailey 25d ago

At what point does Stargate (particularly SG-1) rotate back into consciousness?

Context

Culture War Roundup for the week of May 18, 2026

KingOfTheBailey 26d ago · Edited 26d ago

Over the past few weeks we've had several serious vulnerabilities found in the Linux kernel (CopyFail, DirtyFrag, PinTheft), and LLM assistance has reduced the gap between "suspicious bugfix smells like it might patch a vulnerability", "someone other than the reporter/reportee has PoC and/or a working exploit", and "attackers are deploying it live in the wild" to nearly zero time.

Curl is an unusually disciplined project, and I think it is hard to generalize lessons from it.

Context

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats

KingOfTheBailey

KingOfTheBailey