site banner

Tinker Tuesday for January 6, 2026

This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.

Post your project, your progress from last week, and what you hope to accomplish this week.

If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

1
Jump in the discussion.

No email address required.

Physical-work side, I got to do some siding repairs. That's been a !!fun!! way to spend the holiday break.

Software-side, trying to look into the state of modern sorting-assist tools. You'd think, will all of the advances in AI tech, classifying files and sorting them would be a solved or near-solved problem. Microsoft's "agentic AI" concept drives me up the wall for a wide variety of reasons, but this seems like one of the main killer use cases. If you've ever worked tech support for either Gen Y/Z or Boomers, seeing a Downloads or Documents folder with so many loose files that it causes an SSD to slow to a crawl is a pretty common experience, and they can't find shit (or, worse, can find ImportantDocument_final_last_(1)autorecover\current.docx, for now).

So I've been trying to come up with and evaluating possible solutions to this sorta thing.

  • Zero-shot classifiers like CLIP are well-established and everyone's favorite option (eg Immich, the Bellingcat sorter if you trust them, which I don't). Trivially easy to implement in python... badly. You're effectively trying to compare one caption against another in embedding space, and there's no guarantee (and a lot of anti-guarantees) that just because two words close to each other in meaning will have any proximity. And they're pretty limited in domain; many can't compare between pptx and a png period, and those that can do so with basically zero accuracy.
  • Few- and many-shot classifiers CNNs are also well-established and trivially easy to code for a single domain... but only if you know the categories you want before hand, and have somewhere between five and five hundred examples. Even outside of the training cost, that's another thing that's a great tool for a tiny number of uses. Vision transformers have the same problem except more so; I needed close to fifteen hundred pre-classified input files. And really messy to go from one domain to another.
  • Text-only LLMs operating based on file names already have fully-developed solutions. If anyone has a use case where people name their files but don't put them in folders, I guess that'd work for all three of them on the planet?
  • Multimodal LLMs are kinda there. You can just load up GLM4.6V (-flash), throw a supported file on the input, and ask it to put the file into one of several categories. It'll burn a stupid amount of tokens if the interaction between those categories is particularly weird, or if none of the categorizes make sense for a given input, but it's not too hard to handle a decently wide subset of image-or-document files on the input side, and natural human language for categorization. Problem's that you get LLM text as the output. You can try to force the LLM to keep the answer as concise and as limited to a category as you want, but there's nothing stopping it from spitting out parts or all of your categories when describing the one matching one, even in well-constrained environments.

That's on top of other issues specific to implementations: a lot of ViTs and multimodal LLMs depend heavily on breaking, while a lot of classifiers get really stupid if you have wildly different resolution inputs, multimodal LLMs can't distinguish between prompt and content, yada yada.

On the flip side, closely related topics are nearly >98% solved off the shelf, even ones that I'd consider a lot harder.

  • GLM4.6V can convert a short comic from visual medium into prose, with all the complexity that implies. Doesn't matter if a character's name is introduced several pages after they first show up, or they have aliases, or if two characters look similar but only show up in different contexts, or if you've got stuff that's clearly outside of the likely training data. I can give examples that confuse it, but they're problems like 'did it read where speech bubbles were pointing correctly' or 'can it read these abstract shapes'? And that's something with so useless a business case that I can't believe they made much synthetic data for it.
  • Give me code corresponding to a given layout (or pencil sketch of a layout)? That's trivial to build synthetic data and has a clear business case, but XAML sucks, and GLM4.6V-flash is just fine with basic use cases.
  • It's hard to overstate exactly how far categorization has gotten in single clear domains; if you only need to handle images, or only text, you're in great situations. I built a YOLO-based model from scratch for ten categories, it took less than six hours to train on a graphics card from 2018, 1k input images, >85% accuracy. Twelve years ago, that's a pipe dream for a doctoral program; today, it's a sign that I fucked up somewhere, and I need to consider between more epochs and switching some parameters around.

Windows 11 has gradually become unusable so I am migrating to Linux. So far it is on the same level of shittines than windows, but at least you know the pain will end at some point. I was hoping for it to be better behaved after so many years, but there is progress. Does anyone know how gaming with legally owned backups of games work?

For current-gen games, you're looking at protonDB. Yes, it's officially meant for Steam, but if you don't want to run it and add them as a non-Steam game, you can easily access the underlying tools using Lutris or Heroic Game Launcher. Lutris tends to be better, in my experience, for legally-owned-backups. Compatibility is good-but-not-perfect -- almost anything mainstream enough to sell through Steam in the last ten years is getting looked at, but marginal games under that bar might not, and a lot of the very popular multiplayer games with anti-cheat have trouble or just won't work.

Older stuff and more marginal games can be rougher. DOSBox works and exists, and there are linux-friendly ports (or native builds) of almost every past-gen console, though quality and performance varies on the PS3+ era. Go really far into the indies and it can be a mess, with some games having Linux-native builds despite being built around an ecosystem that absolutely loathes it, and others only coming up to functionality after a decade of attempted ports and then some random fix in photon-ge solved it.

As mentioned in last week's Wellness Wednesday thread, one of my new year's resolutions was to write and record an album this year. To that end, I committed to spending roughly an hour practising guitar every day in January.

For the album I released this time last year, I recorded the guitars using a detuned standard electric guitar, for which I had a special nut cut so that I could put heavy gauge strings on it. On the advice of the luthier who cut the nut for me, for this album I've instead bought myself a baritone guitar: an electric guitar with an unusually long neck and scale length, allowing the player to tune down to lower ranges while maintaining string tension, essentially the missing link between a standard electric guitar and a bass guitar.

Much of my guitar practice has been spent just getting used to the longer scale length of this guitar when compared to a typical electric guitar. I thought a good way of getting the hang of it would be to learn some songs written for a baritone guitar, and it popped into my head that Carcass, a band I listened to a lot as a teenager, used baritone guitars. For the last few days I've been trying to master this song. Hoo boy can these lads play fast: the "playback speed" feature in YouTube is a godsend. I figure once I can play the rhythm guitar for this song cleanly, I can say I've mastered the baritone guitar.