site banner

Friday Fun Thread for December 5, 2025

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

Google's been trying something similar with Sima 1 and the more relevant Sima 2, though I'm not seeing anywhere near as much information about what the model parameters and configuration were for that one. Qwen-2VL-7B seems, intuitively, way too small to make this sort of deep analysis and decision making, and it's kinda weird that a lab environment didn't go to something like Qwen-2.5VL-32B. But 7B was also obscenely good at captioning videos and making problem solving analysis from it, and people had gotten some results, if not great ones before.

Unfortunately, a lot of the value in the study is going to depend on exactly what and how they tested the model, and there's really not enough detail here. An hour-long autonomous play session of 'finish this mission' is the big selling point, but I don't know Genshin well enough to say whether a) that mission was nontrivially different from training data or b) that it involved more than 'follow quest marker, spam A at enemies when lock-on-button does anything.

It'd be interesting to see more information about how well these models handle completely out-of-training problems, though. I've talked about using a MineCraft mod to see how well a model can create a 'new' solution, but these sort of games are trivially easy to present completely out-of-training problems, ranging from stuff as trivial as an enemy or attack that's changed color, all the way up to completely novel gameplay mechanics (eg, FFXIV threw in a "change color to reflect attacks" mechanic several years after initial release). I wouldn't expect an LLM to possibly one-shot every version of this, and some probably aren't possible for architectural reasons (eg, even if a model could go from vanilla minecraft to GTNH from, no plausible memory-constrained implementation would have the context window for even some mid-game recipes), but I think it'd say some interesting things regardless.