site banner

Friday Fun Thread for July 3, 2026

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

https://www.hyperstitionai.com/unslop-results

I came across the results of this contest to generate LLM-written stories, with one of our participants being our very own @self_made_human. I'd be interested in any thoughts you had on participating in the contest as well, if you had any.

My main takeaway after skimming the finalists was that while some of the concepts were interesting, even with a lot of prompting and harness effort, there's just not much difference from the sloppy prose and incoherent writing you get when one-shotting text out of an LLM.

It's hard to tell how much of this is just because writing's not a focus at the LLM labs right now, but the fact that every LLM converges onto similar writing attractor states definitely makes me bearish on LLM's ability to continue to generalize.

Some of the weirdness is downstream of the contest parameters; I'd put some effort toward a short story using a recursive writing approach, but by the time it was remotely good prose, it wasn't clear it complied with the rules. Had similar issues with Phailyoor's challenge.

Some of it is definitely a toolset problem. The models themselves overwhelmingly aim toward <1k word 'chunks', and community efforts to bypass that like WriteLonger are very much band-aids. It's relatively easy to throw together a scaffold of scenes that interlace together, but you still get mini-climaxes at the end of each prompt and that gets grating fast. Claude tries to do something under the hood to hide the problem, as do some agentic setups, but they seem to do so by just gluing segments together. Whether you do that automatically or manually, it still results in unpleasant crescendos because the model thinks the scene or segment ends in places the tension is supposed to be rising. Some prompt scaffolds to give better control over pacing only squeeze that further together.

Like conventional art, there's a big problem where even most writers don't know the technical terms for what they're trying to do, and unlike conventional art with diffusers, writing in the Style of X doesn't work very well, and anything else takes a ton of input tokens. I've had some limited success by giving a handwritten example as input and then transferring the characters and background with them, but it's still not good, and it doesn't scale to longer works.

Context remains an issue. It's amazing that models can get into the 1-million-range, but few do well there. That's more of an issue with long-form writing -- this story isn't very good, sometimes in painful ways because it's close to it, but the prose quality goes from merely purple to outright blah by 30k tokens -- but it does still mean you can't just throw a hundred thousand words of setting and style bible into most models and get anything useful out.

Weirdly, the models are great at editing, both broad strokes and catching narrow typos, including for smaller models and sometimes even for genres that should be really hard for a token predictor to understand. The outcomes aren't always in the form I like, but it's usually nothing awful, or even as bad as the original prose. But you can't just cycle the same text through an LLM with a prompt of "do it better": on top of the token costs, it tends to just get set in cycles around a small subset of problems.

I keep hoping it'd be possible to get something better out of a more complicated script-based approach rather than a solely agentic one, but results have been mixed, and you end up with a thirty-page deep list of checkboxes for the models to review over and over.