This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Your Bull and moderate option seems to miss an important middled. We go from ASI imminent to 'useful tool. I want to see a - will likely disrupt the economy and culture and society, regardless of whether AGI or ASI is coming.
Anyway:
This seems extremely, self-servingly narrow and contradictory. We want to show you how much an AI can do, in order to change your mind on it's limits. But please, do only pick something that it can do. This isn't question begging, but something like it.
Anyway, anyway.
How about an 8-bit side-scrolling video game with the relative complexity of Super Mario Brothers 3? If it can go write a full 'feature length' NES game, I'll be quite impressed. (But I'm playing more skeptical than I am)
Or more real world related:
A data replication tool that can move data from a SQL Server to PostGres database. It has to be able to use both time stamp incremental replication or log-based Change Data Capture on selection. You should be able to customize batch size, hard deletes, time-out, and activity on failure. I want a gui that allows me to select tables and ordering to schedule replication intervals, and to select columns on the table. Bonus points if it allows rows filtering conditions or other in-flight transformations.
If it does this latter one, I will beleive that most of IT infrastructure employment is over in 18 months.
I do not see how you can interpret us in that manner.
If the problem is deemed too hard by everyone (the person proposing it clearly believes the model can't do it), then what exactly does failure demonstrate? Nobody ever expected it to succeed within the given constraints. You can't evaluate automobiles in terms of their ability to reach Alpha Centauri. You can't adjudicate a debate between a Ferrari fanatic and a Lambo lover based on which car is more effective at deep sea exploration.
It takes disagreement on model capabilities and (expected) outcomes for all of this to be surprising or useful.
As we've clearly stated later, if we agree to the challenge, then we expect that the model can do something (that our counterparty thinks it can't), so the failure of the model goes against us, and will force us to update.
I'll forward the proposals to @strappingfrequent, assuming he doesn't show up in the thread. They seem reasonable enough to me, but I am clearly not the real expert here, and I'll be deferring to his judgment. That might take a little while to organize, I'll edit this into the main post for the sake of clarity.
Ok I’ll try in good faith to explain a final time.
You are asking the would be contestants to pick a challenge they think the AI is in capable of, but they have to guess within the bounds of what you think they are capable of. Yes, I get why you set it up this way, but it creates an extreme cherry-picking filter, which will naturally limit the amount of “updating”, which is going to occur.
There are other ways this “experiment” could be designed to avoid the cherry picking.
Joey Sportsdoer claims to be a great athlete, better than people give him credit for. And one of the ways he’s constantly underestimated is in how “broadly” athletic he is. So he lines up the doubters and says, start naming athletic feats you think I can’t succeed at, and then I’ll choose one I think I can do and do it.
This is not the best way to go about convincing folks of his general athletic prowress.
Of course neither is attempting feats he knows he can’t accomplish nor ones everyone agrees he can, but luckily these aren’t the only three ways to design his demonstration
Well, what are the specific ways you think the experiment can be improved, including the minimization of cherrypicking (without adding an unreasonable amount of additional effort on our parts)? Keep in mind we're two dudes in a shed, not Anthropic itself.
More options
Context Copy link
More options
Context Copy link
What I’m saying is you are asking users to come up with examples that they already by definition don’t believe it can accomplish, by definition of their skepticism.
But regardless, either of my two examples would greatly impress me. The former (nes video game), I would not update by the ability to write 80s console code within the limits of a NES performance specs. (I would be impressed but not update).
Specifically I want to see it plan and execute a full coherent game AND code it. It doesn’t need to one shot, but shouldn’t take creative inputs beyond the general concept and considerations.
The second is about writing enterprise reliable IT infrastructure software that would make a lot of Software companies obsolete immediately.
Duh? What on earth could you expect us to do differently? If the skeptic already believes the model to be capable of the task, why ask for a test?
There is non-zero value in discovering a task that both the two of us and the skeptic expect a model to achieve, and then witnessing it failing at it (unexpected, at least), but that is clearly not the primary purpose here. Someone else is welcome to try, after they're no longer swamped with a quadrillion entries. The set of tasks that the skeptics and I both expect models to accomplish is much larger than the one where we disagree.
Hence why I think your claim:
Is clearly nonsensical.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I doubt an AI agent's ability to generate a feature length anything that's coherent. Ask an AI to write a novel and it'll fizzle out around 10,000 words in. I'm convinced that the AI assisted smut romance novels that are popular recently are mostly driven by a human gooning while proompting the AI for the next chapter. I doubt that it can be done fully autonomously, those actually fake books that are just words on a page not included of course
I would expect that one of the biggest limitations on long run narrative coherence is time horizon. The doubling time for time horizon is anywhere from 2-7 months.
A typical novel is about 80000 words, so three doublings in length (6-21 months). To be conservative i'll assume novel complexity/task time scales with the square of word count. This is based on each additional word having to mesh with all previous words. This would give 6 doublings or 12-42 months.
I suspect this is an overestimate because complexity probably increases until the climax then begins to drop off.
More options
Context Copy link
To be fair to AI I've fizzled out on a dozen or so stories after writing about 10k words.
I think there might be a hump at a that point where where story idea turns into story and I'm not sure it's easy for most people to pass.
More options
Context Copy link
This is very unlikely to be accepted:
Too subjective to be useful, and far too ambiguous. Who's doing the grading here? How are they assessing "coherence"? How are we blinding things, if not, how do we account for bias?
We strongly prefer actual programming tasks, not creative writing. We could easily ask Claude to write a novel, and it would do it, but then we're back at the issue of grading it properly.
If you want to propose something like this, you need to be as rigorous as @faul_sname up in the thread. At the very least, propose evaluators that aren't you or the two of us, and we can see if it's possible to make this work.
This wasn't meant as a suggestion, just an observation. My suggestion is below.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link