This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
Fair enough. Thanks for clarifying.
Do you have any thoughts that you'd be willing to share on what I wrote concerning the amount of knowledge work currently required to be input to do things like the task I was thinking about? I suppose I wasn't entirely clear, but I think it would likely fail to do the analysis task on its own. For clarity, this is a task that I thought, "It might be weird enough that no one's done it yet, but it's close enough to the standard stuff that I could almost certainly give it to a student who did well enough in their flight mechanics course, and they could almost certainly just do it." That seems to have been partly justified in that I found a publication in which a student did just do it (and skimming the paper, the analysis seems about on par with what I had expected; I guess my flaw was thinking the idea was sufficiently 'weird'; I guess it says something about the state of aerospace that someone out there has done almost every basic variant, sort of regardless of whether it makes sense to do). I'm probably <50% on whether it would make the "right" engineering implementation choices on its own. I don't have a precise number. I think it might get lucky, because there's a pretty large set of choices available, and I hadn't yet tailored the problem so that it requires it to really think conceptually about what's going on and only pick from a small subset; there's a good enough chance that it could guess somewhat randomly or pick a popular one that happens to work (though I'm not sure if it'll put the right context around it even if it does).
Perhaps, given your comment below, this is just something that you mostly don't care about. Does this sort of thing just bucket into, "No, it can't do this sort of knowledge work now, but with sufficient recursive self-improvement, it will be able to do it later"? (I guess, in line with your stated AGI timelines?)
I am really the wrong person to ask this. I don't regularly use LLMs for programming purposes, when I do, it's usually for didactical purposes, or small bespoke utilities.
The most ambitious project I tried was a mod for Rimworld, which didn't work. To be fair to the models, I was asking for something very niche, and I wasn't using an IDE instead of the chat interface. I ended up borrowing open-source code and editing it, and just using AI image generation for art assets (which worked very well, to the point it pissed off the more puritan modders in the Discord). I can mention that the issues I ran into were the models being unfamiliar with the code for the mod I intended to support (Combat Extended, a massive overhaul of core systems), and that what knowledge they had innately was outdated. I was too unfamiliar with Rimworld modding to be confident that editing their efforts was worth my time. Other people have succeeded in writing bigger mods that work well (as far as I can tell) using AI, so there's definitely an element of skill-issue on my part.
SF might have actually useful observations, but he's a lurker to the core, and I'm the forward-facing entity for the moment. He says he's generally busy with work right now, so I wouldn't wait on him to respond, though I'd be happy if he did.
If you insist:
I don't know if it can do this kind of knowledge work, but I do expect that it will be able to short-order. I make no firm commitments on whether this will be the direct consequence of RSI (since labs are opaque about methodology), or if it'll be a simple consequence of further scaling and increasingly intensive RLVR.
(¿Por que no los dos?)
Either way, I think it's more likely than not the kind of problem you describe will be trivial within a year or two. My impression is that the models can just about do what you want them to do, but with significant frustration and wasted time on your part. That is already a very strong starting point, can you imagine asking GPT-4 to even attempt any of this and get working results?
Thanks again for the kind and thorough response.
I would quibble with this. What I want them to do is to be able to help me with analysis that I don't already know how to do. I wrote it this way a couple days ago:
The reason why I was thinking about the particular flight mechanics problem for this thread here was that I wanted to further drive in that wedge that I think is between the folks who think that most knowledge work is already automatable and those who think that it can be useful if you already know what you're doing. Thus, even a problem where I'm quite confident that I could do the analysis, I predicted that the LLM would fail on its own without significant knowledge-work-educated input. To me, this means that there are two significant steps that the models must overcome before we're thinking about a possible world where basically all knowledge work is automatable.
Maybe as an aside, I'm able to leverage collaborators at multiple levels, from profs to post-docs to PhD students to MS students to undergrads. My experience has been that coming up with the right problem to solve is actually a huge part of the battle. During that process, I'm always considering if I can spin out sub-problems or related problems that may be useful to consider on the way to what we really want (or sufficient contributions in their own right). When considering them, I mentally bin them into a hierarchy. If it's a problem that I'm near 100% sure I could just sit down and do, perhaps I've already done all of the pieces, but never done quite that variant before, and now it seems like that variant might be of interest, it's a plausible candidate to go to an undergrad. On the other end, the vaguest, most conceptually-dense questions, I may reserve for conversations just with profs. There is sometimes something to be said for not "distracting the students" by letting them spin their wheels on something that they're not likely to really contribute on anyway. I have somewhat of a sliding scale for the in-between students/post-docs; I've put words to the basic contours of that scale before, but I don't think I'll bother here, because it's not the most important. There is a possible slight correction factor available if I've been working with a student for long enough to know that they're substantially better/worse than the average student in their category.
In any event, perhaps if I had listed out all of the steps of this scale, I'd have even more than two significant steps that models must overcome, but for my purposes in this thread, I was trying to pick a problem that was pretty directly in the realm of, "I could just give this to an undergrad."
Yes, could I bang on an LLM long enough, the amount of will required being dependent on the particular problem, that it eventually finds its way to the answer that I already knew was the answer all along? Yeah, probably. Is this a huge upgrade from GPT-4? Honestly, I don't know; I gave up back in those days rather than ever really try to beat it into submission.
...but this still is just not really useful, at least not if the goal is to actually automate the knowledge work piece. Sure, it's potentially useful once I've already done all the knowledge work, and I'm sitting down to actually code the thing that I definitely know how to code. But more likely, at this point, it's going to be useful to the student who I've asked to code the thing, because I'm probably not coding it myself, anyway.
I don't really have a good timeline or prediction for if/when some sort of AI system will cross these various thresholds. I'm still hopeful on the straight math side, as I said in my comment a couple days ago. But if the purpose of this exercise here is to find problems that cause someone to update, I was hoping that, "Here's a problem that I'm comfortable that I could give to an undergrad and pretty confident the LLM will fail," could pull you at least epsilon away from thinking that quite so much of knowledge work is currently automatable or perhaps epsilon more cautious about believing that it's quite so imminent.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link