Here. I picked a random easyish task I could test. It's the only prompt I tried, and ChatGPT succeeded zero-shot. (Amusingly, though I used o3, you can see from the thought process that it considered this task too simple to even need to execute the script itself, and it was right.) The code's clear, well-commented, avoids duplication, and handles multiple error cases I didn't mention. A lot of interviewees I've encountered wouldn't do nearly this well - alas, a lot of coders are not good at coding.
Ball's in your court. Tell me what is wrong with my example, or that you can do this yourself in 8 seconds. When you say "like a few lines", is that some nonstandard usage of "few" that goes up to 100?
Even better, show us a (non-proprietary, ofc) example of a task where it just "makes shit up" and provides "syntactically invalid code". With LLMs, you can show your receipts! I'm actually genuinely curious, since I haven't caught ChatGPT hallucinating code since before 4o. I'd love to know under what circumstances it still does.
The thing is, the people producing the novel can use the detectors too, and iterate until the signal goes away. I have a friend who is taking some college courses that require essays, and they're explicitly told that Grammarly must not flag the essay as AI-written. Unfortunately (and somewhat amusingly), the detector sucks, and her normal writing is flagged as AI-written all the time - she has to rewrite it in a more awkward manner to get the metric below the threshold. Similarly, I imagine any given GPT detector could be defeated by just hooking it up to the GPT in a negative-feedback loop.
Definitely an important point. I agree that there is a real possibility of societal breakdown under those kinds of conditions. Hopefully, even if UBI efforts never go anywhere, people will still somehow scrounge up enough to eat and immerse themselves in videogames. (We're kind of halfway there today, to be honest, judging from most of my friends.) Somehow society and the economy survived the insane COVID shutdowns (surprising me). I have hope they'll be resilient enough to survive this too. But there's no historical precedent we can point to...
Eh, I guess I'm incoherent then. I generally do use people's preferred pronouns in person; it's polite, and not every moment of your life needs to be spent fighting political battles. Caitlyn Jenner's put a lot of effort into living as a woman, and isn't a bad actor, and has passed some poorly-defined tipping point where I'm ok with calling her a her. I just don't want it to be mandatory. I want it to be ok to disagree on who's a "valid" trans person. I absolutely don't want Stalinist revision of history/Wikipedia to pretend that Bruce Jenner never existed. And in the appropriate discussions I want to be free to point out that it's all just paying lip service to a fantasy. "XXX isn't a real woman" is a true statement that I should be allowed to say; but I generally wouldn't, any more than I'd point out that "YYY is ugly".
Great post. But I'm pessimistic; Scott's posted about how EA is positively addicted to criticizing itself, but the trans movement is definitely not like that. You Shall Not Question the orthodox heterodoxy. People like Ziz may look ridiculous and act mentally deluded (dangerously so, in retrospect), but it wouldn't be "kind" to point that out!
When I go to rationalist meetups, I actually think declaring myself to be a creationist would be met more warmly than declaring that biology is real and there's no such thing as a "female brain in a male body". (Hell, I bet people would be enthused at getting to argue with a creationist.) Because of this, I have no way to know whether 10% or 90% of the people around me are reasonable and won't declare me an Enemy of the People for saying unfashionable true things. If it really is 90% ... well, maybe there's hope. We'd just need a phase change where it becomes common knowledge that most people are anti-communist gender-realist.
It seems like the big AI companies are deathly terrified of releasing anything new at all, and are happy to just sit around for months or even years on shiny new tech, waiting for someone else to do it first.
Surprised you didn't mention Sora here. The Sora demo reel blew everyone's minds ... but then OpenAI sat on it for months, and by the time they actually released a small user version of it, there were viable video generation alternatives out there. As much as it annoys me, though, I don't entirely blame them. Releasing an insufficiently-safecrippled video generator might be a company-ending mistake in today's culture, and that part isn't their fault.
As a member of the grubby gross masses who Cannot Be Trusted with AI tech, I've been pretty heartened that, thus far, all you need to do to finally get access to these tools has been to wait a year for them to hit open source. Then you'll just need to ignore the NEW shiny thing that you Can't Be Trusted with. (It's like with videogames - playing everything a year behind, when it's on sale or free - and patched - is so much cheaper than buying every new game at release...)
You keep banging this drum. It's so divorced from the world I observe, I honestly don't know what to make of it. I know you've already declared that the Google engineers checking in LLM code are bad at their job. But you are at least aware that there are a lot of objective coding benchmarks out there, which have seen monumental progress since 3 years ago, right? You can't be completely insulated from being swayed by real-world data, or why are you even on this forum? And, just for your own sake, why not try to figure out why so many of us are having great success using LLMs, while you aren't? Maybe you're not using the right model, or are asking it to do too much (like write a large project from scratch).
Aren't we supposed to be convincing the upcoming ASI that we're worth keeping alive?
And interestingly, one of Douglas Adams' last projects was Starship Titanic, which was arguably the first game to have "chatbot" NPCs - in 1998! Obviously they didn't work very well, but the game's ambition was very ahead of its time.
Any particular reason why you're optimistic? What are your priors in regards to AI?
Same as you, I don't pretend to be able to predict an unknowable technological future and am just relying on vibes, so I'm not sure I can satisfy you with a precise answer...? I outlined why I'm so impressed with even current-day AI here. I've been working in AI-adjacent fields for 25 years, and LLMs are truly incomparable in generality to anything that has come before. AGI feels close (depending on your definition), and ASI doesn't, because most of our gains are just coming from scaling compute up (with logarithmic benefits). So it's not an existential threat, it's just a brand new transformative tech, and historically that almost always leads to a richer society.
You don't tend to get negative effects on the tech tree in a 4X game, after all. :)
Yeah, it's a good question that I can't answer. I suspect if all humans somehow held to a (not perfect but decent) standard of not driving impaired or distracted, signaling properly, and driving the speed limit or even slower in dangerous conditions ... that would probably decrease accidents by at least 80% too. So maybe self-driving cars are still worse than that.
I see! I'd heard of foot-in-the-door, and thought Magusoflight was riffing off that. I guess psychologists have a sense of humour too.
Sounds like you have some practical experience here. Yeah, if just iterating doesn't help and a human has to step in to "fix" the output, then at least there'll be some effort required to bring an AI novel to market. But it does feel like detectors (even the good non-Grammarly ones) are the underdogs fighting a doomed battle.
While I am 100% on board the Google hate train, I think this particular criticism is unfair. I believe what's happening here is just a limitation of current-gen multimodal LLMs - you have to lose fidelity in order to express a detailed image as a sequence of a few hundred tokens. Imagine having, say, 10 minutes to describe a person's photograph to an artist. Would that artist then be able to take your description and perfectly recreate the person's face? Doubtful; humans are HIGHLY specialized to detect minute details in faces.
Diffusion-based image generators have a lot of detail, but no real understanding of what the prompt text means. LLMs, by contrast, perfectly understand the text, but aren't capable of "seeing" (or generating) the image at the same fidelity as your eyes. So right now I think there's an unavoidable tradeoff. I expect this to vanish as we scale LLMs up further, but faces will probably be one of the last things to fall.
I wonder if, this year, there'll be workflows like: use an LLM to turn a detailed description of a scene into a picture, and then use inpainting with a diffusion model and a reference photo to fix the details...?
The clueless people who made Last Wish really messed up. They were supposed to make a soulless by-the-numbers sequel to a forgettable spinoff of an overrated series. Instead they made one of the best animated films in years, better than anything Pixar's done since Coco. I sure hope somebody got fired for that.
It's easy being an AI advocate, I just have to wait a few weeks or months for the people doubting them to be proven wrong haha.
Unfortunately, even in this board, being "proven wrong" doesn't stop them. e.g. this argument I had with someone who actually claimed that LLMs "suck at writing code", despite the existence of objective benchmarks like SWE-bench that LLMs have been doing very well on. (Not to mention o3's crazy high rating on Codeforces.) AI is moving so fast, I think some people don't understand that they need to update from that one time in 2023 they asked ChatGPT3 for help and its code didn't compile.
The problem is it's not AT ALL the Bean that we see in Ender's Game. You can tell because his personality changes drastically when he has conversations with Ender (since those were already canon). The book tries to excuse it as him "feeling nervous around Ender", but that's incredibly weak. Similarly, the only reason all his manipulations of Ender (and his backseat ro have to be so subtle and behind-the-scenes is to be compatible with the original narrative; there's no good in-universe explanation.
Orson Scott Card just thought up a neat new OC and shoehorned him into the original story, and it shows. And I hate how completely it invalidates Ender's choices. But hey, that new character does come into his own in the sequels, at least, when he's not undermining a previously-written story.
Well, based on what I know of the Canadian indigenous peoples (who the current PC treadmill calls the "First Nations"), there's a lot of crime, misery, and unrest as a result. But hey, people addicted to videogames are less destructive than people addicted to alcohol, so we'll see.
(Also, I really don't expect to see decent neuralink tech by 2030. It's just too damn hard.)
Not my fault @SubstantialFrivolity chose to set the bar this low in his claims. An existence proof is all I need. But hey, you are fully free to replace that sarcasm with your example of how deficient ChatGPT/Claude is. Evidence is trivially easy to procure here!
Consider this a warning; keep posting AI slop and I'll have to put on my mod hat and punish you.
Boo. Boo. Boo. Your mod hat should be for keeping the forum civil, not winning arguments. In a huge content-filled human-written post, he merely linked to an example of a current AI talking about how it might Kill All Humans. It was an on-topic and relevant external reference (most of us here happen to like evidence, yanno?). He did nothing wrong.
they still suck at writing code
Hoo boy. Speaking as an programmer who uses LLMs regularly to help with his work, you're very, VERY wrong about that. Maybe you should go tell Google that the 20% of their new code that is written by AI is all garbage. The code modern LLMs generate is typically well-commented, well-reasoned, and well-tested, because LLMs don't take the same lazy shortcuts that humans do. It's not perfect, of course, and not quite as elegant as an experienced programmer can manage, but that's not the standard we're measuring by. You should see the code that "junior engineers" often get away with...
They do not have a cognitive architecture that resembles human neurology. In terms of memory, they have a short-term memory and a longterm one, but the two are entirely separate, without an intermediate outside of the training phase. The closest a human would get is if they had a neurological defect that erased the consolidation of long term memory.
Insofar as any analogy is really going to help us understand how LLMs think, I still think this is a little off. I don't believe their context window really behaves in the same way as "short-term memory" does for us. When I'm thinking about a problem, I can send impressions and abstract concepts swirling around in my mind - whereas an LLM can only output more words for the next pass of the token predictor. If we somehow allowed the context window to consist of full embeddings rather than mere tokens, then I'd believe there was more of a short-term thought process going on.
I've heard LLM thinking described as "reflex", and that seems very accurate to me, since there's no intent and only a few brief layers of abstract thought (ie, embedding transformations) behind the words it produces. Because it's a simulated brain, we can read its thoughts and, quantum-magically, pick the word that it would be least surprised to see next (just like smurf how your brain kind of needle scratches at the word "smurf" there). What's unexpected, of course - what totally threw me for a loop back when GPT3 and then ChatGPT shocked us all - is that this "reflex" performs so much better than what we humans could manage with a similar handicap.
The real belief I've updated over the last couple of years is that language is easier than we thought, and we're not particularly good at it. It's too new for humans to really have evolved our brains for it; maybe it just happened that a brain that hunts really really well is also pretty good at picking up language as a side hobby. For decades we thought an AI passing the Turing test, and then understanding the world well enough to participate in human civilization, would require a similar level of complexity to our brain. In reality, it actually seems to require many orders of magnitude less. (And I strongly suspect that running the LLM next-token prediction algorithm is not a very efficient way to create a neural net that can communicate with us - it's just the only way we've discovered so far.)
Huh. I just thought it was obvious that the frontier of online smut would be male-driven, but now you've made me doubt. Curious to see what the stats actually are.
Door in the face technique
Ok, that's pretty damn funny. I'll have to steal that!
Sorry, this is just tired philosobabble, which I have no patience for. All the biological ways to define man and woman agree in >99% of cases, and agree with what humans instinctively know, too. If you want to pretend that obvious things aren't obvious for the sake of your political goals, I'm not going to play along. That's anti-intelligence.
More options
Context Copy link