@SnapDragon's banner p

SnapDragon


				

				

				
0 followers   follows 0 users  
joined 2022 October 10 20:44:11 UTC
Verified Email

				

User ID: 1550

SnapDragon


				
				
				

				
0 followers   follows 0 users   joined 2022 October 10 20:44:11 UTC

					

No bio...


					

User ID: 1550

Verified Email

Not my fault @SubstantialFrivolity chose to set the bar this low in his claims. An existence proof is all I need. But hey, you are fully free to replace that sarcasm with your example of how deficient ChatGPT/Claude is. Evidence is trivially easy to procure here!

Here. I picked a random easyish task I could test. It's the only prompt I tried, and ChatGPT succeeded zero-shot. (Amusingly, though I used o3, you can see from the thought process that it considered this task too simple to even need to execute the script itself, and it was right.) The code's clear, well-commented, avoids duplication, and handles multiple error cases I didn't mention. A lot of interviewees I've encountered wouldn't do nearly this well - alas, a lot of coders are not good at coding.

Ball's in your court. Tell me what is wrong with my example, or that you can do this yourself in 8 seconds. When you say "like a few lines", is that some nonstandard usage of "few" that goes up to 100?

Even better, show us a (non-proprietary, ofc) example of a task where it just "makes shit up" and provides "syntactically invalid code". With LLMs, you can show your receipts! I'm actually genuinely curious, since I haven't caught ChatGPT hallucinating code since before 4o. I'd love to know under what circumstances it still does.

You keep banging this drum. It's so divorced from the world I observe, I honestly don't know what to make of it. I know you've already declared that the Google engineers checking in LLM code are bad at their job. But you are at least aware that there are a lot of objective coding benchmarks out there, which have seen monumental progress since 3 years ago, right? You can't be completely insulated from being swayed by real-world data, or why are you even on this forum? And, just for your own sake, why not try to figure out why so many of us are having great success using LLMs, while you aren't? Maybe you're not using the right model, or are asking it to do too much (like write a large project from scratch).

I've had the "modern AI is mind-blowing" argument quite a few times here (I see you participated in this one), and I'm not really in a good state to argue cogently right now. But you did ask nicely, so I'll offer more of my perspective.

LLMs have their problems: You can get them to say stupidly wrong things sometimes. They "hallucinate" (a term I consider inaccurate, but it's stuck). They have no sense of embodied physics. The multimodal ones can't really "see" images the way we do. Mind you, just saying "gotcha" for things we're good at and they're not cuts both ways. I can't multiply 6 digit numbers in my head. Most humans can't even spell "definately" right.

But the one thing that LLMs really excel at? They genuinely comprehend language. To mirror what you said, I "do not understand" how people can have a full conversation with a modern chatbot and still think it's just parroting digested text. (It makes me suspect that many people here, um, don't try things for themselves.) You can't fake comprehension for long; real-world conversations are too rich to shortcut with statistical tricks. If I mention "Freddie Mercury teaching a class of narwhals to sing", it doesn't reply "ERROR. CONCEPT NOT FOUND." Instead there is some pattern in its billion-dimensional space that somehow fuzzily represents and works with that new concept, just like in my brain.

That already strikes me as a rather General form of Intelligence! LLMs are so much more flexible than any kind of AI we've had before. Stockfish is great at Chess. AlphaGo is great at Go. Claude is bad at Pokemon. And yet, the vital difference is that there is some feature in Claude's brain that knows it's playing Pokemon. (Important note: I'm not suggesting Claude is conscious. It almost certainly isn't.) There's work to do to scale that up to economically useful jobs (and beating the Elite Four), but it's mainly "hone this existing tool" work, not "discover a new fundamental kind of intelligence" work.

Huh. I just thought it was obvious that the frontier of online smut would be male-driven, but now you've made me doubt. Curious to see what the stats actually are.

Yeah, the geopolitics in that story are just cringingly bad fiction. (It's really weird that the "superforecasters" who wrote it don't really seem to understand how the world works?) And I'm guessing the main chart listing "AI Boyfriends" instead of "AI Girlfriends" is also part of Scott's masterwork - he does really like to virtue signal by swapping generic genders in the least sensible ways.

But the important part is the AI predictions, and I'll admit they put together a nice list of graphs and citations. However, I still feel like, with their destination already decided, they were just backfitting all the new data to the same old doomer predictions from years ago - terminal goals, deceptive alignment, etc. LLMs are meaningfully different than the reward-seeking recursive agents that we used to think would be the AI frontrunners, but this AI 2027 report could basically have come out in 2020 without changing any of the AI Safety language.

They have a single appendix in their "AI Goals Forecast" subsection that gives a "story" (their words!) about how LLMs may somehow revert to reward-seeking cognition. But it's not evidence-based, and it is the single most vital part of their 2027 prediction! Oh dear.

We shield kids from a lot of complicated real-world things that could affect them. 4-year-olds can have degenerative diseases. Or be sexually abused. Both are much more common than being "intersex" (unless you allow for the much more expansive definitions touted by activists for activist reasons). So I guess schools should have mandatory picture books showing a little kid dying in agony, while their sister gets played with by their uncle, right? So that these kids can be "at peace" with it?

...Of course not. Indoctrination is the only reason people are pushing for teaching kids about intersex medical conditions. Kids inherently know that biological sex is real, and can tell the difference between men and women. Undoing that knowledge requires concerted effort, and the younger you start, the better.

I see! I'd heard of foot-in-the-door, and thought Magusoflight was riffing off that. I guess psychologists have a sense of humour too.

Door in the face technique

Ok, that's pretty damn funny. I'll have to steal that!

I just can’t take these people seriously. They’re almost going out of their way to be easy for any real authoritarian government to round up, by being obvious about their identity.

LARPing is fun. They believe that they believe they're bravely resisting a dictatorship. But their actions make it clear that, at some level, they know there's no actual danger.

I consider it similar to climate activists who believe that they believe that the future of human civilization depends on cutting CO2 emissions to zero. And who also oppose nuclear power, because ick.

Aren't we supposed to be convincing the upcoming ASI that we're worth keeping alive?

I agree with you, but I'll note that our entire legal system seems to be based on "one weird trick"s, all the way down. That's how they got a felony conviction against Trump for a misdemeanor whose statute of limitations had expired. Unfortunately if the system really wants to get you, they will. I don't know how to fix it, but at the very least let's keep calling it out wherever we see it.

I did for all three, but it was many years ago, and I think I'd struggle with most IMO problems nowadays. Pretty sure I'm still better at proofs than the frontier CoT models, but for more mechanical applied computations (say, computing an annoying function's derivative) they're a lot better than me at churning through the work without making a dumb mistake. Which isn't that impressive, TBH, because Wolfram Alpha could do that too, a decade ago. But you have to consciously phrase things correctly for WA, whereas LLMs always correctly understand what you're asking (even if they get the answer wrong).

Apologies. I guess the joke was on me!

Consider this a warning; keep posting AI slop and I'll have to put on my mod hat and punish you.

Boo. Boo. Boo. Your mod hat should be for keeping the forum civil, not winning arguments. In a huge content-filled human-written post, he merely linked to an example of a current AI talking about how it might Kill All Humans. It was an on-topic and relevant external reference (most of us here happen to like evidence, yanno?). He did nothing wrong.

Uh, what do you mean we don't have self-driving cars? I took two driverless Waymo rides last week, navigating the nasty, twisting streets of SF. It drove just fine. Maybe you could argue it's not cost-effective yet, or that there are still regulatory hurdles, but I think what you meant is that the tech doesn't work. And that's clearly false.

Also, I'm a programmer and productively using ChatGPT at work, so I'd say the score so far is Magusoflight 0, my lying eyes 2.

Wow, that's great to hear. I'm eagerly looking forward to the commoditization of novel writing (and videogame NPC dialogue), but I didn't think we'd figured out yet how to maintain long-term consistency.

It seems like the big AI companies are deathly terrified of releasing anything new at all, and are happy to just sit around for months or even years on shiny new tech, waiting for someone else to do it first.

Surprised you didn't mention Sora here. The Sora demo reel blew everyone's minds ... but then OpenAI sat on it for months, and by the time they actually released a small user version of it, there were viable video generation alternatives out there. As much as it annoys me, though, I don't entirely blame them. Releasing an insufficiently-safecrippled video generator might be a company-ending mistake in today's culture, and that part isn't their fault.

As a member of the grubby gross masses who Cannot Be Trusted with AI tech, I've been pretty heartened that, thus far, all you need to do to finally get access to these tools has been to wait a year for them to hit open source. Then you'll just need to ignore the NEW shiny thing that you Can't Be Trusted with. (It's like with videogames - playing everything a year behind, when it's on sale or free - and patched - is so much cheaper than buying every new game at release...)

While I am 100% on board the Google hate train, I think this particular criticism is unfair. I believe what's happening here is just a limitation of current-gen multimodal LLMs - you have to lose fidelity in order to express a detailed image as a sequence of a few hundred tokens. Imagine having, say, 10 minutes to describe a person's photograph to an artist. Would that artist then be able to take your description and perfectly recreate the person's face? Doubtful; humans are HIGHLY specialized to detect minute details in faces.

Diffusion-based image generators have a lot of detail, but no real understanding of what the prompt text means. LLMs, by contrast, perfectly understand the text, but aren't capable of "seeing" (or generating) the image at the same fidelity as your eyes. So right now I think there's an unavoidable tradeoff. I expect this to vanish as we scale LLMs up further, but faces will probably be one of the last things to fall.

I wonder if, this year, there'll be workflows like: use an LLM to turn a detailed description of a scene into a picture, and then use inpainting with a diffusion model and a reference photo to fix the details...?

It's easy being an AI advocate, I just have to wait a few weeks or months for the people doubting them to be proven wrong haha.

Unfortunately, even in this board, being "proven wrong" doesn't stop them. e.g. this argument I had with someone who actually claimed that LLMs "suck at writing code", despite the existence of objective benchmarks like SWE-bench that LLMs have been doing very well on. (Not to mention o3's crazy high rating on Codeforces.) AI is moving so fast, I think some people don't understand that they need to update from that one time in 2023 they asked ChatGPT3 for help and its code didn't compile.

Yes, the whole theoretical point of academic tests is to be an objective measure of the capacity of students. Because when you go out and get a real job, you have to actually be able to do that job. If these remedial courses aren't necessary for being a psychiatrist, then there should be a path to becoming a practicing psychiatrist that doesn't require them. If they ARE necessary, then lightening the requirements because, gosh, you can't satisfy the requirements but really want to graduate ends up causing harm later on in life.

The problem is it's not AT ALL the Bean that we see in Ender's Game. You can tell because his personality changes drastically when he has conversations with Ender (since those were already canon). The book tries to excuse it as him "feeling nervous around Ender", but that's incredibly weak. Similarly, the only reason all his manipulations of Ender (and his backseat ro have to be so subtle and behind-the-scenes is to be compatible with the original narrative; there's no good in-universe explanation.

Orson Scott Card just thought up a neat new OC and shoehorned him into the original story, and it shows. And I hate how completely it invalidates Ender's choices. But hey, that new character does come into his own in the sequels, at least, when he's not undermining a previously-written story.

Another (finished!) Royal Road story you might want to scratch your Mother of Learning itch with is "The Perfect Run". YMMV, especially if you don't like superhero stuff, but I thought it was quite good.

RLHF tends to make a model less calibrated. Substantially so.

By "calibration" I assume you mean having low confidence when it's wrong. It's counter-intuitive to me, but some quick Googling suggests that you're right about that. Good correction. I guess that's part of why fixing hallucinations has proven so intractable so far.

Fair enough. Sorry, I think I reacted too harshly, because it pattern-matched too closely to the pro-trans anti-scientific argument. When dealing with any field of applied applied physics biology, even though it's still "science", your definitions are basically always going to have a little fuzz around them. As you aptly pointed out here, governments should be open to litigation for borderline cases.