@SnapDragon's banner p

SnapDragon


				

				

				
0 followers   follows 0 users  
joined 2022 October 10 20:44:11 UTC
Verified Email

				

User ID: 1550

SnapDragon


				
				
				

				
0 followers   follows 0 users   joined 2022 October 10 20:44:11 UTC

					

No bio...


					

User ID: 1550

Verified Email

Consider this a warning; keep posting AI slop and I'll have to put on my mod hat and punish you.

Boo. Boo. Boo. Your mod hat should be for keeping the forum civil, not winning arguments. In a huge content-filled human-written post, he merely linked to an example of a current AI talking about how it might Kill All Humans. It was an on-topic and relevant external reference (most of us here happen to like evidence, yanno?). He did nothing wrong.

As far as I can tell, the vast vast VAST majority of it is slop full of repurposed music and lyrics that get by on being offensive rather than clever. Rap artists aren't known for being intelligent, after all. I suspect most "celebrated" rap music would fail a double-blind test against some rando writing parodic lyrics and banging on an audio synthesizer for a few hours. Much like postmodern art, where the janitor can't tell it apart from trash.

There probably are some examples of the genre that I could learn to appreciate (Epic Rap Battles of History comes to mind), but it's hard to find them because of the pomo effect.

What the hell? You most definitely did NOT give any evidence then. Nor in our first argument. I'm not asking so I can nitpick. I would genuinely like to see a somewhat-compact example of a modern LLM failing at code in a way that we both, as programmers, can agree "sucks".

Right, and I asked you for evidence last time too. Is that an unreasonable request? This isn't some ephemeral value judgement we're debating; your factual claims are in direct contradiction to my experience.

Here. I picked a random easyish task I could test. It's the only prompt I tried, and ChatGPT succeeded zero-shot. (Amusingly, though I used o3, you can see from the thought process that it considered this task too simple to even need to execute the script itself, and it was right.) The code's clear, well-commented, avoids duplication, and handles multiple error cases I didn't mention. A lot of interviewees I've encountered wouldn't do nearly this well - alas, a lot of coders are not good at coding.

Ball's in your court. Tell me what is wrong with my example, or that you can do this yourself in 8 seconds. When you say "like a few lines", is that some nonstandard usage of "few" that goes up to 100?

Even better, show us a (non-proprietary, ofc) example of a task where it just "makes shit up" and provides "syntactically invalid code". With LLMs, you can show your receipts! I'm actually genuinely curious, since I haven't caught ChatGPT hallucinating code since before 4o. I'd love to know under what circumstances it still does.

You keep banging this drum. It's so divorced from the world I observe, I honestly don't know what to make of it. I know you've already declared that the Google engineers checking in LLM code are bad at their job. But you are at least aware that there are a lot of objective coding benchmarks out there, which have seen monumental progress since 3 years ago, right? You can't be completely insulated from being swayed by real-world data, or why are you even on this forum? And, just for your own sake, why not try to figure out why so many of us are having great success using LLMs, while you aren't? Maybe you're not using the right model, or are asking it to do too much (like write a large project from scratch).

Door in the face technique

Ok, that's pretty damn funny. I'll have to steal that!

they still suck at writing code

Hoo boy. Speaking as an programmer who uses LLMs regularly to help with his work, you're very, VERY wrong about that. Maybe you should go tell Google that the 20% of their new code that is written by AI is all garbage. The code modern LLMs generate is typically well-commented, well-reasoned, and well-tested, because LLMs don't take the same lazy shortcuts that humans do. It's not perfect, of course, and not quite as elegant as an experienced programmer can manage, but that's not the standard we're measuring by. You should see the code that "junior engineers" often get away with...

Please post an example of what you claim is a "routine" failure by a modern model (2.5 Pro, o3, Claude 3.7 Sonnet). This should be easy! I want to understand how you could possibly know how to program and still believe what you're writing (unless you're just a troll, sigh).

Not my fault @SubstantialFrivolity chose to set the bar this low in his claims. An existence proof is all I need. But hey, you are fully free to replace that sarcasm with your example of how deficient ChatGPT/Claude is. Evidence is trivially easy to procure here!

Thanks, it's clear that (unlike the previous poster, who seems stuck in 2023) you have actual experience. I agree with most of this. I think there are people working on giving LLMs some sort of short-term memory for abstract thought, and also on making them more agentic so they can work on a long-form task without going off the rails. But the tools I have access to definitely aren't there yet.

So, yeah, I admit it's a bit of an exaggeration to say that you can swap a junior employee's role out with an LLM. o3 (or Claude-3.5 Sonnet, which I haven't tried, but which does quite well on the objective SWE-bench metric) is almost certainly better at writing small bits of good working code - people just don't understand how horrifically bad most humans are at programming, even CS graduates - but is lacking the introspection of a human to prevent it from doing dangerously stupid things sometimes. And neither is going to be able to manage a decently-sized project on their own.

Yeah, it's a good question that I can't answer. I suspect if all humans somehow held to a (not perfect but decent) standard of not driving impaired or distracted, signaling properly, and driving the speed limit or even slower in dangerous conditions ... that would probably decrease accidents by at least 80% too. So maybe self-driving cars are still worse than that.

Wow, that's great to hear. I'm eagerly looking forward to the commoditization of novel writing (and videogame NPC dialogue), but I didn't think we'd figured out yet how to maintain long-term consistency.

Another (finished!) Royal Road story you might want to scratch your Mother of Learning itch with is "The Perfect Run". YMMV, especially if you don't like superhero stuff, but I thought it was quite good.

Fair enough. Sorry, I think I reacted too harshly, because it pattern-matched too closely to the pro-trans anti-scientific argument. When dealing with any field of applied applied physics biology, even though it's still "science", your definitions are basically always going to have a little fuzz around them. As you aptly pointed out here, governments should be open to litigation for borderline cases.

Uh, what do you mean we don't have self-driving cars? I took two driverless Waymo rides last week, navigating the nasty, twisting streets of SF. It drove just fine. Maybe you could argue it's not cost-effective yet, or that there are still regulatory hurdles, but I think what you meant is that the tech doesn't work. And that's clearly false.

Also, I'm a programmer and productively using ChatGPT at work, so I'd say the score so far is Magusoflight 0, my lying eyes 2.

Fantastic post, thanks! Lots of stuff in there that I can agree with, though I'm a lot more optimistic than you. Those 3 questions are well stated and help to clarify points of disagreement, but (as always) reality probably doesn't factor so cleanly.

I really think almost all the meat lies in Question 1. You're joking a little with the "line goes to infinity" argument, but I think almost everyone reasonable agrees that near-future AI will plateau somehow, but there's a world of difference in where it plateaus. If it goes to ASI (say, 10x smarter than a human or better), then fine, we can argue about questions 2 and 3 (though I know this is where doomers love spending their time). Admittedly, it IS kind of wild that this this a tech where we can seriously talk about singularity and extinction as potential outcomes with actual percentage probabilities. That certainly didn't happen with the cotton gin.

There's just so much space between "as important as the smartphone" -> "as important as the internet" (which I am pretty convinced is the baseline, given current AI capabilities) -> "as important as the industrial revolution" -> "transcending physical needs". I think there's a real motte/bailey in effect, where skeptics will say "current AIs suck and will never get good enough to replace even 10% of human intellectual labour" (bailey), but when challenged with data and benchmarks, will retreat to "AIs becoming gods is sci-fi nonsense" (motte). And I think you're mixing the two somewhat, talking about AIs just becoming Very Good in the same paragraph as superintelligences consuming galaxies.

I'm not even certain assigning percentages to predictions like this really makes much sense, but just based on my interactions with LLMs, my good understanding of the tech behind them, and my experience using them at work, here are my thoughts on what the world looks like in 2030:

  • 2%: LLMs really turn out to be overhyped, attempts at getting useful work out of them have sputtered out, I have egg all over my face.
  • 18%: ChatGPT o3 turns out to be roughly at the plateau of LLM intelligence. Open-Source has caught up, the models are all 1000x cheaper to use due to chip improvements, but hallucinations and lack of common sense are still a fundamental flaw in how the LLM algorithms work. LLMs are the next Google - humans can't imagine doing stuff without a better-than-Star-Trek encyclopedic assistant available to them at all times.
  • 30%: LLMs plateau at roughly human-level reasoning and superhuman knowledge. A huge amount of work at companies is being done by LLMs (or whatever their descendant is called), but humans remain employed. The work the humans do is even more bullshit than the current status quo, but society is still structured around humans "pretending to work" and is slow to change. This is the result of "Nothing Ever Happens" colliding with a transformative technology. It really sucks for people who don't get the useless college credentials to get in the door to the useless jobs, though.
  • 40%: LLMs are just better than humans. We're in the middle of a massive realignment of almost all industries; most companies have catastrophically downsized their white-collar jobs, and embodied robots/self-driving cars are doing a great deal of blue-collar work too. A historically unprecedented number of humans are unemployable, economically useless. UBI is the biggest political issue in the world. But at least entertainment will be insanely abundant, with Hollywood-level movies and AAA-level videogames being as easy to make as Royal Road novels are now.
  • 9.5%: AI recursively accelerates AI research without hitting engineering bottlenecks (a la "AI 2027"), ASI is the new reality for us. The singularity is either here or visibly coming. Might be utopian, might be dystopian, but it's inescapable.
  • 0.5%: Yudkowsky turns out to be right (mostly by accident, because LLMs resemble the AI in his writings about as closely as they resemble Asimov's robots). We're all dead.

It's easy being an AI advocate, I just have to wait a few weeks or months for the people doubting them to be proven wrong haha.

Unfortunately, even in this board, being "proven wrong" doesn't stop them. e.g. this argument I had with someone who actually claimed that LLMs "suck at writing code", despite the existence of objective benchmarks like SWE-bench that LLMs have been doing very well on. (Not to mention o3's crazy high rating on Codeforces.) AI is moving so fast, I think some people don't understand that they need to update from that one time in 2023 they asked ChatGPT3 for help and its code didn't compile.

I see! I'd heard of foot-in-the-door, and thought Magusoflight was riffing off that. I guess psychologists have a sense of humour too.

Remember, in the game of chess you can never let your adversary see your pieces.

FWIW, I appreciate this reply, and I'm sorry for persistently dogpiling you. We disagree (and I wrongly thought you weren't arguing in good faith), but I definitely could have done a better job of keeping it friendly. Thank you for your perspective.

Most frustratingly, the things that I actually need help on, the ones where I don't know really anything about the topic and a workable AI assistant would actually save me a ton of time, are precisely the cases where it fails hard (as in my examples where stuff doesn't even work at all).

That does sound like a real Catch-22. My queries are typically in C++/Rust/Python, which the models know backwards, forwards, and sideways. I can believe that there's still a real limit to how much an LLM can "learn" a new language/schema/API just by dumping docs into the prompt. (And I don't know anything about OpenAI's custom models, but I suspect they're just manipulating the prompt, not using RL.) And when an LLM doesn't know how to do something, there's a risk it will fake it (hallucinate). We're agreed there.

Maybe using the best models would help. Or maybe, given the speed things are improving, just try again next year. :)

Well, based on what I know of the Canadian indigenous peoples (who the current PC treadmill calls the "First Nations"), there's a lot of crime, misery, and unrest as a result. But hey, people addicted to videogames are less destructive than people addicted to alcohol, so we'll see.

(Also, I really don't expect to see decent neuralink tech by 2030. It's just too damn hard.)

Huh. I just thought it was obvious that the frontier of online smut would be male-driven, but now you've made me doubt. Curious to see what the stats actually are.

RLHF tends to make a model less calibrated. Substantially so.

By "calibration" I assume you mean having low confidence when it's wrong. It's counter-intuitive to me, but some quick Googling suggests that you're right about that. Good correction. I guess that's part of why fixing hallucinations has proven so intractable so far.

Haven't read BoC, but a LitRPG parody I quite enjoyed was "This Quest is Bullshit!".