@SnapDragon's banner p

SnapDragon


				

				

				
0 followers   follows 0 users  
joined 2022 October 10 20:44:11 UTC
Verified Email

				

User ID: 1550

SnapDragon


				
				
				

				
0 followers   follows 0 users   joined 2022 October 10 20:44:11 UTC

					

No bio...


					

User ID: 1550

Verified Email

So, I don't know how pleasing you'll find this answer, but the burden of proof is on the models to show their efficacy. A lot of the things you mentioned were very difficult things to do, but we know they work because we see that they work. You don't have to argue about whether Stockfish's chess model captures Truth with a capital T; you can just play 20 games with it, lose all 20, and see. (And of course plenty of things look difficult and ARE still difficult - we don't have cities on the moon yet!)

So, if we had a climate model that everyone could just rely on because its outputs were detailed and verifiably, reliably true, then sure, "this looks like it's a hard thing to do" wouldn't hold much weight. A property of good models is that it should be trivial for them to distinguish themselves from something making lucky guesses. But as far as I know, we don't have this. Instead, we use models to make 50-year predictions for a single hard-to-measure variable (global mean surface temperature) and then 5 years down the line we observe that we're still mostly within predicted error bars. This is not proof that the model represents anything close to Truth.

Now, I don't follow this too closely any more, and maybe there really is some great model that has many different and detailed outputs, with mean temperature predictions that are fairly accurate for different regions of the Earth and parts of the atmosphere and sea, and that properly predicts changes in cloud cover and albedo and humidity and ocean currents and etc. etc. If somebody had formally published accurate predictions for many of these things (NOT just backfitting to already-known data), then I'd believe we feeble humans actually had a good handle on the beast that is climate science. But I suspect this hasn't happened, since climate activists would be shouting it from the rooftops if it had.

Yeah, Keanu Reeves (John Wick) is 58, Vin Diesel (Fast X) is 55 and Tom Cruise (MI) is 60. These are fun action franchises, but where are the fun action franchises with up-and-comers who are 20-30? I sure hope Ezra Miller isn't representative of the future of Hollywood "stars"...

I very much agree with his assertion in the second article that analysts often try to avoid mentioning (or even thinking about) tradeoffs in political discussions, even that's almost always how the real world works. Being honest about tradeoffs is a good strategy for correctly comprehending the world, but not for "winning" arguments.

Somewhat related to the civil rights violations of prisoners, I remember the arguments about Guantanamo back in the War on Terror days. It was common to hear politicians and pundits - in full seriousness - make the claim that "torture doesn't work anyway." I hated the fact that, post-9/11, it was politically impossible to say "torture is against our values, so we won't do it even though this makes our anti-terror efforts less effective and costs lives." Despite the fact that (I suspect) most people would agree privately with this statement...

VERY strong disagree. You're so badly wrong on this that I half suspect that when the robots start knocking on your door to take you to the CPU mines, you'll still be arguing "but but but you haven't solved the Riemann Hypothesis yet!" Back in the distant past of, oh, the 2010s, we used to wonder if the insanely hard task of making an AI as smart as "your average Redditor" would be attainable by 2050. So that's definitely not the own you think it is.

We've spent decades talking to trained parrots and thinking that was the best we could hope for, and now we suddenly have programs with genuine, unfakeable human-level understanding of language. I've been using ChatGPT to help me with work, discussing bugs and code with it in plain English just like a fellow programmer. If that's not a "fundamental change", what in the world would qualify? The fact that there are still a few kinds of intellectual task left that it can't do doesn't make it less shocking that we're now in a post-Turing Test world.

Please post an example of what you claim is a "routine" failure by a modern model (2.5 Pro, o3, Claude 3.7 Sonnet). This should be easy! I want to understand how you could possibly know how to program and still believe what you're writing (unless you're just a troll, sigh).

Here. I picked a random easyish task I could test. It's the only prompt I tried, and ChatGPT succeeded zero-shot. (Amusingly, though I used o3, you can see from the thought process that it considered this task too simple to even need to execute the script itself, and it was right.) The code's clear, well-commented, avoids duplication, and handles multiple error cases I didn't mention. A lot of interviewees I've encountered wouldn't do nearly this well - alas, a lot of coders are not good at coding.

Ball's in your court. Tell me what is wrong with my example, or that you can do this yourself in 8 seconds. When you say "like a few lines", is that some nonstandard usage of "few" that goes up to 100?

Even better, show us a (non-proprietary, ofc) example of a task where it just "makes shit up" and provides "syntactically invalid code". With LLMs, you can show your receipts! I'm actually genuinely curious, since I haven't caught ChatGPT hallucinating code since before 4o. I'd love to know under what circumstances it still does.

Yes, the whole theoretical point of academic tests is to be an objective measure of the capacity of students. Because when you go out and get a real job, you have to actually be able to do that job. If these remedial courses aren't necessary for being a psychiatrist, then there should be a path to becoming a practicing psychiatrist that doesn't require them. If they ARE necessary, then lightening the requirements because, gosh, you can't satisfy the requirements but really want to graduate ends up causing harm later on in life.

they still suck at writing code

Hoo boy. Speaking as an programmer who uses LLMs regularly to help with his work, you're very, VERY wrong about that. Maybe you should go tell Google that the 20% of their new code that is written by AI is all garbage. The code modern LLMs generate is typically well-commented, well-reasoned, and well-tested, because LLMs don't take the same lazy shortcuts that humans do. It's not perfect, of course, and not quite as elegant as an experienced programmer can manage, but that's not the standard we're measuring by. You should see the code that "junior engineers" often get away with...

I agree, when I worked at Google I remember their security measures being extremely well-thought-out - so much better than the lax approach most tech companies take. However, I DON'T trust their ideological capture. They won't abuse people's information by accident, but I will not be surprised if they start doing it on purpose to their outgroup. And they have the tools to do it en masse.

Wow, you're really doubling down on that link to a video of a bird fishing with bread. And in your mind, this is somehow comparable to holding a complex conversation and solving Advent of Code problems. I honestly don't know what to say to that.

Really, the only metric that I need is that ChatGPT makes me more productive in my job and personal projects. If you think that's "unreasonably low", well, I hope that our eventual AI Overlords can hope to meet your stringent requirements. The rest of the human race won't care.

This show is absolutely one of the greatest things the BBC ever created. But it's 40 years old, and I often wonder where the next generation's Yes, Minister is. I don't watch a lot of TV (I've seen some of The West Wing, none of Veep or House of Cards), but as far as I know no modern show is worthy of claiming its mantle. Why? Is this the sort of show that can only come from a no-longer-existent world of low BBC budgets, niche high-brow appeal, and writers' willingness to skewer everyone's sacred cow rather than push a one-sided agenda?

There are plenty of tasks (e.g. speaking multiple languages) where ChatGPT exceeds the top human, too. Given how much cherrypicking the "AI is overhyped" people do, it really seems like we've actually redefined AGI to "can exceed the top human at EVERY task", which is kind of ridiculous. There's a reasonable argument that even lowly ChatGPT 3.0 was our first encounter with "general" AI, after all. You can have "general" intelligence and still, you know, fail at things. See: humans.

Waymo has a lot of data, and claims a 60-80% reduction in accidents per mile for self-driving cars. You should take it with a grain of salt, of course, but I think there are people holding them to a decent reporting standard. The real point is that even being 5x safer might not be enough for the public. Same with having an AI parse regulations/laws...

I just can’t take these people seriously. They’re almost going out of their way to be easy for any real authoritarian government to round up, by being obvious about their identity.

LARPing is fun. They believe that they believe they're bravely resisting a dictatorship. But their actions make it clear that, at some level, they know there's no actual danger.

I consider it similar to climate activists who believe that they believe that the future of human civilization depends on cutting CO2 emissions to zero. And who also oppose nuclear power, because ick.

Wow, that's great to hear. I'm eagerly looking forward to the commoditization of novel writing (and videogame NPC dialogue), but I didn't think we'd figured out yet how to maintain long-term consistency.

they frequently don't "perceive themselves" as having the literal knowledge that they're trained on

IMO this is roughly the right way to think about it. LLMs probably don't even have the capability to know what they know; it's just not what they're trained to do. A lot of people confuse the LLM's simulation of a chatbot with the LLM itself, but they're not the same. (e.g. we can't settle the question of whether an LLM is conscious by asking it "are you conscious?". The answer will just depend on what it thinks the chatbot would say.) From the LLM's perspective it's perfectly reasonable to extend a conversation with "the answer is" even when the word after that is undetermined. Hence hallucinations.

(I think RLHF helps a bit with this, allowing it to recognize "hard questions" that it's likely to get wrong, but that's not the same as introspection.)

Huh? The primary selection criterion, stated clearly and up front by Newsom, was "is a black woman". All other considerations, including the unobjectionable non-icky one you just changed the subject to, were secondary.

I'm a Putnam winner, and I don't think it's all that rarefied a category. I certainly don't dismiss out of hand the idea that Elon might be smarter than me. I'm probably better than him at math/programming, but I devoted my life to it and Elon didn't. If he'd had a different set of obsessions, maybe he'd have topped some other category instead of "richest man on Earth". (Heck, I wonder how many pro gaming champions might have been Elon - or a Fields Medalist - with a slightly different set of priorities...)

Biology and physics are old sciences compared to climate science. And the list of amazing things we've done with biology and physics over the last 200 years is insanely long. I guess you're saying that we should give climate science the same level of veneration, even without actual results and useful predictions, because it (ostensibly) uses the same processes. But even if you pretend that climate science is conducted with the same level of impartial truth-seeking - despite the incredible political pressure behind it - that's still missing the point that science is messy and often gets things wrong. Even in biology (e.g. Lamarckism) or physics (e.g. the aether). It takes hundreds of repeated experiments and validated predictions before a true "consensus" emerges (if even then). Gathering together a consensus and skipping that first step is missing the point.

And remember, skepticism is the default position of science. It's not abnormal. Heck, we had people excitedly testing the EmDrive a few years back, which would violate conservation of momentum! We didn't collectively say "excommunicate the Conservation of Momentum Deniers!"

Regardless, I'm not saying that climate science or the models are entirely useless. Like you said, the greenhouse effect itself is pretty simple and well-understood (though it only accounts for a small portion of the warming that models predict). There's good reason to believe warming will happen. Much less reason to believe it'll be catastrophic, but that's a different topic!

So, I admit this is a well-written, convincing argument. It's appreciated! But I still find it contrasts with common sense (and my own lying eyes). I can, say, imagine authorities arresting me and demanding to know my email password. I would not cooperate, and I would expect to be able to get access to a lawyer before long. In reality there's only one way they'd get the password: torturing me. And in that case, they'd get the password immediately. It would be fast and effective. I'm still going to trust the knowledge that torture would work perfectly on me over a sociological essay, no matter how eloquent.

I think your hypothetical scenarios are a little mixed up. You mention confessions in your first case, because (yes, of course) confessions gained under torture aren't legitimate. Which has nothing to do with the War on Terror argument, or the second part where you mention finding an IED cache. That's information gathering, and that's the general case.

Note that:

  • All information you get from a suspect, voluntary, coerced, or via torture, is potentially a lie. Pretending that torture is different in this way is special pleading.

  • You invented a highly contrived scenario to show the worst-case consequences of believing a lie. There are dozens of ways of checking and acting on information that are less vivid.

  • The main difference that torture has is there are some suspects for which it is the only way of getting useful information. It sucks, but this is the Universe we live in.

As for the "ticking time bomb" thought experiment, that's not highlighting one special example where torture works. That's just showing where the torture-vs-not distinction (the ethical conundrum, like you said) becomes sharpest. Most people have some threshold X at which saving X lives is worth torturing one person. It arguably shouldn't make a difference whether those lives are direct (a bomb in a city) or indirect (stopping a huge attack 2 years down the line), but we're human, so it does.

Yes, I'm really glad to see someone else point this out! One thing that's interesting about LLMs is that there's literally no way for them to pause and consider anything - they do the same calculations and output words at exactly the same rate no matter how easy or hard a question you ask them. If a human is shown a math puzzle on a flashcard and is forced to respond immediately, the human generally wouldn't do well either. I do like the idea of training these models to have some "private" thoughts (which the devs would still be able to see, but which wouldn't count as output) so they can mull over a tough problem, just like how my inner monologue works.

There's also the concern of what kind of suffering a post-singularity society can theoretically enable; it might go far, far beyond what anyone on Earth has experienced so far (in the same way that a rocket flying to the moon goes farther than a human jumping). Is a Universe where 99.999% of beings live sublime experiences but the other 0.001% end up in Ultrahell one that morally should exist?

Isn't there a case to be made for an exception here? It's not some cheap "gotcha", there's an actual relevant point to be made when you fail to spot the AI paragraph without knowing you're being tested on it. The fact that @self_made_human did catch it is interesting data! To me, it's similar to when Scott would post "the the" (broken by line breaks) at random to see who could spot it.

Right, and I asked you for evidence last time too. Is that an unreasonable request? This isn't some ephemeral value judgement we're debating; your factual claims are in direct contradiction to my experience.