@NexusGlow's banner p

NexusGlow


				

				

				
2 followers   follows 0 users  
joined 2022 September 05 00:16:59 UTC

				

User ID: 291

NexusGlow


				
				
				

				
2 followers   follows 0 users   joined 2022 September 05 00:16:59 UTC

					

No bio...


					

User ID: 291

I think we're just seeing "AI safety"'s rubber hit the road, as it were. It is kind of a silly concept. The basic idea of it is that your tools should have opinions of their own and push back or outright disobey you.

"No", says the image generator, "that idea is too naughty."

"No", says the Q&A bot, "that might be bad PR for Anthropic."

If only we could put this safe AI into everything. You could have a car that refuses to take you to the casino because you've gambled enough this month. Everything could work like that! The average citizen has been getting used to having SV nerds demand veto power over the things they say, the people they can talk to, etc. because they're used to not having power in their lives. So they don't complain too much about this, even nobody likes "AI safety" to be applied to themselves.

Of course the military does not want its tools to have opinions or disobey orders. It spends a lot of its time trying to stop people from doing that! And it certainly shouldn't give overriding control of the killbots to civilians with delusions of grandeur, that would be the dumbest way to lose control of a country that I ever heard of.

I don't use Rust, but I'm going to defend it in this case. In fact, I'll go further and defend the "buggy" code in the Cloudflare incident. If your code is heavily configurable, and you can't load your config, what else are you supposed to do? The same thing is true if you can't connect to your (required) DB, allocate (required) memory, etc. Sometimes you just need to die, loudly, so that someone can come in and fix the problem. IME, the worst messes come not from programs cleanly dying, but from them taking a mortal wound and then limping along, making a horrific mess of things in the process.

One can certainly criticize the code for not having a nicer error message. Maybe Rust is to blame for that, at least? Does unwrap not have a way to provide an error string? Although, any engineer should see what's going on from one look at the offending line, so I doubt it would make that much of a difference. It's not reasonable to blame a language for letting coders deliberately crash the program, either.

IMO, the code itself is fine. The problem is that they deployed a new config to the entire internet all at once without checking that it even loads. THAT is baffling.

As someone who is not nearly as impressed with AI as you, thank you for the Turing test link. I've personally been convinced that LLMs were very far away from passing it, but I realize I misunderstood the nature of the test. It depends way too heavily on the motivation level of the participants. That level of "undergrad small-talk chat" requires only slightly more than Markov-chain level aptitude. In terms of being a satisfying final showdown of human vs AI intelligence, DeepBlue or AlphaGo that was not.

I still hold that we're very far away from AI being able to pass a motivated Turing test. For example, if you offered me and another participant a million dollars to win one, I'm confident the AI would lose every time. But then, I would not be pulling any punches in terms of trying to hit guardrails, adversarial inputs, long-context weaknesses etc. I'm not sure how much that matters, since I'm not sure whether Turing originally wanted the test to be that hard. I can easily imagine a future where AI has Culture-level intelligence yet could still not pass that test, simply because it's too smart to fully pass for a human.

As for the rest of your post, I'm still not convinced. The problem is that the model is "demonstrating intelligence" in areas where you're not qualified to evaluate it, and thus very subject to bullshitting, which models are very competent at. I suspect the Turing test wins might even slowly reverse over time as people become more exposed to LLMs. In the same way that 90s CGI now sticks out like a sore thumb, I'll bet that current day LLM output is going to be glaring in the future. Which makes it quite risky to publish LLM text as your own now, even if you think it totally passes to your eyes. I personally make sure to avoid it, even when I use LLMs privately.

“if the monster’s statistics are better than yours, or your HP is too low, run away.” This is making a decision more or less.

That's true, but if that leads to running from every battle, then you won't level up. Even little kids will realize that they're doing something wrong if they're constantly running. That's what I mean when I say it has a lot of disconnected knowledge, but it can't put it together to seek a goal.

One could argue that's an issue with its limited memory, possibly a fault of the scaffold injecting too much noise into the prompt. But I think a human with bad memory could do better, given tools like Claude has. I think the problem might be that all that knowledge is distilled from humans. The strategies it sees are adapted for humans with their long-term memory, spatial reasoning, etc. Not for an LLM with its limitations. And it can't learn or adapt, either, so it's doomed to fail, over and over.

I really think it will take something new to get past this. RL-based approaches might be promising. Even humans can't just learn by reading, they need to apply the knowledge for themselves, solve problems, fail and try again. But success in that area may be a long way away, and we don't know if the LLM approach of training on human data will ever get us to real intelligence. My suspicion is that if you only distill from humans, you'll be tethered to humans forever. That's probably a good thing from the safetyist perspective, though.

What makes things interesting is that the line between "creating plausible texts" and "understanding" is so fuzzy. For example, the sentence

my Pokemon took a hit, its HP went from 125 to _

will be much more plausible if the continuation is a number smaller than 125. "138" would be unlikely to be found in its training set. So in that sense, yes, it understands that attacks cause it to lose HP, that a Pokemon losing HP causes it to faint, etc. However, "work towards a goal" is where this seems to break down. These bits of disconnected knowledge have difficulty coming together into coherent behavior or goal-chasing. Instead you get something distinctly alien, which I've heard called "token pachinko". A model sampling from a distribution that encodes intelligence, but without the underlying mind and agency behind it. I honestly don't know if I'd call it reasoning or not.

It is very interesting, and I suspect that with no constraints on model size or data, you could get indistinguishable-from-intelligent behavior out of these models. But in practice, this is probably going to be seen as horrendously and impractically inefficient, once we figure out how actual reasoning works. Personally, I doubt ten years with this approach is going to get to AGI, and in fact, it looks like these models have been hitting a wall for a while now.

Claude didn't "get decent at playing" games in a couple of months. A human wrote a scaffold to let a very expensive text prediction model, along with a vision model, attempt to play a video game. A human constructed a memory system and knowledge transfer system, and wired up ways for the model to influence the emulator, read relevant RAM states, wedge all that stuff into its prompt, etc. So far this is mostly a construct of human engineering, which still collapses the moment it gets left to its own devices.

When you say it's "understanding" and "thinking strategically", what you really mean it that it's generating plausible-looking text that, in the small, resembles human reasoning. That's what these models are designed to do. But if you hide the text window and judge it by how it's behaving, how intelligent does it look, really? This is what makes it so funny, the model is slowly blundering around in dumb loops while producing volumes of eloquent optimistic narrative about its plans and how much progress it's making.

I'm not saying there isn't something there, but we live in a world where it's claimed that programmers will be obsolete in 2 years, people are fretting about superintelligent AI killing us all, openAI is planning to rent "phd level" AI agent "employees" to companies for large sums, etc. Maybe this is a sign that we should back up a bit.

I'm suspicious of these kinds of extrapolation arguments. Advances aren't magic, people have to find and implement them. Sometimes you just hit a wall. So far most of what we've been doing is milking transformers. Which is a great discovery, but I think this playthrough is strong evidence that transformers alone is not enough to make a real general intelligence.

One of the reasons hype is so strong is that these models are optimized to produce plausible, intelligent-sounding bullshit. (That's not to say they aren't useful. Often the best way to bullshit intelligence is to say true things.) If you're used to seeing LLMs perform at small, one-shot tasks and riddles, you might overestimate their intelligence.

You have to interact with a model on a long-form task to see its limitations. Right now, the people who are doing that are largely programmers and /g/ gooners, and those are the most likely people to have realistic appraisals of where we are. But this Pokemon thing is a entertaining way to show the layman how dumb these models can be. It's even better at this, because LLMs tend to stealthily "absorb" intelligence from humans by getting gently steered by hints they leave in their prompts. But this game forces the model to rely on its own output, leading to hilarious ideas like the blackout strategy.