@self_made_human's banner p

self_made_human

amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi

15 followers   follows 0 users  
joined 2022 September 05 05:31:00 UTC

I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.

At any rate, I intend to live forever or die trying. See you at Heat Death!

Friends:

A friend to everyone is a friend to no one.


				

User ID: 454

self_made_human

amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi

15 followers   follows 0 users   joined 2022 September 05 05:31:00 UTC

					

I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.

At any rate, I intend to live forever or die trying. See you at Heat Death!

Friends:

A friend to everyone is a friend to no one.


					

User ID: 454

If that's the biggest red flag you can find?

Well, I mentally include you in the list of skeptics too, so I've already wished you luck.

(And I suppose I should thank you for listening to others when they asked you to try repeating your recent experiment with Opus instead of Sonnet. That makes you a better skeptic than many I have the displeasure of knowing on this forum.)

More substantively:

Anthropic takes misalignment seriously, though concerns were raised after the loosened their RSP. You can't really evaluate the safety of the latest and greatest models while being maximally restrictive, at least not if you don't want to be scooped by your competitors with fewer scruples. Anthropic acknowledges this tension explicitly, and asks for forgiveness for moving with haste even they aren't quite comfortable with. I can only assume that reasonable care was taken to minimize the scope for danger even when they did a wider internal rollout.

Plus, they've already said they're not going to make Mythos public, even if some of the benefits will trickle down to the next Opus. That is not something a company that is desperate for money or willing to ignore safety would do.

You Might Be Cooked (And So Am I)

In AI/ML spaces where I hang around (mostly as a humble lurker), there have been rumors that the recent massive uptick in valid and useful submissions for critical bugfixes might be attributable to a frontier AI company.

I specify "valid" and "useful", because most OSS projects have been inundated with a tide of low-effort, AI generated submissions. While these particular ones were usually not tagged as AI by the authors, they were accepted and acted-upon, which sets a floor on their quality.

Then, after the recent Claude Code leak, hawk-eyed reviewers noted that Anthropic had internal flags that seemed to prevent AI agents disclosing their involvement (or nature) when making commits. Not a feature exposed to the general public, AFAIK, but reserved for internal use. This was a relatively minor talking point compared to the other juicy tidbits in the code.

Since Anthropic just couldn't catch a break, an internal website was leaked, which revealed that they were working on their next frontier model, codenamed either Mythos or Capybara (both names were in internal use). This was... less than surprising. Everyone and their dog knows that the labs are working around the clock on new models and training runs. Or at least my pair do. What was worth noting was that Anthropic had, for the last few years, released 3 different tiers of model - Haiku, Sonnet and Opus, in increasing order of size and capability (and cost). But Mythos? It was presented as being plus ultra, too good to simply be considered the next iteration of Opus, or perhaps simply too expensive (Anthropic tried hard to explain that the price was worth it).

But back to the first point: why would a frontier company do this?

Speculation included:

  • A large breakthrough in cyber-security capabilities, particularly in offense (but also in defense) which meant a serious risk of users with access to the models quickly being able to automate the discovery and exploitation of long dormant vulnerabilities, even in legacy code with plenty of human scrutiny.
  • This would represent very bad press, similar to Anthropic's headache after hackers recently used Claude against the Mexican government. It's one thing to have your own tooling for vetted users or approved government use, it's another for every random blackhat to use it in that manner. You cannot release it to the general public yet - the capability jump is large enough that the offensive applications are genuinely concerning before you have defensive infrastructure in place. But the vulnerabilities it's finding exist right now, in production code running on critical systems worldwide. You cannot un-find them. And you have no particular reason to believe you are the only actor who will eventually find them.
  • Thus, if a company notices that their next model is a game-changer, it might be well worth their time to proactively fix bugs with said model. While the typical OSS maintainer is sick and tired of junk submissions, they'd be far more receptive when actual employees of the larger companies vouch for their AI-assisted or entirely autonomous work (and said companies have probably checked to make sure their claims hold true).
  • And, of course, street cred and goodwill. Something the companies do need, with increasing polarization on AI, including in their juiciest demographic: programmers.

I noted this, but didn't bother writing it up because, well, they were rumors, and I've never claimed to be a professional programmer.

And now I present to you:

Project Glasswing by Anthropic

Today we’re announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software. We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.* Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

..

Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

Examples given:

Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;

It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;

The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.

We have reported the above vulnerabilities to the maintainers of the relevant software, and they have all now been patched. For many other vulnerabilities, we are providing a cryptographic hash of the details today (see the Red Team blog), and we will reveal the specifics after a fix is in place.

Well. How about that. I wish the skeptics good luck, someone's going to be eating their hat very soon, and it's probably not going to be me. I'll see you in the queue for the dole. Being right about these things doesn't really get me out of the lurch either, Cassandra's foresight brought about no happy endings for anyone involved. I am not that pessimistic about outcomes, in all honesty, but the train shows no signs of stopping.

floating point accuracy" is the accuracy possible with a certain number of bits. As soon as you say that you have "8-bit" numbers, that immediately defines what floating point accuracy is. And so every 8-bit model has 8-bit floating point accuracy and can never possibly have 64-bit floating point accuracy.

Fair enough, my apologies for the sloppy wording.

Intuitions about continuous functions very rarely apply to non-continuous functions.

I mean, I do know what a continuous or a differentiable function is, but what precisely is the intuition that is being violated here? Is it even one I hold? Otherwise I don't see the point of saying that (at least to me), though I'm not complaining about a crash course in mathematics. As far as I'm aware, there is genuine debate on whether the universe (or at least space-time) is discrete or smooth at a fundamental level, but that doesn't change anything of significance in my daily life.

Post-training quantization is often enough to get 8-bit models close to floating-point accuracy.

Sorry, I wrote that while rather sleep deprived, though I'm not sure what doesn't make sense about it?

What I was trying to say is that it's regular practice to quantize models down significantly, switching from FP32 to INT8 without significant degradation in quality. You can go even harder, people do 4-bit quantization these days, and I'm pretty sure I read about others claiming to quantize down to a single bit.

I don't think any amount of word smithing can get around this disagreement or make people change their minds about the level of epsilon that seems reasonable to them. In principle, though, I can imagine some hypothetical experiments where we actually copy people with different levels of epsilons, observe the resulting behavior, and that this might actually be able to convince people that a certain epsilon is appropriate.

I don't think so. I feel that pointing out that if you are an arbitrary X% different from who you were and who will be, while a biological human, and then you have some reasonable metric for identifying the delta between the biological you (or the last recorded form, after destructive scanning), then there is little grounds to claim that you're not the same "person". And once we're comparing digital copies, there are plenty of already established metrics, I'd wager that KL divergence or something similar might come in handy when assessing only behavior or cognitive output for fixed stimuli. Or something close to a perceptual hash function.

I am closer, right now, to the person I was a second ago than the person I was a week ago, or the person I'll be next month. This is fine. This is entirely unremarkable, and taken for granted by just about everybody who wasn't hit by a bus in the interim. But the point is that I consider this grounds to accept (bounded) deviations from ground truth in a subsequent digital copy as not a particularly big deal. If someone demands something even closer? Well, that's their prerogative. They just have to justify (at least to themselves) why they don't mind dying and becoming a new person every few days, weeks or years. If a version of me from 20 years ago or 20 years in the future showed up, we'd get along and we'd look after each other. I'm happy with that, even if I can't pin-point a specific boundary where I wouldn't identify with divergent forks.

Lines of code.

  1. I do not need perfect accuracy (or operation on real numbers). Why would I? We run simulations all the time, and while accuracy is desirable, the brain itself is an intrinsically noisy and stochastic entity. It isn't perfectly self-similar from moment to moment, and when you consider measurement error, the gains from additional 9s of accuracy drop off precipitously. A night's sleep does not change who I consider myself to be as a person to any meaningful degree.
  2. I don't need that formal proof that the copy is perfect. Close enough works for government work, and it also works for me, but probably for a closer value.
  3. In other words, you're conflating exact representation with sufficient representation, which is what I care about, and which is significantly more tractable.

https://www.quantamagazine.org/how-computationally-complex-is-a-single-neuron-20210902/

They started by creating a massive simulation of the input-output function of a type of neuron with distinct trees of dendritic branches at its top and bottom, known as a pyramidal neuron, from a rat’s cortex. Then they fed the simulation into a deep neural network that had up to 256 artificial neurons in each layer. They continued increasing the number of layers until they achieved 99% accuracy at the millisecond level between the input and output of the simulated neuron. The deep neural network successfully predicted the behavior of the neuron’s input-output function with at least five — but no more than eight — artificial layers. In most of the networks, that equated to about 1,000 artificial neurons for just one biological neuron.

You're also overstating with the scaling objection. It is true that in many domains better approximation can cost much more compute. But that does not show that the relevant personal-level properties require astronomically fine precision. In modern ML, quantization is a routine example of this. Post-training quantization is often enough to get 8-bit models close to floating-point accuracy. You do lose performance and fidelity if you push things too far, but the tradeoff can be handled sensibly and save a lot of compute or memory.

Yes, you probably cannot get a formal proof that an earring is an epsilon-close continuation of you. But we do not demand formal proofs for identity anywhere else. We do not prove that the person waking up after sleep, anaesthesia, intoxication, or an episode of delirium is “really” the same person in a theorem-checking sense.

I am okay with a blackbox/behavioral approach if mechanistic understanding or similar metrics aren't an option. Does the new copy behave in a manner consistent with me, for the same set of stimuli? How consistent? True perfection simply doesn't matter. I am not a perfect copy of myself from moment to moment anyway, even as a biological human. That makes these objections moot as far as I can tell.

Is someone the shape or the filling? The intangible and ineffable insides that might fill many shapes. Or the the thing that is outside and visible to the world. Sounds like maybe you'd say the shape, the story suggests the filling.

I, quite literally, have little idea what that analogy means here. Seriously, it isn't obvious to me at all what it would mean for someone's consciousness or identity to be a shape or a filling. If you have another way to framing the question, I can attempt a more useful answer.

At least the way I see it, active consciousness is a dynamic process that only requires active computation (and maybe temporality, though I don't see why you can't run a mind upload backwards or asynchronously). Capturing the information content of the original mind is necessary, but not sufficient, for consciousness; in the way that someone in cryo (when it's known to work) is neither truly dead nor actively alive. If nothing is happening, there's nobody there to experience anything. Playing a movie is not the same as owning a copy.

More poetically, I consider myself the wave, and not the water. The dance, not the dancer. If someone pisses in the pool, it won't bother me very much if at all. The performance can switch out extras on the fly without issue, as long as the production and choreography remains the same.

I do agree that the "practically indistinguishable" version of a p-zombie is a more serious concern. I am agnostic with regards to the qualia of LLMs.

I suspect, but can't prove beyond reasonable doubt, that sufficiently strong optimization towards the task of mimicking human speech and reasoning will, most of the time, produce cognitive circuitry that is surprisingly close to the real deal. I think I've mentioned that a good place to read up on that are Anthropic's posts and papers on their MechInt work. Is that far enough to produce qualia, let alone humanlike qualia? Hell if I know!

From a pragmatic perspective, I would be okay with defaulting to believing that extremely humanlike agents might have qualia. I wouldn't like to make that assumption, and I don't for anything other than actual biological humans today, but I can see why it might just be the only way to handle things sensibly.

I agree with most of your points, at least when it comes to the principle that some things are far from settled facts.

But:

  1. I see no good reason to believe that manually running a neural network by hand would feel different from the "inside". That includes even an upload of a human mind. For me at, least, substrate independence implies more than just vibes and papers.
  2. It is very obvious to me that parts of a larger system can be unconscious or lack qualia while the larger ensemble does. I think the Chinese Room is a ridiculous thing to take seriously, because I don't see a reason to think that a single neuron in my brain knows English, even if my whole brain clearly can. Is the pen and the paper not conscious? Sure. But the atoms in me aren't conscious either. Doesn't stop anything. You could still hook the output of that hand-calculated process to a robot, and it could control the robot like a normal human might (in theory, if you're calculating fast enough).

The idea that there's something essential wrt consciousness about the human brain or meat in general? Unfalsifiable at present, perhaps unfalsifiable forever. If a mind upload of a human claimed to have qualia, would you immediately believe them? I know many wouldn't.

But the usual invisible and intangible dragon in my garage idea is also just as unfalsifiable. Nobody really believes in that one, so I'll give myself some credit for taking what I see as the more parsimonious/agnostic position for what I see as justified reasons.

In the interest of saving us time, let me just say that we approach the problem with very different premises. I share pretty much zero of your moral intuitions, and if I knew of an argument that could change your mind in that regard, I'd be smarter or more charismatic than the earring (and vice-versa with reference to you). I'm not.

Good comment. The kind of engagement that makes the effort worth it.

For what it's worth, I know well that my opinions here are... unusual. A clear minority, perhaps even on LessWrong. But they're still my sincere opinions, truth can correlate with popularity or consensus, but is hardly defined by it. So it goes for metaphysics.

(If this was the consensus opinion, I'd have been saved the effort of the essay)

It's not clear that the Earring needs to emulate your preferences, rather than simply make sufficiently enjoyable decisions. It always makes decisions that the subject prefers, after they're executed, and those who refuse it always regret that. You can, trivially, today, find static models that are capable of exceeding your own judgement in a variety of environments. These aren't emulating and can't be emulating you, or maintaining persistence of your identity, and I'd like to think that a 30B param model and nvidia 3090 isn't capable of holding a real person anyway.

I am not sure why we'd want to draw a bright line between "emulating preferences" and "making sufficiently enjoyable decisions".

Let's say I spoke to a career counselor or life coach (one who is, miraculously, of some utility). They point me towards an option or goal I had never considered, and one I might not ever have considered without their nudge. I try it, I like it, and I endorse their advice. I don't see an issue with that all.

There is enormous overlap between the two concepts, and the strongest distinguishing observations would be something like the earring telling the average user to wirehead or do drugs. It doesn't do this even after the "original" human is no longer in a position to resist, due to missing most of their brain.

Otherwise, it simply keeps telling them to make better decisions than any they could come up with themselves (or if they don't comply, will regret). This is declared by author fiat, and is a brute fact of the setting.

That is... ridiculously better than any current LLM. I am an LLM-enjoyer and open advocate for their utility. Even I don't think you should accept everything they tell you with alacrity.

If I find myself never (or after significant usage and exposure) needing to second-guess a new model or ever find an error? Well, pack it up boys. We've gone from AGI to ASI, or at least a weak ASI.

Your thoughts on the risks of losing important skills or value by accident

Well. The genuine answer is "it depends". To my surprise, a few days back, I discovered I'd forgotten to do long division by hand. I haven't needed to do it by hand for over a decade, and even when I do need to divide values in my head or on paper, I know other techniques.

The general arc of human history is towards convenience and the loss of universal competency in skills that lose their value, even if they haven't lost all their value.

I don't remember my new phone number, this hasn't really bothered me. I still remember my old one, the ones for my parents and family, so it's hardly a total loss. But I'm... fine? The situations where this might seriously bite me in the ass instead of mildly inconvenience me are very rare.

The average person will need to light a fire without tools about zero times in their life, unless they're an adventurer or live in the Congo/Amazon.

In general, people are reasonably good at learning the skills that they require, or might be likely to require, or which they would benefit from to a degree that is worth the hassle. When that's not the case, the government, parents or social pressure handles the majority of the deficits, though it probably errs on the side of teaching too much.

Is this a perfect process? Hardly. But I think it's not worth losing too much sleep over, at least if the government makes contingencies. They're not completely useless in that regard either.

But it's also a problem of its own: if people are faced with a choice between Doing The Right Thing and Doing The Pleasant-Enough Thing, they will go with the latter far more often. But that's also just the wireheading question in a fancy wrapper, so a lot less interesting.

The earring doesn't do this (at least not in the story as far as I can tell). An AGI or ASI will, if adequately aligned, probably not do this. Of course, said alignment is far from guaranteed. We will hopefully live to find out, or at least die while finding out.

Did you mean to reply to me?

I am extremely suspicious of p-zombies, at least as something it is possible to create in reality. I think they're most likely incoherent as a concept, or at least physically impossible to make. Kind of like positing a new integer number between 2 and 3. I've read Chalmers and other who argue in favor, and I've found their case extremely lacking. It's been even been a recurrent critique of mine in the context of what's otherwise one of my favorite books, Blindsight by Peter Watts.

The reasons for this are... long. If you want a quick intro, here's something by Yudkowsky that I largely agree with:

https://www.lesswrong.com/posts/yA4gF5KrboK2m2Xu7/how-an-algorithm-feels-from-inside

But it's certainly not made explicit either way in the text!

As Scott loves to do, he's created a tantalizing though experiment that snipes many a nerd. Including me. Unfortunately, he hasn't given a canonical "this is the intended message, dummy" (which I agree with). All I can really say is that my interpretation is consistent with the facts, as well as with certain IRL streams of philosophy and neuroscience.

Fair points, and I think that calling it "insanity" without more nuance was less than ideal. I was already grappling with an essay that was larger and more verbose than intended, so I will address that here.

I know that some people genuinely do value "authenticity". I do not care about that nearly as much, but I don't seek to dictate what they can or can't care about.

I literally said the same thing on LessWrong, in the context of a discussion about Yudkowsky's "Infinite Fun Theory" sequence.

For me, all that matters for, say, a Gameboy emulator is whether it runs the game, without obvious bugs and glitches, and I don't particularly care about how it does things under the hood. I care about the destination, and not the road to get there.

But I am fully aware that there are some who do care about perfect emulation down to the transistor. And there are even more devoted purists who want the real physical thing or nothing at all. To them, I say: uh, sure, go for it? Hope you get a good price on Ebay.

I don't see the point of climbing Mount Everest either, I think it's reckless at the very least. But I'm not going to go wave a sign or harangue people who get their kicks from trying. It's their life, and their business, especially if they're reasonably intelligent and rational adults.

I don't even want to eliminate all friction, decisions or consequence from my life. There are many things I enjoy for the sake of it, which I do not wish to entirely automate. Video games, listening to music, good food, arguing with strangers on the internet. Even if I had an ASI, I might still do these things for the sake of it, even if the AI can do everything I can but better. At the same time, I want to be better at all of those things. I yearn to improve my intelligence and capabilities so the better-me is more successful and can do more. There are hobbies that, in all likelihood, only very intelligent people truly enjoy. I strongly suspect that if I were to become more intelligent, I wouldn't run out of interesting things to do before Heat Death, but then again, you can read the Fun Theory for more.

If that's the case, I don't wish to argue otherwise. Your values are genuinely your own, and I have even less reason to argue against them if you have a decent understanding of philosophy or cognitive neuroscience (which I hope/expect you do).

I value myself (or at least this body) for many reasons. But if I was given some kind of Star Trek teleporter machine alongside proof that it works as designed (by destructive scanning and then reconstruction with near perfect fidelity), I'd be fine with using it. If the entity that comes out the other side shares my memories, beliefs, goals and desires, I'll happily call it self_made_human. I'll share my bank account and be okay with the new "me" sleeping with my wife and raising my kids.

On the other hand, I'd prefer it if there were two of us. If the destruction isn't strictly necessary and just a bureaucratic convenience, then I would sue for murder or at least manslaughter. I think there should be more copies of me around, for redundancy if nothing else. And I see no real reason we wouldn't be able to sync up and share our memories and experiences in a world where mind uploading is a reality.

I think it's not at all clear that the earring's simulation of a person has sufficient fidelity to qualify as being the original person. It clearly has sufficient fidelity to qualify as being a person whom the original person would want to be, but that's not quite the same thing. Lots of people would prefer to be Elon Musk (or, well, at the very least Elon Musk before he went nuts); this doesn't mean that if we killed them all and replaced them with copies of Elon Musk, that would still be the same people.

I have no reason to disagree. But the crux of the issue is that I don't see any strong evidence that the system doesn't preserve identity and function, and plenty of evidence it does. Seriously, if your neocortex is atrophied to the point of being vestigial, and the system still acts just like you (or like you but better in the ways you endorse), where's that information coming from?

At the end of the day, we're arguing about the implications of a work of fiction. I wish to argue, and have argued, that my reading is consistent and (IMO) a more plausible understanding of the facts as presented, informed by IRL neuroscience and philosophy.

Props for mentioning that LLMs are worse than the Whispering Earring, though - and indeed, I'm unwilling to use them precisely because of this nonequivalence.

I use them all the time, but never uncritically. They are not superintelligences, but that doesn't stop them from being handy.

The Matrix has you, self_made_human. Or, at least, the social media algorithms do. Take off that earring; it's misaligned.

Unfortunately, I do care about establishing myself as a writer outside of niche internet forums. That's aligned with my own interests, and given that I've reproduced the entire essay here with an unobtrusive call to action at the bottom, I think I've aligned myself with what I perceive are the interests and convenience of the typical reader. If I didn't care about my work being read or signal boosted, it would never leave my note taking app.

They usually predate (or are preyed on) by twinks. Most of the time, it doesn't come to police attention as the damage is limited to the psyche.

Thanks, and the same to you and @HereAndGone2

Be the American the Japanese imagine you to be.

redtail_hawk.wav

Probably? It's a reasonably diplomatic way of saying so. I'd also assume they meant that you are functional and successful despite the diagnosis.

I do not think that being a nerdy (possibly) autistic boy in an actual ghetto is ever a fun time, so I'm sorry you had to go through that but very happy you made it out intact.

I wouldn't even particularly advise you to go get a formal assessment done, at least if you don't see a need for it. Other than closure, for someone like you, all we can really offer is a label and (perhaps) a stronger case for workplace adjustments. If you're already doing fine and feel functional, what's the point?

I fucked around and found out. Being charitable to myself, it was a learning experience.

Sigh. I guess it's time for me to look at foreskin reconstruction options.

When I got diagnosed a few years back, the practitioner said that I had "a lot of well developed coping strategies".

There are only so many sincere compliments we can give without running afoul of the code of conduct. I'd take it.

Quite the opposite. It's most commonly seen in depression, though anabolic steroid abuse does lead to depression, sometimes.

Basically, you know the intuition that depressed people seem to move and speak slower? It's quite true, at least when the depression is severe enough.