@self_made_human's banner p

self_made_human

amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi

15 followers   follows 0 users  
joined 2022 September 05 05:31:00 UTC

I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.

At any rate, I intend to live forever or die trying. See you at Heat Death!

Friends:

A friend to everyone is a friend to no one.


				

User ID: 454

self_made_human

amaratvaṃ prāpnuhi, athavā yatamāno mṛtyum āpnuhi

15 followers   follows 0 users   joined 2022 September 05 05:31:00 UTC

					

I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.

At any rate, I intend to live forever or die trying. See you at Heat Death!

Friends:

A friend to everyone is a friend to no one.


					

User ID: 454

You Might Be Cooked (And So Am I)

In AI/ML spaces where I hang around (mostly as a humble lurker), there have been rumors that the recent massive uptick in valid and useful submissions for critical bugfixes might be attributable to a frontier AI company.

I specify "valid" and "useful", because most OSS projects have been inundated with a tide of low-effort, AI generated submissions. While these particular ones were usually not tagged as AI by the authors, they were accepted and acted-upon, which sets a floor on their quality.

Then, after the recent Claude Code leak, hawk-eyed reviewers noted that Anthropic had internal flags that seemed to prevent AI agents disclosing their involvement (or nature) when making commits. Not a feature exposed to the general public, AFAIK, but reserved for internal use. This was a relatively minor talking point compared to the other juicy tidbits in the code.

Since Anthropic just couldn't catch a break, an internal website was leaked, which revealed that they were working on their next frontier model, codenamed either Mythos or Capybara (both names were in internal use). This was... less than surprising. Everyone and their dog knows that the labs are working around the clock on new models and training runs. Or at least my pair do. What was worth noting was that Anthropic had, for the last few years, released 3 different tiers of model - Haiku, Sonnet and Opus, in increasing order of size and capability (and cost). But Mythos? It was presented as being plus ultra, too good to simply be considered the next iteration of Opus, or perhaps simply too expensive (Anthropic tried hard to explain that the price was worth it).

But back to the first point: why would a frontier company do this?

Speculation included:

  • A large breakthrough in cyber-security capabilities, particularly in offense (but also in defense) which meant a serious risk of users with access to the models quickly being able to automate the discovery and exploitation of long dormant vulnerabilities, even in legacy code with plenty of human scrutiny.
  • This would represent very bad press, similar to Anthropic's headache after hackers recently used Claude against the Mexican government. It's one thing to have your own tooling for vetted users or approved government use, it's another for every random blackhat to use it in that manner. You cannot release it to the general public yet - the capability jump is large enough that the offensive applications are genuinely concerning before you have defensive infrastructure in place. But the vulnerabilities it's finding exist right now, in production code running on critical systems worldwide. You cannot un-find them. And you have no particular reason to believe you are the only actor who will eventually find them.
  • Thus, if a company notices that their next model is a game-changer, it might be well worth their time to proactively fix bugs with said model. While the typical OSS maintainer is sick and tired of junk submissions, they'd be far more receptive when actual employees of the larger companies vouch for their AI-assisted or entirely autonomous work (and said companies have probably checked to make sure their claims hold true).
  • And, of course, street cred and goodwill. Something the companies do need, with increasing polarization on AI, including in their juiciest demographic: programmers.

I noted this, but didn't bother writing it up because, well, they were rumors, and I've never claimed to be a professional programmer.

And now I present to you:

Project Glasswing by Anthropic

Today we’re announcing Project Glasswing1, a new initiative that brings together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks in an effort to secure the world’s most critical software. We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser.* Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.

..

Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

Examples given:

Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;

It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;

The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.

We have reported the above vulnerabilities to the maintainers of the relevant software, and they have all now been patched. For many other vulnerabilities, we are providing a cryptographic hash of the details today (see the Red Team blog), and we will reveal the specifics after a fix is in place.

Well. How about that. I wish the skeptics good luck, someone's going to be eating their hat very soon, and it's probably not going to be me. I'll see you in the queue for the dole. Being right about these things doesn't really get me out of the lurch either, Cassandra's foresight brought about no happy endings for anyone involved. I am not that pessimistic about outcomes, in all honesty, but the train shows no signs of stopping.

floating point accuracy" is the accuracy possible with a certain number of bits. As soon as you say that you have "8-bit" numbers, that immediately defines what floating point accuracy is. And so every 8-bit model has 8-bit floating point accuracy and can never possibly have 64-bit floating point accuracy.

Fair enough, my apologies for the sloppy wording.

Intuitions about continuous functions very rarely apply to non-continuous functions.

I mean, I do know what a continuous or a differentiable function is, but what precisely is the intuition that is being violated here? Is it even one I hold? Otherwise I don't see the point of saying that (at least to me), though I'm not complaining about a crash course in mathematics. As far as I'm aware, there is genuine debate on whether the universe (or at least space-time) is discrete or smooth at a fundamental level, but that doesn't change anything of significance in my daily life.

Post-training quantization is often enough to get 8-bit models close to floating-point accuracy.

Sorry, I wrote that while rather sleep deprived, though I'm not sure what doesn't make sense about it?

What I was trying to say is that it's regular practice to quantize models down significantly, switching from FP32 to INT8 without significant degradation in quality. You can go even harder, people do 4-bit quantization these days, and I'm pretty sure I read about others claiming to quantize down to a single bit.

I don't think any amount of word smithing can get around this disagreement or make people change their minds about the level of epsilon that seems reasonable to them. In principle, though, I can imagine some hypothetical experiments where we actually copy people with different levels of epsilons, observe the resulting behavior, and that this might actually be able to convince people that a certain epsilon is appropriate.

I don't think so. I feel that pointing out that if you are an arbitrary X% different from who you were and who will be, while a biological human, and then you have some reasonable metric for identifying the delta between the biological you (or the last recorded form, after destructive scanning), then there is little grounds to claim that you're not the same "person". And once we're comparing digital copies, there are plenty of already established metrics, I'd wager that KL divergence or something similar might come in handy when assessing only behavior or cognitive output for fixed stimuli. Or something close to a perceptual hash function.

I am closer, right now, to the person I was a second ago than the person I was a week ago, or the person I'll be next month. This is fine. This is entirely unremarkable, and taken for granted by just about everybody who wasn't hit by a bus in the interim. But the point is that I consider this grounds to accept (bounded) deviations from ground truth in a subsequent digital copy as not a particularly big deal. If someone demands something even closer? Well, that's their prerogative. They just have to justify (at least to themselves) why they don't mind dying and becoming a new person every few days, weeks or years. If a version of me from 20 years ago or 20 years in the future showed up, we'd get along and we'd look after each other. I'm happy with that, even if I can't pin-point a specific boundary where I wouldn't identify with divergent forks.

Lines of code.

  1. I do not need perfect accuracy (or operation on real numbers). Why would I? We run simulations all the time, and while accuracy is desirable, the brain itself is an intrinsically noisy and stochastic entity. It isn't perfectly self-similar from moment to moment, and when you consider measurement error, the gains from additional 9s of accuracy drop off precipitously. A night's sleep does not change who I consider myself to be as a person to any meaningful degree.
  2. I don't need that formal proof that the copy is perfect. Close enough works for government work, and it also works for me, but probably for a closer value.
  3. In other words, you're conflating exact representation with sufficient representation, which is what I care about, and which is significantly more tractable.

https://www.quantamagazine.org/how-computationally-complex-is-a-single-neuron-20210902/

They started by creating a massive simulation of the input-output function of a type of neuron with distinct trees of dendritic branches at its top and bottom, known as a pyramidal neuron, from a rat’s cortex. Then they fed the simulation into a deep neural network that had up to 256 artificial neurons in each layer. They continued increasing the number of layers until they achieved 99% accuracy at the millisecond level between the input and output of the simulated neuron. The deep neural network successfully predicted the behavior of the neuron’s input-output function with at least five — but no more than eight — artificial layers. In most of the networks, that equated to about 1,000 artificial neurons for just one biological neuron.

You're also overstating with the scaling objection. It is true that in many domains better approximation can cost much more compute. But that does not show that the relevant personal-level properties require astronomically fine precision. In modern ML, quantization is a routine example of this. Post-training quantization is often enough to get 8-bit models close to floating-point accuracy. You do lose performance and fidelity if you push things too far, but the tradeoff can be handled sensibly and save a lot of compute or memory.

Yes, you probably cannot get a formal proof that an earring is an epsilon-close continuation of you. But we do not demand formal proofs for identity anywhere else. We do not prove that the person waking up after sleep, anaesthesia, intoxication, or an episode of delirium is “really” the same person in a theorem-checking sense.

I am okay with a blackbox/behavioral approach if mechanistic understanding or similar metrics aren't an option. Does the new copy behave in a manner consistent with me, for the same set of stimuli? How consistent? True perfection simply doesn't matter. I am not a perfect copy of myself from moment to moment anyway, even as a biological human. That makes these objections moot as far as I can tell.