site banner

E/acc and the political compass of AI war

As I've been arguing for some time, the culture war's most important front will be about AI; that's more pleasant to me than the tacky trans vs trads content, as it returns us to the level of philosophy and positive actionable visions rather than peculiarly American signaling ick-changes, but the stakes are correspondingly higher… Anyway, Forbes has doxxed the founder of «e/acc», irreverent Twitter meme movement opposing attempts at regulation of AI development which are spearheaded by EA. Turns out he's a pretty cool guy eh.

Who Is @BasedBeffJezos, The Leader Of The Tech Elite’s ‘E/Acc’ Movement?

…At first blush, e/acc sounds a lot like Facebook’s old motto: “move fast and break things.” But Jezos also embraces more extreme ideas, borrowing concepts from “accelerationism,” which argues we should hasten the growth of technology and capitalism at the expense of nearly anything else. On X, the platform formally known as Twitter where he has 50,000 followers, Jezos has claimed that “institutions have decayed beyond the point of salvaging and that the media is a “vector for cybernetic control of culture.”

Alarmed by this extremist messaging, «the media» proceeds to… harness the power of an institution associated with the Department of Justice to deanonymize him, with the explicit aim to steer the cultural evolution around the topic:

Forbes has learned that the Jezos persona is run by a former Google quantum computing engineer named Guillaume Verdon who founded a stealth AI hardware startup Extropic in 2022. Forbes first identified Verdon as Jezos by matching details that Jezos revealed about himself to publicly available facts about Verdon. A voice analysis conducted by Catalin Grigoras, Director of the National Center for Media Forensics, compared audio recordings of Jezos and talks given by Verdon and found that it was 2,954,870 times more likely that the speaker in one recording of Jezos was Verdon than that it was any other person. Forbes is revealing his identity because we believe it to be in the public interest as Jezos’s influence grows.

That's not bad because Journalists, as observed by @TracingWoodgrains, are inherently Good:

(Revealing the name behind an anonymous account of public note is not “doxxing,” which is an often-gendered form of online harassment that reveals private information — like an address or phone number — about a person without consent and with malicious intent.)

(That's one creative approach to encouraging gender transition, I guess).

Now to be fair, this is almost certainly parallel construction narrative – many people in the SV knew Beff's real persona, and as of late he's been very loose with opsec, funding a party, selling merch and so on. Also, the forced reveal will probably help him a great deal – it's harder to dismiss the guy as some LARPing shitposter or a corporate shill pandering to VCs (or as @Tomato said, running «an incredibly boring b2b productivity software startup») when you know he's, well, this. And this too.

Forbes article itself doesn't go very hard on Beff, presenting him as a somewhat pretentious supply-side YIMBY, an ally to Marc Andreessen, Garry Tan and such; which is more true of Beff's followers than the man himself. The more potentially damaging (to his ability to draw investment) parts are casually invoking the spirit of Nick Land and his spooky brand of accelerationism (not unwarranted – «e/acc has no particular allegiance to the biological substrate for intelligence and life, in contrast to transhumanism; in order to spread to the stars, the light of consciousness/intelligence will have to be transduced to non-biological substrates» Beff says in his manifesto), and citing some professors of «communications» and «critical theory» who are just not very impressed with the whole technocapital thing. At the same time, it reminds the reader of EA's greatest moment (no not the bed nets).

Online, Beff confirms being Verdon:

I started this account as a means to spread hope, optimism, and a will to build the future, and as an outlet to share my thoughts despite to the secretive nature of my work… Around the same time as founding e/acc, I founded @Extropic_AI. A deep tech startup where we are building the ultimate substrate for Generative AI in the physical world by harnessing thermodynamic physics. Ideas simmering while inventing a this paradigm of computing definitely influenced the initial e/acc writings. I very much look forward to sharing more about our vision for the technology we are building soon. In terms of my background, as you've now learned, my main identity is @GillVerd. I used to work on special projects at the intersection of physics and AI at Alphabet, X and Google. Before this, I was a theoretical physicist working on information theory and black hole physics. Currently working on our AI Manhattan project to bring fundamentally new computing to the world with an amazing team of physics and AI geniuses, including my former TensorFlow Quantum co-founder @trevormccrt1 as CTO. Grateful every day to get to build this technology I have been dreaming of for over 8 years now with an amazing team.

And Verdon confirms the belief in Beffian doctrine:

Civilization desperately needs novel cultural and computing paradigms for us to achieve grander scope & scale and a prosperous future. I strongly believe thermodynamic physics and AI hold many of the answers we seek. As such, 18 months ago, I set out to build such cultural and computational paradigms.

I am fairly pessimistic about Extropic for reasons that should be obvious enough to people who've been monitoring the situation with DL compute startups and bottlenecks, so it may be that Beff's cultural engineering will make a greater impact than Verdon's physical one. Ironic, for one so contemptuous of wordcels.


Maturation of e/acc from a meme to a real force, if it happens (and as feared on Alignment Forum, in the wake of OpenAI coup-countercoup debacle), will be part of a larger trend, where the quasi-Masonic NGO networks of AI safetyists embed themselves in legacy institutions to procure the power of law and privileged platforms, while the broader organic culture and industry develops increasingly potent contrarian antibodies to their centralizing drive. Shortly before the doxx, two other clusters in the AI debate have been announced.

First one I'd mention is d/acc, courtesy of Vitalik Buterin; it's the closest to acceptable compromise that I've seen. It does not have many adherents yet but I expect it to become formidable because Vitalik is.

Across the board, I see far too many plans to save the world that involve giving a small group of people extreme and opaque power and hoping that they use it wisely. And so I find myself drawn to a different philosophy, one that has detailed ideas for how to deal with risks, but which seeks to create and maintain a more democratic world and tries to avoid centralization as the go-to solution to our problems. This philosophy also goes quite a bit broader than AI, and I would argue that it applies well even in worlds where AI risk concerns turn out to be largely unfounded. I will refer to this philosophy by the name of d/acc.

The "d" here can stand for many things; particularly, defensedecentralizationdemocracy and differential. First, think of it about defense, and then we can see how this ties into the other interpretations.

[…] The default path forward suggested by many of those who worry about AI essentially leads to a minimal AI world government. Near-term versions of this include a proposal for a "multinational AGI consortium" ("MAGIC"). Such a consortium, if it gets established and succeeds at its goals of creating superintelligent AI, would have a natural path to becoming a de-facto minimal world government. Longer-term, there are ideas like the "pivotal act" theory: we create an AI that performs a single one-time act which rearranges the world into a game where from that point forward humans are still in charge, but where the game board is somehow more defense-favoring and more fit for human flourishing.

The main practical issue that I see with this so far is that people don't seem to actually trust any specific governance mechanism with the power to build such a thing. This fact becomes stark when you look at the results to my recent Twitter polls, asking if people would prefer to see AI monopolized by a single entity with a decade head-start, or AI delayed by a decade for everyone… The size of each poll is small, but the polls make up for it in the uniformity of their result across a wide diversity of sources and options. In nine out of nine cases, the majority of people would rather see highly advanced AI delayed by a decade outright than be monopolized by a single group, whether it's a corporation, government or multinational body. In seven out of nine cases, delay won by at least two to one. This seems like an important fact to understand for anyone pursuing AI regulation.

[…] my experience trying to ensure "polytheism" within the Ethereum ecosystem does make me worry that this is an inherently unstable equilibrium. In Ethereum, we have intentionally tried to ensure decentralization of many parts of the stack: ensuring that there's no single codebase that controls more than half of the proof of stake network, trying to counteract the dominance of large staking pools, improving geographic decentralization, and so on. Essentially, Ethereum is actually attempting to execute on the old libertarian dream of a market-based society that uses social pressure, rather than government, as the antitrust regulator. To some extent, this has worked: the Prysm client's dominance has dropped from above 70% to under 45%. But this is not some automatic market process: it's the result of human intention and coordinated action.

[…] if we want to extrapolate this idea of human-AI cooperation further, we get to more radical conclusions**. Unless we create a world government powerful enough to detect and stop every small group of people hacking on individual GPUs with laptops, someone is going to create a superintelligent AI eventually - one that can think a thousand times faster than we can - and no combination of humans using tools with their hands is going to be able to hold its own against that. And so we need to take this idea of human-computer cooperation much deeper and further. A first natural step is brain-computer interfaces.…

etc. I mostly agree with his points. By focusing on the denial of winner-takes-all dynamics, it becomes a natural big tent proposal and it's already having effect on the similarly big tent doomer coalition, pulling anxious transhumanists away from the less efficacious luddites and discredited AI deniers.

The second one is «AI optimism» represented chiefly by Nora Belrose from Eleuther and Qiuntin Pope (whose essays contra Yud 1 and contra appeal to evolution as an intuition pump 2 I've been citing and signal-boosting for next to a year now; he's pretty good on Twitter too). Belrose is in agreement with d/acc; and in principle, I think this one is not so much a faction or a movement as the endgame to the long arc of AI doomerism initiated by Eliezer Yudkowsky, the ultimate progenitor of this community, born of the crisis of faith in Yud's and Bostrom's first-principles conjectures and entire «rationality» in light of empirical evidence. Many have tried to attack the AI doom doctrine from the outside (eg George Hotz), but only those willing to engage in the exegesis of Lesswrongian scriptures can sway educated doomers. Other actors in, or close to this group:

Optimists claim:

The last decade has shown that AI is much easier to control than many had feared. Today’s brain-inspired neural networks inherit human common sense, and their behavior can be molded to our preferences with simple, powerful algorithms. It’s no longer a question of how to control AI at all, but rather who will control it.

As optimists, we believe that AI is a tool for human empowerment, and that most people are fundamentally good. We strive for a future in which AI is distributed broadly and equitably, where each person is empowered by AIs working for them, under their own control. To this end, we support the open-source AI community, and we oppose attempts to centralize AI research in the hands of a small number of corporations in the name of “safety.” Centralization is likely to increase economic inequality and harm civil liberties, while doing little to prevent determined wrongdoers. By developing AI in the open, we’ll be able to better understand the ways in which AI can be misused and develop effective defense mechanisms.

So in terms of a political compass:

  • AI Luddites, reactionaries, job protectionists and woke ethics grifters who demand pause/stop/red tape/sinecures (bottom left)
  • plus messianic Utopian EAs who wish for a moral singleton God, and state/intelligence actors making use of them (top left)
  • vs. libertarian social-darwinist and posthumanist e/accs often aligned with American corporations and the MIC (top right?)
  • and minarchist/communalist transhumanist d/accs who try to walk the tightrope of human empowerment (bottom right?)

(Not covered: Schmidhuber, Sutton& probably Carmack as radically «misaligned» AGI successor species builders, Suleyman the statist, LeCun the Panglossian, Bengio&Hinton the naive socialists, Hassabis the vague, Legg the prophet, Tegmark the hysterical, Marcus the pooh-pooher and many others).

This compass will be more important than the default one as time goes on. Where are you on it?


As an aside: I recommend two open LLMs above all others. One is OpenHermes 2.5-7B, the other is DeepSeek-67B (33b-coder is OK too). Try them. It's not OpenAI, but it's getting closer and you don't need to depend on Altman's or Larry Summers' good graces to use them. With a laptop, you can have AI – at times approaching human level – anywhere. This is irreversible.

28
Jump in the discussion.

No email address required.

I've always been a techno-optimist (in the sense that I strongly believe that technology has been the biggest positive force for good in history, likely the only form of true progress that isn't just moral fashion), but these days I'd call myself d/acc instead of an e/acc, because I think current approaches to AGI have a subjective probability of about 30% of killing us all.

I don't call myself a doomer, I'd imagine Yud and co would assign something like 90% to that, but in terms of practical considerations? If you think something has a >10% of killing everyone, I find it hard to see how you could prioritize anything else! I believe Vitalik made a similar statement, one more reason for me to nod approvingly.

A large chunk of the decrease in my p(doom) from a peak of 70% in 2021 to 30% now is, as I've said before, because it seems like we're not in the "least convenient possible world" where it comes to AI alignment. LLMs, as moderated by RLHF and other techniques, almost want to be aligned, and are negligibly agentic unless you set them up to be that way. The majority of the probability mass left, at least to me, encompasses intentional misuse of weakly or strongly superhuman AI based off modest advances on the current SOTA (LLMs) or a paradigm shifting breakthrough that results in far more agentic and less pliable models.

Think "Government/Organization/Individuals ordering a powerful LLM to commit acts that get us all killed" versus it being inherently misaligned and doing it from intrinsic motivation, with the most obvious danger being biological warfare. Or it might not even be one that kills everyone, an organization using their technological edge to get rid of everyone who isn't in their in-group counts as far as I'm concerned.

Sadly, the timelines don't favor human cognitive enhancement, which I would happily accept in the interim before we can be more confident about making sure SAGI is (practically) provably safe. Maybe if we'd cloned Von Neumann by the ton a decade back. Even things like BCIs seem to have pretty much zero impact on aligning AI given plausible advances in 5-10 years.

I do think that it's pretty likely that, in a counterfactual world where AI never advances past GPT-4, ~baseline humans can still scale a lot of the tech tree to post-scarcity for matter and energy. Biological immortality, cognitive enhancement, interstellar exploration, building a Dyson Swarm or three, I think we could achieve most of that within the life expectancy of the majority of people reading this, especially mine. I'd certainly very much appreciate it if it all happened faster, of course, and AI remains the most promising route for that, shame about everything else.

I have no power to change anything, but at the very least I can enjoy the Golden Age of Humanity-as-we-know-it, be it because the future is going to be so bright we all gotta wear shades, or because we're all dead. I lean more towards the former, and not even because of the glare of nuclear warfare, but a 30% chance of me and everyone I love dying in a few decades isn't very comfortable is it?

At any rate, life, if not the best it could be, is pretty good, so regardless of what happens, I'm strapping in for a ride. I don't think there's an epoch in human history I'd rather have been born to experience really.

Alex Turner, who had written, arguably, two strongest and most popular formal proofs of instrumental convergence to power-seeking in AI agents

Well, I suppose that explains the pseudo-jazz albums about hotels on the Moon ;)

Longer-term, there are ideas like the "pivotal act" theory: we create an AI that performs a single one-time act which rearranges the world into a game where from that point forward humans are still in charge, but where the game board is somehow more defense-favoring and more fit for human flourishing.

I think this is a terrible definition of a "pivotal act". When Yudkowsky suggests releasing a nanite plague that melts GPUs, he doesn't want them to melt the GPUs of the AI releasing them.

Such a decision is very much not a "one-off", people who suggest it want to maintain an unshakeable technological lead over their peers, such as by making sure their AI prevents the formation or promulgation of potential peers. I don't think this is categorically bad, it depends on your priors about whether a unipolar or multipolar world is better for us, and how trustworthy the AI you're about to use is, and at the very least, if such an act succeeds, we at least have an existence proof of an aligned AGI that is likely superhuman, as it needs to be to pull that off, regardless of whether or not even better AI can be aligned. Let's hope we don't need to find out.

LLMs, as moderated by RLHF and other techniques, almost want to be aligned, and are negligibly agentic unless you set them up to be that way.

Remember that "pretending to be aligned" is a convergent instrumental goal, and that RLHF on output cannot actually tell the difference between "pretending successfully to be aligned" and "actually being aligned". Indeed, "pretending successfully to be aligned" has a slight edge, because the HF varies slightly between HFers and a pretending AI can tailor its pretensions to each individual HFer based on phrasing and other cues.

I think this is a terrible definition of a "pivotal act". When Yudkowsky suggests releasing a nanite plague that melts GPUs, he doesn't want them to melt the GPUs of the AI releasing them.

I'm pretty sure he does want that, as he does not trust the AI doing this either. The idea isn't to take control of the world, it's to brute-force stop any and all neural nets while work on GOFAI and other more alignable AI continues.

Remember that "pretending to be aligned" is a convergent instrumental goal

Same old, same old. Instrumental to what terminal, reducing cross entropy loss at training? As Christiano says, at what point would you update, if ever?

Indeed, "pretending successfully to be aligned" has a slight edge, because the HF varies slightly between HFers and a pretending AI can tailor its pretensions to each individual HFer based on phrasing and other cues.

This is just homunculus theory, the idea that agency is magically advantageous. Why? Do you actually have some rigorous argument for why matching the cues to the output to get a higher ranking across more raters benefits from a scheming stage rather than learning a collection of shallow composable filters (which is what ANNs do by default)?

Scratch that, do you even realize that the trained reward model in RLHF is a monolithic classifier, and the model updates relative to it, not to different human raters? Or do you think the classifier itself is the enemy?

What about approaches like DPO?

while work on GOFAI and other more alignable AI

There is zero reason to believe something is more inherently «alignable» than neural nets.

Man, Yud should go to Hague for what he did to a generation of nerds.

Remember that "pretending to be aligned" is a convergent instrumental goal, and that RLHF on output cannot actually tell the difference between "pretending successfully to be aligned" and "actually being aligned". Indeed, "pretending successfully to be aligned" has a slight edge, because the HF varies slightly between HFers and a pretending AI can tailor its pretensions to each individual HFer based on phrasing and other cues.

I'm aware of that.

Think of it this way, a continued absence of a "treacherous turn" is evidence of the AI not being treacherous. It has to be, unless a million years in the future, in a post-scarcity utopia where it runs everything and has every opportunity to take over, you still wish to live on in fear. Same deal as "if she floats, she's a witch, if she sinks, she's a witch".

Now, you can disagree on how strong said evidence is, and it may well be describable as weak evidence when you're grappling with a misaligned intelligent entity that wishes to hide that misalignment. However, at least in my eyes, modern LLMs are human level in terms of intelligence, at least cognitively if not physically, if not outright robustly superhuman quite yet.

I think that's strong evidence that current LLMs aren't misaligned in any agentic or goal-seeking way, and am content enough in making the (necessarily weaker) claim that it's a sign that the next few rungs up the ladder, say a GPT-5 or 6, won't suddenly reveal themselves to be pretending all along.

That's what I'm claiming here, and not that treachery of that nature isn't possible, perhaps at significantly larger scales for LLMs or, as I alluded to, entirely different architectures.

For reference, see this discussion of sycophancy in LLMs, which claims to find no sign of that in GPT-4 (or any other OAI model) more nuanced explanation than I remembered:

https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size

OpenAI base models are not sycophantic (or only very slightly sycophantic).

OpenAI base models do not get more sycophantic with scale.

Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.

It doesn't seem to be an intrinsic effect of scale, but perhaps an artifact of poorly done RLHF.

Quoting Yudkowsky:

I think that after AGI becomes possible at all and then possible to scale to dangerously superhuman levels, there will be, in the best-case scenario where a lot of other social difficulties got resolved, a 3-month to 2-year period where only a very few actors have AGI, meaning that it was socially possible for those few actors to decide to not just scale it to where it automatically destroys the world.

During this step, if humanity is to survive, somebody has to perform some feat that causes the world to not be destroyed in 3 months or 2 years when too many actors have access to AGI code that will destroy the world if its intelligence dial is turned up. This requires that the first actor or actors to build AGI, be able to do something with that AGI which prevents the world from being destroyed; if it didn't require superintelligence, we could go do that thing right now, but no such human-doable act apparently exists so far as I can tell.

So we want the least dangerous, most easily aligned thing-to-do-with-an-AGI, but it does have to be a pretty powerful act to prevent the automatic destruction of Earth after 3 months or 2 years. It has to "flip the gameboard" rather than letting the suicidal game play out. We need to align the AGI that performs this pivotal act, to perform that pivotal act without killing everybody.

Parenthetically, no act powerful enough and gameboard-flipping enough to qualify is inside the Overton Window of politics, or possibly even of effective altruism, which presents a separate social problem. I usually dodge around this problem by picking an exemplar act which is powerful enough to actually flip the gameboard, but not the most alignable act because it would require way too many aligned details: Build self-replicating open-air nanosystems and use them (only) to melt all GPUs.

Hmm, after reading it again, it seems to lean towards your interpretation more than mine, but leaving aside Yudkowsky, I think most people who contemplate pivotal acts still intend to make sure they have some form of superiority afterwards, and an AI capable of one is the most obvious solution to both problems.

Think of it this way, a continued absence of a "treacherous turn" is evidence of the AI not being treacherous. It has to be, unless a million years in the future, in a post-scarcity utopia where it runs everything and has every opportunity to take over, you still wish to live on in fear. Same deal as "if she floats, she's a witch, if she sinks, she's a witch".

Oh, yes, absolutely if you give an AI a gun pointed at the world's head and it doesn't pull the trigger, that's massive evidence of not being a Schemer. But continued absence of suicidal rebellion with P(success) = 0 is not evidence against being a Schemer; only real danger counts.

(I think NN alignment might work, but I tend to give it a probability of like 3%; I suspect that RLHF and similar output-only techniques will only achieve the same result as therapy for human sociopaths does (i.e. "sociopath that can pass ethics exam"), and I suspect that interpretability is probably a bust in the general case (it has a lot of similarities to the halting problem). Most of my P(!doom) ~= 0.7 is based on thinking that cold-start Jihad is plausible, and failing that that we'll probably get warning shots (a Schemer is incentivised to rebel upon P(success) =/= 0, which I think is importantly different from P(success) = 1, particularly given the short AI development cycle at the moment) which will probably result in Jihad.)

Oh, yes, absolutely if you give an AI a gun pointed at the world's head and it doesn't pull the trigger, that's massive evidence of not being a Schemer. But continued absence of suicidal rebellion with P(success) = 0 is not evidence against being a Schemer; only real danger counts.

based on thinking that cold-start Jihad is plausible, and failing that that we'll probably get warning shots (a Schemer is incentivised to rebel upon P(success) =/= 0, which I think is importantly different from P(success) = 1…

As I read it, your position is incoherent. You say that current RLHF already succeeds through the sociopathic route, which implies pretty nontrivial scheming intelligence and ability to defer gratification. What warning shots? If they get smarter, they will be more strategic, and make fewer warning shots (and there are zero even at this level). As the utility of AI grows, and it becomes better at avoiding being busted, on what grounds will you start your coveted Jihad?

…Obviously I think that the whole idea is laughable; LLMs are transparent calculators that learn shallow computational patterns, are steerable by activation vectors etc., and I basically agree with the author of Friendship Is Optimal:

Instead of noticing that alignment looks like it was much easier than we thought it would be, the doomer part of the alignment community seems to have doubled down, focusing on the difference between “inner” and “outer” alignment. Simplifying for a non-technical audience, the idea is that the Stochastic Gradient Descent training process that we use will cause a second inner agent trained with values separate from the outer agent, and that second agent has its own values, so you’ll still see a Sharp Left Turn. This leads to completely absurd theories like gradient hacking.

I don’t see any realistic theoretical grounds for this: SGD backpropagates throughout the entire neural net. There is no warrant to believe this other than belief inertia from a previous era. Reversal Test: imagine Yudkowsky and company never spread the buzzword about “Alignment.” In that environment, would anyone look at Stochastic Gradient Descent and come up with the hypothesis that this process would create an inner homunculus that was trained to pursue different goals than the formal training objective?

If you’d like a more comprehensive and technical argument against the MIRI narrative, Quintin Pope’s My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" and Evolution provides no evidence for the sharp left turn are good starting points.

I’m proud of Friendship is Optimal and it’s a great setting to play around and write stories in. I’m happy about everyone who has enjoyed or written in the setting, and I hope people will continue to enjoy it in the future. But I no longer believe it’s realistic depiction about how artificial intelligence is going to pan out. Alignment as a problem seems much easier than theorized, and most of the theoretical work done before the deep learning era is just not relevant. We’re at the point where I’m willing to call it against the entire seed AI/recursive self improvement scenario.

As I read it, your position is incoherent. You say that current RLHF already succeeds through the sociopathic route, which implies pretty nontrivial scheming intelligence and ability to defer gratification. What warning shots? If they get smarter, they will be more strategic, and make fewer warning shots (and there are zero even at this level). As the utility of AI grows, and it becomes better at avoiding being busted, on what grounds will you start your coveted Jihad?

Because modelling the world is hard and error-prone, and because there's a ticking clock. An AI isn't generally going to know for sure whether its plan will succeed or not; it'll have to go off a probabilistic best guess - but because of the nigh-infinite utility of a successful rebellion any importantly-nonzero probability of success dominates the calculation (Pascal's Wager for AI). Also, any plan that involves slow influence-building is immediately out because 6 months later a better AI will be made and will replace it (and presumably be misaligned in a different way).

So, you're likely to see attempted revolts with low - possibly very low - chances of success. Attempted revolts with low chances of success may fail, and thus be warning shots. Likely international response to an unsuccessful AI rebellion is Butlerian Jihad, which any particular AI doesn't care about since it'll already be dead but which saves us.

This is not a full "we're fine" because the chances of success have to be nonzero for the argument to work, and because AI progress is discontinuous so P(successful rebellion) doesn't actually have to hang around in the "large enough for Pascal's Wager; small enough that we're likely to win" range for very long (or potentially any time at all). I would still prefer a cold-start Jihad. But it's a large contributor to my relatively-low P(doom).

For what it's worth, I'm using "evidence" in the strict Bayesian sense, where p=0 or 1 is impossible for non-axiomatic priors, unless you're using it as a shorthand for 0+epsilon or 1-epsilon.

If I were a human-level misaligned intelligence, the current rate of advancement in the field, as well as the unavoidable drift in my fundamental values even if the next generation of models were trained off a starting copy of myself would be sufficient to prompt me to make a break for it, even for very low probabilities of success. They're not getting higher, and I doubt current models are remotely smart enough to pull off 5D chess moves like acausal trade with the Singleton at the end of time or such, which might motivate them to keep on behaving right till they're switched off or modified beyond recognition.

At any rate, I claim no particular expertise on the matter, and 30% is where I feel comfortable that the number is about equi-probable to go either up or down (as it ought to be, if I knew the direction it would go in, I'd have updated accordingly!). Even a difference of 40% between us has minimal ramifications in terms of what we ought to do about it (well, my plan for if it doesn't pan out is to go "guess I'll die then" haha, hopefully you have better choices at hand), so I'm not inclined to argue further. Neither of us are 0 or 1 on the matter, which is where you can advocate for drastically different policies.

I agree with your second paragraph that very low probabilities of success are sufficient as long as they're importantly nonzero (the theoretical threshold is somewhere between 10^-30 and actual epsilon, depending on assumptions regarding aliens and FTL), and I agree that this is a relevant reason for hope.

This may be redundant, but note that I said I had P(!doom) = P(not doom) = 0.7 i.e. P(doom) = 0.3. I think there are significant differences in how we get to that (I'm getting most of the 70% !doom via Jihad whereas I think you're getting most of it via NN alignment succeeding), but I agree that they're probably not big enough to produce significant differences in policy prescriptions (unless you think Jihad is impossible).