Shirayuki2
new account of Shirayuki, lost old password
User ID: 4180
I think it just depends on how adversarial the judging ends up being rather than saying anything about capabilities.
If the judges have to stay within guardrails then 3.5 could probably win the longbet, but if they're allowed to exploit jailbreaks or known LLM failure cases, then nothing short of ASI is going to pass the test.
This is much more maximalist than even the AI 2027 crowd or the actual frontier labs themselves but I respect providing specific events and an end date on the prediction.
My own expectation is that none of this happens by the end of 2027 except a tranche of models that are notably better at RLVR'd tasks and the Copilot button (shorthand for AI getting more integrated into white-collar workflows). Let's see what happens in 18 months.
their R&D spend is worth it, even if crazy
Well, burning other people's money to try and build a moat is obviously worth it for the frontier labs. It's yet to be seen whether that spending will be worth it in the sense of paying off investors or building the labs a durable lead, or whether the models will end up commoditized and value accruing elsewhere in the stack.
While inference having high margins is true, there are two things to keep in mind here:
Amodei has never said that models are actually profitable on a per-model basis, only that they hypothetically could be. While this might be true, there are trillions of dollars on the line to insinuate that it's true, and personally I wouldn't trust any rumors about financials from a private company who can massage them however they please.
Spending the GDP of a small country on R&D on the promise of getting a commanding lead is why OpenAI and Anthropic have trillion dollar valuations to begin with. There's no such thing as a frontier lab who can cut their exorbitant capex and coast on the margins from inference, as that's a one way road to getting cut-throat commoditized.
I feel like a lot of people in these replies are talking past each other.
My 2 cents:
Are LLM's useful tooling for finding vulnerabilities for security researchers?
Yes, I think this is undeniable at this point; LLM's are exceptional at uncovering software flaws, bugs, and vulnerabilities, and are going to significantly change how cybersecurity is practiced, as can already been seen by how vulnerability disclosures have recently quantitatively spiked like crazy.
Is Mythos better than the other available models at finding and exploiting vulnerabilities?
Yes, Mythos really being a stronger model for cybersecurity applications is almost certainly the case: this XBOW report is a good read on its capabilities.
Is Mythos a super-hacker that's going to break cybersecurity for good?
No, this seems unlikely and driven by good marketing from Anthropic and online hype. Mythos isn't making the Move 37 for cybersecurity or discovering vulnerabilities beyond human comprehension, it's just an iterative improvement over the current tooling combined with a lot more compute and attention suddenly being used to uncover security vulnerabilities. I suspect that the same amount of compute, security researcher attention and buy-in for Project Glasswing applied to the previous generation of frontier models would have uncovered the majority of security issues that Mythos did.
It's also worth noting that there are apparently 11 Curl CVE's in the current release cycle, where the new CVE's did not use Mythos, which seems to disprove the idea that Mythos was not all that effective on Curl because it was uniquely hardened or secure.
Should LLM's being good at vulnerability discovery and theorem proving be an update on LLM's eventually reaching AGI?
YMMV, but to me, the recent headline mathematics and cybersecurity achievements haven't really changed my view that AGI emerging from LLM's seems unlikely. From an outsider's perspective, most of the recent gains in model performance look to have come from RLVR on coding, math and cyber. While very effective at improving performance on those tasks, it seems that RL has largely failed to further generalize intelligence beyond the specific RL'd areas, and if you look at SimpleBench or the AI RP community, seemed to have regressed performance in other areas of intelligence.
I think it's telling that all of the achievements of LLM's being held up over the past ~18 months (METR eval, CCC compiler, theorem proving, cybersecurity), while extremely powerful and which make me bullish on the utility of LLM's, are all tasks limited by requiring an external oracle for verification, and where there's no penalty for failing during intermediate steps. I personally think it's quite likely that LLM's eventually become superhuman at proving theorems and exploiting vulnerabilities given sufficient compute, but still cannot manage a restaurant, write an interesting book or autonomously maintain a software project.
What are some unusual items you've got on your bucket list?
Japan
Losing the war and being occupied by the allies was probably the best thing that could have happened to Japan's economy, as opposed to dragging them down.
The American occupation broke the Japanese military-industrial complex and forced them into exports and free markets, funneled them large sums of capital via Korean war procurement, American market access and technology transfers, and extended Japan the American security umbrella. The Japanese economic miracle wouldn't have been possible without losing the war and getting dragged into a modern economy by force.
India
While they never went full communist like China, there's a good argument that India could be decades ahead of where they are now without Nehru and Gandhi fucking around with poorly implemented socialism. The over-emphasis on heavy industry, licensing and central planning, failing to implement any real land reforms, and essentially being closed off to trade prior to 1991 were practically the completely opposite conditions as to what made the four Asian tigers so successful.
While I agree that India has different issues to East Asia, having a 50 year disadvantage on Japan and a 10 year disadvantage on China in liberalization did them no favors either.
Just wanted to say thanks for posting this.
I've been in a bit of a depressive spiral for the last few days and this post really helped.
My understanding of calvinism vs middle knowledge doctrines was that the main difference is that Calvinism treats free will as compatibilist while Molinism treats free will as libertarian.
Under either doctrine God must still have foreknowledge of children being born with genetic defects that kill them a few days out of the womb and foreknowledge of those destined to die during natural disasters; otherwise how could He be omniscient?
Under theism, you can consistently make choices that go against your self-interest and sacrifice both self-interest and prudence for the sake of morality
Sure, but the point I'm trying to make here is that moral epistemology becomes incoherent, if you abrogate omnibenevolence as understood by human moral intuition. If the meaningless suffering that pervades the world is all part of His plan, then there is no reason to morally privilege doing anything over, say, kicking as many puppies as possible. For all I know that kid drowning to death over there is all just part of the plan, and He instead wants me to kick the shit out of more puppies instead of saving the kid.
Maybe skeptical theists can bite that bullet, but it's not a bullet that I feel capable of accepting myself.
While I think Buddhism in general has useful lessons in reducing suffering even for an atheist, I don't understand at all how western, secular Buddhism can be logically sound without being a suicide cult.
If you accept dukkha, samudaya and nirodha, but you don't believe in the Right View, in the tenets of karma or rebirth, it seems immediately clear that that instead of the Eighfold Path, the much more efficient method to achieving nirodha is the Singlefold Path of a bullet into your skull.
I have to admit I find theodicies appealing to skeptical theism abhorrent, and personally I've never managed to get over the problem of evil.
Why should man do anything good, why should man do anything at all, if human moral intuition is meaningless and the most profane acts can be justified under an unfalsifiable appeal to the greater good?
Rats and ants may not understand the motions of men, but neither should rats and ants have any reason to worship men except for the last argument of kings, the threat of pure brute force and violence.
If God's vision is that children should die screaming in unimaginable pain and that the Ichneumonidae should eat caterpillars from the inside out, then frankly in the footsteps of Ivan Karamazov, I don't want anything offered to me by such a god.
US-China great power competition looks a lot less like god-fearing capitalists vs godless communists and a lot more like godless capitalists with Anglo-Hispanic characteristics vs godless capitalists with Chinese characteristics.
America is not god-fearing in any meaningful sense compared to the America of fifty years ago, and China is not communist in any meaningful sense compared to the China of fifty years ago.
It isn't really arguable that modern society is failing to provide the former to a far larger portion of the population than it did in the past
I agree, but this doesn't really have anything to do with inequality. Most of South America and Africa have vastly higher Gini indexes and much more blatant, corrupt wealth inequality than any developed country yet retain much higher TFR's, while the social democratic Nordic countries living under the law of Jante have amongst the lowest TFR's worldwide. Being rich, free & educated, having the optionality in life to do anything in life at the expense of having children, social atomization and access to smartphones seem like much more causal factors to plummeting rates of family formation.
At the same time, a lot of the visible concentrations of wealth in modern society are nakedly and undeniably antisocial.
I agree that a lot of the aesthetics of the modern wealthy are off-putting, but as I mentioned earlier, "powerful people act in upsetting ways" is not a solvable problem as long as the fundamental ability to concentrate power through technology exists at all. Nobody remembers the man that Luigi killed and nothing changed whatsoever. If it's not the current crop of people seizing the reins of power to enrich themselves, it'll simply be someone else stepping up in their stead.
Furthermore, inequality is a proxy for an uneven distribution of power.
I suppose my view is that wealth isn't power, power is power. Any coalition capable of unseating the billionaire class would by definition hold more power than the current wealthy. I'm not sure it really makes much difference whether it's a Langley spook, Hague bureaucrat, tech billionaire, or CCP party member that holds the reins to ultimate power and status.
As long as technology exists you'll get centralization of power, but as long as centralized technological power doesn't exist you get Haiti or South Sudan, the Hobbesian life in a state of nature.
relative prosperity gates access to some goods that are essential for happiness, like housing and a mate
Extension du domaine de la lutte. The progressive ideal of re-distributing wealth is at least logically possible, but it's fundamentally impossible to re-distribute everyone a big house in the best locations and a high status mate. If being better than others is essential to happiness, then perhaps that is humanity's punishment for eating the forbidden fruit.
Sorry to hear that.
Unfortunately there's a lot of this going on in tech and white collar work as a whole, really, where the LLM's really can't do the work, but some executive assumes they can and so people get chopped in anticipation, or where the company is struggling due to macro-economics or just plain bad management and people get chopped using AI as an excuse.
Best of luck with your other work or with starting the new career.
I was planning to write up a larger top-level effort-post on this topic, but since you've already made the top-level I'll post the notes I was drafting.
For the last few days, I've been reading about the Sam Altman attack drama and the warehouse fire attack that happened recently, and I've been finding the reactions pretty scary. General sentiment on HN is something along the lines of "Altman deserved it" and even among my general leftish acquaintance bubble the vibe is along the lines of "they shouldn't have missed" or "we need more of this fuck the rich" which doesn't really bode well for the stability of society.
Whether or not you believe the more bombastic claims of AI CEO's, I do think it's clear that at minimum AI is going to exacerbate the trend of technology centralizing power, wealth and status, even as absolute material standards have continued to improve beyond the wildest dreams of 99.9% of humanity in the past. For better or for worse, human happiness seems to be tied only lightly to absolute material standards and heavily tied to relative status, position, and feelings of fairness, and the internet and social media are super-stimuli for the human sense of status calibrated towards the Dunbar number.
Ruling out FOOM levels of societal disruption, I can think of a few ways that this plays out.
Left-wing communist populist marxist social democratic total victory: public outcry reaches all-time highs, perhaps with some peasant revolts sprinkled in, and the AOC/Mamdani coalition gets voted in to dismantle the AI labs, big tech and the icky billionaires. Leaving aside the fact that this would annihlate the economy and living standards by proxy, I'm not really convinced that with mass internet and social media there's any gini index or amount of redistribution that would leave the status anxious public satisfied. First they came for the billionaires and then they came for the homeowners.... Certainly comparable democratic countries with half of the gini index of America are still constantly flooded with rhetoric about eating the rich.
Right-wing AI strongman technofeudal democratic backsliding: political violence becomes normalised as a part of day to day life and as a response, perhaps after a significant assassination or riot, a strongman or group of technocrats use the violence as an excuse to seize absolute power, abetted by AI in part or in full. The lumpenproles are kept under control via mass surveillance, drones and guns or killed off entirely. The worst ending, but one that seems depressingly realistic looking at the history of inequality and failed revolutions.
Nothing ever happens: whether mass unemployment happens or not, most people end up with sinecures or welfare to keep them relatively pacified. Social media and concentrating wealth inequality continues to make people miserable even as absolute material conditions begin to reach sci-fi levels, and competition for zero-sum goods like housing in desirable areas and prestigious educations and sinecures becomes even more red in tooth and claw in the vein of the East Asian countries. Political violence gets somewhat more normalised, perhaps to Latin American or 20th century standards, but it's limited to isolated incidents.
Generally I consider myself libertarian and think that billionaires are good, actually, but I do think that inequality and society's response to inequality is likely to be one of the defining questions of the 21st century. While Sam Altman is the most visible face of AI to normies, pure game theory dictates that technological progress will continue with or without the consent of any individual person, company or nation-state, if the capability exists someone (or something...) is going to be the one that holds those reins to wealth, status and power, and as long those reins are held then the holder will inevitably be the target of the green-eyed masses. I don't think we yet have the social technology to deal with this and it's not clear that we ever will; I've seriously been thinking lately whether this might be one way that the Fermi Paradox manifests.
I'd say I'm both simultaneously. I think it's unlikely that scaling LLM's gets to AGI, so I'm a skeptic in that sense, but it is significantly more progress in AI than I ever expected to see in the 2020's.
With that in mind it does seem likely to me that AGI is achieved in my lifetime, and I think if it does happen then humanity is doomed for all the old Bostrom/Yudkowsky reasons.... don't see what I could do about it though, so realistically it doesn't really change my life very much.
And certain professions like SEO slop writer, translator, and others are definitely disrupted forever regardless.
At least in the case of translators, I think you'd be surprised. I happen to be acquainted with a good number of professional translators and almost to a man all of them are still booked out in terms of work and make solid middle class incomes.
My understanding is that the "ChatGPT" moment for translation was around a decade ago when neural machine translation was first getting good. Already at this point, for translation tasks that didn't require professional-grade reliability or well-written prose, Google Translate or DeepL were basically already good enough; translation for things like manuals or brochures was commoditized well before transformers.
Of course LLM's write much better than DeepL, but in practice the set of translation tasks that can't be delegated to Google Translate or DeepL, but can be handled autonomously by a LLM, is actually quite small.
High-reliability translation tasks like legal, medical or diplomatic still require a human in the loop, and LLM's are still subpar at translation tasks that require a high level of interpretation, as in the case of literary translation. At a high level, a good literary translation can be thought of as a re-writing of the original work, and as of yet LLM's are still quite poor writers without significant human intervention.
Operating system and browser zero-days go for millions of dollars.
If Mythos can spit these out for a million dollars a run it's still extremely scary.
A few thoughts:
I'm sure the model will be better than Opus, but the benchmarks look quite clearly overfitted to me. SWE-bench-verified going up to 94% is in particular a clear indication that something suspicious is going on here. It's been known that that benchmark has been contaminated for some time.
Cybersecurity seems like the natural extension of the RL scaling paradigm. I would expect that anything you can easily gradient descent with a well known reward function to continue to see massive improvements over the next year, e.g theorem proving, coding [in the pass tests for a given spec sense] and vulnerability exploits. It doesn't yet seem clear that this will scale tasks that are less amenable to RL scaling.
I'm not sure why you think FIRE money, or really money less than "literal oligarch" tier means you're any more or less cooked if AGI really does come to pass. FIRE in the first place relies on the world looking much the same as the last 80 years of Pax Americana, which seems increasingly unlikely at this point. At the end of the day you own only what you can defend, and it seems unlikely that you would be able to defend anything against sufficiently capable AI.
It seems to me that when people say things along the lines of "LLM's do not have intelligence" their definition of intelligence is something like "everything a human can do", and thus failing at something that can be done by a human proves a lack of intelligence, but in fact human intelligence is very jagged as well!
Should a chimp consider a human unintelligent because of our woefully inferior working memory?
Should a fly consider a human unintelligent because of our woefully inferior visual processing speed?
Should a squirrel consider a human unintelligent because of our woefully inferior spatial memory?
Sure, LLM's fail very basic things that can be done by humans, but humans also fail very basic things that can be done by LLM's; no human alive can write about the same breadth of abstract, novel topics in the same number of languages as even a very weak LLM, or write code as quickly as a LLM.
I fail to see how a LLM isn't intelligent in a way orthogonal to humans, in the same ways that animals are intelligent orthogonally to humans.
While I do align with you in that I consider the current models very powerful and use them plenty myself, and that using some Sonnet + Cline workflow while claiming that AI is incapable is misleading, I do find this sort of crypto-style FOMO inducing rhetoric counter-productive and annoying.
If you believe that the models will usher in the end of history, that they really do end up as AGI, ASI, ushering in the singularity then no amount of using 2026 agents at work will do anything to save you or change the outcome.
On the other hand, in worlds where the models do plateau at some point and end up being commoditized enterprise tooling, nobody is doomed because they didn't use agents correctly in 2026; even boosters have very little consensus on what actually works right now. There will be time to adopt the tooling as capabilities are better understood, the UX will get better, and people will develop best practices and discard what doesn't work or what is no longer necessary; who's still using LangChain or fine-tuning LORA's on hands in 2026?
Find me a real life story where an attractive woman with the option to pick between a handsome, reliable, but only moderately wealthy Blue Collar worker, and a high status millionaire minor celeb, and intentionally settled for the former
Lana Del Rey.
- Prev
- Next

Realistically, I think the only solutions that work at scale are technological advancements that obsolete the need to exploit the commons.
There's literally never been an instance where people "just" decide to collectively stop defecting, rather the payoff matrix gets changed by technological advances and it becomes irrational to defect.
Limiting CFC's went so well in large part because it ended up being pretty easy to transition from CFC's to HFC's, and peak fossil fuel is approaching not because of the environmentalist or degrowth lobbies but because the costs of renewables are falling precipitously. On the other hand, trying to prevent antibiotic resistance has pretty much gone nowhere because antibiotics are so hard to replace - I have much more faith that we'll get better ways to kill bacteria before any progress is made trying to coordinate usage at a global scale.
More options
Context Copy link