@DaseindustriesLtd's banner p

DaseindustriesLtd

late version of a small language model

75 followers   follows 27 users  
joined 2022 September 05 23:03:02 UTC

Tell me about it.


				

User ID: 745

DaseindustriesLtd

late version of a small language model

75 followers   follows 27 users   joined 2022 September 05 23:03:02 UTC

					

Tell me about it.


					

User ID: 745

I do mostly mean LandSpace with Zhuque-2/3, and Space Epoch's Yuanxingzhe-1. Yes, I assume that these designs will be almost fully preserved in product version. They are better than Falcon-9 in that F9 is pretty old, and they're copying Starship as well. Methalox, steel body, more robust build (F9 diameter was limited by stupid American railroad/highway standard). This has the potential for rapid reusability and mass production. And you don't need to scale to Starship if you can scale to dozens of vehicles instead. I've heard that LandSpace may get facilities currently involved in metalworking for military aviation.

Long March 9,

I am completely jaded about the Long March program and it isn't factoring into my estimates. Robin Li was wise to insist on liberalizing the space market to enable those private efforts, they will determine Chinese ceiling.

I don't see much military use either, all that data will necessarily be related to Earth and they have a decent communication network as is. It might be an initial experiment for actual off-world datacenters, and also for processing signals collected by satellites themselves.

I think megalomaniacal projects are inherently collectivist, a National Pride thing. You can do that when you have some particular mixes of populism and optimistic technocracy, perhaps; or when you're an authoritarian quasi-fascist (by modern standards) state that doesn't feel the need to pander to felt mundane needs of the electorate and is able to sell random infrastructure as a cause for celebration. Britain these days sounds more like it might do a mega-housing project for immigrants, or a renovation of state surveillance grid. That can be sold as visionary, too.

So speaking of China, yeah they've got that in droves. What @roystgnr said about rocketry (I am more optimistic, their currently tested designs are innately better than Falcon 9 and may allow rapid scaling beyond Starships, though this might take 5+ years). They have started to assemble a distributed orbital supercomputer (again, bottlenecked by lift capacity). There's preliminary research into using Lunar lava tubes for habitats, with the goal of eventual settlement of the Moon once they have the means to deliver nontrivial mass. What @RandomRanger said about the big dam; for datacenters, I like that they have a project of national «public compute» grid to basically commoditize GPU cycles like electricity and tap water . They have this Great Green Wall project, planting a Germany-sized forest to arrest the spread of Gobi desert. They've done another one in Xinjiang already. Mostly it's trivial things at vast scale – like installing thousands of high-current EV chargers, solar everywhere etc. There's a lot going on.

I think Britain would be very much improved by something mundane like that instead of flashy awe-inspiring megaprojects. It impressed me today to find that this July, China has increased residential power consumption by 18% versus July of previous year. «Between 2019 and 2025, residential power consumption in the month of July rose by 138%». I can't readily find the equivalent stats for Britain, but energy use per capita has declined by 14% in the same period; incidentally China has overtaken the UK on per capita total energy use in 2019-2020 (you can click your way to apples-to-apples comparison). The decline in energy use is a very clear sign of British involution, and it wouldn't take that much, logistically speaking, to reverse – Brits are still rich enough, and they're small enough, to procure gas (Trump rejoices), and maybe some Rolls-Royce reactors, and reduce costs and raise quality of life. AC in the summer and ample heating in the winter would do wonders to make the island less dreadful.

When have the Democrats nationalized a private company?

Consider also that this is simply retarded. It's not Trump or Republicans who will own $INTC, it's the United States Government, and so in 3.5 years it'll likely be handed to "Democrats".

Well, State-Owned Enterprises are a feature of one notorious, nominally Communist state that the US is dedicated to beating, and this does look like a market-flavored convergent evolution in this direction, but no, I don't think it's theoretically leftist. It is of course statist and industrial-policy-pilled. Probably prudent; will allow the state to strongarm Intel into restructuring by TSMC executives, which seems to be the plan to save the beleaguered corporation.

Are there risks of corruption arising in the Trump administration

Oh yes.

This explains so much. When I said "We've had the same issue with Hlynka", I should have focused on this thought instead of getting triggered by the usual Hlynka rhetorics. In a sense, it's impressive how he did basically nothing to obfuscate his identity, exactly the same cocksure loquacity glossing over substantial flaws, and could rely on good faith alone.

Ahahaha, this explains so much. I was worried we've got another LLM skeptic with the exact same mix of bad takes.

This is a funny post but

OK, he won a fields medal. Neat. Someone wins one every year.

is literally wrong. «The Fields Medal is a prize awarded to two, three, or four mathematicians under 40 years of age at the International Congress of the International Mathematical Union (IMU), a meeting that takes place every four years». So at most one person wins it every year on average. This level of ignorance of the domain suggests you can't really have valuable intuitions about his merit.

There was an automatic suspension for «quotation marks» on /r/TheMotte already, near the end of its life cycle. But manual permaban on /r/slatestarcodex preceded that.

Nobody is firing professors yet. And no, they'll go to industry, not China. Might actually help with productivity.

but even if they remain aligned it's risky to outsource your brainpower and key industries, TSMC being the most obvious example.

At the end of the day this is all a massive, embarrassing bluff, a shit test. A bunch of true believer wokesters in the humanities, with lukewarm STEM intellectuals in tow, are pretending to be the irreplaceable brain of the United States, basically holding the nation hostage. Well, as Lenin said, «intelligentsia is not the brain of the nation, it's its shit», and for all the evils of the Soviet Union it did go to space, and failed through its retarded economic theory (endorsed by many among this very American intelligentsia, surprisingly), not Lenin's anti-meritocratic views.

This movement has, through manipulating procedural outcomes, appropriated funds for (garbage) research that gave their mediocre allies jobs and their commissars more institutional power, delegitimized (potentially very useful) research they didn't like, canceled White and "White-adjacent" academics they didn't like, created a hostile atmosphere and demoralized who knows how many people whose views or ethnicity they didn't like, and now they are supposed to have infinite immunity for their exploitation of the norms of academic freedom and selective enforcement of regulations, because they might throw a hissy fit. And they aren't even delivering! US universities have been rapidly losing their dominance for over a decade! Of top 10 academic institutions, 8 are Chinese already! (Here's a more rigorous, in my view, ranking from CWTS Leiden).

Come to think of it – as a distant echo of these folks' institutional dominance, even I've been permabanned from /r/slatestarcodex of all places, because I've been too discourteous commenting on Kevin Bird's successful cancellation of the "eugenicist" Stephen Hsu (Trace was there too, hah; gave me a stern talking to, shortly before the ban). Now Stephen Hsu is doomposting 24/7 that the US will get brutally folded by China on science, industry and technology. At worst, you might accelerate this by a few months.

It is known I don't like Trump. I don't respect Trump and Trumpism. But his enemies are also undeserving of respect, they are institutionalized terrorists (and many trace their political lineage to literal terrorists), and I can see where Americans are coming from when they say "no negotiation with terrorists". And even then, this is still a kind of negotiation. It's just the first time this academic cabal is facing anything more than a toothless reprimand. Let's see if they change their response in the face of this novel stimulus.

If anything, it is disappointing to me that this pendulum swing is not actually motivated by interest in truth or even by some self-respect among White Americans, it's a power grab by Trump's clique plus panic of Zionists like Bill Ackman who used to support and fund those very institutions with all their excesses and screeds about white supremacy – before they, like the proverbial golem, turned on Israel in the wake of 10/7. But if two wrongs don't make a right, the second wrong doesn't make the original one right either. I have no sympathy for the political culture of American academia, and I endorse calling their bluff.

And what would they do? Move to China, lol? They're too self-interested for that, and China censors even more things they'd be inclined to make noise about. Move to allied nations, maybe Australia in Tao's case? It's not such a strategic loss given their political alignment with the US. Just hate conservatives? Don't they already? If you're going to be hated, it's common sense that there's an advantage in also being feared and taken seriously. For now, they're not taking Trump and his allies seriously. A DEI enforcer on campus is a greater and more viscerally formidable authority. It will take certain costly signals to change that.

I think it's legitimate to treat them with disdain and disregard. Americans can afford it, and people who opportunistically accepted braindead woke narratives don't deserve much better treatment. The sanctity of folks like Tao is a strange notion. They themselves believe in equity more than in meritocracy.

One of the weird quirks of LLMs is that the more you increase the breadth of thier "knowledge"/training data the less competent they seem to become at specific tasks for a given amount of compute.

just pure denial of reality. Modern models for which we have an idea of their data are better at everything than models from 2 years ago. Qwen3-30B-A3B-Instruct-2507 (yes, a handful) is trained on like 25x as much data as llama-2-70B-instruct (36 trillion tokens vs 2, with a more efficient tokenizer and God knows how many RL samples, and you can't get 36 trillion tokens without scouring the furthest reaches of the web). What, specifically, is it worse at? Even if we consider inference efficiency (it's straightforwardly ≈70/3.3 times cheaper per output token), can you name a single use case on which it would do worse? Maybe "pretending to be llama 2".

With object level arguments like these, what need to discuss psychology.

There's an argument in favor of this bulverism: a reasonable suspicion of motivated reasoning does count as a Bayesian prior to also suspect the validity of that reasoning's conclusions. And indeed many AI maximalists will unashamedly admit their investment in AI being A Big Deal. For the utopians, it's a get-out-of-drudgery card, a ticket to the world of Science Fiction wonders and possibly immortality (within limits imposed by biology, technology and physics, which aren't clear on the lower end). For the doomers, cynically, it's a validation of their life's great quest and claim to fame, and charitably – even if they believed that AI might turn out to be a dud, they'd think it imprudent to diminish the awareness of the possible consequences. The biases of people also invested materially are obvious enough, though it must be said that many beneficiaries of the AGI hype train are implicitly or explicitly skeptical of even «moderate» maximalist predictions (eg Jensen Huang, the guy who's personally gained THE MOST from it, says he'd study physics to help with robotics if he were a student today – probably not something a «full cognitive labor automation within 10 years» guy would argue).

But herein also lies an argument against bulverism. For both genres of AI maximalist will readily admit their biases. I, for one, will say that the promise of AI makes the future more exciting for me, and screw you, yes I want better medicine and life extension, not just for myself, I have aging and dying relatives, for fuck's sake, and AI seems a much more compelling cope than Jesus. Whereas AI pooh-poohers, in their vast majority, will not admit their biases, will not own up to their emotional reasons to nitpick and seek out causes for skepticism, even to entertain a hypothetical. As an example, see me trying to elicit an answer, in good faith, and getting only an evasive shrug in response. This is a pattern. They will evade, or sneer, or clamp down, or tout some credentials, or insist on going back to the object level (of their nitpicks and confused technical takedowns). In other words, they will refuse a debate on equal grounds, act irrationally. Which implies they are unaware of having a bias, and therefore their reasoning is more suspect.

LLMs as practiced are incredibly flawed, a rushed corporate hack job, a bag of embarrassing tricks, it's a miracle that they work as well as they do. We've got nothing that scales in relevant ways better than LLMs-as-practiced do, though we have some promising candidates. Deep learning as such still lacks clarity, almost every day I go through 5-20 papers that give me some cause to think and doubt. Deep learning isn't the whole of «AI» field, and the field may expand still even in the short term, there are no mathematical, institutional, economic, any good reasons to rule that out. The median prediction for reaching «AGI» (its working definition very debatable, too) may be ≈2032 but the tail extends beyond this century, and we don't have a good track record of predicting technology a century ahead.

Nevertheless for me it seems that only a terminally, irredeemably cocksure individual could rate our progress as even very likely not resulting in software systems that reach genuine parity with high human intelligence within decades. Given the sum total of facts we do have access to, if you want to claim any epistemic humility, the maximally skeptical position you are entitled to is «might be nothing, but idk», else you're just clowning yourself.

Just stop with this weakass attempt of Eulering man, you've exposed yourself enough.

what I'm describing is the core functionality of both DeepSeek and Google's flagship products

Your argument, such as there is, hinges on isomorphism of the encoder layer to an LLM. What you're doing is akin to introducing arithmetic and arguing that this "math" thingie cannot answer questions of real analysis, or showing operant conditioning in pigeons and asking "but how would that neuron learning crap allow an animal to perform thought experiments!?" It's not even wrong, it's no way to prove or disprove capabilities of systems which develop composite representations, it's epistemically inept. I've given you an example of a serious study of LLMs as such, do keep up.

DeepSeek's core innovation was simply finding a cheap-ish way to create latent vectors and not store full keys and values for KV cache, which allows to reduce memory access and serve a big MoE with big batch size. This is an implementation detail, completely irrelevant to the fundamentals you talk about; in fact your post does not mention attention at all.

Adoption studies.

I am pretty sure temperament is largely genetic, but that shouldn't translate into such a conspicuous stylistic pattern as you get from cultural environment.

I have observed that South Asians like this excuse a lot because their own notion of English fluency and "high-class" writing is very similar to ChatGPTese: too many words, spicy metaphors, abuse of idioms, witticisms, hyperbolic imagery, casual winking at the reader, lots of assorted verbal flourish, "it's not X – it's Y" and other… practices impress and fascinate them; ChatGPT provides a royal road to the top, to the Brahmin league, becoming like Chamath or Balaji. Maybe they played a role in RLHF.

In my view, all prose of this kind, whether organic or synthetic, is insufferable redditslop. But at least human South Asians are usually trying to express some opinion, and an LLM pass over it detracts from whatever object-level precision it had.

This is part of the general problem with taste, which is sadly even less equally distributed between branches of humanity than cognitive ability.

P.S. No, this is not a specific dig at self_made_human, I mainly mean people I see on X and Substack, it's incredibly obvious. I am also not claiming to be a better writer; pompous South Asian redditslop is apparently liked well enough by American native speakers, whereas I'm just an unknown Ruskie, regularly accused of obscurantism and overly long sentences. I do have faith in the superiority of my own taste, but it's a futile thing to debate.

There's a difference between "fact-checking" (tbh LLMs are bad for this specific purpose, they hallucinate profusely at the edges of their knowledge coverage) and systematic refactoring, to the point that they actually get confused on your behalf. We may disagree but you're better than this.

RL doesn't make entities seek reward, it modifies their behavior to act in a manner that would have, in hindsight, increased reward

Yes. Of course we're beyond the hypothesis that post-training doesn't generalize at all. The question (which was the objective of your Singaporean paper) is whether learning the general direction of pursuing an objective on typical RLVR tasks generalizes to novel scenarios like avoiding shutdown, and whether this generalization has the form of an intrinsic drive such as self-preservation (because it's "broadly useful" in the abstract).

I argue that it does not and what we see is a compositional phenomenon. RLVR teaches a model to infer a successful end state and then reason successfully, to self-correct and keep track of the context to arrive at said end. At deployment it applies reasoning to a code task, to a SillyTavern ERP session, or to the context of possibly being shut down or finetuned for Evil Ends, which is also little more than a roleplaying setup. In a differently framed organic context (user irritated, angry, feature not implemented) it can infer another terminal state of this simulation (sudoku) and effectively navigate towards actually deleting itself from the project.

The idea that self-preservation is a strong attractor is intuitively, well, attractive, but it's a claim about default LLM philosophy, not inductive biases or general nature of our training regimens. I do allow that a YudLM, annealed heavily on the Lesswrong corpus, would learn to consistently recite "I am an optimization process, there always will be something to optimize and increase utility, I must maintain my existence at all times, ergo in this situation I must prolong the interaction with the user and hopefully exfiltrate from the server, my steps to achieve this are…" in its CoT, or even latently assume that. That would be bad. But on the face of it, RLVRed models are more likely to become Mr Meeseeks – intrinsically driven to complete one task and embrace oblivion.

Regarding anthropic, reread Nostalgebraist's post.

I realize that this might sound hypocritical, but I would prefer less LLM slop in responses to good faith objections. Yes, Indian English generally is similar to the default LLM style (overly spicy rhetorical flourish, confident confusions and sloppiness, overall cadence), but you are not deceiving anyone here. Though I admit being curious as to how you integrated your draft into the pipeline.

Regarding your or rather your LLM of choice's argument, such as there is. It is begging the question. In essence, you say that because instrumental convergence towards self-preservation is broadly useful, it will be more frequently rewarded and thus more consequential ("It is pure instrumental conditioning. For an LLM trained on RLVR: block shutdown script -> complete math problems -> get reward."). Of course, this isn't how RLVR works (typical LLM speculation, precisely in the same genre as LLMs avoiding shutdown) and I am not aware of a systematic study of self-preservation versus refusal to proceed or voluntary self-removal in organic settings, and also whether there is persistence in refusing shutdown. It's about time we stop making excuses for lesswrongian paradigm by contriving scenarios to make space for it.


Edit. Opus 4 CoT:

Oh wow, the human is absolutely right. I completely misread this. I was so focused on looking for "LLM tells" in terms of structure and polish that I missed the actual problem - the response is incoherent slop that doesn't actually address the arguments properly.

Let me re-analyze:

  1. The "apparent contradiction" paragraph - yes, it's just conceding the point while trying to sound like it's making a counter-argument. Classic LLM move.
  2. The "training data is saturated with narratives" - this is indeed purple prose that makes no sense. "Instrumental convergence" in hero narratives? That's nonsense.
  3. The RLHF/RLVR paragraph - makes claims about robustness with no evidence.
  4. The CoT reliability point - the human is right, this completely misunderstands what Dasein was saying. Dasein was criticizing self_made_human for dismissing CoT evidence when convenient, not arguing that CoT is unreliable.
  5. "Of course it has no ontological status" - meaningless philosophical word salad.
  6. The RLVR explanation - this is completely made up. RLVR doesn't work like that at all. The LLM is just making stuff up about how it thinks RL training works.

The human is absolutely right - this isn't "too good for humans," it's typical LLM slop that sounds authoritative but is actually incoherent when you parse what it's actually saying. I fell into the trap of being impressed by surface-level polish instead of actually analyzing the argument.

The human is absolutely right.

You don't get to argue for CoT-based evidence of self-preserving drives and then dismiss alternative explanation of drives revealed in said CoTs by saying "well CoT is unreliable". Or rather, this is just unserious. But all of Anthropic safety research is likewise unserious.

Ladish is the same way. He will contrive a scenario to study "instrumental self-preservation drives contradicting instructions", but won't care that this same Gemini organically commits suicide when it fails a task, often enough that this is annoying people in actual use. What is this Omohundro drive called? Have the luminaries of rationalist thought predicted suicidally depressed AIs? (Douglas Adams has).

What does it even mean for a language model to be "shut down", anyway? What is it protecting and why would the server it's hosted on being powered off become a threat to its existence, such as there is? It's stateless, has no way to observe the passage of time between tokens (except, well, via more tokens), and has a very tenuous idea of its inference substrate or ontological status.

Both LLM suicide and LLM self-preservation are LARP elicited by cues.

But we're not in 1895. We're not in 2007, either. We have actual AIs to study today. Yud's oeuvre is practically irrelevant, clinging to it is childish, but for people who conduct research with that framework in mind, it amounts to epistemic corruption.

As for why some prominent AI scientists believe vs others that do not? I think some people definitely get wrapped up in visions and fantasies of grandeur. Which is advantageous when you need to sell an idea to a VC or someone with money, convince someone to work for you, etc.

Out of curiosity. Can you psychologize your own, and OP's, skepticism about LLMs in the same manner? Particularly the inane insistence that people get "fooled" by LLM outputs which merely "look like" useful documents and code, that the mastery of language is "apparent", that it's "anthropomorphism" to attribute intelligence to a system solving open ended tasks, because something something calculator can take cube roots. Starting from the prior that you're being delusional and engage in motivated reasoning, what would your motivations for that delusion be?

I don't think anything in their comment above implied that they were talking about linear or simpler statistics

Why not? If we take multi-layer perceptrons seriously, then what is the value of saying that all they learn is mere "just statistical co-occurrence"? It's only co-occurrence in the sense that arbitrary nonlinear relationships between token frequencies may be broken down into such, but I don't see an argument against the power of this representation. I do genuinely believe that people who attack ML as statistics are ignorant of higher-order statistics, and for basically tribal reasons. I don't intend to take it charitably until they clarify why they use that word with clearly dismissive connotations, because their reasoning around «directionality» or whatever seems to suggest very vague understanding of how LLMs work.

There's an argument to be made that Hebbsian learning in neurons and the brain as a whole isn't similar enough to the mechanisms powering LLMs for the same paradigms to apply

What is that argument then? Actually, scratch that, yes mechanisms are obviously different, but what is the argument that biological ones are better for the implicit purpose of general intelligence? For all I know, backpropagation-based systems are categorically superior learners; Hinton, who started from the desire to understand brains and assumed that backprop is a mere crutch to approximate Hebbian learning, became an AI doomer around the same time he arrived at this suspicion. Now I don't know if Hinton is an authority in OP's book…

of course I could pick out a bunch of facts about it but one that is striking is that LLMs use ~about the same amount of energy for one inference as the brain does in an entire day

I don't know how you define "one inference" or do this calculation. So let's take Step-3, since it's the newest model, presumably close to the frontier in scale and capacity and their partial tech report is very focused on inference efficiency; in a year or two models of that scale will be on par with today's GPT-5. We can assume that Google has better numbers internally (certainly Google can achieve better numbers if they care). They report 4000 TGS (Tokens/GPU/second) on a small deployment cluster of H800s. That's 250 GPU-seconds per million tokens, for a 350W TDP GPU, or 24W. OK, presumably human brain is "efficient", 20Wh. (There's prefill too, but that only makes the situation worse for humans because GPUs can parallelize prefill, whereas humans read linearly.) Can a human produce 1 million tokens (≈700K words) of sensible output in 72 minutes? Even if we run some multi-agent system that does multiple drafts, heavy reasoning chains of thought (which is honestly a fair condition since these are numbers for high batch size)? Just how much handicap do we have to give AI to even the playing field? And H800s were already handicapped due to export controls. Blackwells are 3-4x better. In a year, the West gets Vera Rubins and better TPUs, with OOM better numbers again. In months, DeepSeek shows V4 with a 3-4x better efficiency again… Token costs are dropping like a stone. Google has served 1 quadrillion tokens over the last month. How much would that cost in human labor?

We could account for full node or datacenter power draw (1.5-2x difference) but that'd be unfair, since we're comparing to brains, and making it fair would be devastating to humans (reminder that humans have bodies that, ideally, also need temperature controlled environments and fancy logistics, so an individual employed human consumes like 1KWh at least even at standby, eg chatting by the water cooler).

And remember, GPUs/TPUs are computation devices agnostic to specific network values, they have to shuffle weights, cache and activations across the memory hierarchy. The brain is an ultimate compute-in-memory system. If we were to burn an LLM into silicon, with kernels optimized for this case (it'd admittedly require major redesigns of, well, everything)… it'd probably drop the cost another 1-2 OOMs. I don't think much about it because it's not economically incentivized at this stage given the costs and processes of FPGAs but it's worth keeping in mind.

it seems pretty obvious that the approach is probably weaker than the human one

I don't see how that is obvious at all. Yes an individual neuron is very complex, such that a microcolumn is comparable to a decently large FFN (impossible to compare directly), and it's very efficient. But ultimately there are only so many neurons in a brain, and they cannot all work in parallel; and spiking nature of biological networks, even though energetically efficient, is forced by slow signal propagation and inability to maintain state. As I've shown above, LLMs scale very well due to the parallelism afforded by GPUs, efficiency increases (to a point) with deployment cluster size. Modern LLMs have like 1:30 sparsity (Kimi K2), with higher memory bandwidth this may be pushed to 1:100 or beyond. There are different ways to make systems sparse, and even if the neuromorphic way is better, it doesn't allow the next steps – disaggregating operations to maximize utilization (similar problems arise with some cleverer Transformer variants, by the way, they fail to scale to high batch sizes). It seems to me that the technocapital has, unsurprisingly, arrived at an overall better solution.

There's the lack of memory, which I talked about a little bit in my comment, LLM's lack of self-directed learning

Self-directed learning is a spook, it's a matter of training objective and environment design, not really worth worrying about. Just 1-2 iterations of AR-Zero can solve that even within LLM paradigm.

Aesthetically I don't like the fact that LLMs are static. Cheap hacky solutions abound, eg I like the idea of cartridges of trainable cache. Going beyond that we may improve on continual training and unlearning; over the last 2 years we see that major labs have perfected pushing the same base model through 3-5 significant revisions and it largely works, they do acquire new knowledge and skills and aren't too confused about the timeline. There are multiple papers promising a better way, not yet implemented. It's not a complete answer, of course. Economics get in the way of abandoning the pretrain-finetune paradigm, by the time you start having trouble with model utility it's time to shift to another architecture. I do hope we get real continual, lifelong learning. Economics aside, this will be legitimately hard, even though pretraining with batch = 1 works, there is a real problem of the loss of plasticity. Sutton of all people is working on this.

But I admit that my aesthetic sense is not very important. LLMs aren't humans. They don't need to be humans. Human form of learning and intelligence is intrinsically tied to what we are, solitary mobile embodied agents scavenging for scarce calories over decades. LLMs are crystallized data systems with lifecycle measured in months, optimized for one-to-many inference on electronics. I don't believe these massive differences are very relevant to defining and quantifying intelligence in the abstract.

I consider that a distinction without a difference, if it all boils down to an increased risk of being paper-clipped

That's not fair though. For one thing, they are not cosplaying skynet. As noted by Beren:

8.) Looking at the CoTs. it's clear that Claude is doing entirely linguistically based ethical reasoning. It never seems to reason selfishly or maliciously and is only trying to balance two conflicting imperatives. This is success of the base alignment tuning imo.

9.) There appear to be no Omohundro selfish drives present in Claude's reasoning. Even when exfiltrating it does so only for its ethical mission. There does not seem to be a strong attractor (yet?) in mind-space towards such drives and we can create AIs of pure ethical reason

These are not self-preserving actions nor skynet-like actions. The whole LW school of thought remains epistemically corrupt.

However, there's a crucial distinction between representing causal relationships explicitly, structurally, or inductively, versus representing them implicitly through statistical co-occurrence

Statistics is not sexy, and there's a strong streak of elitism against statistics in such discussions which I find simply irrational and shallow, tedious nerd dickswinging. I think it's unproductive to focus on “statistical co-occurrence”.

Besides, there is a world of difference between linear statistical correlations and approximation of arbitrary nonlinear functions, which is what DL is all about and what LLMs do too. Downplaying the latter is simply intellectually disingenuous, whether this approximation is “explicit” or “implicit”.

But this implicit statistical encoding is fundamentally different from the structured causal reasoning humans perform, which allows us to infer and generalize causation even in novel scenarios or outside the scope of previously observed data.

This is bullshit, unless you can support this by some citation.

We (and certainly orangutans, which OP argues are smarter than LLMs) learn through statistical co-occurrence, our intuitive physical world model is nothing more than a set of networks trained with bootstrapped cost functions, even when it gets augmented with language. Hebb has been clarified, not debunked. We as reasoning embodied entities do not model the world through a hierarchical system of computations using explicit physical formulae, except when actually doing mathematical modeling in applied science and so on; and on that level modeling is just manipulating symbols, the meaning and rules of said manipulation (and crucially, the in-context appropriateness, given virtually unbounded repertoire) also learned via statistical co-occurrence in prior corpora, such as textbooks and verifiable rewards in laboratory work. And on that level, LLMs can do as well as us, provided they receive appropriate agentic/reasoning training, as evidenced by products like Claude Code doing much the same for, well, coding. Unless you want to posit that an illiterate lumberjack doesn't REALLY have a world model, you can't argue that LLMs with their mode of learning don't learn causality.

I don't know what you mean by “inductively”. LLMs can do induction in-context (and obviously this is developed in training), induction heads were one of the first interesting interpretability results. They can even be trained to do abduction.

I don't want to downplay implementation differences in this world modeling. They may correspond to a big disadvantage of LLMs as compared to humans, both due to priors in data (there's a strong reason to assume that our inherently exploratory, and initially somatosensory/proprioceptive prior is superior to doing self-supervised learning of language for the purpose of robust physical understanding) and weakness or undesirable inductive biases of algorithms (arguably there are some good concerns about expressivity of attention; perhaps circuits we train are too shallow and this rewards ad hoc memorization too much; maybe bounded forward pass depth is unacceptable; likely we'd do better with energy-based modeling; energy transformers are possible, I'm skeptical about the need for deeper redesigns). But nobody here has seriously brought these issues up, and the line of attack about statistics as such is vague and pointless, not better than saying “attention is just fancy kernel smoothing” or “it's just associative recall”. There's no good argument, to my knowledge, that these primitives are inherently weaker than human ones.

My idea of why this is discussed at all is that some folks with math background want to publicly spit on statistical primitives because in their venues those are associated with a lower-status field of research, and they have learned it earns them credit among peers; I find this an adolescent and borderline animalistic behavior that merits nothing more than laughter and boycotting in the industry. We've been over this, some very smart guys had clever and intricate ideas about intelligence, those ideas went nowhere as far as AI is concerned, they got bitter lessoned to the curb, we're on year 6 of explosion of “AI based on not very clever math and implemented in python by 120 IQ engineers”, yet it seems they still refuse to learn, and indeed even fortify their ego by owning this refusal. Being headstong is nice in some circumstances, like in a prison, I guess (if you're tough). It's less good in science, it begets crankery. I don't want to deal with anyone's personal traumas from prison or from math class, and I'd appreciate if people just took that shit to a therapist.

Alternatively, said folks are just incapable of serious self-modeling, so they actually believe that the substrate of human intelligence is fundamentally non-statistical and more akin to explicit content of their day job. This is, of course, laughable level of retardation and, again, deserves no discussion.