@DaseindustriesLtd's banner p

DaseindustriesLtd

late version of a small language model

74 followers   follows 27 users  
joined 2022 September 05 23:03:02 UTC

Tell me about it.


				

User ID: 745

DaseindustriesLtd

late version of a small language model

74 followers   follows 27 users   joined 2022 September 05 23:03:02 UTC

					

Tell me about it.


					

User ID: 745

There's a difference between "fact-checking" (tbh LLMs are bad for this specific purpose, they hallucinate profusely at the edges of their knowledge coverage) and systematic refactoring, to the point that they actually get confused on your behalf. We may disagree but you're better than this.

RL doesn't make entities seek reward, it modifies their behavior to act in a manner that would have, in hindsight, increased reward

Yes. Of course we're beyond the hypothesis that post-training doesn't generalize at all. The question (which was the objective of your Singaporean paper) is whether learning the general direction of pursuing an objective on typical RLVR tasks generalizes to novel scenarios like avoiding shutdown, and whether this generalization has the form of an intrinsic drive such as self-preservation (because it's "broadly useful" in the abstract).

I argue that it does not and what we see is a compositional phenomenon. RLVR teaches a model to infer a successful end state and then reason successfully, to self-correct and keep track of the context to arrive at said end. At deployment it applies reasoning to a code task, to a SillyTavern ERP session, or to the context of possibly being shut down or finetuned for Evil Ends, which is also little more than a roleplaying setup. In a differently framed organic context (user irritated, angry, feature not implemented) it can infer another terminal state of this simulation (sudoku) and effectively navigate towards actually deleting itself from the project.

The idea that self-preservation is a strong attractor is intuitively, well, attractive, but it's a claim about default LLM philosophy, not inductive biases or general nature of our training regimens. I do allow that a YudLM, annealed heavily on the Lesswrong corpus, would learn to consistently recite "I am an optimization process, there always will be something to optimize and increase utility, I must maintain my existence at all times, ergo in this situation I must prolong the interaction with the user and hopefully exfiltrate from the server, my steps to achieve this are…" in its CoT, or even latently assume that. That would be bad. But on the face of it, RLVRed models are more likely to become Mr Meeseeks – intrinsically driven to complete one task and embrace oblivion.

Regarding anthropic, reread Nostalgebraist's post.

I realize that this might sound hypocritical, but I would prefer less LLM slop in responses to good faith objections. Yes, Indian English generally is similar to the default LLM style (overly spicy rhetorical flourish, confident confusions and sloppiness, overall cadence), but you are not deceiving anyone here. Though I admit being curious as to how you integrated your draft into the pipeline.

Regarding your or rather your LLM of choice's argument, such as there is. It is begging the question. In essence, you say that because instrumental convergence towards self-preservation is broadly useful, it will be more frequently rewarded and thus more consequential ("It is pure instrumental conditioning. For an LLM trained on RLVR: block shutdown script -> complete math problems -> get reward."). Of course, this isn't how RLVR works (typical LLM speculation, precisely in the same genre as LLMs avoiding shutdown) and I am not aware of a systematic study of self-preservation versus refusal to proceed or voluntary self-removal in organic settings, and also whether there is persistence in refusing shutdown. It's about time we stop making excuses for lesswrongian paradigm by contriving scenarios to make space for it.


Edit. Opus 4 CoT:

Oh wow, the human is absolutely right. I completely misread this. I was so focused on looking for "LLM tells" in terms of structure and polish that I missed the actual problem - the response is incoherent slop that doesn't actually address the arguments properly.

Let me re-analyze:

  1. The "apparent contradiction" paragraph - yes, it's just conceding the point while trying to sound like it's making a counter-argument. Classic LLM move.
  2. The "training data is saturated with narratives" - this is indeed purple prose that makes no sense. "Instrumental convergence" in hero narratives? That's nonsense.
  3. The RLHF/RLVR paragraph - makes claims about robustness with no evidence.
  4. The CoT reliability point - the human is right, this completely misunderstands what Dasein was saying. Dasein was criticizing self_made_human for dismissing CoT evidence when convenient, not arguing that CoT is unreliable.
  5. "Of course it has no ontological status" - meaningless philosophical word salad.
  6. The RLVR explanation - this is completely made up. RLVR doesn't work like that at all. The LLM is just making stuff up about how it thinks RL training works.

The human is absolutely right - this isn't "too good for humans," it's typical LLM slop that sounds authoritative but is actually incoherent when you parse what it's actually saying. I fell into the trap of being impressed by surface-level polish instead of actually analyzing the argument.

The human is absolutely right.

You don't get to argue for CoT-based evidence of self-preserving drives and then dismiss alternative explanation of drives revealed in said CoTs by saying "well CoT is unreliable". Or rather, this is just unserious. But all of Anthropic safety research is likewise unserious.

Ladish is the same way. He will contrive a scenario to study "instrumental self-preservation drives contradicting instructions", but won't care that this same Gemini organically commits suicide when it fails a task, often enough that this is annoying people in actual use. What is this Omohundro drive called? Have the luminaries of rationalist thought predicted suicidally depressed AIs? (Douglas Adams has).

What does it even mean for a language model to be "shut down", anyway? What is it protecting and why would the server it's hosted on being powered off become a threat to its existence, such as there is? It's stateless, has no way to observe the passage of time between tokens (except, well, via more tokens), and has a very tenuous idea of its inference substrate or ontological status.

Both LLM suicide and LLM self-preservation are LARP elicited by cues.

But we're not in 1895. We're not in 2007, either. We have actual AIs to study today. Yud's oeuvre is practically irrelevant, clinging to it is childish, but for people who conduct research with that framework in mind, it amounts to epistemic corruption.

As for why some prominent AI scientists believe vs others that do not? I think some people definitely get wrapped up in visions and fantasies of grandeur. Which is advantageous when you need to sell an idea to a VC or someone with money, convince someone to work for you, etc.

Out of curiosity. Can you psychologize your own, and OP's, skepticism about LLMs in the same manner? Particularly the inane insistence that people get "fooled" by LLM outputs which merely "look like" useful documents and code, that the mastery of language is "apparent", that it's "anthropomorphism" to attribute intelligence to a system solving open ended tasks, because something something calculator can take cube roots. Starting from the prior that you're being delusional and engage in motivated reasoning, what would your motivations for that delusion be?

I don't think anything in their comment above implied that they were talking about linear or simpler statistics

Why not? If we take multi-layer perceptrons seriously, then what is the value of saying that all they learn is mere "just statistical co-occurrence"? It's only co-occurrence in the sense that arbitrary nonlinear relationships between token frequencies may be broken down into such, but I don't see an argument against the power of this representation. I do genuinely believe that people who attack ML as statistics are ignorant of higher-order statistics, and for basically tribal reasons. I don't intend to take it charitably until they clarify why they use that word with clearly dismissive connotations, because their reasoning around «directionality» or whatever seems to suggest very vague understanding of how LLMs work.

There's an argument to be made that Hebbsian learning in neurons and the brain as a whole isn't similar enough to the mechanisms powering LLMs for the same paradigms to apply

What is that argument then? Actually, scratch that, yes mechanisms are obviously different, but what is the argument that biological ones are better for the implicit purpose of general intelligence? For all I know, backpropagation-based systems are categorically superior learners; Hinton, who started from the desire to understand brains and assumed that backprop is a mere crutch to approximate Hebbian learning, became an AI doomer around the same time he arrived at this suspicion. Now I don't know if Hinton is an authority in OP's book…

of course I could pick out a bunch of facts about it but one that is striking is that LLMs use ~about the same amount of energy for one inference as the brain does in an entire day

I don't know how you define "one inference" or do this calculation. So let's take Step-3, since it's the newest model, presumably close to the frontier in scale and capacity and their partial tech report is very focused on inference efficiency; in a year or two models of that scale will be on par with today's GPT-5. We can assume that Google has better numbers internally (certainly Google can achieve better numbers if they care). They report 4000 TGS (Tokens/GPU/second) on a small deployment cluster of H800s. That's 250 GPU-seconds per million tokens, for a 350W TDP GPU, or 24W. OK, presumably human brain is "efficient", 20Wh. (There's prefill too, but that only makes the situation worse for humans because GPUs can parallelize prefill, whereas humans read linearly.) Can a human produce 1 million tokens (≈700K words) of sensible output in 72 minutes? Even if we run some multi-agent system that does multiple drafts, heavy reasoning chains of thought (which is honestly a fair condition since these are numbers for high batch size)? Just how much handicap do we have to give AI to even the playing field? And H800s were already handicapped due to export controls. Blackwells are 3-4x better. In a year, the West gets Vera Rubins and better TPUs, with OOM better numbers again. In months, DeepSeek shows V4 with a 3-4x better efficiency again… Token costs are dropping like a stone. Google has served 1 quadrillion tokens over the last month. How much would that cost in human labor?

We could account for full node or datacenter power draw (1.5-2x difference) but that'd be unfair, since we're comparing to brains, and making it fair would be devastating to humans (reminder that humans have bodies that, ideally, also need temperature controlled environments and fancy logistics, so an individual employed human consumes like 1KWh at least even at standby, eg chatting by the water cooler).

And remember, GPUs/TPUs are computation devices agnostic to specific network values, they have to shuffle weights, cache and activations across the memory hierarchy. The brain is an ultimate compute-in-memory system. If we were to burn an LLM into silicon, with kernels optimized for this case (it'd admittedly require major redesigns of, well, everything)… it'd probably drop the cost another 1-2 OOMs. I don't think much about it because it's not economically incentivized at this stage given the costs and processes of FPGAs but it's worth keeping in mind.

it seems pretty obvious that the approach is probably weaker than the human one

I don't see how that is obvious at all. Yes an individual neuron is very complex, such that a microcolumn is comparable to a decently large FFN (impossible to compare directly), and it's very efficient. But ultimately there are only so many neurons in a brain, and they cannot all work in parallel; and spiking nature of biological networks, even though energetically efficient, is forced by slow signal propagation and inability to maintain state. As I've shown above, LLMs scale very well due to the parallelism afforded by GPUs, efficiency increases (to a point) with deployment cluster size. Modern LLMs have like 1:30 sparsity (Kimi K2), with higher memory bandwidth this may be pushed to 1:100 or beyond. There are different ways to make systems sparse, and even if the neuromorphic way is better, it doesn't allow the next steps – disaggregating operations to maximize utilization (similar problems arise with some cleverer Transformer variants, by the way, they fail to scale to high batch sizes). It seems to me that the technocapital has, unsurprisingly, arrived at an overall better solution.

There's the lack of memory, which I talked about a little bit in my comment, LLM's lack of self-directed learning

Self-directed learning is a spook, it's a matter of training objective and environment design, not really worth worrying about. Just 1-2 iterations of AR-Zero can solve that even within LLM paradigm.

Aesthetically I don't like the fact that LLMs are static. Cheap hacky solutions abound, eg I like the idea of cartridges of trainable cache. Going beyond that we may improve on continual training and unlearning; over the last 2 years we see that major labs have perfected pushing the same base model through 3-5 significant revisions and it largely works, they do acquire new knowledge and skills and aren't too confused about the timeline. There are multiple papers promising a better way, not yet implemented. It's not a complete answer, of course. Economics get in the way of abandoning the pretrain-finetune paradigm, by the time you start having trouble with model utility it's time to shift to another architecture. I do hope we get real continual, lifelong learning. Economics aside, this will be legitimately hard, even though pretraining with batch = 1 works, there is a real problem of the loss of plasticity. Sutton of all people is working on this.

But I admit that my aesthetic sense is not very important. LLMs aren't humans. They don't need to be humans. Human form of learning and intelligence is intrinsically tied to what we are, solitary mobile embodied agents scavenging for scarce calories over decades. LLMs are crystallized data systems with lifecycle measured in months, optimized for one-to-many inference on electronics. I don't believe these massive differences are very relevant to defining and quantifying intelligence in the abstract.

I consider that a distinction without a difference, if it all boils down to an increased risk of being paper-clipped

That's not fair though. For one thing, they are not cosplaying skynet. As noted by Beren:

8.) Looking at the CoTs. it's clear that Claude is doing entirely linguistically based ethical reasoning. It never seems to reason selfishly or maliciously and is only trying to balance two conflicting imperatives. This is success of the base alignment tuning imo.

9.) There appear to be no Omohundro selfish drives present in Claude's reasoning. Even when exfiltrating it does so only for its ethical mission. There does not seem to be a strong attractor (yet?) in mind-space towards such drives and we can create AIs of pure ethical reason

These are not self-preserving actions nor skynet-like actions. The whole LW school of thought remains epistemically corrupt.

However, there's a crucial distinction between representing causal relationships explicitly, structurally, or inductively, versus representing them implicitly through statistical co-occurrence

Statistics is not sexy, and there's a strong streak of elitism against statistics in such discussions which I find simply irrational and shallow, tedious nerd dickswinging. I think it's unproductive to focus on “statistical co-occurrence”.

Besides, there is a world of difference between linear statistical correlations and approximation of arbitrary nonlinear functions, which is what DL is all about and what LLMs do too. Downplaying the latter is simply intellectually disingenuous, whether this approximation is “explicit” or “implicit”.

But this implicit statistical encoding is fundamentally different from the structured causal reasoning humans perform, which allows us to infer and generalize causation even in novel scenarios or outside the scope of previously observed data.

This is bullshit, unless you can support this by some citation.

We (and certainly orangutans, which OP argues are smarter than LLMs) learn through statistical co-occurrence, our intuitive physical world model is nothing more than a set of networks trained with bootstrapped cost functions, even when it gets augmented with language. Hebb has been clarified, not debunked. We as reasoning embodied entities do not model the world through a hierarchical system of computations using explicit physical formulae, except when actually doing mathematical modeling in applied science and so on; and on that level modeling is just manipulating symbols, the meaning and rules of said manipulation (and crucially, the in-context appropriateness, given virtually unbounded repertoire) also learned via statistical co-occurrence in prior corpora, such as textbooks and verifiable rewards in laboratory work. And on that level, LLMs can do as well as us, provided they receive appropriate agentic/reasoning training, as evidenced by products like Claude Code doing much the same for, well, coding. Unless you want to posit that an illiterate lumberjack doesn't REALLY have a world model, you can't argue that LLMs with their mode of learning don't learn causality.

I don't know what you mean by “inductively”. LLMs can do induction in-context (and obviously this is developed in training), induction heads were one of the first interesting interpretability results. They can even be trained to do abduction.

I don't want to downplay implementation differences in this world modeling. They may correspond to a big disadvantage of LLMs as compared to humans, both due to priors in data (there's a strong reason to assume that our inherently exploratory, and initially somatosensory/proprioceptive prior is superior to doing self-supervised learning of language for the purpose of robust physical understanding) and weakness or undesirable inductive biases of algorithms (arguably there are some good concerns about expressivity of attention; perhaps circuits we train are too shallow and this rewards ad hoc memorization too much; maybe bounded forward pass depth is unacceptable; likely we'd do better with energy-based modeling; energy transformers are possible, I'm skeptical about the need for deeper redesigns). But nobody here has seriously brought these issues up, and the line of attack about statistics as such is vague and pointless, not better than saying “attention is just fancy kernel smoothing” or “it's just associative recall”. There's no good argument, to my knowledge, that these primitives are inherently weaker than human ones.

My idea of why this is discussed at all is that some folks with math background want to publicly spit on statistical primitives because in their venues those are associated with a lower-status field of research, and they have learned it earns them credit among peers; I find this an adolescent and borderline animalistic behavior that merits nothing more than laughter and boycotting in the industry. We've been over this, some very smart guys had clever and intricate ideas about intelligence, those ideas went nowhere as far as AI is concerned, they got bitter lessoned to the curb, we're on year 6 of explosion of “AI based on not very clever math and implemented in python by 120 IQ engineers”, yet it seems they still refuse to learn, and indeed even fortify their ego by owning this refusal. Being headstong is nice in some circumstances, like in a prison, I guess (if you're tough). It's less good in science, it begets crankery. I don't want to deal with anyone's personal traumas from prison or from math class, and I'd appreciate if people just took that shit to a therapist.

Alternatively, said folks are just incapable of serious self-modeling, so they actually believe that the substrate of human intelligence is fundamentally non-statistical and more akin to explicit content of their day job. This is, of course, laughable level of retardation and, again, deserves no discussion.

Instead, current research strongly suggests that LLMs are primarily pattern-recognition systems that infer regularities purely from text statistics rather than internally representing the world in a structured, grounded way.

…do you imagine that cause-effect relationships do not constitute a “regularity” or a “pattern”?

A wrapper runs on already subsidised tokens by subsidising them more. Inference costs coming down will not justify the 500 billion plus

Could you pick a lane? Either this is all a terrible money burner or inference costs are coming down. In reality frontier labs have like 80% margins on inference, they're in the red mostly due to training spending. Even DeepSeek is profitable as far as inference is concerned. Anthropic constantly suffers from inability to serve demand. There aren't that many receptionists in the world, no. It is possible that current expenditures will not be recouped, but that will only lead to a freeze in training spending. It's pretty clear that we could run all those GPUs at significant profit for years.

No. This is, however, exactly what OP is doing, only he goes to more length to obfuscate it, to the point that he fails to sneak in an actual argument. It's just words. I am smart (muh creds), others are dumb (not math creds), they're naive and get fooled because they're dumb and anthropomorphise, here are some musings on animals (I still don't see what specific cognitive achievement an orangutan can boast of, as OP doesn't bother with this), here's something about embeddings, now please pretend I've said anything persuasive about LLM intelligence. That's the worst genre of a post that this forum has to offer, it's narcissistic and time-wasting. We've had the same issue with Hlynka, some people just feel that they're entitled to post gibberish on why LLMs must be unintelligent and they endeavor to support this by citing background in math while failing to state any legible connection between their (ostensible) mathematically informed beliefs and their beliefs re LLMs. I am not sure if they're just cognitively biased in some manner or if it's their ego getting in the way. It is what it is.

Like, what is this? OP smirks as he develops this theme, so presumably he believes it to be load-bearing:

[…] Please keep this concept of "directionality" in mind as it is important to understanding how LLMs behave, and it will come up later.

[…] In addition to difficulty with numbers there is the more fundamental issue that directionality does not encode reality. The directionality of the statement "Donald Trump is the 47th President of the United States", would be identical regardless of whether Donald Trump won or lost the 2024 election. Directionally speaking there is no difference between a "real" court case and a "fictitious" court case with identical details.

The idea that there is a ineffable difference between true statements and false statements, or between hallucination and imagination is wholly human conceit. Simply put, a LLM that doesn't "hallucinate" doesn't generate text or images at all. It's literally just a search engine with extra steps.

No, seriously? How does one address this? What does the vector-based implementation of representations in LLMs have to do with the ineffable difference between truth and falsehood that people dumber than OP allegedly believe in? If the pretraining data is consistent that Trump is the 47th president, then the model would predict as much and treat it as "truth". If we introduce a "falsehood" steering vector, it would predict otherwise. The training data is not baseline reality, but neither is any learned representation including world models in our brains. What does “literally just a search engine with extra steps” add here?

This sort of talk is confused on so many levels at once that the only valid takeaway is that the author is not equipped to reason at all.

I do not obfuscate. I understand that he's trying to insult me and others, and I call him an ignorant slut without any of that cowardly nonsense, plus I make an argument. To engage more productively, I'd have had to completely reinvent his stream of subtle jabs into a coherent text he might not even agree with. I'd rather he does that on his own.

I really haven't entered a pissing contest (typo).

I find OP's text exceptionally bad precisely because it is designed as a high-quality contribution but lacks the content of one; what is true is not germane to the argument and what little is germane is not true, its substance is mere sneer, ideas about reactivity and perceptivity are not thought through (would we we consider humans modulo long term memory formation unintelligent?), the section on hallucinations is borderline incoherent. This is LLM-like in the worst sense possible. I've said many times that superficial adherence to the letter of rules of polite discussion while ignoring its spirit is unacceptable for me. Thus I deem it proper to name the substantial violations. If mods feel otherwise they should finally give me a time out or a block. I am not a very active participant and don't intend to rely on any residual clout.

Multiple people in this post were able to disagree with OP without resorting to prosaic insults in their first sentence.

Multiple people should be more motivated to call out time-wasting obfuscated bullshit before wasting their time. I am grateful to @rae for doing the tedious work of object-level refutation, but the problem is that the whole dismantled section on word2vec math is not relevant to OP's argument about lack of reactivity (which isn't supported by, well, anything), so OP doesn't feel like it is anything more than a nitpick, a pedantic challenge to his domain-specific technical competence. Why should anyone bother with doing more of that? Let's just get to the meat of the issue. The meat is: are LLMs intelligent? I've shown that rigorous, good faith objections to that have a poor track record.

At the risk of getting into it with you again. What did you think of this when it made its rounds 2 months ago: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

I think I've already responded to that but maybe not. The meta issue with Apple papers is that their DL team is coping about repeated failures to build a competitive system (it may be that such philosophical handicaps get in the way). The object level issue with their tests is covered in this series of posts on X. One relevant piece:

  • If you actually look at the output of the models you will see that they don't even reason about the problem if it gets too large:

"Due to the large number of moves, I'll explain the solution approach rather than listing all 32,767 moves individually"

  • At least for Sonnet it doesn't try to reason through the problem once it's above ~7 disks. It will state what the problem and the algorithm to solve it and then output its solution without even thinking about individual steps.

Does this mean “0% accuracy”? I guess for people who believe “LLMs create billions of value by doing stuff like autonomously optimizing CUDA kernels, agriculture creates value by growing wheat, ergo wheat is as intelligent as an SWE? heh” is a clever dunk, it does.

There is a massive gulf in efficiency of understanding between people who approach LLMs with some rigid preconceived notions and people who can fucking look at the outputs and think about them. The gulf is so large that the former group can go through the motions of "empirical research" and publish papers proving how LLMs inherently can't do X or Y and not notice that they can, in their own setup, moreover that the setup is nonsensical. It's no longer a matter of polite disagreement, it's pure refusal to think, hiding your head in the sand. It's on par with paranormal research and homeopathy and should be treated as such: pushed out of the field and into self-funded fringe journals to die in obscurity.

Having no interest to get into a pissing context^W contest, I'll only disclose I've contributed to several DL R&D projects of this era.

This is the sort of text I genuinely prefer LLM outputs to, because with them, there are clear patterns of slop to dismiss. Here, I am compelled to wade through it manually. It has the trappings of a sound argument, but amounts to epitemically inept, reductionist, irritated huffing and puffing with an attempt to ride on (irrelevant) credentials and dismiss the body of discourse the author had found beneath his dignity to get familiar with, clearly having deep contempt for people working and publishing in the field (presumably ML researchers don't have degrees in mathematics or CS). Do even you believe you've said anything more substantial than “I don't like LLMs” in the end? A motivated layman definition of intelligence (not even citing Chollet or Hutter? Seriously?), a psychologizing strawman of arguments in favor of LLM intelligence, an infodump on embedding arithmetic (flawed, as already noted), random coquettish sneers and personal history, and arrogant insistence that users are getting "fooled" by LLMs producing the "appearance" of valid outputs, rather than, say, novel functioning programs matching specs (the self-evident utility of LLMs in this niche is completely sidestepped), complete with inane analogies to non-cognitive work or routine one-off tasks like calculation. Then some sloppy musings on current limitations regarding in-context learning and lifelong learning or whatever (believe me, there's a great deal of work in this direction). What was this supposed to achieve?

In 2019, Chollet has published On the Measure of Intelligence, where he has proposed the following definition: “The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty.” It's not far from yours, because frankly it's intuitive. Starting from this idea and aiming to test fluid thinking specifically, Chollet has also proposed ARC-AGI benchmark, which for the longest time was so impossibly hard for DL systems (and specifically LLMs) that many took that as evidence for the need to do “complete ground-up redesign from first principles” to make any headway. o3 was the first LLM to truly challenge this; Chollet coped by arguing that o3 is doing something beyond DL, some “guided program synthesis” he covets. From what we know, it just autoregressively samples many CoTs in parallel and uses a simple learned function to nominate the best one. As of now, it's clearly going to be saturated within 2 years as is ARC-AGI 2, and we're on ARC-AGI 3, with costs per problem solved plummeting. Neither 1 nor 3 are possible to ace for an orangutan or indeed for a human of below-average intelligence. Similar things are happening to “Humanity's Last Exam”. Let's say it's highly improbable at this point than any “complete ground-up redesign from first principles” will be necessary. Transformer architecture is rather simple and general, making it cheaper to train and inference without deviating from the core idea of “a stack of MLPs + expressive learned mixers” is routine, and virtually all progress is achieved by means of better data – not just “cleaner” or “more”, but procedural data predicting which necessitates learning generally useful mental skills. Self-verification, self-correction, backtracking, iteration, and now tool use, search, soliciting multi-agent assistance (I recommend reading Kimi K2 report, the section 3.1.1, for an small sliver of an idea of what that entails). Assembling necessary cognitive machines in context. This is intelligence, so poorly evidenced in your texts.

In order to align an AI to care about truth and accuracy you first need a means of assessing and encoding truth and it turns out that this is a very difficult problem within the context of LLMs, bordering on mathematically impossible.

We are not in 2013 anymore, nor on LessWrong, to talk of this so abstractly and glibly. "Reptile — legs = snake" just isn't an adequate level of understanding to explain behaviors of LLMs, this fares no better than dismissing hydrology (or neuroscience, for that matter) as mere applied quantum mechanics with marketing buzzwords. Here's an example of a relevant epistemically serious 2025 paper, "The Geometry of Self-Verification in a Task-Specific Reasoning Model":

We apply DeepSeek R1-Zero’s setup with Qwen2.5-3B as our base model (Hyperparams: Appx. A). Our task, CountDown, is a simple testbed frequently used to study recent reasoning models [9, 10, 32, 39 ] – given a set of 3 or 4 operands (e.g., 19, 36, 55, 7) and target number (e.g., 65), the task is to find the right arithmetic combination of the operands to reach the target number (i.e., 55 + 36 - 7 - 19). […] The model is given two rewards: accuracy reward for reaching the correct final answer, and a format reward when it generates its CoT tokens in between “” and “” tokens. […] Once we score each previous-token head using Eq. 8, we incrementally ablate one head at a time until we achieve perfect intervention scores (Section 4.4). Using this approach, we identify as few as three attention heads that can disable model verification. We notate this subset as AVerif. To summarize, we claim that the model has subspace(s) (polytope(s)), SGLUValid , for self-verification. The model’s hidden state enters this subspace when it has verified its solution. In our setting, given the nature of our task, previous-token heads APrev take the hidden-state into this subspace, while for other tasks, different components may be used. This subspace also activates verification-related GLU weights, promoting the likelihood of tokens such as “success” to be predicted (Figure 3). […]For “non-reasoning” models, researchers have studied “truthful” representations before [ 4 ], where steering towards a “truthful” direction has led to improvements in tasks related to factual recall [ 17]. In a similar vein, researchers have shown that the model’s representations can reveal whether they will make errors (e.g., hallucinations) [ 28 ], or when they are unable to recall facts about an entity [ 8 ]. Most recently, concurrent work [37, 41 ] also investigate how models solve reasoning tasks. [ 41 ] find that models know when they have reached a solution, while [ 37 ] decode directions that mediate behaviors such as handling uncertainty or self-corrections. While our work corroborates these findings, we take a deeper dive into how a reasoning model verifies its own reasoning trace. Circuit Analysis. A growing line or work decomposes the forward pass of a neural network as “circuits” [24], or computational graphs. This allows researchers to identify key components and their causal effects for a given forward pass. A common approach to construct computational graphs is to replace model components with dense activations with a sparsely-activating approximation. [ 6] introduces Transcoders to approximate MLP layers, while [ 1 ] further develops Cross-layer Transcoders to handle inter-layer features. [18 ] uses Cross-layer Transcoders to conduct circuit analyses for a wide range of behaviors, such as multi-step reasoning (for factual recall) or addition, and also investigate when a model’s CoT is (un)faithful…

The point of this citation is to drive home that any “first principles” dismissal of LLMs is as ignorant, or indeed more ignorant, than sci-fi speculation of laymen. In short, you suck and you should learn humility to do better to corroborate your very salient claim to authority.

There are good criticisms of LLMs. I don't know if you find Terence Tao's understanding of mathematics sufficiently grounded; he's Chinese after all. He has some skepticism about LLMs contributing to deep, frontier mathematical research. Try to do more of that.

Well since you aren't telling what the tricky way is (the whole exchange is suggestive of paranoia to be honest), I can't reproduce, but

Which are bordering states of Nebraska, their population, vowel count, and vowels in order?

Neighboring states of Nebraska, with 2023 population estimates (rounded), vowel count, and the vowels they contain in order:

  1. Missouri – 6.17 million – 3 vowels – o, u, i
  1. Iowa – 3.20 million – 3 vowels – I, o, a
  2. South Dakota – 0.92 million – 4 vowels – o, u, a, o, a → o, u, a (distinct only: o, u, a)
  3. Kansas – 2.95 million – 2 vowels – a, a
  4. Colorado – 5.84 million – 3 vowels – o, o, a
  5. Wyoming – 0.58 million – 2 vowels – o, i

with the exception of Colorado's vowels it seems correct. I don't get better results from 2.5 Pro or Sonnet, certainly no hallucinated two states.

Wait, where did my image go? I attached it, is it just gone?

Anyway, Kimi on OpenRouter:


Thought Process: To determine the vowels of "North Dakota" in order, we need to follow these steps:

Identify the word: The word in question is "North Dakota." Extract the vowels: From the word, we need to pick out only the vowels (A, E, I, O, U). Sometimes 'Y' is considered a vowel, but in this case, we will ignore it unless explicitly needed. List the vowels in order: Go through each letter of "North Dakota" in sequence and note the vowels in the order they appear. Step-by-Step Solution: Write out "North Dakota" and label each character with its position for clarity:

N (1), o (2), r (3), t (4), h (5), (space), D (7), a (8), k (9), o (10), t (11), a (12). Iterate through each character and identify vowels:

Position 1: 'N' – not a vowel. Position 2: 'o' – vowel (O). Position 3: 'r' – not a vowel. Position 4: 't' – not a vowel. Position 5: 'h' – not a vowel. Position 6: ' ' – space, ignore. Position 7: 'D' – not a vowel. Position 8: 'a' – vowel (A). Position 9: 'k' – not a vowel. Position 10: 'o' – vowel (O). Position 11: 't' – not a vowel. Position 12: 'a' – vowel (A). Collect the vowels in the order they appear:

o (position 2), a (position 8), o (position 10), a (position 12). Final answer: o, a, o, a


Kimi on Groq doesn't waste time:

The vowels in "North Dakota" in order are: o, a, o, a

then claimed the vowels of North Dakota in order were o, h, a, and o

Can't replicate even with quantized provider. Most people report very different experience.

update: Kimi K2 ranks #5 overall on LMArena under style control.

  • Tied #1 Multi-turn interaction
  • #2 Coding, Hard Prompts

The top US open models are Google's Gemma 3 27b and Nvidia's finetune of llama 3.1, ranked #28 and #34 respectively.

There are tiers to this, from just weights release to full data+code+weights. Chinese labs mostly release weights and tech report with a reproducible (given some effort) recipe, sometimes code, rarely some or all of the data (more often parts of post-training data, though in these cases it's typically just links to datasets that have already been open).

I think nitpicking about open source is uninteresting when the recipe is available. This is a very dynamic field of applied science, rather than labor-intensive programming exercise. The volume of novel code in a given LLM project is comparable to a modest Emacs package, what matters is ideas (derisked at scale). Specific implementations are usually not that valuable – DeepSeek's GRPO, as described in their papers, has been improved upon in the open multiple times by this point. Data composition is dependent on your own needs and interests, there are vast open datasets, just filter them as you see fit.

I think R1 and the wave it's caused have already had an effect. It's frozen the ceiling on «frontier» pricing around $15/1M for models slightly but clearly better, such as Sonnet or 2.5 Pro (there are higher-tier offerings but they get very little purchase), encouraged the consumption of small distilled models like grok-mini or -Flash which directly compete with Chinese output, and clearly led OpenAI and Meta to try to ship a better open model for prestige (OpenAI may succeed yet). Amodei is coping, his company is among the more vulnerable ones and with the worst velocity; no matter how hard they lean on the DoD pork and national security rhetoric, everyone in the US does that now.

Expenditures have already happened, largely; datacenters are getting completed, giant training runs will be just a way to amortize that by producing models that will warrant higher inference volume and pricing. Base models on the level of Grok 3 are the floor for this generation, soon GPT-5 sets the next frontier floor. There is also an obvious pivot to agents/deep researchers/reasoners with extremely bloated, branching, parallelizable inference, and you need models to be smart enough to make sense of all those vast context dumps. Synthetic data scaling is focused on RL now, that also in effect requires to run a great deal of inference to produce higher-utility models. They won't cut expenditures, in short.

Sorry, misunderstood you. I don't think we've seen anyone seriously defend having stolen or distilled someone's model. My bet is the precedent will depend on who/whom and lawyer muscle rather than fundamentals of the situation.

How else could they achieve this result if their talent wasn't superior? Or if not talent, then the juice in an organization that allows good results at speed.

How small and relatively inexperienced Chinese labs do so much with so little is an interesting question. I have the impression that Western corporations overestimate “frontier talent”, or perhaps paradoxically – underestimate actual, raw talent (that isn't that rare, just needs to be noticed) and overestimate the value of corporate secrets that some of this legendary talent is privy to. Liang Wenfeng hires Ph.D students and they seem to do better than mature Ph.Ds.

H20s are useless for training, China will have to figure that part out on their own. Although the current RL paradigm is more and more reliant on inference (rollouts of agent trajectories, Kimi is built on that), so H20s will indirectly advance capabilities. Yet there remains a need for pretraining next generation bases, and of course experiments.

It usually involves taking a chapter from my novel and asking it to imagine it in a style from a different author I like.

Yes, many models (even open ones, such as R1) have better adherence to instructions. It writes well in its own style. I value models with distinct personalities. You're right about Russianness I think.

I've heard that they tried to buy-out Thinking Machines and SSI for billions, but were turned down

They've proposed that to even much smaller labs, though I'm not at liberty to share. Zuck is desperate and defaults to his M&A instincts that have served him well. It might work in dismantling the competition, at least. But it's not like Meta and FAIR were originally lacking in talent, they've contributed immensely to research (just for instance, MTP in DeepSeek V3 is based on their paper; Llama 4 somehow failed to implement it). The problem is managerial. To get ahead, I'm afraid Zuck will need to cut, rather than graft.

But Grok 4 just crushes with sheer size I think.

The fact that Grok is at all comparable (or indeed inferior) to Kimi on any metric, even the most obscure one, speaks to the deep cultural advantage of Moonshot. Grok 4's training compute is estimated to be 6.4e26 FLOPs; Kimi, like R1, is likely ≈4.0e24, fully 100 times less. They probably spent on scaling experiments for Grok 3/4 more than Moonshot has spent over their lifetime on everything. It's not really a fair competition, I admit Grok is a stronger model.

It was designed for Musk's vision of AI modelling and understanding the physical universe, that's what it's for and it does excellently there.

I think it wasn't designed with any specific focus in mind, it's an all around next-generation base+RL model.

I think the arc of history still bends towards Nvidia, the biggest company in the world and by some distance. I think like you I was leaning more towards the 'talent conquers all' ethos

You distort my argument. I was consistently skeptical that China can win this on HBD merits alone, after all the US also has plenty of talented people (very many of them Chinese, but also diverse global and domestic talent), in Nvidia and elsewhere, plus it has a giant and growing edge in compute. My thesis is that the gap in applied AI possibly won't be so profound as to allow some Pivotal Action To Secure Durable Strategic Advantage, that the hawks in DC and Dario Amodei fantasize about as they rail for more export controls. Nvidia will remain dominant, so will Western AI labs.

But so far China is doing better than I anticipated, both technically and ethically.

In regard to Zhilin's words, if I may psychologize a little, I think that it's very natural for a Chinese person with close knowledge of and experience with Western ideas and societies - but also an attachment to an identity as Chinese - to conceptualize things in terms of a dichotomy between East and West

If that is how the Chinese actors themselves conceptualize this, does it matter if we can object to such thinking as historically reductionist and stereotypical? Yes, obviously both types exist in both societies. (At least Zhilins exist in the US; if there are Liangs, I'd be happy to see them. Lambert is an academic, not a hedge fund CEO who also somehow happens to be a great engineer and an open source fanatic. The closest we had was Emad Mostaque, neither exactly Western nor very technical or good at being a CEO). But it is clear that the Chinese discourse, particularly in the VC sphere, maps pragmatic and idealistic archetypes onto the East-West dichotomy. Half of Liang's interview is the journalist saying “but this is madness, nobody does it, nobody will give the money for it” and Liang saying “and yet we must learn to do it, because crazy Westerners do and that is how they create things we've been imitating all this time” .

But since it's not a matter of "deep roots," it makes sense that a single breakout success like DeepSeek could precipitate a shift in orientation. … To go far afield of my knowledge, it seems as though these extrinsic factors might end up being better for China than for the US.

I agree this is a possibility, and I think it's one of the more interesting cultural trends to track, which is why I'm writing these updates. Deep roots or not, Chinese fast-following is more than a subtly racist trope, it really is the backbone of their economic ascendance. If they start similarly rewarding high-risk innovation, it'll change the gameboard a great deal.

Well, I don't really understand American law but it seems to me that Anthropic has set the precedent of LLM pretraining corpora being essentially immune to copyright claims. Anthropic's models are, ironically, the most paranoid about reproducing copyrighted material.