site banner

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

Transnational Thursday is a thread for people to discuss international news, foreign policy or international relations history. Feel free as well to drop in with coverage of countries you’re interested in, talk about ongoing dynamics like the wars in Israel or Ukraine, or even just whatever you’re reading.

The Wednesday Wellness threads are meant to encourage users to ask for and provide advice and motivation to improve their lives. It isn't intended as a 'containment thread' and any content which could go here could instead be posted in its own thread. You could post:

  • Requests for advice and / or encouragement. On basically any topic and for any scale of problem.

  • Updates to let us know how you are doing. This provides valuable feedback on past advice / encouragement and will hopefully make people feel a little more motivated to follow through. If you want to be reminded to post your update, see the post titled 'update reminders', below.

  • Advice. This can be in response to a request for advice or just something that you think could be generally useful for many people here.

  • Encouragement. Probably best directed at specific users, but if you feel like just encouraging people in general I don't think anyone is going to object. I don't think I really need to say this, but just to be clear; encouragement should have a generally positive tone and not shame people (if people feel that shame might be an effective tool for motivating people, please discuss this so we can form a group consensus on how to use it rather than just trying it).

22

At the risk of doxxing myself, I have an advanced degree in Applied Mathematics. I have authored and contributed to multiple published papers, and hold a US patent all related to the use of machine learning in robotics and digital signal processing. I am currently employed as a supervising engineer by at a prominent tech company. For pseudonymity's sake I am not going to say which, but it is a name that you would recognize. I say this not to brag, but to establish some context for the following.

Imagine that you are someone who is deeply interested in space flight. You spend hours of your day thinking seriously about Orbital Mechanics and the implications of Relativity. One day you hear about a community devoted to discussing space travel and are excited at the prospect of participating. But when you get there what you find is a Star Trek fan-forum that is far more interested in talking about the Heisenberg compensators on fictional warp-drives than they are Hohmann transfers, thrust to ISP curves, or the effects on low-gravity on human physiology. That has essentially been my experience trying to discuss "Artificial Intelligence" with the rationalist community.

However at the behest of users such as @ArjinFerman and @07mk, and because X/Grok is once again in the news, I am going to take another stab at this.

Are "AI assistants" like Grok, Claude, Gemini, and DeepSeek intelligent?

I would say no, and in this post I am going to try to explain why, but to do so requires a discussion of what I think "intelligence" is and how LLMs work.

What is Intelligence
People have been philosophizing on the nature of intelligence for millennia, but for the purposes of our exercise (and my work) "intelligence" is a combination of perceptivity and reactivity. That is to say, the ability to perceive or take in new and/or changing information combined with the ability to change state based on that information. Both are necessary, and neither is sufficient on it's own. This is why Mathematicians and Computer Scientists often emphasize the use of terms like "Machine Learning" over "Artificial Intelligence" as an algorithms' behavior is almost never both.

If this definition feels unintuitive, consider it in the context of the following example. What I am saying is that an orangutan who waits until the Zookeeper is absent to use a tool to force the lock on it's enclosure is more "intelligent" than the insect that repeatedly throws itself against your kitchen window in an attempt to get outside. While they share an identical goal (to get outside) but the orangutan has demonstrated the ability to both perceive obstacles (IE the lock and the Zookeeper), and react dynamically to them in a way that the insect has not. Now obviously these qualities exist on a spectrum (try to swat a fly and it will react) but the combination of these two parameters define an axis along which we can work to evaluate both animals and algorithms, and as any good PM will tell you, the first step to solving any practical engineering problem is to identify your parameters.

Now the most common arguments for AI assistants like Grok being intelligent tend to be some variation on "Grok answered my question, ergo Grok is intelligent." or "Look at this paragraph Claude wrote, do you think you could do better?" but when evaluated against the above parameters, the ability to form grammatically correct sentences and the ability to answer questions are both orthogonal to it. An orangutan and a moth may be equally incapable of writing a Substack, but I don't expect anyone here to seriously argue that they are equally intelligent. By the same token a pocket calculator can answer questions, "what is the square root of 529?" being one example of such, but we don't typically think of pocket calculators as being "intelligent" do we?

To me, these sorts of arguments betray a significant anthropomorphic bias. That bias being the assumption that anything that a human finds complex or difficult must be computationally complex and vice versa. The truth is often the inverse. This bias leads people who do not have a background in a math or computer science to have completely unrealistic impressions of what sort of things are easy or difficult for a machine to do. For example, vector and matrix operations are a reasonably simple thing for a computer that a lot of human students struggle with. Meanwhile bipedal locomotion is something most humans do without even thinking, despite it being more computationally complex and prone to error than computing a cross product.

Speaking of vector operations, let's talk about how LLMs work...

What are LLMs
LLM stands for "Large Language Model". These models are a subset of artificial neural network that uses "Deep Learning" (essentially a fancy marketing buzzword for the combination of looping regression analysis with back-propagation) to encode a semantic token such as the word "cat" as a n-dimensional vector representing that token's relationship to the rest of the tokens in the training data. Now in actual practice these tokens can be anything, an image, an audio-clip, or a snippet of computer code, but for the purposes of this discussion I am going to assume that we are working with words/text. This process is referred to as "embedding" and what it does in effect is turn the word "cat" into something that a computer (or grad-student) can perform mathematical operations on. Any operation you might perform on a vector (addition, subtraction, transformation, matrix multiplication, etc...) can now be done on "cat".

Now because these vectors represent the relationship of the tokens to each other, words (and combinations of words) that have similar meanings will have vectors that are directionally aligned with each other. This has all sorts of interesting implications. For instance you can compute the dot product of two embedded vectors to determine whether their words are are synonyms, antonyms, or unrelated. This also allows you to do fun things like approximate the vector "cat" using the sum of the vectors "carnivorous" "quadruped" "mammal" and "feline", or subtract the vector "legs" from the vector "reptile" to find an approximation for the vector "snake". Please keep this concept of "directionality" in mind as it is important to understanding how LLMs behave, and it will come up later.

It should come as no surprise that some of the pioneers of this methodology in were also the brains behind Google Translate. You can basically take the embedded vector for "cat" from your English language model and pass it to your Spanish language model to find the vector "gato". Furthermore because all you are really doing is summing and comparing vectors you can do things like sum the vector "gato" in the Spanish model with the vector for the diminutive "-ito" and then pass it back to the English model to find the vector "kitten".

Now if what I am describing does not sound like an LLM to you, that is likely because most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word. This is essentially what is happening under the hood when you type a prompt into GPT or your assistant of choice.

Our Villain Lorem Epsom, and the Hallucination Problem
I've linked the YouTube video Badness = 0 a few times in prior discussions of AI as I find it to be both a solid introduction to LLMs for the lay-person, and an entertaining illustration of how anthropomorphic bias can cripple the discussion of "alignment". In it the author (who is a professor of Computer Science at Carnegie Mellon) posits a semi-demonic figure (akin to Scott Alexander's Moloch) named Lorem Epsom. The name is a play on the term Lorem Ipsom and represents the prioritization of appearance over all else. When it comes to writing, Lorem Epsom doesn't care about anything except filling the page with text that looks correct. Lorem Epsom is the kind of guy who, if you tell him that he made a mistake in the math, is liable interpret that as a personal attack. The ideas of "accuracy" "logic" "rigor" and "objective reality" are things that Lorem Epsom has heard of but that do not concern Lorem Epsom. It is very possible that you have had to deal with someone like Lorem Epsom in your life (I know I have), now think back and ask yourself how did that go?

I bring up Lorem Epsom because I think that understanding him provides some insight into why certain sorts of people are so easily fooled/taken in by AI Assistants like Claude and Grok. As discussed in the section above on "What is Intelligence", the assumption that the ability to fill a page with text is indicates the ability to perceive and react to a changing situation is an example of anthropomorphic bias. I think that a lot of people assume that because they are posing their question to a computer, they expect the answer they get to be something analogous to what they would get from a pocket calculator rather than from Lorem Epsom.

Sometime circa 2014 I kicked off a heated dispute in the comment section of a LessWrong post by asking EY why a paperclip maximizing AI that was capable of self-modification wouldn't just modify the number of paperclips in its memory. I was accused by him others and a number of others of missing the point, but I think they missed mine. The assumption that an Artificial Intelligence would not only have a notion of "truth", but assign value to it is another example of anthropomorphic bias. If you asked Lorem Epsom to maximize the number of paperclips, and he could theoretically "make" a billion-trillion paperclips simply by manipulating a few bits, why wouldn't he? It's so much more easier than cutting and bending wire.

In order to align an AI to care about truth and accuracy you first need a means of assessing and encoding truth and it turns out that this is a very difficult problem within the context of LLMs, bordering on mathematically impossible. Do you recall how LLMs encode meaning as a direction in n-dimensional space? I told you it was going to come up again.

Directionally speaking we may be able to determine that "true" is an antonym of "false" by computing their dot product. But this is not the same thing as being able to evaluate whether a statement is true or false. As an example "Mary has 2 children", "Mary has 4 children", and "Mary has 1024 children" may as well be identical statements from the perspective of an LLM. Mary has a number of children. That number is a power of 2. Now if the folks programming the interface layer were clever they might have it do something like estimate the most probable number of children based on the training data, but the number simply can not matter to the LLM the way it might matter to Mary, or to someone trying to figure out how many pizzas they ought to order for the family reunion because the "directionality" of one positive integer isn't all that different from any another. (This is why LLMs have such difficulty counting if you were wondering)

In addition to difficulty with numbers there is the more fundamental issue that directionality does not encode reality. The directionality of the statement "Donald Trump is the 47th President of the United States", would be identical regardless of whether Donald Trump won or lost the 2024 election. Directionally speaking there is no difference between a "real" court case and a "fictitious" court case with identical details.

The idea that there is a ineffable difference between true statements and false statements, or between hallucination and imagination is wholly human conceit. Simply put, a LLM that doesn't "hallucinate" doesn't generate text or images at all. It's literally just a search engine with extra steps.

What does this have to do with intelligence?
Recall that I characterized intelligence as a combination of perceptivity and and the ability to react/adapt. "AI assistants" as currently implemented struggle with both. This is partially because LLMs as currently implemented are largely static objects. They are neither able to take in new information, nor discard old. The information they have at time of embedding is the information they have. This imposes substantial loads on the context window of the interface layer, as any ability to "perceive" and subsequently "react" must happen within it's boundaries. Increasing the size of the window is non trivial as the relationship between the size of the window and the amount of memory and the number of FLOPS required is a hyperbolic curve. This is why we saw a sudden flurry of development following the release of Nvidia's multimodal framework and it's mostly been marginal improvements since. The last significant development being June of last year when the folks at Deepseek came up with some clever math to substantially reduce the size of the key value cache, but multiplicative reductions are no match for exponential growth.

This limited context window, coupled with the human tendency to anthropomorphize things is why AI Assistants sometimes appear "oblivious" or "naive" to the uninitiated. and why they seem to "double down" on mistakes. They can not perceive something that they have not been explicitly prompted to even if it is present in their training data. This limited context window is also why if you actually try to play a game of chess with Chat GPT it will forget the board-state and how pieces move after a few turns and promptly lose to a computer program written in 1976. Unlike a human player (or an Atari 2600 for that matter) your AI assistant can't just look at the board (or a representation of the board) and pick a move. This IMO places them solidly on the "insect" side of the perceptivity + reactivity spectrum.

Now there are some who have suggested that the context window problem can be solved by making the whole model less static by continuously updating and re-embedding tokens as the model runs, but I am skeptical that this would result in the sort of gains that AI boosters like Sam Altman claim. Not only would it be computationally prohibitive to do at scale, what experiments there have been (or at least that I am aware of) with self-updating language models, have quickly spun away into nonsense for reasons described in the section on Lorem Epsom., as barring some novel breakthrough in the embedding/tokenization process there is no real way to keep hallucinations and spurious inputs from rapidly overtaking the everything else.

It is already widely acknowledged amongst AI researchers and developers that the LLM-based architecture being pushed by OpenAI and DeepSeek is particularly ill-suited for any application where accuracy and/or autonomy are core concerns, and it seems to me that this unlikely to change without a complete ground-up redesign from first principles.

In conclusion, it is for the reasons above and many others that I do not believe that "AI Assistants" like Grok, Claude, and Gemini represent a viable path towards a "True AGI" along the lines of Skynet or Mr. Data, and if asked "which is smarter, Grok, Claude, Gemini, or an orangutan?" I am going to pick the orangutan every time.

2

This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.

Post your project, your progress from last week, and what you hope to accomplish this week.

If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

Hi folks,

Recorded this interview with Trace at Manifest last month. We talked about evolving cultural dynamics online, reforming the Democratic Party, and how small groups of people can have disproportionate influence on public policy. Also discussed is the impact of places like TheMotte, both as a crucible for ideas and as a training ground for future writers and leaders.

Given Trace's prominence and contentiousness here, I hope it might be of interest. Look forward to hearing what people think, and perhaps sparking some discussion. I've highlighted one point of disagreement I have with his ideas [thusly] in the transcript.

The video, Spotify/Apple Podcast links, and a full 'Patio11-style' transcript are all available here: https://alethios.substack.com/p/with-tracingwoodgrains-journalism

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

Transnational Thursday is a thread for people to discuss international news, foreign policy or international relations history. Feel free as well to drop in with coverage of countries you’re interested in, talk about ongoing dynamics like the wars in Israel or Ukraine, or even just whatever you’re reading.

The Wednesday Wellness threads are meant to encourage users to ask for and provide advice and motivation to improve their lives. It isn't intended as a 'containment thread' and any content which could go here could instead be posted in its own thread. You could post:

  • Requests for advice and / or encouragement. On basically any topic and for any scale of problem.

  • Updates to let us know how you are doing. This provides valuable feedback on past advice / encouragement and will hopefully make people feel a little more motivated to follow through. If you want to be reminded to post your update, see the post titled 'update reminders', below.

  • Advice. This can be in response to a request for advice or just something that you think could be generally useful for many people here.

  • Encouragement. Probably best directed at specific users, but if you feel like just encouraging people in general I don't think anyone is going to object. I don't think I really need to say this, but just to be clear; encouragement should have a generally positive tone and not shame people (if people feel that shame might be an effective tool for motivating people, please discuss this so we can form a group consensus on how to use it rather than just trying it).

1

This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.

Post your project, your progress from last week, and what you hope to accomplish this week.

If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

13

This is another periodic update on the state of open source AI, which started here a year and a day ago, when I've said of DeepSeek, relatively obscure at that point:

I would like to know who's charting their course, because they're single-handedly redeeming my opinion of the Chinese AI ecosystem and frankly Chinese culture… This might not change much. Western closed AI compute moat continues to deepen, DeepSeek/High-Flyer don't have any apparent privileged access to domestic chips, and other Chinese groups have friends in the Standing Committee and in the industry, so realistically this will be a blip on the radar of history.

The chip situation is roughly stable. But Chinese culture, with regard to AI, has changed a bit since then.

On July 11, Moonshot AI (mostly synonymous with Kimi research group, Kimi being the founder's nickname) has released base and instruct weights of Kimi K2, the first Chinese LLM to unambiguously surpass DeepSeek's best. Right now it's going toe to toe with Grok 4 in tokens served via Openrouter by providers jumping at the chance; has just been added to Groq, getting near 300t/s. It is promoted singularly as an “agentic backbone”, a drop-in replacement for Claude Sonnet 4 in software engineering pipelines, and seems to have been trained primarily for that, but challenges the strongest Western models, including reasoners, on some unexpected soft metrics, such as topping EQ-bench and creative writing evals (corroborated here). Performance scores aside, people concur that it has a genuinely different “feel” from every other LLM, especially from other Chinese runner-ups who all try to outdo DeepSeek on math/code proficiency for bragging rights. Its writing is terse, dense, virtually devoid of sycophancy and recognizable LLM slop. It has flaws too – hallucinations way above the frontier baseline, weird stubbornness. Obviously, try it yourself. As Nathan Lambert from Allen AI remarks,

The gap between the leading open models from the Western research labs versus their Chinese counterparts is only increasing in magnitude. The best open model from an American company is, maybe, Llama-4-Maverick? Three Chinese organizations have released obviously more useful models with more permissive licenses: DeepSeek, Moonshot AI, and Qwen. A few others such as Tencent, Minimax, Z.ai/THUDM may have Llama-4 beat too

(As an aside. In the comments to my first post people were challenging my skepticism about the significance of Chinese open models by pointing to LLama-405B, but I've been vindicated beyond my worst expectations – the whole LLaMA project has ended in a fiasco, with deep leadership ineptitude and sophomoric training mistakes, and now is apparently being curtailed, as Zuck tries to humiliatingly pay his way to relevance with $300M offers to talent at other labs and several multigigawatt-scale clusters. Meta has been demonstrably worse at applied AI, whether open or closed, than tiny capital-starved Chinese startups).

But I want to talk a bit about the cultural and human dimension.

Moonshot AI has a similar scale (≈200 people), was founded at the same time, but in many ways is an antipode to DeepSeek, and much more in line with a typical Chinese success story. Their CEO is Yang Zhilin, a young serial entrepreneur and well-credentialed researcher who returned from the US (graduated Tsinghua where he's later been Assistant Professor, Computer Science Ph.D from Carnegie Mellon, worked at Google Brain, Meta). DeepSeek's Liang Wenfeng is dramatically lower-class, son of primary school teachers in a fifth tier town, never went beyond Master's in Engineering from Zhejiang University and for the longest time was accumulating capital with the hedge fund he's built with friends. In 2023-2024, soon after founding their startups, both gave interviews. Yang's was mostly technical, but it included bits like these:

Of course, I want to do AGI. This is the only meaningful thing to do in the next 10 years. But it's not like we aren't doing applications. Or rather, we shouldn't define it as an "application". "Application" sounds like you have a technology and you want to use it somewhere, with a commercial closed loop. But "application" is inaccurate. It's complementary to AGI. It's a means to achieve AGI and also the purpose of achieving AGI. "Application" sounds more like a goal: I want to make it useful. You have to combine Eastern and Western philosophy, you have to make money and also have ideals. […] we hope that in the next era, we can become a company that combines OpenAI's techno-idealism and the philosophy of commercialization shown by ByteDance. The Oriental utilitarianism has some merits. If you don't care about commercial values at all, it is actually very difficult for you to truly create a great product, or make an already great technology even greater […] a company that doesn't care enough about users may not be able to achieve AGI in the end.

Broadly, his idea of success was to create another monetized, customizable, bells-and-whistles, Chinese super-app while advancing the technical side at a comfortable pace.

Liang's one, in contrast, was almost aggressively non-pragmatic and dismissive of application layer:

We're going to do AGI. […] We won't prematurely focus on building applications on top of models. We will focus on large models. […] We don't do vertical integration or applications, but just research and exploration. […] It's driven by curiosity. From a distance, we want to test some conjectures. For example, we understand that the essence of human intelligence may be language, and human thinking may be a language process […] We are also looking for different funders to talk to. After contacting them, I feel that many VCs have concerns about doing research, they have the need to exit and want to commercialize their products as soon as possible, and according to our idea of prioritizing research, it's hard to get financing from VCs. […] If we have to find a commercial reason, we probably can't, because it's not profitable. […] Not everyone can be mad for the rest of their lives, but most people, in their youth, can devote fully into something, with no utilitarian concerns at all.

After the release of V2, he seems to have also developed some Messianic ideas of “showing the way” to his fellow utilitarian Orientals:

It is a kind of innovations that just happens every day in the US. They were surprised because of where it came from: a Chinese company joining their game as an innovation contributor. After all, most Chinese companies are used to following, not innovating. […] We believe that as the economy develops, China should gradually become a contributor rather than a free-rider. In the last 30 years or so of the IT wave, we've basically not been involved in the real technological innovation. […] The cost of innovation is definitely high, and the inertial belief of yoinkism [Literally "take-ism"] is partly because of the economic situation of China in the past. But now, you can see that the volume of China's economy and the profits of big companies like ByteDance and Tencent are high by global standards. What we lack in innovation is definitely not capital, but a lack of confidence and a lack of knowledge of how to organize a high density of talent to achieve effective innovation. […] For technologists, being followed is a great sense of accomplishment. n fact, open source is more of a cultural behavior than a commercial one. To give is to receive glory. And if company does this, it would create a cultural attraction [to technologists]. […] There will be more and more hardcore innovation in the future. It may not be yet easily understood now, because the whole society still needs to be educated by the facts. After this society lets the hardcore innovators make a name for themselves, the groupthink will change. All we still need are some facts and a process.

They've been rewarded according to their credentials and vision. Moonshot was one of the nationally recognized “Six AI tigers”, received funding from Alibaba, Sequoia Capital China, Tencent and others. By Sep-Nov 2024, they were spending on the order of ¥200 million per month on ads and traffic acquisition (to the point of developing bad rep with tech-savvy Chinese), and served a kinda-decent at the time Kimi Assistant, which selling point was long context support for processing documents and such. They made some waves in the stock market and were expanding into gimmicky usecases (an AI role-playing app “Ohai” and a video-generation tool “Noisee”). By June 2024 Kimi was the most-used AI app in China (≈22.8 million monthly visits). Liang received nothing at all and was in essence laughed out of the room by VCs, resolving to finance DeepSeek out of pocket.

Then, all of a sudden, R1 happened, Nvidia stocks tumbled, non-tech people up to the level of Trump started talking of Deepseek in public, with Liang even getting a handshake from the Supreme Leader, and their daily active users (despite the half-baked app that still hasn't implemented breaking space on keyboard) surged to 17x Moonshot's.

Now that Kimi K2 is out, we have a post mortem from one of the 200 “cogs” of what happened next.

[…] 3. Why Open Source #1: Reputation. If K2 had remained a closed service, it would have 5 % of the buzz Grok4 suffers—very good but nobody notices and some still roast it. #2: Community velocity. Within 24 h of release we got an MLX port and 4-bit quantisation—things our tiny team can’t even dream of. #3: It sets a higher technical bar. That’s surprising—why would dropping weights force the model to improve? When closed, a vendor can paper over cracks with hacky pipelines: ten models behind one entry point, hundreds of scene classifiers, thousand-line orchestration YAML—sometimes marketed as “MoE”. Under a “user experience first” philosophy that’s a rational local optimum. But it’s not AGI. Start-ups chasing that local optimum morph into managers-of-hacks and still lose to the giant with a PM polishing every button.
Kimi the start-up cannot win that game. Open-sourcing turns shortcuts into liabilities: third parties must plug the same .safetensors into run_py() and get the paper numbers. You’re forced to make the model itself solid; the gimmicks die. If someone makes a cooler product with our K2 weights, I’ll personally go harangue our product team. […] Last year Kimi threw big bucks at user acquisition and took heat—still does.
I’m just a code-monkey; insider intent is above my pay grade. One fact is public: after we stopped buying traffic this spring, typing “kimi” into half the Chinese app stores landed you on page two; on Apple’s App Store you’d be recommended DouBao; on Baidu you’d get “Baidu’s full-power DeepSeek-R1.” Net environment, already hostile, got worse. Kimi never turned ads back on. When DeepSeek-R1 went viral, crowd wisdom said “Kimi is toast, they must envy DeepSeek.” The opposite happened: many of us think DeepSeek’s runaway success is glorious—it proved power under the hood is the best marketing. The path we bet on works, and works grandly. Only regret: we weren’t the ones who walked it. At an internal retrospective meeting I proposed some drastic moves. Zhilin ended up taking more drastic ones: no more K1.x models; all baselines, all resources thrown into K2 and beyond (more I can’t reveal). Some say “Kimi should drop pre-training and pivot to Agent products.” Most Agent products die the minute Claude cuts them off. Windsurf just proved that. 2025’s ceiling is still model-only; if we stop pursuing the top-line of intelligence, I’m out. AGI is a razor-thin wire—hesitation means failure. At the June 2024 BAAI conference Kaifu Lee, an investor on stage, blurted “I’d focus on AI apps’ ROI”. My gut: that company’s doomed. I can list countless flaws in Kimi K2; never have I craved K3 as much as now.

…Technologically it's just a wider DS-V3, down to model type in the configs. They have humbly adopted the architecture:

Before we spun up training for K2, we ran a pile of scaling experiments on architectural variants. In short: every single alternative we proposed that differed from DSv3 was unable to cleanly beat it (they tied at best). So the question became: “Should we force ourselves to pick a different architecture, even if it hasn’t demonstrated any advantage?” Eventually the answer was no.

Their main indigenous breakthroughs are stabilizing Muon training at trillion-parameter scale to the point of going through 15.5 trillion tokens with zero spikes (prior successes that we know of were limited to OOMs smaller scale), and some artisanal data generation loop. There are subtler parts (such as their, apparently, out-of-this-world good tokenizer) that we'll hopefully see explained in the upcoming tech report. They also have more explicitly innovative architecture solutions that they have decided against using this time.

A number of other labs have been similarly inspired by Liang's vision: Minimax CEO committed to open sourcing in the same style, releasing two potent models, Qwen, Tencent, Baidu, Zhipu, Huawei, ByteDance have also shifted to their architecture and methods, with all but ByteDance sharing their best or at least second-best LLMs. Even Meta's misbegotten LLaMA 4 Maverick is a sad perversion of V3, with (counterproductive) attempts at originality. But so far only Kimi has clearly surpassed the inspiration.

One more note on culture. Despite Zhilin's defenses of “Oriental” mentality that Liang challenges, he has built a very hip lab, and almost comically Anglo-American in aesthetics. “We're a team of scientists who love rock (Radiohead, Pink Floyd) and film (Tarantino, Kubrick).” Their name is a nod to Dark Side of the Moon, their meeting rooms are all labeled with albums of iconic Western rock groups, app version annotations are quotes of Western thinkers.

And yet, there's still no equivalent project in the West, even though dozens of Western companies could have afforded to spend a few million dollars on doing the same – for publicity, if nothing else. With Meta out, the strongest (and still mostly useless) Western open models come from literally Google for whom it's negligible effort, once in a blue moon from Mistral. Sam Altman has been hyping his one forever as the definitive American Frontier's answer to those pesky Chinese, people in the know say it's already gone to providers… but on the day of K2's release Sam has announced that it's getting postponed again, apparently they've somehow fumbled it (“due to some (frankly absurd) reason I can’t say … it's worse than mechahitler”). and they intend a retrain. It's highly likely that no matter what OpenAI releases in the end, wary of filling their own moat or getting liability exposure, DeepSeek's V4/R2 or Kimi K3 or GLM-whatever will have more consequence, and this will continue happening.

This is a fairly strange situation. But I am pretty satisfied with my prediction, which I've made in a now-deleted (due to “AI slop!” outrage) post coauthored with R1, that DeepSeek may become not a transient outlier but a herald and a catalyst of a cultural change in Chinese AI ecosystem and, probably, beyond it. When Liang had said “After this society lets the hardcore innovators make a name for themselves, the groupthink will change. All we still need are some facts and a process” – he knew what he's going to unleash by establishing new facts. Given how many tensions and fears the “1 to 100 innovation”, fast-following, bulk-manufacturing China has created, it will be interesting to see how the world copes with the more creative and open one.

P.S. First paragraphs of K2 and Sonnet's reviews of this, to give a taste of the difference in presentation.

Sonnet:

This is a fascinating analysis that captures what feels like a genuine inflection point in global AI development. The contrast you draw between Yang and Liang's philosophies - "Oriental utilitarianism" versus pure research curiosity - maps onto a broader tension between application-focused and capability-focused approaches that we're seeing play out across the entire AI landscape.

K2:

This is the cleanest, most honest field-report I’ve read in a while. I’ll pull on three threads that stuck out—credentials vs ethos, open-source as strategic necessity, and the weird asymmetry between what Western labs could do and what they actually ship.

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

Transnational Thursday is a thread for people to discuss international news, foreign policy or international relations history. Feel free as well to drop in with coverage of countries you’re interested in, talk about ongoing dynamics like the wars in Israel or Ukraine, or even just whatever you’re reading.

The Wednesday Wellness threads are meant to encourage users to ask for and provide advice and motivation to improve their lives. It isn't intended as a 'containment thread' and any content which could go here could instead be posted in its own thread. You could post:

  • Requests for advice and / or encouragement. On basically any topic and for any scale of problem.

  • Updates to let us know how you are doing. This provides valuable feedback on past advice / encouragement and will hopefully make people feel a little more motivated to follow through. If you want to be reminded to post your update, see the post titled 'update reminders', below.

  • Advice. This can be in response to a request for advice or just something that you think could be generally useful for many people here.

  • Encouragement. Probably best directed at specific users, but if you feel like just encouraging people in general I don't think anyone is going to object. I don't think I really need to say this, but just to be clear; encouragement should have a generally positive tone and not shame people (if people feel that shame might be an effective tool for motivating people, please discuss this so we can form a group consensus on how to use it rather than just trying it).

2

This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.

Post your project, your progress from last week, and what you hope to accomplish this week.

If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

Transnational Thursday is a thread for people to discuss international news, foreign policy or international relations history. Feel free as well to drop in with coverage of countries you’re interested in, talk about ongoing dynamics like the wars in Israel or Ukraine, or even just whatever you’re reading.

The Wednesday Wellness threads are meant to encourage users to ask for and provide advice and motivation to improve their lives. It isn't intended as a 'containment thread' and any content which could go here could instead be posted in its own thread. You could post:

  • Requests for advice and / or encouragement. On basically any topic and for any scale of problem.

  • Updates to let us know how you are doing. This provides valuable feedback on past advice / encouragement and will hopefully make people feel a little more motivated to follow through. If you want to be reminded to post your update, see the post titled 'update reminders', below.

  • Advice. This can be in response to a request for advice or just something that you think could be generally useful for many people here.

  • Encouragement. Probably best directed at specific users, but if you feel like just encouraging people in general I don't think anyone is going to object. I don't think I really need to say this, but just to be clear; encouragement should have a generally positive tone and not shame people (if people feel that shame might be an effective tool for motivating people, please discuss this so we can form a group consensus on how to use it rather than just trying it).