site banner

Culture War Roundup for the week of July 14, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

5
Jump in the discussion.

No email address required.

Periodic Open-Source AI Update: Kimi K2 and China's Cultural Shift

(yes yes another post about AI, sorry about that). Link above is to the standalone thread, to not clutter this one.

Two days ago a small Chinese startup Moonshot AI has released weights of the base and instruct versions of Kimi K2, the first open (and probably closed too) Chinese LLM to clearly surpass DeepSeek's efforts. It's roughly comparable to Claude Sonnet 4 without thinking (pay no mind to the horde of reasoners at the top of the leaderboard, this is a cheap-ish capability extension and doesn't convey the experience, though is relevant to utility). It's a primarily agentic non-reasoner, somehow exceptionally good at creative writing, and offers a distinct "slop-free", disagreeable but pretty fun conversation, with the downside of hallucinations. It adopts DeepSeek-V3’s architecture wholesale (literally "modeling_deepseek.DeepseekV3ForCausalLM"), with a number of tricks gets maybe 2-3 times as much effective compute out of the same allowance of GPU-hours, and the rest we don't know yet because they've just finished a six-months marathon and don't have a tech report.

I posit that this follows a cultural shift in China’s AI ecosystem that I've been chronicling for a while, and provides a nice illustration by contrast. Moonshot and DeepSeek were founded at the same time, have near-identical scale and resources but have been built on different visions. DeepSeek’s Liang Wengeng (hedge fund CEO with Masters in engineering, idealist, open-source advocate) couldn't procure funding in the Chinese VC world with his inane pitch of “long-termist AGI research driven by curiosity” or whatever. Moonshot’s Yang Zhilin (Carnegie Mellon Ph,D, serial entrepreneur, pragmatist) succeeded at that task, got to peak $3,3 valuation with the help of Alibaba and Sequoia, and was heavily spending on ads and traffic acquisition throughout 2024, building a nucleus of another super-app with chatbot companions, assistants and such trivialities at a comfortable pace. However, DeepSeek R1, on merit of vastly stronger model, has been a breakout success and redefined Chinese AI scene, making people question the point of startups like Kimi. Post-R1, Zhilin pivoted hard to prioritize R&D spending and core model quality over apps, adopting open weights as a forcing function for basic progress. This seems to have inspired the technical staff: "Only regret: we weren’t the ones who walked [DeepSeek’s] path."

Other Chinese labs (Qwen, Minimax, Tencent, etc.) now also emulate this open, capability-focused strategy. Meanwhile, Western open-source efforts are even more disappointing than last year – Meta’s LLaMA 4 failed, OpenAI’s model is delayed again, and only Google/Mistral release sporadically, with no promises of competitive results.

This validates my [deleted] prediction: DeepSeek wasn’t an outlier but the first swallow and catalyst of China’s transition from fast-following to open innovation. I think Liang’s vision – "After hardcore innovators make a name, groupthink will change" – is unfolding, and this is a nice point to take stock of the situation.

Minor technical nitpick: I think something "open source" is carrying a bunch of connotations which do not apply to LLMs. It is a bit like if I called a CC-BY-SA photograph "open source".

To the degree that LLMs are like traditional software, the source code -- the human-readable inputs which decide what a program does -- would be a neural network framework plus the training data (most of which is crawled/pirated rather than open source licensed).

Compiling would be the process of training.

In normal open source software, almost all of the effort goes into creating the code base. Compilation is basically free, and you compile your code a zillion times in the process of building your codebase. With LLMs, training is really expensive. Nobody downloads your sources, everybody just takes your binary, the weights.

With a normal open source project, you can easily git clone the sources and compile. If you run into a problem or need the program to do something differently, you just edit the sources and compile again, and if you think your changes might be generally useful, you make a pull request upstream to start the process of getting them into the official version.

With LLMs what you git clone are giant inscrutable matrices. If you are really good, you might be able to tweak the weights a bit so that the LLM will talk about the Golden Gate Bridge all the time. But this is a gimmick, not a general improvement. If you want to actually make the model more useful for purposes you have in common with others, you need RLHF, which is computationally expensive again.

This is an important difference between how traditional open source software interacts with the users and how "open source" LLMs interact with the users. I would thus propose to use the name "open weights" for LLMs, which carries none of the connotations of "users will contribute bug fixes".

I mean it's certainly possible to release your training code as well as the resulting weights for an LLM -- now I'm curious as to whether this company is actually doing that or not?

If not, agreed that "OS" is a big misnomer here -- there are certainly lots of individuals floating around who might like to train their own version of this and could afford to do so (FIRE startup retirees spring to mind) and "you can use our weights" is quite different from "you can try to make improvements on our process". More like free beer than free speech.

There are tiers to this, from just weights release to full data+code+weights. Chinese labs mostly release weights and tech report with a reproducible (given some effort) recipe, sometimes code, rarely some or all of the data (more often parts of post-training data, though in these cases it's typically just links to datasets that have already been open).

I think nitpicking about open source is uninteresting when the recipe is available. This is a very dynamic field of applied science, rather than labor-intensive programming exercise. The volume of novel code in a given LLM project is comparable to a modest Emacs package, what matters is ideas (derisked at scale). Specific implementations are usually not that valuable – DeepSeek's GRPO, as described in their papers, has been improved upon in the open multiple times by this point. Data composition is dependent on your own needs and interests, there are vast open datasets, just filter them as you see fit.

Can someone explain to me why these companies are open sourcing their models? Developing/training this stuff seems enormously costly, what’s the business case for just giving it away?

I literally cite Kimi's own arguments for open source:

[…] 3. Why Open Source

#1: Reputation. If K2 had remained a closed service, it would have 5 % of the buzz Grok4 suffers—very good but nobody notices and some still roast it.

#2: Community velocity. Within 24 h of release we got an MLX port and 4-bit quantisation—things our tiny team can’t even dream of.

#3: It sets a higher technical bar. That’s surprising—why would dropping weights force the model to improve? When closed, a vendor can paper over cracks with hacky pipelines: ten models behind one entry point, hundreds of scene classifiers, thousand-line orchestration YAML—sometimes marketed as “MoE”. Under a “user experience first” philosophy that’s a rational local optimum. But it’s not AGI. Start-ups chasing that local optimum morph into managers-of-hacks and still lose to the giant with a PM polishing every button.

Kimi the start-up cannot win that game. Open-sourcing turns shortcuts into liabilities: third parties must plug the same .safetensors into run_py() and get the paper numbers. You’re forced to make the model itself solid; the gimmicks die. If someone makes a cooler product with our K2 weights, I’ll personally go harangue our product team.

DeepSeek's arguments are more ideological and cultural:

For technologists, being followed is a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one. To give is to receive glory. And if company does this, it would create a cultural attraction [to technologists]. […]

plus stuff about accelerating the development of Chinese ecosystem.

High-level researchers are not slaves or menial workers, they have massive pride, they want to publish and gain clout. You can pay them hundreds of millions to get over that, or you can let them publish. Open sourcing is the ultimate form of publishing your work.

There's a big breakdown here, but my summary take --

Business case:

  • Models aren't useful products on their own; there's too much competition, too shallow a moat, and few buyers have the skillset and equipment to use a model themselves. Runtime with effective models is where these businesses expect to make their money, made more convenient by their familiarity with optimal operation and tuning of their own models, and by the giant sack of GPUs that they happen to have sitting available.
  • This is especially true where (as now) model creators don't have a good understanding of all or even a large portion of use cases for a model. Where exposing an API, as increasingly many western LLM-makers are doing, limits you to prompt engineering, an open source model can be rapidly tuned or modified in a pretty wide variety of ways. You can't necessarily learn from everything someone else has done with an open-source model, or even what they've done without breaking the license, but you can learn a lot.
  • Businesses producing open-source models can attract specialized workers, not just in being skilled, but in having a very specific type of ideology, similar to how linux (or rust) devs tend to be weird in useful ways.
  • 'Sticky' open-source licenses have the additional benefits of allowing most innovations by other smart people to filter back in. (In more legally-minded jurisdictions, they also put down beartraps to other developers that would love to borrow a great implementation without complying with the license.)
  • (Cynically, they can only succeed with government backing, and open sourcing a model makes them politically indispensable.)

Philosophical arguments:

  • Open-sourcing a model is Better for what it allows; interaction with academic communities, rapid iteration, so on. A business that emphasizes these topics might not be the most remunerative, but it'll be better at its actual goal.
  • (Optimistically, some devs want to get to the endgame of AGI/ASI as soon as possible, and see the API business model as distracting from that even if it does work.)

Pragmatic argument:

  • The final models fit on a single thumb drive. It's not clear any company running this sort of thing can seriously prevent leaks over a long enough time for it to be relevant. There's an argument that China is more vulnerable to this sort of unofficial espionage, but we've also had significant leaks from Llama, Midjourney, etc.

It's not clear any company running this sort of thing can seriously prevent leaks over a long enough time for it to be relevant.

With sufficient will, they could do just this. This is a choice they actively make one way or another.

Not if you want to keep highly skilled researchers and programmers working for you as it would mean locking down the systems so hard that it makes daily work a chore and the sorts of people you need for that level of work hate working under such restrictions.

Yeah, I have a friend who works in a very sensitive area of banking and it’s a nightmare:

  • Four layers of security before he can get to his desk
  • Everything on the computer is absolutely locked down and the software is rubbish as is the authentication system
  • Constant surveillance from cameras absolutely everywhere

I think other stuff too but I forget the details.

Let’s just say that I used to work as a developer for a large German company. Note past tense.

For the first year I could only install new software or drivers by making an official request and having the Indian IT support do it via remote access. You can imagine how well this worked for embedded systems development where you regularly need to use some new piece of external hardware or random IC manufacturer’s legacy software tool.

The best part? All of the daily work could have been done on a rando, burner laptop as email, team chat, source repositories and build system were all in external cloud services. Literally the only things I actually needed intranet access was to put my hours into the SAP system once per month.

Even in finance the logic is that it’s always impossible to prevent a willing employee from committing crime and leaking sensitive information, monitoring systems are just set up so that if and when it happens (1) they can trace it to source and (2) convince the regulator they did everything they could and reported it as soon as possible.

And also to minimise the scale of the breach, right? It's bad if an employee tells me that BigCorp and BiggerCorp are expected to finalise their merger by May, but it's worse if they give me 2000 pages of detail on the subject including all the due diligence on both parties.

  1. Commoditizing their complements. This is particularly true of Meta, which wanted to use Llama to undercut competitors like OpenAI. Meta doesn't need their models to be profitable, that's not the core of their company. But OAI? Without people willing to pay for access to their models (or if they're able to clone them cheaper and run them elsewhere), they'd be utterly screwed.

  2. Gaining market and mind share, and either finding other ways to monetize (consulting or fine-tuning services), going paid and closed-source, or using the hype and investor confidence to raise money.

  3. Attracting talented researchers, who often want recognition and the right to publish their research instead of having it all be locked down internal IP.

  4. Actual ideological commitment. Or at least lip-service to help with points 2 and 3.

This touches on something I've been wondering about for a while: Do all of these qualitative updates to LLMs actually translate to new use cases? In my case, the only two updates that have had any significant impact on my LLM use were the jump from ChatGpt3.5 to 4, and the increase of the context window from small to essentially limitless (yes it still has limits, but in day to day use I rarely hit them). Both of those happened in 2023. Since then, LLM tooling has become vastly better. But I struggle to think of anything that I can do now with an LLM that I couldn't have done in 2023 based purely on the quality of the LLM output.

Coding has greatly improved. Vibe-coding in 2023 was a bleak experience, one could hardly get anything done. In 2025 it's easy.

Error rates have fallen drastically, and I'm someone who has regularly benefited from context windows becoming OOMs larger than the best 2023 had to offer.

I know specific questions, in programming and maths most obviously, but also in medicine, where I wouldn't trust the output of a 2023 model, but where I'd be rather confident in a 2025 one being correct.

Reasoning models are also far better at task adherence and thinking logically. Agents are still less than ideal today, but they were borderline useless in 2023.

Other very nice QOL features include image and file input and generation, artifacts, voice conversations etc. If I had to go back to a 2023 GPT-4, I'd be pissed.

The quality of legal advice current LLMs give is miles better than what you could get in 2023. It's still not perfect but now it's at the point where you need to have a decent idea of the field to understand where it goes wrong compared to back then when an intelligent layman with Google would have been able to point out the errors.

"Wrong but you can't tell how wrong it is" seems to be worse than "wrong and you can immediately tell that it is".

If it's the AI thread, what do you think about diffusion models for text?

I am skeptical about diffusion even for images and video, the whole subfield is a giant nerd snipe for mathcels. Autoregression is strictly more expressive and more suitable for continual generation, sadly we pay the price of worse parallelization. If anything, I'd be more enthusiastic about honest-to-God Energy-based LLMs. There have been a series of breakthroughs in making DLLMs that don't totally suck and offer extreme speeds in low batch size regime, but eh. I think sparse attention Transformers will crush them.

I'm not Dase, alas, but I want to say that I was profoundly surprised that Diffusion as a technique even works at all for text generation, at least text that maintains long-term coherence. I'm utterly bamboozled.

I'm just waiting for DeepSeek R2. Not happy with the delay and while R1-05-28 is pretty damn good it isn't at the very top. K2 is non-thinking which means that while an excellent base model it isn't the best of the best when quality rather than speed matters.

There probably isn't a delay, plans to ship it in May to capitalize on the hype were entirely hallucinated by jo*rnalists as far as I can tell. It might take many months yet.

yes yes another post about AI, sorry about that

Feel that AGI baby!

It's obvious what the trends are. I predict that, on the midnight before ASI, the Motte's going to be 124% AI commentary. It might even be AI doing the commentary.

It's a primarily agentic non-reasoner

I have read claims that it's a pseudo-reasoner, and it was trained on COT traces and had RL done even if it doesn't use explicit reasoning tokens itself. I've also heard that it's 3x as verbose as most NRLLMs, almost on par with RLMMs, making the distinction academic. This was on Twitter, and I don't have links handy. I'm not sure how strongly to index on that.

It's not really verbose in normal use, rather the opposite. It is verbose in agentic mode, writing docs and commentary rather than thinking traces. RL has been used for all serious LLMs since GPT 3-instruct, this is independent of the current long-CoT paradigm. It is dubious that Kimi has been trained on CoTs because it doesn't do them. More likely, its training data is largely final outputs of a reasoner (like Kimi's own 1.5/1.6). They have a section in the paper on 1.5 about penalizing verbosity.

Thank you. I will clarify that by RL, I don't mean bog-standard RLHF, but more recent techniques like RLVR that have been around since o1.

I was worried when I saw that it had a custom license, but it turns out it's just a slightly modified MIT license, requiring credit if your project gets big enough. This truly is open source.

I remain of the opinion that it is likely (but not guaranteed) that courts will find "training models" to not be a sufficiently creative endeavour to merit copyright protection. "Throwing a bunch of data into the GPU blender and doing massive least squares" isn't IMO more creative than scanning a painting, compressing the works of Shakespeare with gzip, or having a monkey press the camera shutter.

Well, I don't really understand American law but it seems to me that Anthropic has set the precedent of LLM pretraining corpora being essentially immune to copyright claims. Anthropic's models are, ironically, the most paranoid about reproducing copyrighted material.

The Anthropic case there is focused on "Is it a copyright violation to train models on copyrighted data without licensed distribution?", which is an interesting question, but my comment is more on the separate "Is the resulting model I've trained something I can claim copyright over?" question.

Sorry, misunderstood you. I don't think we've seen anyone seriously defend having stolen or distilled someone's model. My bet is the precedent will depend on who/whom and lawyer muscle rather than fundamentals of the situation.

The closest I'm aware of is the nominal academic license of Facebook's llama models that seems to have been largely ignored once they were out in the wild. At the time, Meta was trailing a bit, and it probably helped their mindshare overall, but they didn't bring any court cases that I'm aware of either.

OTOH, it’s established that collections of data (such as phonebooks) can be copyrighted. None of the individual data items are under copyright, but the collection itself is.

A lot of the grognards over on HN don't think it counts, but they're the type who wouldn't accept blowjobs in heaven if the angels weren't Apache licensed.

As one such grognard I think the idea was sound but that it's poorly worded for the long term and opens one to lawsuits about the more vague components (what does prominent mean?)

Likely inconsequential in the long run, but still bad practice. Even as I perfectly understand why they did it.