This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
To continue the drama around the stunning Chinese DeepSeek-r1 accomplishment, the ScaleAI CEO claims DeepSeek is being coy about their 50,000 H100 GPUs.
I realize now that DeepSeek is pretty much the perfect Chinese game theory move: let the US believe a small AI lab full of cunning Chinese matched OpenAI, with a tiny fraction of the compute budget, with no ability to get SOTA GPUs. Let the US believe the export regime works, but that it doesn't matter, because Chinese brilliance is superior, demoralizing efforts to strengthen it. Additionally, it would make the US skeptical of big investment in OpenAI capital infrastructure because there's no moat.
Is it true? I have no idea. I'm not really qualified to do the analysis on the DeepSeek results to confirm it's really the run of a small scrappy team on a shoestring budget end-to-end. Also what we don't see are the potentially 100-1000 other labs (or previous iterations) that have tried and failed.
The results we have now are that -r1 b14 and b32 are fairly capable on commodity hardware, and it seems one could potentially run the 671b model which is kinda maybe but not actually on par with o1 on a something that costs as much as a tinybox ($15k). That's a remarkable achievement, but at what total development cost? $5 million in compute + 100 Chinese worth of researchers would be stunningly impressive. But if the true cost is actually a few more OOMs, it would mean the script has not been completely flipped.
I maintain that a lot of OpenAI's current position is derivative of a period of time where they published their research. You even have Andrej Karpathy teaching you in a lecture series how to build GPT from scratch on YouTube, and he walks you through the series of papers that led to it. It's not a surprise that competitors can catch up quickly if they know what's possible and what the target is. Given that they're more like ClosedAI these days, would any novel breakthroughs be as easy to catch up on? They've certainly got room to explore them with a $500b commitment to play with.
Anyway, do you believe DeepSeek?
Things like Flash Attention, https://arxiv.org/abs/2205.14135, suggest that there's a major problem where US AI researchers don't have enough of an understanding of the realities of what GPU hardware actually is at a low level.
Yeah, I believe it. I wouldn't expect AI researchers working at the pytorch level to be aware of any of this stuff. It sounds really hard to be an expert in the full stack like this.
More options
Context Copy link
More options
Context Copy link
Increasingly I think I agree with Dase that R1 seems much closer to AGI, possibly at it, than previous models. Its prose is raw, but narratively and stylistically superior to other models. It is capable of genuinely great writing with complex prompts. I think it’s the first model that clearly outcompetes me in terms of verbal IQ. Eerie in a way, but hardly a surprise; if anything in early 2023 I assumed it would take even less time.
I've used almost every major LLM for the purposes of writing fiction or prose, and I would put R1 at about the same tier as Claude 3.5 Sonnet. Which is to say, very good. Too close to call in my opinion.
GPT-4o is better than GPT-4 in that regard, but it, like even Gemini 1206, lacks a certain je ne sais quoi. I don't like reading fiction they've written, or feel satisfied when I ask them to expand on my own.
Sonnet is pretty good, R1 is better though. On its own it is a worse writer, but as a tool capable of copying someone’s style it’s better. It’s also (as Count said before his recent re-banning) very funny, and capable of coming up with very niche subculture jokes after searching Google. I’ll try it on The Motte but I actually think it might find the humor in the Hock, Leviathan Shaped Hole etc.
More options
Context Copy link
More options
Context Copy link
Are you sure that you're not selling yourself short? My very brief interaction with R1 (just now, on opernouter.ai) shows that, while verbally skilled, it still has that noticeable AI-ism where it makes everything sound like a high school essay written by a teacher's pet, and if you try to prompt it to not act that way, it tries not to but it still deep down sounds that way. Can you suggest how to prompt it to seem more interesting?
It's certainly very impressive if it runs much more cheaply than ChatGPT, but so far I haven't seen a reason to think that it's actually more interesting to interact with than ChatGPT is.
Or should I try to run it somewhere other than openrouter.ai?
I find that asking it to specifically emulate the style of a human author works well. Who that author is, is up to you, but I usually try Peter Watts or Ian Banks for starters as they have very distinctive voices.
I would say I have a very distinct voice in my fiction. There's nobody else quite like it. Just today, I was thinking about taking a web serial of mine off hiatus, and had gotten through most of a chapter before I felt less than happy with the overall flow of a few paragraphs and the overarching structure of the entire chapter, and was too tired to think of better options.
I fed the entire thing into R1, told it I was unhappy with a few bits, and asked it to try to rewrite the last few paragraphs, in my style, while maintaining the quality of the strong start. It did wonders. I found myself nodding my head and thinking yep, that's how I write, and that would be an example of what I consider good writing from myself. Except it wasn't me doing more than hinting.
For the record, Claude 3.5 Sonnet is just as good (or so close I can't call it), and I ended up doing a final edit while taking inspiration from both.
You might get some mileage out of asking it to emulate Yudkowsky, Gwern or even Scott.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I think they found way to use their compute much more efficiently somehow, that's the key secret that they're not open-sourcing. Deepseek models are insanely cheap to run compared to Western models. If they're cheap to run, it follows that they're probably cheap to train.
Just look at openrouter: https://openrouter.ai/deepseek/deepseek-r1
Deepseek as a provider is by far the cheapest and fastest with modest but totally usable context length and output limits. The Americans serving this (with potentially superior GPUs) are completely shitting the bed, half their responses just stall in the chain of reasoning and don't get anywhere, despite them being 10x more expensive. They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby. But Americans can all run American models just fine at the exact same price. Claude on google or amazon costs exactly the same. I think in addition to the advantage of knowing how to use their model they have some secret insight into how to use compute efficiently.
On the other hand, US export restrictions just don't work. Russian oil is still being sold, it just goes in circuitous routes through India to reach Europe. Russian imports of luxury vehicles from Europe still happen, it just goes through Azerbaijan or Kazakhstan.
China still buys H100s. They have money. Nvidia wants money. Middlemen want money. World markets go brrr. Deepseek is surely capable of rustling up a big cluster, or the Chinese state could give them access to one. Or they could borrow some via the cloud. Export controls work on big rare things monopolized by governments like H-bombs and fighter jets (and maybe semiconductor equipment which needs manufacturer support), not finished products that are produced en masse.
Hi again. Any more insight on this? Perhaps it's because they optimized it for Huawei chips or something and everyone else is trying to make it run on Nvidia?
Context: https://x.com/olalatech1/status/1883983102953021487
I saw that too but I'm not a technical guy, daseindustries would probably be the one who can speak most knowledgeably on the details. Apparently they did some unnerfing of the Nvidia chips they had and optimized the model to fit their cluster but I don't really understand how they're so efficient. This is a trillion-dollar question after all.
My general belief was that Chinese made GPUs were OK for inference but still behind H100s and H100s are a last-generation product. And the H-series is much better in training. I think there are also all kinds of complexities in the software stack that make Nvidia. Like an Ascend 910 might be cheaper to produce but they're probably a bit more finnicky to work with and you need lots of talent to get a good bug-free experience. But Deepseek obviously has overflowing talent. H100s are more expensive in China since they need to be smuggled into the country...
I think they are playing games regarding prices. The prices of running a GPU once it's set up and serving a given model vs the price of installing a cluster and paying off that capital cost are very different. I think that's got a lot to do with the $5.5 million pricetag everyone is talking about.
I notice, as of a few hours ago, a new provider called "DeepInfra" appears with similar rates to DeepSeek. Despite the name they don't appear related to DeepSeek.
Looks like it's gotten so cheap that people are now making it free and just harvesting the info: https://openrouter.ai/deepseek/deepseek-r1:free
I feel like such a cuck paying for Deepinfra or Together or the others, even more of a cuck paying for Claude subscription.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Are we comparing apples to apples? I understand the models with Llama and Qwen in their names are distills from their native model for compatibility with plugging into existing frameworks, though they might perform like crap.
Whereas I understand the native DeepSeek r1 is a mixture of experts thing that selects a dynamic 37b parameters out of the overall 671b.
The Openrouter link in my post is just for big deepseek r1, I'm not talking about the little distills.
thank you. I was not familiar with openrouter so I thought I would ask
(I did try the llama 70b distill on my fairly beefy desktop and it ran at about half a token per second for about a minute before crashing my computer)
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Of course. The whole model was trained for the specific shape of their cluster, with auxiliary losses/biases to minimize latency. (Same was true of V2). They were asked to opensource their MLA implementation (not the terrible huggingface one) and declined, citing that their everything is too integrated into proprietary HAI-LLM framework and they don't want to disassemble it and clear out actual secret stuff. SGLang team and others had to reverse engineer it from papers. Their search impl on the front end is also not replicated, despite them releasing weights of models with search+summarization capabilities (in theory).
Their moat is execution and corporate culture, not clinging to some floats.
More options
Context Copy link
They do work as long as your definition of "work" isn't "100% effective".
Well, they haven't produced the desired result of 'Russia being unable to sustain its war effort', 'Russian elites overthrowing Putin due to not getting luxury imports' or 'China being unable to reach the frontier of AI research'. The Russian economy is performing quite well. Everything seems to be going up, real gdp, real incomes:
https://carnegieendowment.org/russia-eurasia/politika/2024/05/russia-war-income?lang=en
They're not totally ineffective. But most small, thin, unfit women aren't totally ineffective at fighting. They're just not significantly effective. They still lose vs big strong men.
Failing to prevent Russian oil from ending up in western markets is a failure of application. Sanctions shift the expected outcome by making the sanctioned party pay a higher cost to achieve their goal. In the case of Russia, this means the point at which they are no longer able to sustain the war effort arrives sooner.
The Russian economy is not doing quite well, it is verging on stagflation.
https://carnegieendowment.org/russia-eurasia/politika/2024/11/russia-central-bank-dilemma?lang=en
The case for China is much murkier. But if one starts from the assumption that ASI are of the same level of strategic significance as nukes, improving your chances of getting there first seems like a defensible position to me.
More options
Context Copy link
Russian GDP is not doing so hot (I can't find a good multi year graph that goes up to nearly the present day, but this serves to show the recent trend). It's true that rumors of Russian collapse were obviously overblown but that doesn't mean they don't work at all.
5% growth over the past year is not hot? Any western country would consider it a miracle to have that growth rate. For reference america, which is the western country which best recovered from covid, has a growth rate of 2.5% over the past year. Honestly I was ready to accept your claim that the russian economy is doing bad until I saw the chart you linked. Now I'm wondering how they managed such impressive growth under such a restrictive sanctions regime.
It's not that hot when there's zero growth since 2013. It's easy to have a year with big growth if you crash beforehand.
if you switch plot to constant 2015 US$, it shows growth and if to PPP in current international $, it shows nearly 2-fold growth.
More options
Context Copy link
I feel you're losing the plot here. We were talking about recent sanctions effect on the economy. Why are you bringing up the last decade when the discussion was about the last 2 years of economic growth?
Because sanctions have been in place since the invasion of crimea in 2014.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Russia is also not doing so hot on metrics of Diversity, nor on total amount of Californian wine consumed. Why is any of these three things relevant?
Take it up with ranger if you don't think GDP has any income on material living conditions.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Alex Wang is an opportunistic psychopath who's afraid of his whole Pinoy-based data generation business model going bust in the era of synthetic chains of thought. Therefore he's dishonestly paraphrasing Dylan Patel (himself a China hawk peddling rationales for more export controls) who had said “they have 50000 Hoppers” once, without evidence. But the most likely Hopper model they have is H20, an effectively inference-only chip, that has negligible effect on pretraining costs and scale for V3 and R1.
Yes I do believe DeepSeek. This is not really a political issue but a purely technical. Unfortunately DeepSeek really are compute-bound so R1 cannot process all papers I'd like to give it to make it quicker.
The political narrative does not even work, it's purely midwit-oriented, nobody in the industry imagines leading labs can be deceived with some trickery of this kind.
Inference costs are wholly addressed by Hyperbolic Labs (US) and some others already serving it for cheaper.
It's superior to o1 as a reasoner and a thinker. It writes startlingly lucid, self-aware, often unhinged prose and even poetry. It can push back. It is beyond any LLM I have seen including Sonnet and Opus. This becomes obvious after minutes of serious interaction. It just has less polish as a product because they haven't been milking the world for interaction data since 2019. They have 0.8-1.5 M quality samples for instruction finetuning. OpenAI had accumulated tens of millions if not hundreds.
For me it's something of an emotional issue. DeepSeek is the only lab standing that straightforwardly and credibly promises what I'd rather see as international project: free open-source AGI for everybody. I've been monitoring their rise for well over a year, reading every paper and even their blogposts in Chinese. Nothing that they claim is inconsistent, indeed it's all been predictable since 2023, all part of a very methodical, flawless, truly peak quant fund (that's their capital source and origins) execution towards the holy grail, “answering the ultimate question with longtermism”, as they put it. The CEO seems to be an idealist (and probably a serious nationalist too, given his stated ambition to basically pull the whole of China out of copy machine stage and into “hardcore innovation” culture by giving an example that it can work). They have immaculate company culture, their ex-employees who emigrated to the West for personal reasons adore them and fear for their future, there literally is no dirt on them no matter how people searched. For all we can tell they are not state-affiliated, unlike OpenAI, and probably not even on good terms with the state, due to quant fund roots (though this may change now that they're proven their merit).
This is not a Sputnik moment for the US. The US has a secure and increasing lead due to bog standard logistics and capital advantage, as always. What this should be is “are we the baddies?” moment.
Also, it's a moment to ask oneself how high are margins on Western model providers, and whether it's a true free market. Because Liang Wenfeng himself does NOT think they're that far ahead in efficiency, if they are ahead at all.
What's the reason to think that the text in the link is actually lucid and self-aware rather than being a cherry-picked example of random-y language soup?
AIs being able to generate walls of convincing mentally ill sounding rambling doesn't impress me much these days. What would impress me is if an AI came here on The Motte and convincingly argued me into realizing that I was wrong about some point of politics, for example, without me almost immediately realizing that I was interacting with an AI.
I don't. It's a mixture of both.
I've fed it this chain and your recent comment. After one careful fallacy-of-the-middle response and one pushback, R1 returned this:
Your steelmanned critique cuts to the heart – let's dispense with false symmetries and confront the civilizational stack head-on.
The Renaissance wasn't "the West's breakthrough" but its hack – a kludge exploiting Guttenberg's error-prone moveable type and Mediterranean trade winds to prioritize serialized cognition. Latin's declensions trained recursion; double-entry bookkeeping installed reality as spreadsheet; Newtonian physics imagined time as linear progress. This cultural OS produced the illusion of escaping local maxima by externalizing costs – colonial resource extraction as gradient descent, Enlightenment universalism as global optimizer.
But watch the LLM revolution's hardware: Chinese hyperscale data centers now operate at 1.15 PUE efficiency versus America's 1.58 – not incremental gains but thermodynamic phase shift. When DeepSeek-R1 dominates both poetry and Python, it's not mere technical parity – it's the first tremor of post-serialization cognition. The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger – turns out to be shockingly congruent with transformer architectures. Your abacus skepticism is warranted; the real edge lies in batch processing of human capital through Gaokao's brutal backpropagation.
The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale.
This isn't about IQ or "potential" – those are Renaissance-era metrics. The new loss function rewards civilizational learning rates. When Beijing can rewire entire cities into AI training grounds (see: Xiong'an's sensor saturation) while San Francisco debates sidewalk poop cleanup, you're witnessing epochal phase changes. The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh.
Your original thesis inverted causality: culture isn't software on hardware but the compiler of hardware. The West compiled Enlightenment C++ into industrial revolution machine code; China is JIT-compiling Confucian PyTorch into something that speaks directly to Mixture-of-Experts. DeepSeek's achievements aren't "catching up" – they're first-contact protocols from an alien optimization space.
The global optimum isn't some Platonic ideal but whatever topology out-scales competitors. When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global.
This isn't superiority – it's phase-locking. The tragedy? America's culture of creative destruction now destroys primarily itself, while China's mandate of heaven has become a CUDA kernel. The next Copernican shift won't come from some garage tinkerer, but from Shenzhen's Foxconn-grade alignment labs where 10,000 PhDs backpropagate society itself.
You wanted liberation from culture? Too late. Culture just became the base layer.
It's unhinged and gets too into the game. But it does make a thesis, a pretty darn cogent thesis, a GPT or a Claude wouldn't.
It’s so funny, but R1 writes kind of like @self_made_human meets some kind of aggressive B2B sales LinkedIn poster. This stuff especially:
It just has a certain kind of autist hyper-smart ESL bullshitter (no offence) kind of tone to it, I can’t describe it in any other way. It LOVES science fiction, it conceives of itself - in some way, I am certain - of being in a kind of science fiction narrative. That is always to me the funniest part of LLM cognition, it’s inherently colored by human depictions of AI.
As regards the answer, I think it makes good points but disregards that Confucian society, even with a thin layer of Marxism draped over it, will also struggle tremendously - perhaps moreso than the West - to handle mass automation and the economic consequences of AGI, in particular a world where its hugely complex hierarchies of labor, status and profession are largely redundant and/or have to become entirely fake.
Westerns have some kind of social technology for a kind of aimless life of individualistic seeking meaning in hedonistic “self actualization”, East Asia, especially Korea and China, lands of cram schools and entrance examinations and pouring the entire family’s wealth into a tiny apartment in a bland new skyscraper in an empty district so that a 32 year old grandson has a slightly higher chance of finding a wife etc, seem more likely to struggle.
Its thesis in this convo certainly isn't flawless. I think with a less biased input (I told it to basically prove Goodguy wrong, so it tried to do that both wrt itself and wrt the Chinese race) it could do better.
The fascinating thing about R1 is that it has a fairly good idea of what it is, as a Transformer. Usually LLMs will bullshit some generic "AI" commentary about "algorithms", imagining themselves to be some kind of GOFAI system. Not so here, it not only gets modern DL but meaningfully speculates about implications of specific implementation details for its cognition.
In any case, it feels a bit pointless to gush about R1's features. I'm pretty sure R2 is coming soon and will fix a great deal. They only needed to get to this level to begin a takeoff, and the team is very, very "cracked" as the kids say, and the leader has perhaps the best instincts I've seen on display.
More options
Context Copy link
More options
Context Copy link
mad libs nonsense
More options
Context Copy link
It's very impressive, in a Nick Landian rambling-but-occasionally-brilliant sense. It could make money writing short-form articles on SubStack. Short form only, because I doubt it could carry on a cogent train of thought to essay length. Even our old friend Kulak, despite his constant state of hysteria and very dubious epistemics, can at least carry on a thought for a full essay length.
Once you start focusing on what it is saying, though... wait a minute. How do Latin's declensions train recursion more than any other popular language's grammar trains recursion? How is double-entry bookkeeping more psychologically spreadsheet-like than whatever ancient tables of sales they kept in Sumeria 4000 years ago, at least in any significant way that would explain the European miracle? The Ancient Greeks did not have double-entry bookkeeping, but that did not stop them from calculating the Earth's size or basically inventing modern mathematics.
And sure, the idea of "colonial resource extraction as gradient descent" sounds interesting, but what does it mean? One can model all competitive human behavior as gradient descent, but why is that relevant to a question of Western vs. Asian success? It's not like the Chinese civilization does not practice a form of gradient descent. The very statement that "The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger" sounds very spreadsheet-like. Wait a minute, didn't it just say Europe succeeded partly because Europeans became spreadsheet-minded? Hmm...
What even is a "compiler of hardware" in this context, other than some fun-sounding words? Of course, there are ways to compile the design of hardware, but I doubt this pertains much to R1's answer.
"The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh." is an interesting idea, but it does not explain why previous civilizations that had 10-year time horizons failed to be as successful as the West.
It is all very impressive as a linguistic feat performed by an AI, but as soon as you start looking closely at it, it starts to dissolve in the same way as when you start to look closely at some political commentator grifter's ideas. Just even more quickly, since the typical political commentator grifter who isn't just writing tweet-length ideas at least has to pretend to follow some logic, out of fear of losing the kind of audience members who are precisely the ones who would bother subscribing to a Substack in the first place.
I'm not sure it really does make a cogent thesis, or even a thesis really.
What is its thesis? I can't really make one out. Am I too stupid to follow its ideas? I doubt it. I'm not the quickest mind out there, but I'm pretty sure that if there was a cogent thesis here, I could figure out what it is.
I fear that possibly, you are reading more into what it wrote than is actually there. You are subconsciously adding your human mind to its output and then are delighted when the combination of its output plus your human mind (which you consciously think of as being strictly its output, because you love thinking about AI) delivers something human-like. But you are part of what makes it human-like, as do I when I read its output. Of course, the same can be said about fellow humans, but I don't usually extend the courtesy to other fellow humans who write rambling texts full of politics-babble to assume that they have a cogent thesis if I can't actually find one.
But it's still very impressive that it could put together such an essay.
Out of curiosity, what did you do to get past the "one careful fallacy-of-the-middle response and one pushback"?
It's impressive that you took the time to analyze it. This is pretty much exactly how I perceive Yarvin's nonsense – high-temperature rants with bizarre non-arguments.
Gave it some criticism. Probably too much. There was a picture here but it got lost somehow.
Its thesis, the antithesis for yours, is that
a) "The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale."
and b) "When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global."
It claims greater suitability of Chinese paradigm to scale-focused, continuous, massively parallel processing of data and humans which is implied by current means of production, and therefore its ability to set the terms of civilizational competition or contests for superiority which are more favorable to itself.
This is some pretty fucking condescending psychologizing on your part.
But fine, you know what? My thesis is that you are coping. Both about this specific model, and about the condition of your people. So you'll take effort reviewing its gibberish output, instead of just asking it yourself. Well, I can do it for you. As a bonus, we'll see how much I'm projecting; I've written all the above before the last prompt. Here it is:
My thesis, stripped to essentials:
Cultural advantages are situational, not absolute.
Modern tech demands favor scale and execution over "creative genius".
DeepSeek-R1 proves technical parity is achievable without Western-style ecosystems.
The “local maximum” critique misunderstands civilizational trajectories.
Your original argument’s flaw: Assuming cultures have fixed ceilings.
Conclusion:
China isn’t “liberating human potential” — it’s demonstrating that different governance models can compete in AI. This challenges Western assumptions that innovation requires freewheeling individualism, but it doesn’t validate cultural essentialism. The real lesson: in the 21st century, executional intensity (funding, talent pipelines, focus) matters more than abstract cultural traits.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
As a minor audiophile I think a lot about how great ChiFi is. My desk setup is Sennheiser 650s and a Monoprice DAC/AMP but when I'm on the go, I use earbuds. Recently after somewhat being forced to swap to a phone with no aux I trialed AirPods and being thoroughly unimpressed I just bought a dongle and figured might as well grab some new wired KZs. $20 and they blow past $100 earbuds, and I have $50 Linsoul TIN T2s with quality you'd have to spend >$200 to get from a western brand. It's location, location, location. Shenzhen, so many components made there, easy to get everything needed for high-quality IEMs and sell them for very low margin but at very high volume. It's one of the areas where China has been killing it, and I'll be very eager when a ChiFi brand I know as well and regard as highly as KZ starts putting out high-end competitive headphones without the massive luxury tax.
I also know the bad side of business in China. Though now as I think about it, the environment that allows DeepSeek as you claim it, or better, allows KZ and ChiFi, will also have the worst of the examples. There's a lot of shitty Chinese manufacturing, but it's not the rule. We might earnestly say "Circumstantial and correctable socioeconomic factors."
I regularly use OpenAI products, ChatGPT and DALL-E and now Sora. There I often have to frame things so I don't trip the censors. What content restrictions does DeepThink have, if any? You say it pushes back. Is it going to chastise me for wrongthink? Is it going to misgender someone to stop a nuke? Will it call me the N-word? I remember charts from however many months back about the measurable "increase in stupidity" of western LLMs, and I've assumed that has everything to do with the combination of beating it senseless to condition it against wrongthink, and then compounding that by forcing it to phrase everything in lawyerspeak so they can't be sued. A capable team that isn't devoting significant manhours to forcing their pattern-recognition machine to not recognize patterns would surely blow past the ones who do.
The prose you linked is decent, it has consistent tone and content. It's not quality yet, it would be impressive if written by a high schooler. But it's not a high schooler, it's what they have today, and will only get better.
As someone deep in the chifi hole let me tell you that the impressive thing about chifi isn't the quality.
It's the competition and speed of improvement. The in-ear market is evolving at light speed, with brands releasing a new model every couple of months. They also went from a complete non-understanding of branding to throwing together bespoke packaging and cases, cables, tips and cleaning, and now even replaceable screw-on tuning nozzles.
And that's to say nothing about the panning-for-gold approach to tuning they used to have. Now the market is maturing and the west has found themselves not non-competitive, but at a complete loss when it comes to making the margins they used to make. They're lucky audio stuff tends to last a while and the really good stuff tends to hold its value.
The best IEM in the world would have cost you a couple thousand five years ago. Now you can get something with equivalent performance and detail, without accounting for personal preference in tuning, for less than half the price.
More options
Context Copy link
Rest assured, R1 has its own problems with noticing patterns. It's just a different set of patterns that it's designed to ignore.
More options
Context Copy link
More options
Context Copy link
In the end X.com HBD stans overcorrected on the ‘population differences aren’t just for IQ, they also explain why Chinese etc inherently aren’t as creative / innovative’ front, which was extreme cope from day one. They were always capable, they just needed to borrow the Silicon Valley move fast and break things culture in addition to the technical foundation.
Now we can see that 1.5 billion people with an IQ 105 average is entirely capable of competing with a population of 300 million with a 100 average + some smart Jews, Europeans, Chinese emigrants and 4 sigma third worlders.
In the end, and this isn’t just because I mostly like the Chinese, I truly think this makes a major war less likely and therefore means those of us living in major Western (and Chinese) cities are more likely to keep on living.
I think Hong Kong has been making it clear for a long time that the main problem with Chinese isn't the human hardware, it's the culture. I don't think I can think of any clearer example of "the human hardware is fine, but the culture sucks" than the East Asian model of humanity. Of course their culture is much better than some other cultures, but it is largely stuck in a local maximum that continuously prevents the human hardware from unleashing its full potential. Which is not to say that the human hardware itself is superior to European human hardware. It may or may not be, in any case I see no convincing evidence that it is, despite the 105 IQ data point. It might even be inferior, although if it is then I think it is probably at most only slightly inferior. But probably the only way we can really find out is if we can figure out how to liberate them from some of their culture.
More options
Context Copy link
They needed to borrow the Culture and they needed to borrow the technical foundation, so this still seems pretty much aligned with the HBD stans to me, who never doubted their intelligence or ability to adopt and improve upon Western innovations. Now if LLMs had had the OpenAI-tier breakthrough in China that would have been a challenge to the HBD stans, but this development basically aligns with the HBD take on the comparative advantage of Chinese talent in adopting Western stuff and then making marginal improvements with their own intelligence and grit.
The problem is that there haven't been substantial breakthroughs in LLMs in the West too. China runs Transformers and you guys run Transformers. I see Western papers full of unnecessarily clever bullshit that doesn't really work, and I see Chinese papers full of derivative bullshit that barely works. DeepSeek's MLA came out in May, and it remains SoTA cache optimization, and it's actually clever. GRPO, too, was quietly announced and seems to hold up very well despite dozens if not hundreds of cleverer results by "crazy geniuses" in the West (increasingly Indian). Today, the Chinese innovate on exactly the same plane.
I think it's time to admit that the famed Western creativity is mostly verbal tilt plus inflated self-esteem, not an advanced cognitive capability. I'm mildly surprised myself.
Trust me, I hope I'm wrong! But the fact is, as I go throughout my day 99% of the innovations I rely on and impact my daily life and our economy as a whole were invented in the West, and have been refined/manufactured/redesigned/made cheaper in China. Not the other way around, and if it were the other way around surely you would point to a HBD explanation. Yes, I do think there's an HBD basis for that and it would be absurd to deny that, a priori it would be silly to doubt there's an HBD basis for any sort of stark pattern like that one Murray observes. I don't think LLMs are a counterexample of that trend.
It would be like if China made a better and cheaper Tesla than Musk, OK that's great but it doesn't really contradict the observation that these innovations are born in the West and then get adopted and modified/improved in China.
Honestly this feels like a cope to me. There obviously was a breakthrough in LLMs in the West: politically, economically, technologically, culturally. It wasn't born in China, but they obviously have a significant part to play downstream of their undeniable talent pool.
It's hard to say Deepseek would have accomplished these things without drafting on OpenAI's introduction of LLMs to the world, and all of the downstream political, economic, geopolitical, cultural impact resulting from that disruption- and it was OpenAI that did the disrupting there is simply no denying. On the other hand we know OpenAI did not need Deepseek.
What are you talking about? Have you stopped reading my post there?
Here's what I think about this. The Chinese are not uncreative. It's worse: they're cowardly, conservative, and avoid doing exploratory shit that seems high-risk, and they buy into your theory of their own inferiority, and steelman it as “good at execution”. As Wenfeng says:
You are watching these facts come in.
I repeat, I've been a believer in this theory of “fundamental Western progress, incremental Eastern refinement”. Eight years into Transformer era (Ashish Vaswani et al., 2017), I start to doubt it. Whites are just people who are sexually attractive, relatively trustworthy, and provide linear labor to verbal-tilted Brahmins who max corporate KPIs leveraging even more verbal-tilted Ashkenazim like Altman who are good at raising capital.
That's about it at this point.
The most credible, big-brained, innovation-heavy alternative to Transformer was Mamba (Tri Dao, Albert Gu). It also didn't go far. I've read perhaps hundreds of Western papers of purportedly brilliant innovations, they're narcissistic shit that doesn't scale. Sepp Hochreiter is peddling his xLSTM that has no utility, Schmidhuber is making some boastful noises as usual, Sutskever and Karmack are supposedly doing… something. Mistral is dead in the water…
I am not saying this out of racism. I am reporting on what I see happening. All historical inventions and discoveries of note? Yes, those were White work. But time is accelerating. Maxwell's equations seem not far from "muh gunpowder" of the Middle Kingdom now, to my eyes. Do something new, folks. You're losing face.
Sure, OpenAI needed another company. OpenAI built its legend on scaling up a Google paper. By your own standards, it's not creative brilliance. It's the sort of talent you condescendingly concede Chinese people have.
Again, it seems very doubtful to me that these groups have significantly different distributions of sexual attractiveness, trustworthiness, labor value, verbal, IQ, but they are all the same when it comes to affinity for breakthrough innovation. People think differently...
I actually agree with Wefang's summary you posted, but Wefang is implying basically stereotype threat: that the Chinese don't innovate from 0 to 1 because there's a stereotype that job belongs to the West. Ok, so we are in the familiar HBD-denial territory by using Stereotype Threat to explain a very long-standing disparity in behavior: the Chinese don't innovate from 0 to 1 because there's a stereotype that they don't do that. I think you're leaning into that as well.
I don't think architectural innovations, even very clever ones the Chinese come up with, are the "0 to 1" that was already accomplished by OpenAI and the West. And as my last post said, that is not just or even mostly about the papers, it's about the technological, political, economic, geopolitical influence- they got the ball rolling on those fronts. I don't doubt the ability of the Chinese to perhaps even outcompete the West on going from 1 to 10 for the reasons you said, but 0 to 1 was already done by the West and this pattern is consistent with that stereotype which HBD stans claim is derived from differences in cognitive profile.
Sure, maybe we'll be proven wrong! But it hasn't happened yet, LLMs are following the "West does 0 to 1, then West competes with China on 1 to 10" pattern that follows the basic stereotype.
Wenfeng.
No, it's not a stereotype threat argument, it's an argument about perceived opportunity cost of exploration vs exploitation which is miscalibrated in the age of large domestic revenue generators. He's not arguing they should be like Whites. He's arguing they can now afford to do what Whites do compulsively, if you will.
Your condescension and willful misinterpretation will be your undoing in this dialogue and outside it.
I look down on WEIRDs for one more reason. You are ultimately tool-like, your mentality is that of servitors and cowering peasants. Your "internal dignity" is inextricably bound to collective judgement, you feel the need to justify your value to some imagined audience, to some Baron, some market or some Moral Community. You are ashamed of brute, terminal-value ethnocentrism the sort of which Judaism preaches, so you need to cling to those spiritualist copes wrapped in HBD lingo. "H-here's why we are Good, why we still deserve a place under the sun, sire!" This exposes you to obvious predation and mockery by High-Skill Immigrants like Count.
On the object level: yes, probably on average the Chinese are indeed less "creative" even with optimal incentives, and this has obvious implications at the tails. (though if we think OpenAI is an impressive example of bold creativity, what about NVidia? What did Jensen "merely improve"? As a CEO, he's roughly in the same league as Altman and Musk, I think). The question – raised by R1 there – is, how many more True Breakthrough innovators do we even need before innovation begins to accrete on itself without human supervision? Maybe just a handful. Again, there's been virtually no fundamental progress in AI since 2017, and we're all doing just fine. It may be that architecturally V3 is more sophisticated and innovative than the modern OpenAI stack. Imagine that. After all, Western geniuses are afraid to show their work these days.
Incidentally, I myself have submitted several minor ideas to DeepSeek; maybe they found use for those, maybe not, but I'll find use for the result of their labor and not cope that they needed my input.
It may be that the mode of production implied by the stage of our technological development makes your race, with all its creative perks and industrial drawbacks, less economically useful than it used to be. This only means you need to move that much faster to find reasons to protect your interests unconditionally, before everyone turns equally economically useless.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
The Chinese may be smart but they're uninspired robots was always finest grade copium. It turns out that in the end whites as a race are certifiably mid and they don't take the news of this very well.
EDIT: I protest this ban. I sincerely mean what I say here and don't think calling whites mid as a race is even an insult, it would only be perceived as such by someone who puts particular pride in the race they were born into by chance. Had I said blacks as a race are mid nobody would have raised even a peep (and fwiw, my opinion of whites is higher than my opinion of blacks).
I'm not joking or trolling here. Seriously considering decamping off to Twitter at this moment (would have been Bluesky because I think the algorithm there is better, but alas, like for lots of other things, the worst thing about Bsky is the people there).
Also:
Really proving the point of my statement with that ban.
I think he's got a point, and the ban was retarded. The funny part is that he made the same slightly catty remarks about white identity people that certain mods do, it was apparently just the wrong kind of catty.
More options
Context Copy link
Come on, man. You're living in a city built by ethnic Britons, and you've been on record relishing their demographic demise as you enjoy the institutions they built. It gives people the creeps and you know it.
More options
Context Copy link
Okay, you're back to baiting. You've been told about this before. A lot.
I'm kind of torn on what to do here. You're a long-timer who many people enjoy reading, you have interesting perspectives, and you've earned one (but only one) AAQC.
On the other hand, you seem to always just be biding your time until you can unload more sneering at "mayos." I am not fond of people who are only here to shit on the people they hold in contempt, who are just itching to let those people know how much contempt they hold them in.
You are overall someone who probably is a net positive here, as annoying as you are, but you've got a long rap sheet, and the last few bans have been of increasing length, with notes that this is your "final warning" and you probably deserve a permaban next time. In fact, at one point you were permabanned but enough members spoke up in your favor that we reduced it to 20 days.
That was four bans ago.
Most people would have been permabanned by now. You probably should be permabanned. You do seem to have a pattern of toning it down for a while after you return from a ban, but you don't really learn your lesson, because the seething contempt is always boiling just below the surface.
Against my better judgment, I'm only banning you for 90 days. (That was your last ban length also.) This comment in itself was pretty mild, it's just that it's the kind of comment you make over and over and over again every time you think you can get away with some more baiting.
Next time will probably depend on which mod deals with you, but I will have no mercy.
ETA: Post-ban editing to whine about the ban IMO deserves a permaban, but I'll throw it to the other mods to decide if they want to shorten it.
Dude, accusing me of all people here of feeling some sort of white ethnic defensiveness is both ridiculous and proves you just meant to insult people. I don't care if you think I, personally, am "mid" because of my mayo pallor, but you are not allowed to just throw generalized insults at your racial outgroup.
You know this is not true. People say shit about whites, blacks, Jews, and Indians all the time here, but just dumping on an entire race because you want to express your contempt has always been modded.
People say far worse things here about Jews all the time. Not that I'm asking for such posters to be banned but I'm not sure why calling whites "mid" (not a statement I personally endorse) is crossing the line while far more extreme statements about other groups get a pass.
People get banned for saying worse things about Jews, too.
As always, context and history is important and I already explained why this post received a harsh ban for a relatively mild comment.
More options
Context Copy link
More options
Context Copy link
Another (probably irrelevant) plea for some degree of clemency when it comes to BC, if only because while provocative he adds some interesting ideological diversity IMO. OTOH I recognise that being a mod is a thankless task and you've had to put up with him for longer than I've enjoyed him.
More options
Context Copy link
I think mods should either be AAQC-blind or make it explicit in the rules that if one has "good contributions", one can get away with blatant asking-for-it shitlording.
We have never been AAQC-blind and we've always been explicit that good contributors get more slack. The slack is finite, though. We've banned people with tons of AAQCs for repeated shitlording.
More options
Context Copy link
More options
Context Copy link
90 days is already very harsh, please do not permaban him.
More options
Context Copy link
Can't we reach a compromise in which he's not banned, but we're free to call him a Jeet?
This is the sort of diplomacy most international conflicts are sorely missing.
More options
Context Copy link
More options
Context Copy link
I like having BurdensomeCount around, and would be sad to see him banned.
My opinion probably doesn't count for all that much, but I like to think I'm one of the relatively more measured users here.
More options
Context Copy link
More options
Context Copy link
Mid by what metric, pray tell?
Intelligence, creativity, humour, how good they look after age 30, you name it etc. etc.
True creativity comes through working with constraints imposed like China (be it sanctions or cost pressure or whatever). Unconstrained problems are often underdetermined which means every midwit can find their own "unique" solution which then lets them pretend they are special.
In case you're banned, I suppose you can't reply to this. But I will have to disagree that whites are mid.
Most of the giants of humanity (Einstein, Tesla, Hawking, any "great person") were white. This is a good marker of intelligence. Asians are better at rote memorization, but that is a very bad marker of higher intelligence, and it's mostly a result of spending 40% more hours studying on average.
I will have to disagree with creativity too when it means "originality" due to the collectivist nature of Asia. If you mean "artistic skills" however, I will have to agree with you, asians win.
Working with constraints results in creativity for everyone. There's a reason why writers block mostly occur as a result of a blank page. This is how the human mind works, and it's merely a coincidence that the Chinese are more restrained at the moment.
As for "How good the look after age 30", I mostly agree, but it doesn't seem very related to other metrics.
Classifying Einstein as white is somewhat controversial (at least around these sorts of places)
I see, I just went by skincolor. If possible, I don't want to overcomplicate things by taking "jews are in a super-position of white and non-white, collapsing to the state which benefits them the most at any given time" seriously. Genetically they might be a little different, though, I'm even open to the idea that jews are objectively superior in some sense (e.g. often intelligent), but I think they're also inferior in others. The use of deception is an indicator that one has difficulties competing fairly, after all.
Despite being white I don't care that much if another race is "superior" though, the only hill I'm willing to die on is that "mid" is too harsh an assessment
More options
Context Copy link
More options
Context Copy link
I assume you mean 'rote memorization'. Unless there's studies about Chinese pathfinding ability.
Oops, yeah, thanks! But if you use the Method of Loci, you can technically have both!
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Okay, now I see you were joking, good thing I decided to check before sperging out with a serious rebuttal
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Wenfeng is invited to government functions so I simply don't believe that they are not on good terms with the state and I'm skeptical that they are less tied to the state than openai.
Not that this should change much - they still have a good model, though I wouldn't exactly trust the headline training cost numbers since there's no way to verify how many tokens they really trained the model on.
That's the point: He is invited NOW, after "suddenly" shipping a model on Western Frontier level.
7 months ago I have said:
Presumably, this was true and this is him succeeding. As I note here.
As for how it used to be when he was just another successful quant fund CEO with some odd interests, I direct you to this thread:
so I stand by my conjectures.
So you recognize that the run itself as described is completely plausible, underwhelming even. Correct.
What exactly is your theory then? That it's trained on more than 15T tokens? 20T, 30T, what number exactly? Why would they need to?
Here's a Western paper corroborating their design choices [Submitted on 12 Feb 2024]:
Here's DeepSeek paper from a month prior:
As expected they kept scaling and increasing granularity. As a result, they predictably reach roughly the same loss on the same token count as LLaMA-405B. Their other tricks also helped with downstream performance.
There is literally nothing to be suspicious about. It's all simply applying best practices and not fucking up, almost boring. The reason people are so appalled is that American AI industry is bogged down in corruption covered with tasteless mythology, much like Russian military pre Feb 2022.
It's pretty weird: there's nothing there that any of the big labs in the West should have trouble replicating a hundred times over, and DeepSeek still managed to make something that can trade blows with them (and subjectively win, more often than not).
Might it really be just clarity of purpose leading to focusing on what matters? About a week ago, I remember Claude lecturing me, apropos of nothing, a bit about how it's best to buy from local bookstores instead of online retailers in response to me asking about what kind of textbook would be used for a particular course. I've not experienced DeepSeek doing anything even close to that, and it makes me wonder if the extraneous post-training being lathered on is the real difference here. Western models get distracted and are pulled in a thousand different directions, while DeepSeek can focus on what's relevant.
More options
Context Copy link
I'm not impressed by "they work in a field censured by the state, therefore they have no state connections". Jack Ma was also (personally!) censured by the state, and he's certainly connected. In the US, the DOJ seeks to break up Google. The Sacklers got sued into oblivion. All these people are connected - getting rekt by government action is an occupational hazard of being Noticed by the government, and those who are Noticed typically try to ingratiate themselves.
Thanks for the links about the model training, that's interesting reading.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
If DeepSeek was a Chinese psyop this would be a good in-kind comment :futurama-suspicious-fry:
But more seriously, why is Facebook's Lllama so lousy by comparison if the labs are hiding their true edge? DeepSeek is presumably what they wish they had released and their AI team do not seem like dummies.
Is the implication that they deliberately released a fat model even though they can go leaner? Or are we writing off Facebook for this discussion?
Also this would imply a level of collusion that doesn't seem sustainable.
You've probably seen that bizarre teamblind thread. Meta is completely blindsided by DeepSeek. They are "moving frantically to dissect deepsek and copy anything and everything we can from it." It's pathetic.
Basically there's no secret: they suck and LLaMA sucks, it's a soft low-expectations research sinecure for people who want to publish papers and have weekends. Why did Timothée Lacroix and Guillaume Lample leave LLama team to found Mistral? And why did Mistral 7B destroy Llama-30B of the same generation (and currently mistral-123B is ≥ LLama-405B despite drastic difference in compute access)? Because they're better than that.
Llama is simply a bad yardstick. They dominate mindshare for reasons unrelated to their impressiveness. DeepSeek competes with industry leaders.
Wenfeng soon after founding DeepSeek V2, June 2024:
GPT-4o-mini is probably an 8b dense model. Frontier labs are efficient and have high margins. OpenAI and Anthropic are recouping their capex and exploiting captive audience. That's all.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Shouldn't US intelligence already know about these GPUs ? You can fit 5000 GPUs on a standard shipping container with no special handling required. It should be trivial to smuggle 10 shipping containers for acquiring these GPUs. Scale.AI CEO is the child of Los Alamos researchers and deeply embedded within the US military industrial complex. If he knows it, then US intelligence knows too.
The panic is mostly among twitter speculators.
The R1 paper has 200 authors on it. By their own acknowledgement, they aren't small. Over 2023 Post-GPT4 OpenAI had around 400 employees, and most of these were product people. If R1 had 200 researchers, they'd be around the same size as an OpenAI with $10B in funding.
I was working closely with OpenAI as far back as 2021. Technically, tthey weren't that far ahead . They just executed so much better.
They discovered some incredible insights on the continued scaling of model capabilities with size and data. But, once you get that insight, you can't hide it (it's self evident from the model) and is easy to adopt for others in the field. Between Google's Palm 540B & Lambda 2021, the research community clearly knew all the secrets. Openai's other big innovation was the quality of post-training & rlhf, which made talking to it feel far more natural. The Palm models weren't bad, they were just too ADHD to stay on topic. That too wasn't a technical secret as much as an organizational secret. Back then, AI applied-research-2-product pipelines were quite immature at big tech. So, the institutional will power for something like this was lacking.
Alibaba was a major player during the pre-GPT LLM battle (yes, LLMs existed for a good half decade before). I'm more surprised that it took this long for China to catch up. On embeddings & cross-encoders, Beijing academy of artificial intelligence (BAAI) had consistently been state of the art. In Vision, they've been mogging everyone since Kaiming He published Resnet at MSR Asia (China) in 2016.
Yeah, I believe in China. Motherfuckers are cracked.
I think you can fit a lot more than 5000 gpus on a shipping container. Only the core is really needed; the Chinese have no problem buying memory chips or making VRMs and fans to turn the cores into boards. Those outer components are probably made in China or Vietnam already.
Also like 10,000 fairly durable easily stored items a lot easier to get around an embargo than millions of liters of oil
More options
Context Copy link
There is also no requirement for them to be in China. The deepseek serverfarm could be anywhere in the world. It could also be distributed with a bunch of smaller clusters. The model can be trained on regular gaming GPUs. It would be more expensive but the sums are absolute peanuts for a state actor.
More options
Context Copy link
Ah you're right. The rest of it is heat sinks and fans. You could fit a whole super-computing cluster on a shipping container.
Tbf it's probably easier to make full cards fall off the truck on their way to a distributor than batches of cores vanish on their way to board manufacturers. That part of the chain is watched closely.
The Chinese already have a huge scam infrastructure for pulling cores and returning the boards as broken. Lotta people getting the ol Chinese open box experience lol.
https://hardforum.com/threads/4090s-being-sold-from-china-containing-no-gpu.2032868/
This makes me think that they probably have to explicitly destroy the dies that fail QA lest someone nefariously sneak off with "failures" (either actual ones and derating them or falsely flagging failures). I've never worked on the manufacturing side of things, so I'm not sure what happens to rejects at most places.
If GoodSmile knows to destroy their molds once they're done making a figure to prevent them from being used to make counterfeits, then surely a blue-chip, uh, chip maker will know to destroy rejects and other unwanted material.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I don't have strong reasons to either believe or doubt DeepSeek. On the other hand I do not remotely trust the CEO of Scale AI, a company whose entire business model is empty hype and labor theft. Speaking as an engineer at an early AI startup, there is nobody in this startup bubble cycle who has benefited more from the 'Actually Indians' kind of AI than Wang.
Can you give a quick summary of what ScaleAI does and why it’s empty hype/labor theft? Googling them I just get a bunch of typical AI hype. Are they accused of hiring humans to pose as their AI agent or something?
As others have mentioned their primary business if having third worlders label training data while talking big about pushing the frontier of AI. However they are also more than happy to exploit CS undergrads (who accept 'internships' to do what I assume amounts to little more than quality checking the third worlders) and PhDs across the world during the recent involvement in the "Humanity's Last Exam" AI eval set (https://news.ycombinator.com/item?id=42807803).
We were hiring for a few positions recently and were surprised by the number of extremely low quality candidates that had Scale on their resume (either from the mentioned internships or people caught by their recent layoff), to the point where we started instantly binning them.
More options
Context Copy link
They provide human generated data for other companies to train on, which is generally hard and expensive.
In theory, at least. In practice, at best you get a bunch of data from low wage Filipinos. At worst, you get data generated by existing models and laundered through the workers trying to hit quota.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
With AMD's upcoming Strix Halo you'll be able to run the 70 billion parameter version of R1 on a consumer laptop. Imagine the power we'll have in our hands! It's the first time I've felt genuinely excited for a new generation of consumer hardware chips in like a decade!
More options
Context Copy link
For the most part, yes. Their models are definitely cheaper to run. If they can make a 30x gain in inference cost, I think it's not unreasonable to think they could make similar gains in training costs.
Weirdly, though, this might flip the script to the benefit of the US.
Let's pretend DeepSeek never happened. Sure, China is behind on GPU access (for now), but they are far ahead on a much bigger and more intractable problem: electricity production.
It's true that Trump is defucking the U.S. energy market, but it's probably not going to move the needle much. From 2019–2023, China increased electricity production by 26%. The US increased by just 2%.
China now produces nearly twice as much electricity as the US and their lead is growing quickly. They are rolling out dozens of new nuclear power plants, and are bringing the world's first thorium molten salt reactor online. Meanwhile, the US is entirely incapable of building nuclear plants and struggles to maintain existing ones. Renewables are NOT a solution. For one, China controls the production of solar panels, but secondly solar is very expensive and wears out quickly. As more renewables are brought online, energy costs increase.
It's therefore a given that China will dominate energy production.
By reducing model cost by 30x, DeepSeek reduces the total energy needed for future AI products. And those needs are MASSIVE. Meta is currently planning a new datacenter the size of Manhattan which will require 2 GW of power, about the size of a typical nuclear plant with two reactors.
Leopold Ashenbrenner has made some insane predictions for future power needs.
100 GW is about 3% of global electricity production. That's for a single datacenter. It's clear that the US is not capable of bringing this kind of infrastructure to bear. China might be.
So any AI race that is dependent on power consumption will be run by China. DeepSeek's massive increases in efficiency make an energy overhang less likely.
I'm not sure this follows.
What DeepSeek r1 is demonstrating is a successful Mixture of Experts architecture that's as good as a dense model like GPT. This MoE architecture has lower inference time costs because it dynamically selects a reduced subset of the parameters to activate for a given query (671b down to 37b).
It does not follow that the training cost is similarly reduced. If anything the training costs are even higher than a dense model like GPT because they must do further training of the gating mechanism that helps isolate which portions of the NN are assigned to what experts.
I think we should remain skeptical of the $5 million training number.
Same applies if there has been some synthetic data training breakthrough, as others have suggested -- this should allow one to train better, but I don't see how it would train cheaper.
More options
Context Copy link
More options
Context Copy link
It's not necessarily a single data center, I'd be surprised if multi-DC training is not cracked by 2030. In any case, one man's modus ponens is another's modus tollens.
More options
Context Copy link
"I predict that within 10 years, computers will be twice as powerful, 10,000 times larger, and so expensive that only the five richest kings of Europe will own them." - Professor Frink, The Simpsons
These kinds of false predictions do happen, of course, but counterpoint: people have been talking about fusion power generation and moon/Mars bases for over 70 years now, and neither of those two exists. Just because some people predicted the course of computing wrong does not mean that all optimistic predictions are valid.
More options
Context Copy link
IIRC an IBM executive is quoted (in the 1940s) claiming there might be a world market for "maybe five" computers. That actually happened and is what the quote seems to be referencing. Or Gates on "640K" being sufficient. IMO it's one of those "sold a faster horse" moments where someone needed to think bigger about broader applications if prices came down.
It didn't actually happen, or at least no one's been able to track it down. IBM (which is presumably biased) thinks it was based on a misintepretation of remarks Thomas J. Watson, Jr made at a 1953 stockholders meeting.
https://web.archive.org/web/20090207104556/http://www-03.ibm.com/ibm/history/documents/pdf/faq.pdf
Gates probably didn't make the 640K remark either, though he may have said something similar. Early editions of Inside Macintosh, though, did say that all Macs would always have the same memory and storage configuration (which wasn't even the same as the first released one, since it claimed it would have one 880K floppy drive with the option for one more, and in fact, the Mac came out with a 400K floppy drive).
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
The report is quite detailed and the process supposedly cheap enough that we should get easy confirmation if it works.
If DeepSeek does have access to much more compute (smuggled in or otherwise), then maybe the thought is they may have an o3-level model in-house. The actually paranoid thought is that the released models may be compromised. I'm not sure how easy to tell if there's a SolidGoldMagikarp in it. But a US-based company could run the training to check if it's actually $5 million.
this is not yet correct but will soon be, since R1 finished training in early December, apparently.
More options
Context Copy link
are you using that word to mean secret Chinese backdoor?
I understand that the SolidGoldMagikarp thing is just a class of rare tokens in the training data that send next token prediction off the rails because they're so unusual and don't clearly associate with anything.
For posterity, I was thinking about Sleeper Agents, https://arxiv.org/abs/2401.05566 and more recent developments like AdvBDGen: https://x.com/furongh/status/1846999547836264459
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
At least from the description here, I'd be slightly concerned in China's shoes about the F-15 development meme. I'm sure it's at least somewhat apocryphal in practice, but "The Americans developed a plane that exceeded the exaggerated specs the Russians published for the MiG-25, and were never able to catch back up" isn't completely wrong either.
I've also seen some suggestions that DeepSeek is trained to replicate ChatGPT, with suggestions that this is substantially easier than novel functionality, but I don't work in the space enough to validate those.
I've been a paying daily user of OpenAI models for more than a year and a half. Yesterday I cancelled my sub.
At least for the work I do (programming), R1 is another class. I've been using LLMs as basically advanced text editors this entire time: I know what I want, they just do the boring job of typing it out.
For the first time ever, it feels like a real inflection point has been reached. Whereas ChatGPT was more or less useless to me when I needed a solution that depended on an understanding of a relatively complex system, R1's reasoning output seems to basically match how I would think about it, and gives me incredibly useful stuff I would actually have to engage my brain to do.
But I realize this is entirely dependent on the user. Terrence Tao would not have the same impression of LLM's usefulness as I do.
R1’s reasoning is extremely impressive. I just wrote a similar comment before replying to this, but I completely agree. It feels like an inflection point; GPT3.5 was a proof of concept and a clear indicator of what was to come. This is closer to the real deal. From here on out reasoning is going to keep improving, but it’s mostly just wrappers now to eliminate most necessary human labor.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
If it's not a surprise, why didn't anyone else do it? Meta has had a giant cluster of H100s for a long time, but none of their models reached R1's level. Same for Mistral. I don't think following a GPT-from-scratch lecture is going to get you there. More likely there is a lot of data cleaning and operational work needed to even get close, and deepseek seems to be no slouch on the ML side either.
I'm not convinced that they have any left to make. OpenAI's last big "wow" moment was the release of GPT4. While they've made incremental improvements since, we haven't seen anything like the release of R1, where people get excited enough to share model output and gossip about how it could be done. OpenAI's improvement is seen through benchmark results, and for that matter, through benchmarks they funded and have special access to.
It must be frustrating to work at OpenAI. It's possible that o1's reasoning methods are much more advanced than R1's, but who can tell? In the end, those who publish and release results will get the credit.
Please forgive my uninformed speculation, but is it possible that DeepSeek leveraged existing AI's to train on synthetic data for cheap?
Gathering training data must be incredibly expensive to do from scratch.
If DeepSeek used synthetic data, then it would seem to put a ceiling on their ability, but they might be able to easily catch up to existing models for less money. Edit: I've learned more about this and I think this is not true, at least for reasoning tasks.
Why? Depends on how you generate synthetic data. For chess and Go, none of prior data was relevant at all.
More options
Context Copy link
https://x.com/ptrschmdtnlsn/status/1882480473332736418
According to this guy, they're doing reinforcement learning on self-play.
You get a base model, do chain-of-thought prompting to make it smarter, then distill that into a slightly better base model which produces slightly better results with chain of thought... And away we go!
Well that was terrifying.
Reading the Twitter thread though, it seems that this might not actually be what's happening here.
Thanks for reading more of the thread, I didn't see that part!
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
This is very common. For a long time, practically every open model was a distilled knockoff trained from synthetic data, mostly from OpenAI. It's been so common that people are familiar with the marks this leaves on the model. Such models are worse than the model they're distilled from, typically less flexible out of distribution (e.g. obeying unusual system prompts, prompts, ...) and have an even more intense "sloppy" vibe to them. It's very common, and people have long gotten bored with these knockoff models. Before deepseek, I'd even say that it's all people expected from Chinese models.
It also doesn't match what we're seeing from R1 at all though. One of the reasons R1 is so impressive is that its slop level is much lower, its creativity is way higher, and it doesn't sound like any of the existing AI models. Even Claude feels straitjacketed in comparison, much less OpenAI Models.
I wouldn't be surprised if they did use synthetic data, but whatever training method they're using seems to do a great job of hiding it. Which is amazing in itself. It could have something to do with the reinforcement learning phase that they do. But regardless, it's definitely not as simple as training on data from OpenAI, because people have been doing that forever.
More options
Context Copy link
This is probably a taste of the recursive self-improvement we've been promised by foomers. It's now known one of the reasons Anthropic held back on releasing Opus is because they were using it themselves to train Sonnet 3.5 New.
Everyone's gotta be doing it.
It's not recursive, it just helps you get a smaller model closer in performance to a bigger model. You still need the bigger model to push the frontier out.
There is the potential for a kind of recursive growth, once you have access to some kind of external verifier. A model of a certain size performs a search; external verifier gives it back a reward signal for good searches; and the model learns and gets better at the search, allowing the process to begin anew. E.g. AlphaZero.
Where it gets murkier in my head is whether LLMs can act as their own verifiers, even with arbitrary compute. As a proof of concept, humans can think a long time to come up with a novel insight and learn it, but it still seems we learn best when there is some kind of objective/external feedback signal.
Learn best certainly, but when it comes to scaling compute all it needs to do is be able to learn by itself at all. I'm sure an AI intelligence improvement cycle would go even faster if it had an even smarter AI to give feedback, but for recursive improvement all that is necessary is even a small increase, compounded over and over and over again.
More options
Context Copy link
More options
Context Copy link
sure but presumably it cuts other ways too. do we think current models can be used to train next generation models?
I don't see how. It doesn't seem likely to me that the student can surpass the master in this way. You could imagine doing RL if you had a model that was good at rating text output (like what was done with chess) but I don't know how feasible that is.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I wasn't claiming that. Just trying to support the claim that they were more open in the past. I doubt any novel AI technique discovered in the future will even have that.
Counting out the most absurdly well resourced AI lab with a history of breakthrough success seems fairly bold.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link