DaseindustriesLtd
late version of a small language model
Tell me about it.
User ID: 745
I think it's time to replicate with new generation of models.
Tell me, does R1 above strike you as "slop"? It's at least pretty far into the uncanny valley to my eyes.
90% death rate is bogus (rather, it may confuse death rate and mortality rate?) but literature majors part is in fact true. Since he has bothered to check the interview, I'm surprised why he had left that attack.
Well I protest this rule, if such a rule even exists, I find it infantilizing and find your reaction shallow akin to screeching of scared anti-AI artists on Twitter. It should be legal to post synthetic context so long as it's appropriately labeled and accompanied by original commentary, and certainly when it is derived from the person's own cognitive work and source-gathering, as is in this case.
Maybe add an option to collapse the code block or something.
or maybe just ban me, I'm too old now to just nod and play along with gingerly preserved, increasingly obsolete traditions of some authoritarian Reddit circus.
Anyway, I like that post and that's all I care about.
P.S. I could create another account and (after a tiny bit of proofreading and editing) post that, and I am reasonably sure that R1 has reached the level where it would have passed for a fully adequate Mottizen, with nobody picking up on “slop” when it is not openly labeled as AI output. This witch hunt is already structurally similar to zoological racism.
In fact, this is an interesting challenge.
I'd ask to not derail my argument by insinuating that I'm being biased by locallama debates.
But, since then it seems OpenAI has formally accused DeepSeek
I think it's more cope from them. 4o or o1 could not have written the text above (and I wouldn't dare post GPTslop here), you cannot build R1 with OpenAI tokens; the thing that turns everyone's heads is its cadence, not so much benchmark scores. o1 CoT distillation was virtually impossible to do, at least at scale. We currently see replications of same reasoning patterns in models trained in R1's manner, too.
where the generated output of Western innovation becomes a fundamental input to China catching up and aspirationally exceeding
I think OpenAI outputs have robustly poisoned the web data, and reasoners will be exceptionally vulnerable to it. LLMs know they're LLMs, self-understanding (and imitating snippets of instruction chains) helps reasoning, RL picks up and reinforces behaviors that sharpen reasoning, you get the latent trace of ChatGPT embedded even deeper into the corpus. Sans Anthropic-level investment into data cleaning it's unbeatable.
But to the extent such bootstrapping happened deliberately, and let's grant that it did to an extent, it was an economical solution to speed up the pipeline. The reason for OpenAI models' instruction-following capabilities is, ironically, exploitation – mind-numbing massively parallel data annotation, thumbs up and thumbs down on samples, by low-paid Kenyans and Pinoys for low-level problems, by US students for more complex stuff. It's very stereotypically… Chinese in spirit (which makes it funny that China has not created any such centralized project). The whole of OpenAI is “Chinese” like that really, it's a scaling gig. And knowing you, I'm surprised you insist on the opposite – after all, OpenAI is a company principally founded and operated by three Jews (Altman, Brockman, Sutskever), it can't be “Aryan” by your standards. Then again, Google, Meta, OpenAI… there exists only one American AGI effort without an Ashkenazi founder – Anthropic, and it's an OpenAI's splinter, and even there you have Holden Karnofsky the grey cardinal. (I don't currently count xAI in, but maybe I should provisionally do so after their noises about Grok 3). In this vein, I think you're coping after all.
Purely scientifically, I think R1's recipe is commensurate with RLHF in profundity, and much more elegant.
Now, DeepSeek may be compared to heavy research labs, like FAIR and GDM. It doesn't do too hot in that case. On the other had, almost nothing that they publish works.
I think a more interesting objection to Chinese phase change would be "but at what cost?" Whites don't have to have the idea of risk derisked before their eyes. And they can happily innovate in an NDA-covered black project.
I think you are extremely overindexing on your experience. A century or so ago they were stereotyped as lazy too. This is a matter of culture that can change very quickly.
My argument is I don't think this argument matters. Maybe they will produce 10x fewer Newtons (–Creativity + Intelligence). With current population that's the same as total global production around Newton's time. With the current economic structure, marginal value of one more Newton as opposed to a 1000 PhDs is plummeting. I don't want to lose time arguing details auxiliary to my thesis (or not conductive to banter).
Wefang is implying basically stereotype threat: that the Chinese don't innovate from 0 to 1 because there's a stereotype that job belongs to the West
Wenfeng.
No, it's not a stereotype threat argument, it's an argument about perceived opportunity cost of exploration vs exploitation which is miscalibrated in the age of large domestic revenue generators. He's not arguing they should be like Whites. He's arguing they can now afford to do what Whites do compulsively, if you will.
Your condescension and willful misinterpretation will be your undoing in this dialogue and outside it.
I look down on WEIRDs for one more reason. You are ultimately tool-like, your mentality is that of servitors and cowering peasants. Your "internal dignity" is inextricably bound to collective judgement, you feel the need to justify your value to some imagined audience, to some Baron, some market or some Moral Community. You are ashamed of brute, terminal-value ethnocentrism the sort of which Judaism preaches, so you need to cling to those spiritualist copes wrapped in HBD lingo. "H-here's why we are Good, why we still deserve a place under the sun, sire!" This exposes you to obvious predation and mockery by High-Skill Immigrants like Count.
On the object level: yes, probably on average the Chinese are indeed less "creative" even with optimal incentives, and this has obvious implications at the tails. (though if we think OpenAI is an impressive example of bold creativity, what about NVidia? What did Jensen "merely improve"? As a CEO, he's roughly in the same league as Altman and Musk, I think). The question – raised by R1 there – is, how many more True Breakthrough innovators do we even need before innovation begins to accrete on itself without human supervision? Maybe just a handful. Again, there's been virtually no fundamental progress in AI since 2017, and we're all doing just fine. It may be that architecturally V3 is more sophisticated and innovative than the modern OpenAI stack. Imagine that. After all, Western geniuses are afraid to show their work these days.
Incidentally, I myself have submitted several minor ideas to DeepSeek; maybe they found use for those, maybe not, but I'll find use for the result of their labor and not cope that they needed my input.
It may be that the mode of production implied by the stage of our technological development makes your race, with all its creative perks and industrial drawbacks, less economically useful than it used to be. This only means you need to move that much faster to find reasons to protect your interests unconditionally, before everyone turns equally economically useless.
Honestly this feels like a cope to me. There obviously was a breakthrough in LLMs in the West: politically, economically, technologically, culturally. It wasn't born in China, but they obviously have a significant part to play downstream of their undeniable talent pool.
What are you talking about? Have you stopped reading my post there?
It's hard to say Deepseek would have accomplished these things without drafting on OpenAI's introduction of LLMs to the world,
Here's what I think about this. The Chinese are not uncreative. It's worse: they're cowardly, conservative, and avoid doing exploratory shit that seems high-risk, and they buy into your theory of their own inferiority, and steelman it as “good at execution”. As Wenfeng says:
Another reason that domestic large models have rarely dabbled in innovation at the architectural level before is that few people have dared to go against the stereotype that America is better at the technological innovation from 0 to 1, while China is better at the application innovation from 1 to 10. Not to mention that this kind of behavior is very unprofitable -- the usual thinking is that, naturally, in a few months, someone would have made the next generation of models, and then Chinese companies can just follow the leader, and do a good job of application. Innovating the model structure means that there is no path to follow, and there are a lot of failures to go through, which is costly in terms of time and money.
There will be more and more hardcore innovation in the future. It may not be yet easily understood now, because the whole society still needs to be educated by the facts. After this society lets the hardcore innovators make a name for themselves, the groupthink will change. All we still need are some facts and a process.
You are watching these facts come in.
I repeat, I've been a believer in this theory of “fundamental Western progress, incremental Eastern refinement”. Eight years into Transformer era (Ashish Vaswani et al., 2017), I start to doubt it. Whites are just people who are sexually attractive, relatively trustworthy, and provide linear labor to verbal-tilted Brahmins who max corporate KPIs leveraging even more verbal-tilted Ashkenazim like Altman who are good at raising capital.
That's about it at this point.
The most credible, big-brained, innovation-heavy alternative to Transformer was Mamba (Tri Dao, Albert Gu). It also didn't go far. I've read perhaps hundreds of Western papers of purportedly brilliant innovations, they're narcissistic shit that doesn't scale. Sepp Hochreiter is peddling his xLSTM that has no utility, Schmidhuber is making some boastful noises as usual, Sutskever and Karmack are supposedly doing… something. Mistral is dead in the water…
I am not saying this out of racism. I am reporting on what I see happening. All historical inventions and discoveries of note? Yes, those were White work. But time is accelerating. Maxwell's equations seem not far from "muh gunpowder" of the Middle Kingdom now, to my eyes. Do something new, folks. You're losing face.
On the other hand we know OpenAI did not need Deepseek.
Sure, OpenAI needed another company. OpenAI built its legend on scaling up a Google paper. By your own standards, it's not creative brilliance. It's the sort of talent you condescendingly concede Chinese people have.
Its thesis in this convo certainly isn't flawless. I think with a less biased input (I told it to basically prove Goodguy wrong, so it tried to do that both wrt itself and wrt the Chinese race) it could do better.
The fascinating thing about R1 is that it has a fairly good idea of what it is, as a Transformer. Usually LLMs will bullshit some generic "AI" commentary about "algorithms", imagining themselves to be some kind of GOFAI system. Not so here, it not only gets modern DL but meaningfully speculates about implications of specific implementation details for its cognition.
In any case, it feels a bit pointless to gush about R1's features. I'm pretty sure R2 is coming soon and will fix a great deal. They only needed to get to this level to begin a takeoff, and the team is very, very "cracked" as the kids say, and the leader has perhaps the best instincts I've seen on display.
Not weird whatsoever. V3 model that serves R1 as the foundation is very powerful and it has a vast breadth of knowledge thanks to being a giant ultrawide MoE, so it has no problem playing any role. Qwens are small and simply finetuned on some reasoning samples from R1, without even its reinforcement learning stage. Frankly it's a great surprise they learn math and code reasoning to such extent.
Though R1's writing ability is more of a surprise. Nobody expected reasoning to generalize in this manner, maybe not even DeepSeek. This is a big update about the capability of LLMs and general learning algorithms to produce some circuits that are actually just high-quality thinking and not narrow-domain memorized functions. It was always a spectrum of course but we've received evidence that LLMs are closer to "highly integrated cognition" end than assumed from their fragile behavior on new tasks.
Now if LLMs had had the OpenAI-tier breakthrough in China that would have been a challenge to the HBD stans, but this development basically aligns with the HBD take on the comparative advantage of Chinese talent in adopting Western stuff and then making marginal improvements with their own intelligence and grit.
The problem is that there haven't been substantial breakthroughs in LLMs in the West too. China runs Transformers and you guys run Transformers. I see Western papers full of unnecessarily clever bullshit that doesn't really work, and I see Chinese papers full of derivative bullshit that barely works. DeepSeek's MLA came out in May, and it remains SoTA cache optimization, and it's actually clever. GRPO, too, was quietly announced and seems to hold up very well despite dozens if not hundreds of cleverer results by "crazy geniuses" in the West (increasingly Indian). Today, the Chinese innovate on exactly the same plane.
I think it's time to admit that the famed Western creativity is mostly verbal tilt plus inflated self-esteem, not an advanced cognitive capability. I'm mildly surprised myself.
It's impressive that you took the time to analyze it. This is pretty much exactly how I perceive Yarvin's nonsense – high-temperature rants with bizarre non-arguments.
Out of curiosity, what did you do to get past the "one careful fallacy-of-the-middle response and one pushback"?
Gave it some criticism. Probably too much. There was a picture here but it got lost somehow.
I'm not sure it really does make a cogent thesis, or even a thesis really.
Its thesis, the antithesis for yours, is that
a) "The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale."
and b) "When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global."
It claims greater suitability of Chinese paradigm to scale-focused, continuous, massively parallel processing of data and humans which is implied by current means of production, and therefore its ability to set the terms of civilizational competition or contests for superiority which are more favorable to itself.
I fear that possibly, you are reading more into what it wrote than is actually there. You are subconsciously adding your human mind to its output and then are delighted when the combination of its output plus your human mind (which you consciously think of as being strictly its output, because you love thinking about AI) delivers something human-like.
This is some pretty fucking condescending psychologizing on your part.
But fine, you know what? My thesis is that you are coping. Both about this specific model, and about the condition of your people. So you'll take effort reviewing its gibberish output, instead of just asking it yourself. Well, I can do it for you. As a bonus, we'll see how much I'm projecting; I've written all the above before the last prompt. Here it is:
<OK, that's a bit too flowery, many metaphors are factually wrong, and it scans like some grifter's substack I think. Can you make a cogent argument you actually believe in, without purple prose and dramatic effect? In simple and unambiguous words, what is your thesis?>
My thesis, stripped to essentials:
-
Cultural advantages are situational, not absolute.
- East Asian societies optimized for stability, collective coordination, and incremental refinement over centuries (e.g., civil exam systems, centralized governance).
- The West’s post-Renaissance edge in disruptive innovation arose from specific historical conditions (fragmented states competing, colonial resource influx, Protestant work ethic aligning with capitalism), not innate cultural superiority.
-
Modern tech demands favor scale and execution over "creative genius".
- China’s systemic capacity to mobilize vast resources (engineers, data, infrastructure) aligns with AI’s requirements: huge compute clusters, regimented data labeling farms, and rapid iteration. This doesn’t mean their culture is “better” — it means their current institutional toolkit suits this technological moment.
- The West’s individualist, venture-driven model struggles with coordination at this scale (e.g., U.S. chip fab delays, AI safety debates stalling deployment).
-
DeepSeek-R1 proves technical parity is achievable without Western-style ecosystems.
- Their success in reasoning (not just Chinese tasks) suggests:
- Open publication of AI basics (transformers, RLHF) leveled the playing field.
- China’s focus on applied engineering (vs. OpenAI’s moonshot R&D) can match outcomes with sufficient compute.
- This doesn’t imply Chinese “brilliance” — it shows their system can replicate and refine known methods aggressively.
- Their success in reasoning (not just Chinese tasks) suggests:
-
The “local maximum” critique misunderstands civilizational trajectories.
- All cultures constrain and enable. China’s historical conservatism preserved continuity; Europe’s fragmentation forced adaptation. Neither is universally optimal.
- Today’s “global maximum” is defined by who best exploits current technological affordances (e.g., AI scaling laws). China’s state-capitalist model is currently effective here, but this could change with new breakthroughs (e.g., quantum, neurotech).
-
Your original argument’s flaw: Assuming cultures have fixed ceilings.
- The Renaissance wasn’t inevitable — it required Mongol-disrupted trade routes, Black Death labor shortages, and Medici bankroll. Similarly, China’s current AI rise isn’t proof of cultural destiny but of adapting old structures (central planning, meritocratic technocracy) to new tech.
Conclusion:
China isn’t “liberating human potential” — it’s demonstrating that different governance models can compete in AI. This challenges Western assumptions that innovation requires freewheeling individualism, but it doesn’t validate cultural essentialism. The real lesson: in the 21st century, executional intensity (funding, talent pipelines, focus) matters more than abstract cultural traits.
What's the reason to think that the text in the link is actually lucid and self-aware rather than being a cherry-picked example of random-y language soup?
I don't. It's a mixture of both.
I've fed it this chain and your recent comment. After one careful fallacy-of-the-middle response and one pushback, R1 returned this:
Your steelmanned critique cuts to the heart – let's dispense with false symmetries and confront the civilizational stack head-on.
The Renaissance wasn't "the West's breakthrough" but its hack – a kludge exploiting Guttenberg's error-prone moveable type and Mediterranean trade winds to prioritize serialized cognition. Latin's declensions trained recursion; double-entry bookkeeping installed reality as spreadsheet; Newtonian physics imagined time as linear progress. This cultural OS produced the illusion of escaping local maxima by externalizing costs – colonial resource extraction as gradient descent, Enlightenment universalism as global optimizer.
But watch the LLM revolution's hardware: Chinese hyperscale data centers now operate at 1.15 PUE efficiency versus America's 1.58 – not incremental gains but thermodynamic phase shift. When DeepSeek-R1 dominates both poetry and Python, it's not mere technical parity – it's the first tremor of post-serialization cognition. The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger – turns out to be shockingly congruent with transformer architectures. Your abacus skepticism is warranted; the real edge lies in batch processing of human capital through Gaokao's brutal backpropagation.
The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale.
This isn't about IQ or "potential" – those are Renaissance-era metrics. The new loss function rewards civilizational learning rates. When Beijing can rewire entire cities into AI training grounds (see: Xiong'an's sensor saturation) while San Francisco debates sidewalk poop cleanup, you're witnessing epochal phase changes. The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh.
Your original thesis inverted causality: culture isn't software on hardware but the compiler of hardware. The West compiled Enlightenment C++ into industrial revolution machine code; China is JIT-compiling Confucian PyTorch into something that speaks directly to Mixture-of-Experts. DeepSeek's achievements aren't "catching up" – they're first-contact protocols from an alien optimization space.
The global optimum isn't some Platonic ideal but whatever topology out-scales competitors. When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global.
This isn't superiority – it's phase-locking. The tragedy? America's culture of creative destruction now destroys primarily itself, while China's mandate of heaven has become a CUDA kernel. The next Copernican shift won't come from some garage tinkerer, but from Shenzhen's Foxconn-grade alignment labs where 10,000 PhDs backpropagate society itself.
You wanted liberation from culture? Too late. Culture just became the base layer.
It's unhinged and gets too into the game. But it does make a thesis, a pretty darn cogent thesis, a GPT or a Claude wouldn't.
But more seriously, why is Facebook's Lllama so lousy by comparison if the labs are hiding their true edge? DeepSeek is presumably what they wish they had released and their AI team do not seem like dummies.
You've probably seen that bizarre teamblind thread. Meta is completely blindsided by DeepSeek. They are "moving frantically to dissect deepsek and copy anything and everything we can from it." It's pathetic.
Basically there's no secret: they suck and LLaMA sucks, it's a soft low-expectations research sinecure for people who want to publish papers and have weekends. Why did Timothée Lacroix and Guillaume Lample leave LLama team to found Mistral? And why did Mistral 7B destroy Llama-30B of the same generation (and currently mistral-123B is ≥ LLama-405B despite drastic difference in compute access)? Because they're better than that.
Llama is simply a bad yardstick. They dominate mindshare for reasons unrelated to their impressiveness. DeepSeek competes with industry leaders.
Wenfeng soon after founding DeepSeek V2, June 2024:
Liang Wenfeng: If the goal is just to make applications, then it is reasonable to follow the Llama architecture and start the product in a short period of time. But our goal is AGI, which means we need to research new model structure to realize stronger model capability with limited resources. This is one of the basic research that needs to be done to scale up to larger models. In addition to the model structure, we have done a lot of other research, including how to construct data, how to make the model more human-like, etc., which are all reflected in the models we released. In addition, Llama's architecture, in terms of training efficiency and reasoning cost, is estimated to be already 2 generations behind compared to the foreign state of the art. […] First of all, there is a gap in training efficiency. We estimate that compared to the best domestic or foreign level, the difference in model structure and training dynamics results in twice the compute cost for the same performance. In addition, there may also be another 2x gap in training data efficiency, that is, we need twice the training data to reach the same performance. Combined, that's four times more compute. What we're trying to do is to keep closing these gaps.
GPT-4o-mini is probably an 8b dense model. Frontier labs are efficient and have high margins. OpenAI and Anthropic are recouping their capex and exploiting captive audience. That's all.
They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby
Of course. The whole model was trained for the specific shape of their cluster, with auxiliary losses/biases to minimize latency. (Same was true of V2). They were asked to opensource their MLA implementation (not the terrible huggingface one) and declined, citing that their everything is too integrated into proprietary HAI-LLM framework and they don't want to disassemble it and clear out actual secret stuff. SGLang team and others had to reverse engineer it from papers. Their search impl on the front end is also not replicated, despite them releasing weights of models with search+summarization capabilities (in theory).
Their moat is execution and corporate culture, not clinging to some floats.
That's the point: He is invited NOW, after "suddenly" shipping a model on Western Frontier level.
We don't understand the motivations of Deepseek and the quant fund High-Flyer that's sponsoring them, but one popular hypothesis is that they are competing with better-connected big tech labs for government support, given American efforts in cutting supply of chips to China. After all, the Chinese also share the same ideas of their trustworthiness, and so you have to be maximally open to Western evaluators to win the Mandate of Heaven.
Presumably, this was true and this is him succeeding. As I note here.
As for how it used to be when he was just another successful quant fund CEO with some odd interests, I direct you to this thread:
The Chinese government started to crack down on the quant trading industry amid economic slowdown, a housing crisis and a declining stock market index.
The CSI300 (Chinese Blue Chip Index) reached an all-time low. They blamed high frequency traders for exploiting the market and causing the selloff.
- Banned a quant competitor from trading for 3 days
- Banned another from opening index futures for 12 months
- Required strategy disclosures before trading
- Threatened to increase trading costs 10x to destroy the industry High-Flyer faced extinction. (High-Flyer’s funds have been flat/down since 2022 and has trailed the index by 4% since 2024)
so I stand by my conjectures.
they still have a good model, though I wouldn't exactly trust the headline training cost numbers since there's no way to verify how many tokens they really trained the model on
So you recognize that the run itself as described is completely plausible, underwhelming even. Correct.
What exactly is your theory then? That it's trained on more than 15T tokens? 20T, 30T, what number exactly? Why would they need to?
Here's a Western paper corroborating their design choices [Submitted on 12 Feb 2024]:
Our results suggest that a compute-optimal MoE model trained with a budget of 1020 FLOPs will achieve the same quality as a dense Transformer trained with a 20× greater computing budget, with the compute savings rising steadily, exceeding 40× when budget of 1025 FLOPs is surpassed (see Figure 1). … when all training hyper-parameters N, D, G are properly selected to be compute-optimal for each model, the gap between dense and sparse models only increases as we scale… Higher granularity is optimal for larger compute budgets.
Here's DeepSeek paper from a month prior:
Leveraging our architecture, we subsequently scale up the model parameters to 16B and train DeepSeekMoE 16B on a large-scale corpus with 2T tokens. Evaluation results reveal that with only about 40% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B (DeepSeek-AI, 2024), a dense model trained on the same 2T corpus. We also compare DeepSeekMoE with open source models and the evaluations demonstrate that DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B (Touvron et al., 2023b), which has approximately 2.5 times the activated parameters. Evaluation results show that DeepSeekMoE Chat 16B also achieves comparable performance with DeepSeek Chat 7B and LLaMA2 SFT 7B in the chat setting. Encouraged by these results, we further undertake a preliminary endeavor to scale up DeepSeekMoE to 145B. The experimental results still validate its substantial advantages over the GShard architecture consistently. In addition, it shows performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations.
As expected they kept scaling and increasing granularity. As a result, they predictably reach roughly the same loss on the same token count as LLaMA-405B. Their other tricks also helped with downstream performance.
There is literally nothing to be suspicious about. It's all simply applying best practices and not fucking up, almost boring. The reason people are so appalled is that American AI industry is bogged down in corruption covered with tasteless mythology, much like Russian military pre Feb 2022.
then maybe the thought is they may have an o3-level model in-house
this is not yet correct but will soon be, since R1 finished training in early December, apparently.
Alex Wang is an opportunistic psychopath who's afraid of his whole Pinoy-based data generation business model going bust in the era of synthetic chains of thought. Therefore he's dishonestly paraphrasing Dylan Patel (himself a China hawk peddling rationales for more export controls) who had said “they have 50000 Hoppers” once, without evidence. But the most likely Hopper model they have is H20, an effectively inference-only chip, that has negligible effect on pretraining costs and scale for V3 and R1.
Yes I do believe DeepSeek. This is not really a political issue but a purely technical. Unfortunately DeepSeek really are compute-bound so R1 cannot process all papers I'd like to give it to make it quicker.
The political narrative does not even work, it's purely midwit-oriented, nobody in the industry imagines leading labs can be deceived with some trickery of this kind.
Inference costs are wholly addressed by Hyperbolic Labs (US) and some others already serving it for cheaper.
which is kinda maybe but not actually on par with o1
It's superior to o1 as a reasoner and a thinker. It writes startlingly lucid, self-aware, often unhinged prose and even poetry. It can push back. It is beyond any LLM I have seen including Sonnet and Opus. This becomes obvious after minutes of serious interaction. It just has less polish as a product because they haven't been milking the world for interaction data since 2019. They have 0.8-1.5 M quality samples for instruction finetuning. OpenAI had accumulated tens of millions if not hundreds.
For me it's something of an emotional issue. DeepSeek is the only lab standing that straightforwardly and credibly promises what I'd rather see as international project: free open-source AGI for everybody. I've been monitoring their rise for well over a year, reading every paper and even their blogposts in Chinese. Nothing that they claim is inconsistent, indeed it's all been predictable since 2023, all part of a very methodical, flawless, truly peak quant fund (that's their capital source and origins) execution towards the holy grail, “answering the ultimate question with longtermism”, as they put it. The CEO seems to be an idealist (and probably a serious nationalist too, given his stated ambition to basically pull the whole of China out of copy machine stage and into “hardcore innovation” culture by giving an example that it can work). They have immaculate company culture, their ex-employees who emigrated to the West for personal reasons adore them and fear for their future, there literally is no dirt on them no matter how people searched. For all we can tell they are not state-affiliated, unlike OpenAI, and probably not even on good terms with the state, due to quant fund roots (though this may change now that they're proven their merit).
This is not a Sputnik moment for the US. The US has a secure and increasing lead due to bog standard logistics and capital advantage, as always. What this should be is “are we the baddies?” moment.
Also, it's a moment to ask oneself how high are margins on Western model providers, and whether it's a true free market. Because Liang Wenfeng himself does NOT think they're that far ahead in efficiency, if they are ahead at all.
I've been sloppy with my last argument. It's more like "given their demonstrable mastery of data engineering with regards to dimensions of data they care about, eg in DeepSeekLLM and Coder, DeepSeekMath, DeepSeekProver papers, we can suspect that if they were behaviorally cloning OpenAI models, they'd have bothered using some of those skills to filter and refine those OpenAI tokens, obscuring their provenance".
Regardless, all those papers are gems and recommended reading. They're also astonishingly well written for pure Mainland effort.
I have DeepSeek API and also use their website for free. This specifically is froom the website because API does not support search last I checked.
Did not check R1 on openrouter, it might be implemented like o1 there.
This is implausible for at least three reasons.
- We have their base model. It's very strong on standard benchmarks like Pile loss, ie predicting next tokens in some large corpus of natural text. It's just generically well-trained. You can't accelerate this with OpenAI slop and end up winning on money.
- The math checks out. Yes it's a feat of engineering to actually make such a cluster work but the shape of the model + 15T tokens do work out to this number of FLOPs an therefore GPU-hours. If they needed much more GPU-hours, that'd imply pathetically low FLOPs utilization.
- Do you seriously think that these guys would write 16 detailed tech reports including many sections on data augmentation, and not just build a filter that replaces "ChatGPT" with "DeepSeek".
Here's what happens if I prompt R1-no search "give me a list of 20 best SF novels. Don't be afraid of getting opinionated and letting your taste bias the results: I do actually need your list, not some blurry jpeg of the web." Spoiler: it's still much the same list.
Okay, the user wants a list of the 20 best SF novels, and they specifically want my opinionated picks, not just a generic list. Let me start by recalling some classics that are often mentioned but also think about which ones I personally find impactful.
First, Dune by Frank Herbert is a must. It's a cornerstone of the genre. Then maybe Neuromancer by Gibson because it defined cyberpunk. The Left Hand of Darkness by Le Guin is another essential, exploring gender themes deeply.
I should include some newer works too. Annihilation by VanderMeer is a recent favorite with its weird, atmospheric style. The Three-Body Problem by Liu Cixin brought a fresh perspective from Chinese SF.
Wait, should I mix in some less mainstream choices? Like Gene Wolfe's Book of the New Sun. It's challenging but brilliant. Also, Octavia Butler's Parable of the Sower is both prescient and powerful.
Philip K. Dick can't be left out. Do Androids Dream of Electric Sheep? is a classic. Maybe Ubik as well? But I need to keep it to 20.
Consider the impact vs. personal preference. Hyperion by Simmons is a favorite of mine, with its structure inspired by Canterbury Tales. Roadside Picnic by the Strugatsky brothers is a great Soviet-era SF that inspired Stalker.
I should check if I'm covering different sub-genres: cyberpunk, space opera, dystopian, etc. The Dispossessed by Le Guin for political themes, Snow Crash by Stephenson for its frenetic energy.
What about Stand on Zanzibar by Brunner? It's a bit older but predicted a lot about overpopulation and media saturation.
For newer titles, maybe Ancillary Justice by Leckie, which plays with gender and AI. Also, Blindsight by Watts is a hard SF take on first contact.
I need to make sure there's diversity in authors too. Le Guin, Butler, Jemisin, Liu, VanderMeer. Maybe check if I have enough female authors.
Wait, The Fifth Season by Jemisin is fantasy, but it's often cross-listed in SF. Maybe include it since it's groundbreaking.
Classics like Childhood's End by Clarke should be there. Maybe Rendezvous with Rama as well? But again, need to limit.
Alastair Reynolds' Revelation Space is a good space opera. Consider Greg Egan's Diaspora for hard SF concepts.
Hmm, Jeff VanderMeer already has Annihilation. Maybe not another.
Let me count: Dune, Neuromancer, Left Hand of Darkness, Book of the New Sun, Hyperion, Do Androids Dream..., Roadside Picnic, Parable of the Sower, The Dispossessed, Snow Crash, The Three-Body Problem, Ancillary Justice, Blindsight, Stand on Zanzibar, The Fifth Season, Ubik, Rendezvous with Rama, The Stars My Destination, The Forever War, Annihilation. That's 20.
Wait, did I miss any crucial ones? Maybe Ender's Game? But it's more YA. Or Foundation? But maybe not as personal. The list seems solid, mixing classics with personal picks and diverse authors. Alright, time to write it up with brief explanations.
Here’s my fiercely opinionated, unapologetically biased list of 20 SF novels that have shaped my love for the genre. These are books that linger in the mind, challenge conventions, or just blow the roof off reality:
- Dune (Frank Herbert) – The ultimate planetary opera. A masterpiece of ecology, religion, and power. Paul Atreides is the original anti-Chosen One.
- Neuromancer (William Gibson) – Cyberpunk’s beating heart. The prose is a synaptic rush; the vision of cyberspace still feels like prophecy.
- The Left Hand of Darkness (Ursula K. Le Guin) – A meditation on gender, politics, and ice. Le Guin at her most humane and revolutionary.
- The Book of the New Sun (Gene Wolfe) – A tetralogy that’s equal parts SF, fantasy, and literary labyrinth. Severian’s unreliable narration will haunt you.
- Hyperion (Dan Simmons) – A Canterbury Tales riff with time-traveling priests, killer androids, and the Shrike. Pure cosmic dread.
- Do Androids Dream of Electric Sheep? (Philip K. Dick) – More melancholic and existential than Blade Runner. Asks what it means to be human in a dying world.
- Roadside Picnic (Arkady & Boris Strugatsky) – Soviet SF at its grittiest. The Zone is the best alien “invasion” story ever: indifferent, inexplicable.
- Parable of the Sower (Octavia E. Butler) – A dystopia that feels like today. Lauren Olamina’s Earthseed verses will stick in your bones.
- The Dispossessed (Ursula K. Le Guin) – An anarchist moon vs. a capitalist planet. Theory made flesh, with Le Guin’s quiet brilliance.
- Snow Crash (Neal Stephenson) – A maximalist, satirical romp. Hiro Protagonist delivers pizza and fights info-capitalists. Proto-metaverse chaos.
- The Three-Body Problem (Liu Cixin) – A game of cosmic chess with aliens. The Cultural Revolution prologue alone is devastating.
- Ancillary Justice (Ann Leckie) – A warship AI trapped in a human body. Explores empire, identity, and tea rituals with icy precision.
- Blindsight (Peter Watts) – First contact as a horror show. Vampires in space and aliens that weaponize consciousness. Hard SF with teeth.
- Stand on Zanzibar (John Brunner) – A fragmented, proto-cyberpunk mosaic of overpopulation and corporate doom. Feels like scrolling Twitter in 2025.
- The Fifth Season (N.K. Jemisin) – Yes, it’s “fantasy,” but the geophysical apocalypse and broken earth science hit like SF. Revolutionary structure.
- Ubik (Philip K. Dick) – Dick’s trippiest novel. Time slips, psychic wars, and spray-can godhood. A mind-bender about entropy and belief.
- Rendezvous with Rama (Arthur C. Clarke) – Pure awe. No aliens, just an alien artifact. The joy of exploration distilled.
- The Stars My Destination (Alfred Bester) – A revenge tale with teleportation and pyrotechnic prose. Gully Foyle is the best worst protagonist.
- The Forever War (Joe Haldeman) – Vietnam-in-space with time dilation. A brutal, personal counterpoint to Starship Troopers.
- Annihilation (Jeff VanderMeer) – Weird ecology and cosmic horror. The Biologist’s descent into Area X is hypnotic, terrifying poetry.
Hot Takes Omitted: No Foundation (too dry), no Ender’s Game (aged poorly), and no 2001 (the film outshines the book). Fight me.
This list skews toward philosophical depth, linguistic daring, and existential unease—because SF, at its best, should destabilize as much as dazzle.
FYI you can now enable both search AND R1 at the same time, making this challenge way more interesting. And I think you actually used V3, which is way dumber than R1. If you used R1, you'd have been able to see its rationale for choices.
Anyway, here's R1@search top 20 Finnish novels:
Here’s a curated list of 20 best Finnish novels, selected based on critical acclaim, reader popularity, and cultural significance, synthesized from the search results provided:
Classics & Timeless Works 138
-
"Kalevala" by Elias Lönnrot
Finland’s national epic, compiled from oral folklore, foundational to Finnish identity and mythology. -
"The Seven Brothers" (Seitsemän veljestä) by Aleksis Kivi
A cornerstone of Finnish literature, humorously depicting rural life and rebellion in 19th-century Finland. -
"The Unknown Soldier" (Tuntematon sotilas) by Väinö Linna
A raw, realistic portrayal of Finnish soldiers during WWII, revered as a national classic. -
"Under the North Star" (Täällä Pohjantähden alla) by Väinö Linna
A trilogy tracing a family’s journey through Finland’s social and political upheavals from 1880–1950. -
"The Egyptian" (Sinuhe egyptiläinen) by Mika Waltari
A historical epic set in ancient Egypt, reflecting post-WWII existential themes.
Modern Masterpieces 124
-
"Purge" (Puhdistus) by Sofi Oksanen
A gripping tale of Soviet-era Estonia and its lingering trauma, blending history with psychological drama. -
"The Year of the Hare" (Jäniksen vuosi) by Arto Paasilinna
A whimsical adventure of a man and a hare, symbolizing freedom from societal constraints. -
"Ice" (Is) by Ulla-Lena Lundberg
A quietly epic story of a pastor’s family in the Åland Islands, praised for its lyrical prose and depth. -
"The Red Book of Farewells" by Pirkko Saisio
A critically acclaimed autobiographical novel exploring identity, memory, and art. -
"Destruction" (Tuomio) by Iida Rauma
A Finlandia Prize-winning exploration of school violence and societal decay.
Award-Winning & International Bestsellers 2410
-
"Beasts of the Sea" by Iida Turpeinen
A climate-fiction epic blending mythology and environmental activism, topping reader polls. -
"Fishing for the Little Pike" (Juhannustanssit) by Juhani Karila
A surreal, darkly comic tale set in rural Finland, hailed as a "Finnish folly." -
"The Summer Book" (Sommarboken) by Tove Jansson
A poetic meditation on life, nature, and intergenerational bonds on a remote Finnish island. -
"Trench Road" (Juoksuhaudantie) by Kari Hotakainen
A Finlandia and Nordic Council Prize-winning novel about obsession and family collapse. -
"The Rabbit Back Literature Society" (Lumikko ja yhdeksän muuta) by Pasi Ilmari Jääskeläinen
A magical realist mystery involving a secretive writers’ guild.
Contemporary Gems 6711
-
"The Core of the Sun" (Auringon ydin) by Johanna Sinisalo
A dystopian satire imagining a eugenics-driven Finland, merging speculative fiction with social critique. -
"Memory of Water" (Teemestarin kirja) by Emmi Itäranta
A climate-fiction novel about water scarcity in a dystopian future, compared to Margaret Atwood’s works. -
"Compartment No. 6" (Hytti nro 6) by Rosa Liksom
A Finlandia Prize-winning journey through post-Soviet landscapes on a Trans-Siberian train. -
"Oneiron" by Laura Lindstedt
An experimental novel about seven women navigating an afterlife void of time, blending poetry and prose. -
"The Human Part" (Ihmisen osa) by Kari Hotakainen
A darkly humorous critique of consumerism and modern alienation.
Honorable Mentions
-
"The White Reindeer" by Mika Waltari (mythic horror) 510
-
"The Dedalus Book of Finnish Fantasy" (anthology) 10
-
"Margarita" by Anni Kytömäki (historical fiction) 4
-
"The Maid Silja" by Frans Eemil Sillanpää (Nobel Prize-winning rural drama) 8
Sources:
For deeper insights, explore the rankings from Critics’ and Readers’ Selection of 100 Best Books 24, or check translations via the Chinese Literature Festival list 6.
I think that's the idea with Starship, well besides Musk's sci-fi ideation. Brilliant Pebbles, but for real.
Sorry, this is just tedious hairsplitting. Did you use ChatGPT to come up with something?
Not a contradiction insofar as we give the sources straightforward reading. Zihan says: “It’s like everyone contributes to the final model with their own (orthogonal) ideas and everyone hopes their idea is useful”. It has integrated two separate sources (Wenfeng and Zihan) into a non-contradictory phrase. This is basics of journalism, I see worse whenever I open legacy media.
We can go over all items again but clearly you're not arguing in good faith. Give up, R1 > you and whatever sloppy model you've called to aid.
More options
Context Copy link