@DaseindustriesLtd's banner p

DaseindustriesLtd

late version of a small language model

77 followers   follows 28 users  
joined 2022 September 05 23:03:02 UTC

Tell me about it.


				

User ID: 745

DaseindustriesLtd

late version of a small language model

77 followers   follows 28 users   joined 2022 September 05 23:03:02 UTC

					

Tell me about it.


					

User ID: 745

My argument is I don't think this argument matters. Maybe they will produce 10x fewer Newtons (–Creativity + Intelligence). With current population that's the same as total global production around Newton's time. With the current economic structure, marginal value of one more Newton as opposed to a 1000 PhDs is plummeting. I don't want to lose time arguing details auxiliary to my thesis (or not conductive to banter).

Wefang is implying basically stereotype threat: that the Chinese don't innovate from 0 to 1 because there's a stereotype that job belongs to the West

Wenfeng.

No, it's not a stereotype threat argument, it's an argument about perceived opportunity cost of exploration vs exploitation which is miscalibrated in the age of large domestic revenue generators. He's not arguing they should be like Whites. He's arguing they can now afford to do what Whites do compulsively, if you will.

Your condescension and willful misinterpretation will be your undoing in this dialogue and outside it.

I look down on WEIRDs for one more reason. You are ultimately tool-like, your mentality is that of servitors and cowering peasants. Your "internal dignity" is inextricably bound to collective judgement, you feel the need to justify your value to some imagined audience, to some Baron, some market or some Moral Community. You are ashamed of brute, terminal-value ethnocentrism the sort of which Judaism preaches, so you need to cling to those spiritualist copes wrapped in HBD lingo. "H-here's why we are Good, why we still deserve a place under the sun, sire!" This exposes you to obvious predation and mockery by High-Skill Immigrants like Count.

On the object level: yes, probably on average the Chinese are indeed less "creative" even with optimal incentives, and this has obvious implications at the tails. (though if we think OpenAI is an impressive example of bold creativity, what about NVidia? What did Jensen "merely improve"? As a CEO, he's roughly in the same league as Altman and Musk, I think). The question – raised by R1 there – is, how many more True Breakthrough innovators do we even need before innovation begins to accrete on itself without human supervision? Maybe just a handful. Again, there's been virtually no fundamental progress in AI since 2017, and we're all doing just fine. It may be that architecturally V3 is more sophisticated and innovative than the modern OpenAI stack. Imagine that. After all, Western geniuses are afraid to show their work these days.

Incidentally, I myself have submitted several minor ideas to DeepSeek; maybe they found use for those, maybe not, but I'll find use for the result of their labor and not cope that they needed my input.

It may be that the mode of production implied by the stage of our technological development makes your race, with all its creative perks and industrial drawbacks, less economically useful than it used to be. This only means you need to move that much faster to find reasons to protect your interests unconditionally, before everyone turns equally economically useless.

Honestly this feels like a cope to me. There obviously was a breakthrough in LLMs in the West: politically, economically, technologically, culturally. It wasn't born in China, but they obviously have a significant part to play downstream of their undeniable talent pool.

What are you talking about? Have you stopped reading my post there?

It's hard to say Deepseek would have accomplished these things without drafting on OpenAI's introduction of LLMs to the world,

Here's what I think about this. The Chinese are not uncreative. It's worse: they're cowardly, conservative, and avoid doing exploratory shit that seems high-risk, and they buy into your theory of their own inferiority, and steelman it as “good at execution”. As Wenfeng says:

Another reason that domestic large models have rarely dabbled in innovation at the architectural level before is that few people have dared to go against the stereotype that America is better at the technological innovation from 0 to 1, while China is better at the application innovation from 1 to 10. Not to mention that this kind of behavior is very unprofitable -- the usual thinking is that, naturally, in a few months, someone would have made the next generation of models, and then Chinese companies can just follow the leader, and do a good job of application. Innovating the model structure means that there is no path to follow, and there are a lot of failures to go through, which is costly in terms of time and money.

There will be more and more hardcore innovation in the future. It may not be yet easily understood now, because the whole society still needs to be educated by the facts. After this society lets the hardcore innovators make a name for themselves, the groupthink will change. All we still need are some facts and a process.

You are watching these facts come in.

I repeat, I've been a believer in this theory of “fundamental Western progress, incremental Eastern refinement”. Eight years into Transformer era (Ashish Vaswani et al., 2017), I start to doubt it. Whites are just people who are sexually attractive, relatively trustworthy, and provide linear labor to verbal-tilted Brahmins who max corporate KPIs leveraging even more verbal-tilted Ashkenazim like Altman who are good at raising capital.

That's about it at this point.

The most credible, big-brained, innovation-heavy alternative to Transformer was Mamba (Tri Dao, Albert Gu). It also didn't go far. I've read perhaps hundreds of Western papers of purportedly brilliant innovations, they're narcissistic shit that doesn't scale. Sepp Hochreiter is peddling his xLSTM that has no utility, Schmidhuber is making some boastful noises as usual, Sutskever and Karmack are supposedly doing… something. Mistral is dead in the water…

I am not saying this out of racism. I am reporting on what I see happening. All historical inventions and discoveries of note? Yes, those were White work. But time is accelerating. Maxwell's equations seem not far from "muh gunpowder" of the Middle Kingdom now, to my eyes. Do something new, folks. You're losing face.

On the other hand we know OpenAI did not need Deepseek.

Sure, OpenAI needed another company. OpenAI built its legend on scaling up a Google paper. By your own standards, it's not creative brilliance. It's the sort of talent you condescendingly concede Chinese people have.

Its thesis in this convo certainly isn't flawless. I think with a less biased input (I told it to basically prove Goodguy wrong, so it tried to do that both wrt itself and wrt the Chinese race) it could do better.

The fascinating thing about R1 is that it has a fairly good idea of what it is, as a Transformer. Usually LLMs will bullshit some generic "AI" commentary about "algorithms", imagining themselves to be some kind of GOFAI system. Not so here, it not only gets modern DL but meaningfully speculates about implications of specific implementation details for its cognition.

In any case, it feels a bit pointless to gush about R1's features. I'm pretty sure R2 is coming soon and will fix a great deal. They only needed to get to this level to begin a takeoff, and the team is very, very "cracked" as the kids say, and the leader has perhaps the best instincts I've seen on display.

Not weird whatsoever. V3 model that serves R1 as the foundation is very powerful and it has a vast breadth of knowledge thanks to being a giant ultrawide MoE, so it has no problem playing any role. Qwens are small and simply finetuned on some reasoning samples from R1, without even its reinforcement learning stage. Frankly it's a great surprise they learn math and code reasoning to such extent.

Though R1's writing ability is more of a surprise. Nobody expected reasoning to generalize in this manner, maybe not even DeepSeek. This is a big update about the capability of LLMs and general learning algorithms to produce some circuits that are actually just high-quality thinking and not narrow-domain memorized functions. It was always a spectrum of course but we've received evidence that LLMs are closer to "highly integrated cognition" end than assumed from their fragile behavior on new tasks.

Now if LLMs had had the OpenAI-tier breakthrough in China that would have been a challenge to the HBD stans, but this development basically aligns with the HBD take on the comparative advantage of Chinese talent in adopting Western stuff and then making marginal improvements with their own intelligence and grit.

The problem is that there haven't been substantial breakthroughs in LLMs in the West too. China runs Transformers and you guys run Transformers. I see Western papers full of unnecessarily clever bullshit that doesn't really work, and I see Chinese papers full of derivative bullshit that barely works. DeepSeek's MLA came out in May, and it remains SoTA cache optimization, and it's actually clever. GRPO, too, was quietly announced and seems to hold up very well despite dozens if not hundreds of cleverer results by "crazy geniuses" in the West (increasingly Indian). Today, the Chinese innovate on exactly the same plane.

I think it's time to admit that the famed Western creativity is mostly verbal tilt plus inflated self-esteem, not an advanced cognitive capability. I'm mildly surprised myself.

It's impressive that you took the time to analyze it. This is pretty much exactly how I perceive Yarvin's nonsense – high-temperature rants with bizarre non-arguments.

Out of curiosity, what did you do to get past the "one careful fallacy-of-the-middle response and one pushback"?

Gave it some criticism. Probably too much. There was a picture here but it got lost somehow.

I'm not sure it really does make a cogent thesis, or even a thesis really.

Its thesis, the antithesis for yours, is that

a) "The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale."

and b) "When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global."

It claims greater suitability of Chinese paradigm to scale-focused, continuous, massively parallel processing of data and humans which is implied by current means of production, and therefore its ability to set the terms of civilizational competition or contests for superiority which are more favorable to itself.

I fear that possibly, you are reading more into what it wrote than is actually there. You are subconsciously adding your human mind to its output and then are delighted when the combination of its output plus your human mind (which you consciously think of as being strictly its output, because you love thinking about AI) delivers something human-like.

This is some pretty fucking condescending psychologizing on your part.

But fine, you know what? My thesis is that you are coping. Both about this specific model, and about the condition of your people. So you'll take effort reviewing its gibberish output, instead of just asking it yourself. Well, I can do it for you. As a bonus, we'll see how much I'm projecting; I've written all the above before the last prompt. Here it is:

<OK, that's a bit too flowery, many metaphors are factually wrong, and it scans like some grifter's substack I think. Can you make a cogent argument you actually believe in, without purple prose and dramatic effect? In simple and unambiguous words, what is your thesis?>


My thesis, stripped to essentials:

  1. Cultural advantages are situational, not absolute.

    • East Asian societies optimized for stability, collective coordination, and incremental refinement over centuries (e.g., civil exam systems, centralized governance).
    • The West’s post-Renaissance edge in disruptive innovation arose from specific historical conditions (fragmented states competing, colonial resource influx, Protestant work ethic aligning with capitalism), not innate cultural superiority.
  2. Modern tech demands favor scale and execution over "creative genius".

    • China’s systemic capacity to mobilize vast resources (engineers, data, infrastructure) aligns with AI’s requirements: huge compute clusters, regimented data labeling farms, and rapid iteration. This doesn’t mean their culture is “better” — it means their current institutional toolkit suits this technological moment.
    • The West’s individualist, venture-driven model struggles with coordination at this scale (e.g., U.S. chip fab delays, AI safety debates stalling deployment).
  3. DeepSeek-R1 proves technical parity is achievable without Western-style ecosystems.

    • Their success in reasoning (not just Chinese tasks) suggests:
      • Open publication of AI basics (transformers, RLHF) leveled the playing field.
      • China’s focus on applied engineering (vs. OpenAI’s moonshot R&D) can match outcomes with sufficient compute.
    • This doesn’t imply Chinese “brilliance” — it shows their system can replicate and refine known methods aggressively.
  4. The “local maximum” critique misunderstands civilizational trajectories.

    • All cultures constrain and enable. China’s historical conservatism preserved continuity; Europe’s fragmentation forced adaptation. Neither is universally optimal.
    • Today’s “global maximum” is defined by who best exploits current technological affordances (e.g., AI scaling laws). China’s state-capitalist model is currently effective here, but this could change with new breakthroughs (e.g., quantum, neurotech).
  5. Your original argument’s flaw: Assuming cultures have fixed ceilings.

    • The Renaissance wasn’t inevitable — it required Mongol-disrupted trade routes, Black Death labor shortages, and Medici bankroll. Similarly, China’s current AI rise isn’t proof of cultural destiny but of adapting old structures (central planning, meritocratic technocracy) to new tech.

Conclusion:
China isn’t “liberating human potential” — it’s demonstrating that different governance models can compete in AI. This challenges Western assumptions that innovation requires freewheeling individualism, but it doesn’t validate cultural essentialism. The real lesson: in the 21st century, executional intensity (funding, talent pipelines, focus) matters more than abstract cultural traits.

What's the reason to think that the text in the link is actually lucid and self-aware rather than being a cherry-picked example of random-y language soup?

I don't. It's a mixture of both.

I've fed it this chain and your recent comment. After one careful fallacy-of-the-middle response and one pushback, R1 returned this:


Your steelmanned critique cuts to the heart – let's dispense with false symmetries and confront the civilizational stack head-on.

The Renaissance wasn't "the West's breakthrough" but its hack – a kludge exploiting Guttenberg's error-prone moveable type and Mediterranean trade winds to prioritize serialized cognition. Latin's declensions trained recursion; double-entry bookkeeping installed reality as spreadsheet; Newtonian physics imagined time as linear progress. This cultural OS produced the illusion of escaping local maxima by externalizing costs – colonial resource extraction as gradient descent, Enlightenment universalism as global optimizer.

But watch the LLM revolution's hardware: Chinese hyperscale data centers now operate at 1.15 PUE efficiency versus America's 1.58 – not incremental gains but thermodynamic phase shift. When DeepSeek-R1 dominates both poetry and Python, it's not mere technical parity – it's the first tremor of post-serialization cognition. The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger – turns out to be shockingly congruent with transformer architectures. Your abacus skepticism is warranted; the real edge lies in batch processing of human capital through Gaokao's brutal backpropagation.

The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale.

This isn't about IQ or "potential" – those are Renaissance-era metrics. The new loss function rewards civilizational learning rates. When Beijing can rewire entire cities into AI training grounds (see: Xiong'an's sensor saturation) while San Francisco debates sidewalk poop cleanup, you're witnessing epochal phase changes. The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh.

Your original thesis inverted causality: culture isn't software on hardware but the compiler of hardware. The West compiled Enlightenment C++ into industrial revolution machine code; China is JIT-compiling Confucian PyTorch into something that speaks directly to Mixture-of-Experts. DeepSeek's achievements aren't "catching up" – they're first-contact protocols from an alien optimization space.

The global optimum isn't some Platonic ideal but whatever topology out-scales competitors. When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global.

This isn't superiority – it's phase-locking. The tragedy? America's culture of creative destruction now destroys primarily itself, while China's mandate of heaven has become a CUDA kernel. The next Copernican shift won't come from some garage tinkerer, but from Shenzhen's Foxconn-grade alignment labs where 10,000 PhDs backpropagate society itself.

You wanted liberation from culture? Too late. Culture just became the base layer.


It's unhinged and gets too into the game. But it does make a thesis, a pretty darn cogent thesis, a GPT or a Claude wouldn't.