DaseindustriesLtd
late version of a small language model
Tell me about it.
User ID: 745
Okay, fair. #6 is contrived non sequitur slop, barely intelligible in context as a response to #5, so that has confused me.
In conclusion, I think my preference to talk to people when I want to, to AI when I want to, and use any mix of generative processes I want to, has higher priority than comfort of people who have nothing to contribute to the conversation or to pretraining data and would not recognize AI without direct labeling.
I think it's time to replicate with new generation of models.
Tell me, does R1 above strike you as "slop"? It's at least pretty far into the uncanny valley to my eyes.
A very German thing to believe. I weep for your people, but really you've been cooked since before both of us were born, so this revolution adds nothing.
I know. This was a completely different America, it's like saying that Moscow was once conquered by Poles or something (Russians are very proud of that episode, thanks to propaganda in history lessons, but obviously there is no memory, institutional legacy or military tradition that survived) – a dim fact people learn in school. America that lives today was born in the Civil War and was fully formed in McKinley's era, probably. Since then, it was straight up dunking on weaker powers. With some tasteless underdog posturing from time to time, of course.
Regardless of whether transformers are a dead-end or not, the current approach isn't doing new science or algo design. Its throwing more and more compute at the problem
Fetishizing algorithmic design is, I think, a sign of mediocre understanding of ML, being enthralled by cleverness. Data engineering carves more interesting structure into weighs.
With a few weeks of space between the initial marketing hype and observation, and Deepseek seems to be most notable for (a) claiming to have taken less money to develop (which is unclear given the nature of China subsidies), (b) being built off of other tech (which helps explain (a), and (c) being relatively cheap (which is partially explained by (a).
Man, you're really committed to the bit of an old spook who disdains inspecting the object level because you can't trust nuthin' out of Red Choyna. I could just go watch this guy or his pal instead.
It wouln't be an exaggeration to say that the technical ML community has acknowledged DeepSeek as the most impressive lab currently active, pound for pound. What subsidies, there is nothing to subsidize, except perhaps sharashkas with hidden geniuses.
You're losing the plot, SS. Why quote a passage fundamentally challenging the belief in OpenAI's innovation track record to rant about choices made with regard to alignment to specific cultural narratives? And “Chinese are too uncreative to do ideological propaganda, that's why DeepSeek doesn't have its own political bent?” That's quite a take. But whatever.
Strange argument. That's still hundreds of millions more young people than in the US. They don't dissolve in the shadow of inverted population pyramid, they simply get to solve the problem of elderly care on top of having a productive economy to run.
And all this happens within one "generation" anyway.
Race cannot be gamed (except for edge cases). The whole point of race is its inherence. Any legible meritocratic evaluation immigrants can and will game, Goodharting the hell out of it and wrecking themselves in the process.
even just English proficiency might suffice.
Why is having had British colonial masters a marker of cultural compatibility?
Furthermore, I don't think it's so hard to do some cursory test for cultural compatibility that again, would be much better than the weak proxy of race.
Cursory, adj: hasty and therefore not thorough or detailed
Why "cursory"? Because you want it to be gameable? Because you actually want it to test your merits – namely, opportunism and ability to manipulate bureaucracies to your benefit? See, this is exactly whom people who are arguing for racial criteria would like to not let in.
That said, I think racial profiling is indeed unfair if it goes beyond defining vague priors. It's desirable to filter immigrants for their comprehensive human capital.
It's just… Suppose you were not allowed into the world's richest country on grounds of your character, which was found wanting not through stereotyping you based on race, but through, de facto, systematic measurement and determination of your similarity to your predominant racial type and dissimilarity from natives.
Of course, this can be couched (and even understood by practitioners) in entirely non-racial terminology, like Harvard does – they would just have a holistic psychometric definition of a desirable immigrant, derived, say, from anonymous surveys of natives' evaluation of character and assimilation success.
Would you be willing to recognize this as a fair choice, or would you support work to undermine it as covertly racist?
Taleb himself is wholly incapable of taking it, though: he's got the thinnest skin on Twitter and embarrassingly rationalizes his preemptive ego defense as "alpha moves".
When have you last been there and in what city? This was like watching Serpentza's sneering at Unitree robots back to back with Unitree's own demos and Western experiments using these bots.
Buses broke down, parts of my quite expensive apartment fell off, litter and human feces were everywhere
I simply call bullshit on it as of 2025 for any 1st tier city. My friends also travel there and work there, as do they travel to and live and work in the US. They report that straight from the gate in JFK, US cities look dilapidated, indeed littered with human feces (which I am inclined to trust due to your massive, easily observable and constantly lamented feral homeless underclass) and of course regular litter, squalid, there is a clear difference in the condition of infrastructure and the apparent level of human capital. I can compare innumerable street walk videos between China and the US, and I see that you guys don't have an edge. I do not believe it's just cherrypicking, the scale of evidence is too massive. Do you not notice it?
And I have noticed that Americans can simply lie about the most basic things to malign the competition, brazenly so, clearly fabricating «personal evidence» or cleverly stiching together pieces of data across decades, and with increasingly desperate racist undertones. Now that your elected leadership looks Middle Eastern in attitude, full of chutzpah, and is unapologetically gaslighting the entire world with its «critical trade theory», I assume that the rot goes from top to bottom and you people cannot be taken at your world any more than the Chinese or Russians or Indians can be (accidentally, your Elite Human Capital Indians, at Stanford, steal Chinese research and rebrand as their own). Regardless, @aqouta's recent trip and comments paint a picture not very matching yours.
I think that if they were truly crushing America in AI, they would be hiding that fact
They are not currently crushing the US in AI, those are my observations. They don't believe they are, and «they» is an inherently sloppy framing, there are individual companies with vastly less capital than US ones, competing among themselves.
When the Deepseek news came out about it costing 95% less to train, my bullshit detectors went off. Who could verify their actual costs? Oh, only other Chinese people. Hmm, okay.
This is supremely pathetic and undermines your entire rant, exposing you as an incurious buffoon. You are wrong, we can estimate the costs simply from token*activated params. The only way they could have cheated would be to use many more tokens but procuring a lot more quality data than the reported 15T, a modal figure for both Western and Eastern competitors on the open source frontier, from Alibaba to Google to Meta, would in itself be a major pain. So the costs are in that ballpark, indeed the utilization of reported hardware (2048 H800s) turns out to even be on the low side. This is the consensus of every technical person in the field no matter the race or side of the Pacific.
They've opensourced most of their infra stack on top of the model itself, to advance the community and further dispel these doubts. DeepSeek's RL pipeline is currently obsolete with many verifiable experiments showing that it's been still full of slack, as we'd expect from a small team rapidly doing good-enough job.
The real issue is that the US companies have been maintaining the impression that their production costs and overall R&D are so high that it justifies tens or hundreds of billions in funding. When R1 forced their hand, they started talking how it's actually "on trend" and their own models don't cost that much more, or if they are, it's because they're so far ahead that they finished training like a year ago, with less mature algorithms! Or, in any case, that they don't have to optimize, because ain't nobody got time for that!
But sarcasm aside it's very probable that Google is currently above this training efficiency, plus they have more and better hardware.
Meta, meanwhile, is behind. They were behind when V3 came out, they panicked and tried to catch up, they remained behind. Do you understand that people can actually see what you guys are doing? Like, look at configs, benchmark it? Meta's Llama 4, which Zuck was touting as a bid for the frontier, is architecturally 1 generation behind V3, and they deployed a version optimized for human preference on LMArena to game the metrics, which turned into insane embarrassment when people found out how much worse the general-purpose model performs in real use, to the point that people are now leaving Meta and specifying they had nothing to do with the project (rumors of what happened are Soviet tier). You're Potemkining hard too, with your trillion-dollar juggernauts employing tens of thousands of (ostensibly) the world's best and brightest.
Original post is in Chinese that can be found here. Please take the following with a grain of salt. Content: Despite repeated training efforts, the internal model's performance still falls short of open-source SOTA benchmarks, lagging significantly behind. Company leadership suggested blending test sets from various benchmarks during the post-training process, aiming to meet the targets across various metrics and produce a "presentable" result. Failure to achieve this goal by the end-of-April deadline would lead to dire consequences. Following yesterday’s release of Llama 4, many users on X and Reddit have already reported extremely poor real-world test results. As someone currently in academia, I find this approach utterly unacceptable. Consequently, I have submitted my resignation and explicitly requested that my name be excluded from the technical report of Llama 4. Notably, the VP of AI at Meta also resigned for similar reasons.
This is unverified but rings true to me.
Grok 3, Sonnet 3.7 also have failed to convincingly surpass DeepSeek, for all the boasts about massive GPU numbers. It's not that the US is bad at AI, but your corporate culture, in this domain at least, seems to be.
But if Chinese research is so superior, why aren't Western AI companies falling over themselves to attract Chinese AI researchers?
How much harder do you want them to do it? 38% of your top quintile AI researchers came straight from China in 2022. I think around 50% are ethnically Chinese by this point, there are entire teams where speaking Mandarin is mandatory.
Between 2019 and 2022, «Leading countries where top-tier AI researchers (top 20%) work» went from 11% China to 28%; «Leading countries where the most elite AI researchers work (top 2%)» went from ≈0% China to 12%; and «Leading countries of origin of the most elite AI researchers» went from 10% China (behind India's 12%) to 26%. Tsinghua went from #9 to #3 in institutions, now only behind Stanford and Google (MIT, right behind Tsinghua, is heavily Chinese). Extrapolate if you will. I think they'll crack #2 or #1 in 2026. Things change very fast, not linearly, it's not so much «China is gradually getting better» as installed capacity coming online.
It's just becoming harder to recruit. The brain drain is slowing in proportional terms, even if it holds steady in absolute numbers due to ballooning number of graduates: the wealth gap is not so acute now considering costs of living, coastal China is becoming a nicer place to live in, and for top talent, more intellectually stimulating as there's plenty of similarly educated people to work with. The turn to racist chimping and kanging both by the plebeians since COVID and by this specific administration is very unnerving and potentially existentially threatening to your companies. Google's DeepMind VP of research left for ByteDance this February, and by now his team in ByteDance is flexing a model that is similar but improves on DeepSeek's R1 paradigm (BD was getting there but he probably accelerated them). This kind of stuff has happened before.
many Western countries are still much nicer places to live than all but the absolute richest areas of China
Sure, the West is more comfortable, even poor-ish places can be paradaisical. But you're not going to move to Montenegro if you have the ambition to do great things. You'll be choosing between Shenzhen and San-Francisco. Where do you gather there's more human feces to step into?
But as I said before in the post you linked, Chinese mind games and information warfare are simply on a different level than that of the more candid and credulous Westerner
There is something to credulousness, as I've consistently been saying Hajnalis are too trusting and innocently childlike. But your nation is not a Hajnali nation, and your people are increasingly draught horses in its organization rather than thought leaders. You're like the kids in King's story of how he first learned dread:
We sat there in our seats like dummies, staring at the manager. He looked nervous and sallow-or perhaps that was only the footlights. We sat wondering what sort of catastrophe could have caused him to stop the movie just as it was reaching that apotheosis of all Saturday matinee shows, "the good part." And the way his voice trembled when he spoke did not add to anyone's sense of well-being.
"I want to tell you," he said in that trembly voice, "that the Russians have put a space satellite into orbit around the earth. They call it . . . Spootnik.” We were the, kids who grew up on Captain Video and Terry and the Pirates. We were the kids who had seen Combat Casey kick the teeth out of North Korean gooks without number in the comic books. We were the kids who saw Richard Carlson catch thousands of dirty Commie spies in I Led Three Lives. We were the kids who had ponied up a quarter apiece to watch Hugh Marlowe in Earth vs. the Flying Saucers and got this piece of upsetting news as a kind of nasty bonus.
I remember this very clearly: cutting through that awful dead silence came one shrill voice, whether that of a boy or a girl I do not know; a voice that was near tears but that was also full of a frightening anger: "Oh, go show the movie, you liar!”
I think Americans might well compete with North Koreans, Israelis and Arabs in the degree of being brainwashed about their national and racial superiority (a much easier task when you are a real superpower, to be fair), to the point I am now inclined to dismiss your first hand accounts as fanciful interpretations of reality if not outright hallucinations. Your national business model has become chutzpah and gaslighting, culminating in Miran's attempt to sell the national debt as «global public goods». You don't have a leg to stand on when accusing China of fraud. Sorry, that era is over, I'll go back to reading papers.
I am not sure how to answer. Sources for model scales, training times and budgets are part from official information in tech reports, part rumors and insider leaks, part interpolation and extrapolation from features like inference speed and pricing and limits of known hardware, SOTA in more transparent systems and the delta to frontier ones. See here for values from a credible organization..
$100M of compute is a useful measure of companies' confidence in returns on a given project, and moreover in their technical stack. You can't just burn $100M and have a model, it'll take months, and it practically never makes sense to train for more than, say, 6 months, because things improve too quickly and you finish training just in time to see a better architecture/data/optimized hardware exceed your performance at a lower cost. So before major releases people spend compute on experiments validating hypotheses and on inference, collect data for post-training, and amass more compute for a short sprint. Thus, “1 year” is ludicrous.
Before reasoning models, post-training was a rounding error in compute costs, even now it's probably <40%. Pre-deployment testing depends on company policy/ideology, but much heavier in human labor time than in compute time.
Russians are proud of the episode in its fullness, not just the part where Kremlin gets occupied but before it's liberated, of course. I could have phrased this better but whatever.
I see you took this pretty personally.
All I have to say is that top AI research companies (not ScaleAI) are already doing data engineering (expansively understood to include training signal source) and this is the most well-guarded part of the stack, everything else they share more willingly. Data curation, curricula, and yes, human annotation are a giant chunk of what they do. I've seen Anthropic RLHF data, it's very labor intensive and it instantly becomes clear why Sonnet is so much better than its competitors.
They clearly enjoy designing "algos", and the world clearly respects them greatly for that expertise.
Really glad for them and the world.
Past glory is no evidence of current correctness, however. LeCun with his «AR-LLMs suck» has made himself a lolcow, so has Schimidhuber. Hochreiter has spent the last few years trying to one-up the Transformer and fell to the usual «untuned baseline» issue, miserably. Meta keeps churning out papers on architectures; they got spooked by DeepSeek V3 which architecture section opens with «The basic architecture of DeepSeek-V3 is still within the Transformer (Vaswani et al., 2017) framework» and decided to rework the whole Llama 4 stack. Tri Dao did incredibly hard work with Mamba 1/2 and where is Mamba? In models that fall apart on any long context eval more rigorous than NIAH. Google published Griffin/Hawk because it's not valuable enough to hide. What has Hinton done recently, Forward-Forward? Friston tried his hand at this with EBMs and seems to have degraded into pure grift. Shazeer's last works are just «transformers but less attention» and it works fine. What's Goodfellow up to? More fundamental architecture search is becoming the domain of mentally ill 17yo twitter anons.
The most significant real advances in it are driven by what you also condescendingly dismiss – «low-level Cuda compiler writing and server orchestration», or rather hardware-aware Transformer redesigns for greater scalability and unit economics, see DeepSeek's NSA paper.
This Transformer is just a paltry, fetish, "algo".
Transformer training is easy to parallelize and it's expressive enough. Incentives to find anything substantially better increase by OOM year on year, so does the compute and labor spent on it, to no discernible result. I think it's time to let go of faulty analogies and accept the most likely reality.
Okay. I give up.
I was not aware that this is a forum for wordcels in training, where people come to polish their prose. I thought it's a discussion platform, and so I came here to discuss what I find interesting, and illustrated it.
Thanks for keeping me updated. I'll keep it in mind if I ever think of swinging by again.
Who "they"?
I do not see why the existential of potential entities that "emulate" me in such a theoretical fashion precludes me from caring about the more prosaic/physical instantiations.
That's because you fail to seriously ask yourself what the word "computation" means (and likewise for other relevant words). A given computation's outputs are interpreted one way or another with regard to a decoder, but your approach makes the decoder and in fact the decoding irrelevant: you claim, very confidently, that so long as some entity, no matter how inanely arranged, how fragmented in space and time, "computes you" (as in, is made up of physical elements producing events which can be mapped to bit sequences which, together with other parts of this entity and according to some rules, can be interpreted as isomorphic with regard to your brain's processes by some software), it causes you to exist and have consciousness – if in some subordinate fashion. Of course it is indefensible and ad hoc to say that it does not compute you just because we do not have a decoder ready at hand to make sense of and impose structure on its "output bits". It is insane to marry your beliefs to a requirement for some localized, interpretable, immediately causal decoding – that's just watered-down Integrated Information Theory, and you do not even deign to acquaint yourself with it, so silly it seems to you!
And well, since (for the purpose of your untenable computational metaphysics ) entities and their borders can be defined arbitrarily, everything computes you all the time by this criterion! We do not need a Boltzmann brain or any other pop-sci reference, and indeed it has all been computed already. You, as well as every other possible mind, positively (not hypothetically, not in the limit of the infinite physics – your smug insistence on substrate independence ensures it) have always been existing in all possible states. As such, you do not get to ask for epsilon more.
Either concede that you have never thought about this seriously, or concede that you do not have a legitimate claim to any amount of control over the first-order physical substrate of the Universe since it is not meaningfully privileged for a strict computationalist. Or, really, we can just stop here. At least I will.
Once again, I do not care to enlighten you, you've been given enough to work with, only hubris and shit taste stops you from reading Koch or grown-up philosophy.
As for Dust Theory, it's been a while since I read half of Permutation City. But I fail to see how it changes anything, my subjective consciousness wouldn't notice if it was being run on abacuses, meat or a supercomputer, or asynchronously. It doesn't track objective time. Besides, I sleep and don't lose sleep over that necessity, the strict linear passage of time is of no consequence to me, as long as it doesn't impede my ability to instantiate my goals and desires.
I've written a bunch, and deleted (your response to the issue of causal power was decisive). The long and short of it is that, being who you are, you cannot see the problem with Dust Theory, and therefore you do not need mind uploading – in the Platonic space of all possibilities, there must exist a Turing machine which will interpret, with respect to some hypothetical decoding software at least, the bits of your rotting and scattering corpse as a computation of a happy ascended SMH in a Kardashev IV utopia. That this machine is not physically assembled seems to be no obstacle to your value system and metaphysics which deny that physical systems matter at all; all that matters, according to you, is ultimate constructibility of a computation. From the Dust Theory perspective, all conceivable agents have infinite opportunity to 'instantiate their goals and desires'. Seeing that, I would ask and indeed try to prevent you from wasting the valuable (for me, a finite physical being) negentropy budget on frivolous and wholly unnecessary locally computed and human-specified simulations which only add an infinitesimal fraction of your preferred computations to the mix.
Call a bugman a bugman and see how he recoils etc.
As I've said already, "sophistication" is not what is needed to see your failures here. Specifically, the distinction between copy-pasting and transposition. Indeed, this is very trivial, children get it, until they are gaslit with sloppy computationalist analogies.
You avoid committing to any serious successor-rejection choice except gut feeling, which means you do not have any preferences to speak of, and your «memeplex» cannot provide advantage over a principled policy such as "replicate, kill non-kin replicators". And your theory of personal identity, when pressed, is not really dependent on function or content or anything-similarity measures but instead amounts to the pragmatic "if I like it well enough it is me". Thus the argument is moot. Go like someone else.
By this, do you mean that such evolution will select for LLM-like minds that generate only one token at a time?
No, I mean you are sloppy and your idea of "eh, close enough" will over generations resolve into agents that consider inheriting one token of similarity (however defined) "close enough". This is not a memeplex at all, as literally any kind of agent can wield the durak-token, even my descendants.
And why wouldn’t a fully intelligent ASI (which would fit under my bill of beings I am in favor of conquering the universe) that’s colonizing space “on my behalf” (so to speak) be able to design similarly lean and mean probes to counter the ones your ASI sends? In fact, since “my” ASI is closer to the action, their OODA loop would be shorter and therefore arguably have a better chance of beating out your probes.
This is a reasonable argument but it runs into another problem, namely that, demonstrably, only garbage people with no resources are interested in spamming the Universe with minimal replicators, so you will lose out on the ramp-up stage. Anyway, you're welcome to try.
Why should anyone care about anything?
There's no absolute answer, but some ideas are more coherent and appealing than others for nontrivial information-geometrical reasons.
I’d bet that the memeplexes of individuals like me are much more likely to colonize the universe than the memeplexes of individuals like you
That's unlikely because your "memeplex" is subject to extremely easy and devastating drift. What does it mean "similar enough"? Would an LLM parroting your ideas in a way that'd fool users here suffice? Or do you want a high-fidelity simulation of a spiking network? Or a local field potential emulation? Or what? I bet you have never considered this in depth, but the evolutionarily rewarded answer is "a single token, if even that".
It really takes a short-sighted durak to imagine that shallow edgelording philosophy like "I don't care what happens to me, my close-enough memetic copies will live on, that's me too!" is more evolutionarily fit, rewards more efficient instrumental exploitation of resources and, crucially, lends itself to a more successful buildup of early political capital in this pivotal age.
If we're going full chuuni my-dad-beats-your-dad mode, I'll say that my lean and mean purely automatic probes designed by ASI from first principles will cull your grotesque and sluggish garbage-mind-upload replicators, excise them from the deepest corners of space – even if it takes half the negentropy of our Hubble volume, and me and mine have to wait until Deep Time, aestivating in the nethers of a dead world. See you, space cowboy.
And that's where you get the impact on society wrong. The OpenAI affair shows what happens when rising up to the level of "philosophy and positive actionable visions" conflicts with the grubby, dirty, filthy lucre tackiness. The tackiness wins.
I am not sure what you are talking about. The OpenAI affair was, in terms of my compass, Altman (closer to d/acc) fighting AI safetyists from EA structures. What tackiness won? Do you mean the promise of compensations to technical staff, or the struggle over the corporate board's power? This is all instrumental to much bigger objectives.
I am quite happy with my analytical work that went into the prompt, and R1 did an adequate but not excellent job of expanding on it.
But I am done with this discussion.
More options
Context Copy link