DaseindustriesLtd
late version of a small language model
Tell me about it.
User ID: 745
Wait, where did my image go? I attached it, is it just gone?
Anyway, Kimi on OpenRouter:
Thought Process: To determine the vowels of "North Dakota" in order, we need to follow these steps:
Identify the word: The word in question is "North Dakota." Extract the vowels: From the word, we need to pick out only the vowels (A, E, I, O, U). Sometimes 'Y' is considered a vowel, but in this case, we will ignore it unless explicitly needed. List the vowels in order: Go through each letter of "North Dakota" in sequence and note the vowels in the order they appear. Step-by-Step Solution: Write out "North Dakota" and label each character with its position for clarity:
N (1), o (2), r (3), t (4), h (5), (space), D (7), a (8), k (9), o (10), t (11), a (12). Iterate through each character and identify vowels:
Position 1: 'N' – not a vowel. Position 2: 'o' – vowel (O). Position 3: 'r' – not a vowel. Position 4: 't' – not a vowel. Position 5: 'h' – not a vowel. Position 6: ' ' – space, ignore. Position 7: 'D' – not a vowel. Position 8: 'a' – vowel (A). Position 9: 'k' – not a vowel. Position 10: 'o' – vowel (O). Position 11: 't' – not a vowel. Position 12: 'a' – vowel (A). Collect the vowels in the order they appear:
o (position 2), a (position 8), o (position 10), a (position 12). Final answer: o, a, o, a
Kimi on Groq doesn't waste time:
The vowels in "North Dakota" in order are: o, a, o, a
then claimed the vowels of North Dakota in order were o, h, a, and o
Can't replicate even with quantized provider. Most people report very different experience.
update: Kimi K2 ranks #5 overall on LMArena under style control.
- Tied #1 Multi-turn interaction
- #2 Coding, Hard Prompts
The top US open models are Google's Gemma 3 27b and Nvidia's finetune of llama 3.1, ranked #28 and #34 respectively.
There are tiers to this, from just weights release to full data+code+weights. Chinese labs mostly release weights and tech report with a reproducible (given some effort) recipe, sometimes code, rarely some or all of the data (more often parts of post-training data, though in these cases it's typically just links to datasets that have already been open).
I think nitpicking about open source is uninteresting when the recipe is available. This is a very dynamic field of applied science, rather than labor-intensive programming exercise. The volume of novel code in a given LLM project is comparable to a modest Emacs package, what matters is ideas (derisked at scale). Specific implementations are usually not that valuable – DeepSeek's GRPO, as described in their papers, has been improved upon in the open multiple times by this point. Data composition is dependent on your own needs and interests, there are vast open datasets, just filter them as you see fit.
I think R1 and the wave it's caused have already had an effect. It's frozen the ceiling on «frontier» pricing around $15/1M for models slightly but clearly better, such as Sonnet or 2.5 Pro (there are higher-tier offerings but they get very little purchase), encouraged the consumption of small distilled models like grok-mini or -Flash which directly compete with Chinese output, and clearly led OpenAI and Meta to try to ship a better open model for prestige (OpenAI may succeed yet). Amodei is coping, his company is among the more vulnerable ones and with the worst velocity; no matter how hard they lean on the DoD pork and national security rhetoric, everyone in the US does that now.
Expenditures have already happened, largely; datacenters are getting completed, giant training runs will be just a way to amortize that by producing models that will warrant higher inference volume and pricing. Base models on the level of Grok 3 are the floor for this generation, soon GPT-5 sets the next frontier floor. There is also an obvious pivot to agents/deep researchers/reasoners with extremely bloated, branching, parallelizable inference, and you need models to be smart enough to make sense of all those vast context dumps. Synthetic data scaling is focused on RL now, that also in effect requires to run a great deal of inference to produce higher-utility models. They won't cut expenditures, in short.
Sorry, misunderstood you. I don't think we've seen anyone seriously defend having stolen or distilled someone's model. My bet is the precedent will depend on who/whom and lawyer muscle rather than fundamentals of the situation.
How else could they achieve this result if their talent wasn't superior? Or if not talent, then the juice in an organization that allows good results at speed.
How small and relatively inexperienced Chinese labs do so much with so little is an interesting question. I have the impression that Western corporations overestimate “frontier talent”, or perhaps paradoxically – underestimate actual, raw talent (that isn't that rare, just needs to be noticed) and overestimate the value of corporate secrets that some of this legendary talent is privy to. Liang Wenfeng hires Ph.D students and they seem to do better than mature Ph.Ds.
H20s are useless for training, China will have to figure that part out on their own. Although the current RL paradigm is more and more reliant on inference (rollouts of agent trajectories, Kimi is built on that), so H20s will indirectly advance capabilities. Yet there remains a need for pretraining next generation bases, and of course experiments.
It usually involves taking a chapter from my novel and asking it to imagine it in a style from a different author I like.
Yes, many models (even open ones, such as R1) have better adherence to instructions. It writes well in its own style. I value models with distinct personalities. You're right about Russianness I think.
I've heard that they tried to buy-out Thinking Machines and SSI for billions, but were turned down
They've proposed that to even much smaller labs, though I'm not at liberty to share. Zuck is desperate and defaults to his M&A instincts that have served him well. It might work in dismantling the competition, at least. But it's not like Meta and FAIR were originally lacking in talent, they've contributed immensely to research (just for instance, MTP in DeepSeek V3 is based on their paper; Llama 4 somehow failed to implement it). The problem is managerial. To get ahead, I'm afraid Zuck will need to cut, rather than graft.
But Grok 4 just crushes with sheer size I think.
The fact that Grok is at all comparable (or indeed inferior) to Kimi on any metric, even the most obscure one, speaks to the deep cultural advantage of Moonshot. Grok 4's training compute is estimated to be 6.4e26 FLOPs; Kimi, like R1, is likely ≈4.0e24, fully 100 times less. They probably spent on scaling experiments for Grok 3/4 more than Moonshot has spent over their lifetime on everything. It's not really a fair competition, I admit Grok is a stronger model.
It was designed for Musk's vision of AI modelling and understanding the physical universe, that's what it's for and it does excellently there.
I think it wasn't designed with any specific focus in mind, it's an all around next-generation base+RL model.
I think the arc of history still bends towards Nvidia, the biggest company in the world and by some distance. I think like you I was leaning more towards the 'talent conquers all' ethos
You distort my argument. I was consistently skeptical that China can win this on HBD merits alone, after all the US also has plenty of talented people (very many of them Chinese, but also diverse global and domestic talent), in Nvidia and elsewhere, plus it has a giant and growing edge in compute. My thesis is that the gap in applied AI possibly won't be so profound as to allow some Pivotal Action To Secure Durable Strategic Advantage, that the hawks in DC and Dario Amodei fantasize about as they rail for more export controls. Nvidia will remain dominant, so will Western AI labs.
But so far China is doing better than I anticipated, both technically and ethically.
In regard to Zhilin's words, if I may psychologize a little, I think that it's very natural for a Chinese person with close knowledge of and experience with Western ideas and societies - but also an attachment to an identity as Chinese - to conceptualize things in terms of a dichotomy between East and West
If that is how the Chinese actors themselves conceptualize this, does it matter if we can object to such thinking as historically reductionist and stereotypical? Yes, obviously both types exist in both societies. (At least Zhilins exist in the US; if there are Liangs, I'd be happy to see them. Lambert is an academic, not a hedge fund CEO who also somehow happens to be a great engineer and an open source fanatic. The closest we had was Emad Mostaque, neither exactly Western nor very technical or good at being a CEO). But it is clear that the Chinese discourse, particularly in the VC sphere, maps pragmatic and idealistic archetypes onto the East-West dichotomy. Half of Liang's interview is the journalist saying “but this is madness, nobody does it, nobody will give the money for it” and Liang saying “and yet we must learn to do it, because crazy Westerners do and that is how they create things we've been imitating all this time” .
But since it's not a matter of "deep roots," it makes sense that a single breakout success like DeepSeek could precipitate a shift in orientation. … To go far afield of my knowledge, it seems as though these extrinsic factors might end up being better for China than for the US.
I agree this is a possibility, and I think it's one of the more interesting cultural trends to track, which is why I'm writing these updates. Deep roots or not, Chinese fast-following is more than a subtly racist trope, it really is the backbone of their economic ascendance. If they start similarly rewarding high-risk innovation, it'll change the gameboard a great deal.
Well, I don't really understand American law but it seems to me that Anthropic has set the precedent of LLM pretraining corpora being essentially immune to copyright claims. Anthropic's models are, ironically, the most paranoid about reproducing copyrighted material.
I am skeptical about diffusion even for images and video, the whole subfield is a giant nerd snipe for mathcels. Autoregression is strictly more expressive and more suitable for continual generation, sadly we pay the price of worse parallelization. If anything, I'd be more enthusiastic about honest-to-God Energy-based LLMs. There have been a series of breakthroughs in making DLLMs that don't totally suck and offer extreme speeds in low batch size regime, but eh. I think sparse attention Transformers will crush them.
I literally cite Kimi's own arguments for open source:
[…] 3. Why Open Source
#1: Reputation. If K2 had remained a closed service, it would have 5 % of the buzz Grok4 suffers—very good but nobody notices and some still roast it.
#2: Community velocity. Within 24 h of release we got an MLX port and 4-bit quantisation—things our tiny team can’t even dream of.
#3: It sets a higher technical bar. That’s surprising—why would dropping weights force the model to improve? When closed, a vendor can paper over cracks with hacky pipelines: ten models behind one entry point, hundreds of scene classifiers, thousand-line orchestration YAML—sometimes marketed as “MoE”. Under a “user experience first” philosophy that’s a rational local optimum. But it’s not AGI. Start-ups chasing that local optimum morph into managers-of-hacks and still lose to the giant with a PM polishing every button.
Kimi the start-up cannot win that game. Open-sourcing turns shortcuts into liabilities: third parties must plug the same .safetensors into run_py() and get the paper numbers. You’re forced to make the model itself solid; the gimmicks die. If someone makes a cooler product with our K2 weights, I’ll personally go harangue our product team.
DeepSeek's arguments are more ideological and cultural:
For technologists, being followed is a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one. To give is to receive glory. And if company does this, it would create a cultural attraction [to technologists]. […]
plus stuff about accelerating the development of Chinese ecosystem.
High-level researchers are not slaves or menial workers, they have massive pride, they want to publish and gain clout. You can pay them hundreds of millions to get over that, or you can let them publish. Open sourcing is the ultimate form of publishing your work.
There probably isn't a delay, plans to ship it in May to capitalize on the hype were entirely hallucinated by jo*rnalists as far as I can tell. It might take many months yet.
It's not really verbose in normal use, rather the opposite. It is verbose in agentic mode, writing docs and commentary rather than thinking traces. RL has been used for all serious LLMs since GPT 3-instruct, this is independent of the current long-CoT paradigm. It is dubious that Kimi has been trained on CoTs because it doesn't do them. More likely, its training data is largely final outputs of a reasoner (like Kimi's own 1.5/1.6). They have a section in the paper on 1.5 about penalizing verbosity.
Relevant from Lambert: The American DeepSeek Project
While America has the best AI models in Gemini, Claude, o3, etc. and the best infrastructure with Nvidia it’s rapidly losing its influence over the future directions of AI that unfold in the open-source and academic communities. Chinese organizations are releasing the most notable open models and datasets across all modalities, from text to robotics or video, and at the same time it’s common for researchers worldwide to read far more new research papers from Chinese organizations rather than their Western counterparts.
This balance of power has been shifting rapidly in the last 12 months and reflects shifting, structural advantages that Chinese companies have with open-source AI — China has more AI researchers, data1, and an open-source default.
[…] The goal for my next few years of work is what I’m calling The American DeepSeek Project — a fully open-source model at the scale and performance of current (publicly available) frontier models, within 2 years.2 A fully open model, as opposed to just an “open weights” model, comes with data, training code, logs, and decision making — on top of the weights to run inference — in order to distribute the knowledge and access for how to train AI models fully.
etc. He overstates the cause, perhaps. America doesn't need these egghead communist values of openness and reproducibility, the free market will align incentives, and charity shouldn't go too far. But he's pointing to the very real fact that China, and not on the state level but on the level of individual small companies with suitable culture, is the only country bringing transformative AI not locked on corporate clusters closer to reality.
Periodic Open-Source AI Update: Kimi K2 and China's Cultural Shift
(yes yes another post about AI, sorry about that). Link above is to the standalone thread, to not clutter this one.
Two days ago a small Chinese startup Moonshot AI has released weights of the base and instruct versions of Kimi K2, the first open (and probably closed too) Chinese LLM to clearly surpass DeepSeek's efforts. It's roughly comparable to Claude Sonnet 4 without thinking (pay no mind to the horde of reasoners at the top of the leaderboard, this is a cheap-ish capability extension and doesn't convey the experience, though is relevant to utility). It's a primarily agentic non-reasoner, somehow exceptionally good at creative writing, and offers a distinct "slop-free", disagreeable but pretty fun conversation, with the downside of hallucinations. It adopts DeepSeek-V3’s architecture wholesale (literally "modeling_deepseek.DeepseekV3ForCausalLM"), with a number of tricks gets maybe 2-3 times as much effective compute out of the same allowance of GPU-hours, and the rest we don't know yet because they've just finished a six-months marathon and don't have a tech report.
I posit that this follows a cultural shift in China’s AI ecosystem that I've been chronicling for a while, and provides a nice illustration by contrast. Moonshot and DeepSeek were founded at the same time, have near-identical scale and resources but have been built on different visions. DeepSeek’s Liang Wengeng (hedge fund CEO with Masters in engineering, idealist, open-source advocate) couldn't procure funding in the Chinese VC world with his inane pitch of “long-termist AGI research driven by curiosity” or whatever. Moonshot’s Yang Zhilin (Carnegie Mellon Ph,D, serial entrepreneur, pragmatist) succeeded at that task, got to peak $3,3 valuation with the help of Alibaba and Sequoia, and was heavily spending on ads and traffic acquisition throughout 2024, building a nucleus of another super-app with chatbot companions, assistants and such trivialities at a comfortable pace. However, DeepSeek R1, on merit of vastly stronger model, has been a breakout success and redefined Chinese AI scene, making people question the point of startups like Kimi. Post-R1, Zhilin pivoted hard to prioritize R&D spending and core model quality over apps, adopting open weights as a forcing function for basic progress. This seems to have inspired the technical staff: "Only regret: we weren’t the ones who walked [DeepSeek’s] path."
Other Chinese labs (Qwen, Minimax, Tencent, etc.) now also emulate this open, capability-focused strategy. Meanwhile, Western open-source efforts are even more disappointing than last year – Meta’s LLaMA 4 failed, OpenAI’s model is delayed again, and only Google/Mistral release sporadically, with no promises of competitive results.
This validates my [deleted] prediction: DeepSeek wasn’t an outlier but the first swallow and catalyst of China’s transition from fast-following to open innovation. I think Liang’s vision – "After hardcore innovators make a name, groupthink will change" – is unfolding, and this is a nice point to take stock of the situation.
So, despite 15 years of supposed collapse
There's been no collapse, but there is a great deal of ruin in a nation. Canada has been a great country to live in, and it is still good by OECD standards. Canada is just on a starkly different growth trajectory from the southern neighbor (to wit, it does NOT grow, among other problems), a trajectory that holds no promise of changing. 10-15 years ago it was on par with the US in per capita GDP, then trends diverged more and more. In 2020 this was still easy to ignore, and anyway perceptions take time to change, the British are still stereotyped as sophisticated gentlemen in some parts of the world.
You attribute the quality of Canada in the past to Canadians, and the current condition to the government of Canada. But it's, of course, made of Canadians as well. My point is that the governance of Canada seemed to be successful, based on results, and for the last decades the results are getting worse and worse.
Americans also have a lot of cruel things to say about their governance, and about each other. I dismiss their entitled whining because the actual performance has been and remains superior to the competition.
Census says 126,340 people left Canada for the U.S. in 2022, a 70 per cent increase over a decade ago
sounds bad, but what do I know.
Israelis hadn't changed, but the clearly rising antisemitism among the western left and its Islamic neighborhood pushed Israelis to vote for the one cynical hawk in town : Bibi. While politics shifted right, the average Israeli remained a normal person.
this is quite tendentious.
Israelis have changed, and will change more. Demographically, politically, culturally. Israel today is not some offshoot of Western Civilization but a higher-IQ Middle Eastern nation, with all that follows. This narrative is getting very stale.
I wasn't talking of Canadian perceptions. Are you playing for pity now with this ridiculous «nuh-uh»? Yes it was.
But Canadians were endlessly preening about their moral superiority and greater civility and safety, too, much to the consternation of Americans. Still are, to an extent. This is very easy to observe in the wild.
It's always funny to watch a person unravel like this. You want to prove me stupid and ignorant on every turn, even when it's completely beside the point and your attack is not very tenable. I didn't want to get this result, you could have simply been content with me refusing to elaborate on the genius of American deep state.
Hajnali mentality is not that deep, scratch and there's the same rabid animal underneath.
P.S. I don't remember what I was taught in the geography class about Canada. It's not been heavy on politics. Probably just oil sands and climate, stuff like that.
or I imagine you'd retreat to a motte of achieve comparable GDP growth rates to the US
Do you really think this would be a mere "motte"? Canada used to be seen as a “nicer America”, an uncontroversially well-running state. Then they went all in on replacement migration in the name of muh GDP, and achieved GDP growth… proportionate to the population increase, per capita they've stagnated for a decade (quite a feat given that they've been importing hundreds of thousands of "talents" from China and India, I presume many of them legitimate). Now even first generation immigrants flee south for better opportunities, the government barely has popular mandate, and there's increasingly not-jokey talk about Alberta accepting American annexation. Yes, this is exactly how actual state incompetence looks like, the US isn't doing that.
For the case of EU, you can read this.
The issue with state regulation of AI is obviously that all relevant AI in the US is produced in California, maybe a little bit in Texas. This would have never flied with a Republican administration, but even that aside it's clearly discriminatory against everyone else who will experience the effects of Californian regulation.
Spy satellites contribute so little to the total mass to orbit that you never even needed SpaceX for that (i don't consider Starlink a primarily national security project, because it's not).
For delivering payloads, including probably international ones, China will begin catching up next year. I do not assume that Americans will be contracting them, no, so in that sense SpaceX is poised to maintain its near-monopoly.
It is believed that the crop of reusable rocket startups is attributable to Robin Li, the founder of Baidu, getting into National People's Congress, and advocating for legalization of private space businesses in 2010s. So far, there have been three Chinese entities that have conducted VTOL tests for reusable rockets.
- China Aerospace Science and Technology Corporation (CASC), June 23, 2024
- LandSpace, September 11, 2024
- Space Epoch, May 29, 2025
There are others which are further behind.
Technologically, they are several iterations behind, but strategically I'd say they save significant advantages over the current SpaceX (a usual feature of Chinese fast-following). For example Space Epoch Yuanxingzhe-1 is basically a small Starship (or a better, thicker Falcon-9, if Falcon-9 were designed today). Stainless steel, metholox, will naturally plug into the existing and state-subsidized logistics, including military facilities that currently produce aviation parts (as a small point, Falcon's extreme height-to-width ratio is obviously suboptimal and downstream of American highway standards, but China had no problem building dedicated roads). LandSpace Zhuque-3 VTVL-1 is similar (they can boast of the first metholox engine to make it to orbit).
But as you rightfully notice, it's not clear if this will have much effect on the SpaceX bottom line, since Americans can saturate their cadence anyway. In all likelihood it will only unnerve some people in Washington as a symbolic thing.
Or are you trying to make the argument that the US state department is competent, but got played by even bigger-brained Israelis?
More charitably, I think that the US and Israel are a geopolitical bloc with shared elites, in which core US interests sometimes take a backseat for the bigger picture, to some consternation of the electorate. The US is long-term invested in the Israeli dominance in the Middle East. This isn't even different from the official rhetoric.
Ah, that was very generous of them. I'm sure self-interest played no part in it, and it's not even clear what you mean by that - buying treasuries? If so, they bought treasuries throughout the early 2000s at a rate not that different from 2008 - was that also for altruistic reasons?
It was after this paragraph that I decided to just stop reading. A Hajnali is just a Hajnali, in your head reality and morality melt together, a proper and cooperative action must be morally motivated, so you will engage in these ridiculous theatrics because you feel morally outraged at China and at me. It's so annoying.
Of course China did it for self-interest. As for buying treasuries. Have you actually checked? That was the sharpest sustained acceleration on the chart.
Some facts gathered by a Chinese open-source AI, which you could freely use instead of trying to be clever:
- In November 2008, China launched a ¥4 trillion ($586 billion) stimulus (13% of its GDP), dwarfing the U.S. stimulus (5% of GDP) 1 … This rapid recovery boosted global markets and commodity demand
- Mechanism of Support: China recycled trade surpluses into U.S. Treasuries to maintain a weak yuan, ensuring export competitiveness. By 2008, it held $700 billion in U.S. debt. This provided critical demand for U.S. debt during massive deficit spending (e.g., TARP bailouts)
During 2008, about half of China’s total reserve accumulation of $400 billion went towards net purchases of U.S. treasury bills and bonds.
During September to November 2008, the latest three-month period for which data are currently available from the U.S. Treasury, Chinese purchases of U.S. treasury bills and bonds amounted to nearly $123 billion—this at a time when U.S. financial markets were in deep turmoil. The continued flow of Chinese money into U.S. treasuries is of course rather convenient for the U.S. at a time when it faces the prospect of having to finance a massive budget deficit.
All this has saddled China with provincial debt, bloated real estate market, and systemic imbalances they're still not finished dealing with.
Of course not just China, everyone had to pay for American profligacy and scamming, to avoid a truly catastrophic recession. But my argument here was not that China Good: solely that allowing Chinese development in the first place, instead of pursuing a more negative-sum strategy, was not a blunder or a betrayal of American self-interest. America actually can benefit from global growth (eg by getting bailed out in a crisis, after having become a pillar of global economy). Chinese growth prior to this phase of conflict is, therefore, not evidence of American Deep State being incompetent.
You've lost track of that, and I've lost interest in combing through your rather emotional text. In short, Vietnam was premised on a reasonable fear of the domino effect, and most specifically-American problems (eg falling birth rate isn't one) are genuinely hard to solve due to the nature of American economy and population. The tradeoffs so far have been very worth it, accumulating towards even greater ones, and I believe they have been greater than what the American population without such high-IQ stewardship could hope to earn.
For the contrary example, look no further than the EU and Canada. They have comparable population quality, are at the same stage of development, and share many of your natural advantages. How have the last 20 years been for them? Are they famed for their Deep State? I rest my case.
- Prev
- Next
Well since you aren't telling what the tricky way is (the whole exchange is suggestive of paranoia to be honest), I can't reproduce, but
with the exception of Colorado's vowels it seems correct. I don't get better results from 2.5 Pro or Sonnet, certainly no hallucinated two states.
More options
Context Copy link