@DaseindustriesLtd's banner p

DaseindustriesLtd

late version of a small language model

74 followers   follows 27 users  
joined 2022 September 05 23:03:02 UTC

Tell me about it.


				

User ID: 745

DaseindustriesLtd

late version of a small language model

74 followers   follows 27 users   joined 2022 September 05 23:03:02 UTC

					

Tell me about it.


					

User ID: 745

or I imagine you'd retreat to a motte of achieve comparable GDP growth rates to the US

Do you really think this would be a mere "motte"? Canada used to be seen as a “nicer America”, an uncontroversially well-running state. Then they went all in on replacement migration in the name of muh GDP, and achieved GDP growth… proportionate to the population increase, per capita they've stagnated for a decade (quite a feat given that they've been importing hundreds of thousands of "talents" from China and India, I presume many of them legitimate). Now even first generation immigrants flee south for better opportunities, the government barely has popular mandate, and there's increasingly not-jokey talk about Alberta accepting American annexation. Yes, this is exactly how actual state incompetence looks like, the US isn't doing that.

For the case of EU, you can read this.

but I don't understand people who aren't willing to choose the lesser of two evils

What is the argument for the need to make a choice? Does the US pay much attention to the war between Congo and Rwanda (despite clearly laying blame on one side)? Actually have you even heard of it?

Any reasonable country in Israel's position would react similarly.

No, not at all. Or only on the crudest level of analysis. There is no way to argue that Israeli policy is the only reasonable response, not even Israelis would say that. There are many possible options. Eg China has shown its take on the situation, in Xinjiang.

Manus is a generic thin wrapper over a strong Western model (Sonnet 3.7), if a bit better executed than most, and I am quite unhappy about this squandering of DeepSeek's cultural victory. The developers are not deeply technical and have instead invested a lot into hype, with invites to influencers and creating a secondary invite market, cherrypicked demos aimed at low value add SEO-style content creation (eg “write a course on how to grow your audience on X”) and pretty UX. Its performance on GAIA is already almost replicated by this opensource repo. This is the China we know and don't very much love, the non-DeepSeek baseline: tacky self-promotion, jumping on trends, rent-seeking, mystification. In my tests is hallucinates a lot – even in tasks where naked Sonnet can catch those same hallucinations.

The real analogy to DeepSeek is that, like R1 was the first time laymen used to 4o-mini or 3.5-turbo level slop got a glimpse of a SoTA reasoner in a free app, this is the first time laymen have been exposed to a strong-ish agent system, integrating all features that make sense at this stage – web browsing, pdf parsing, code sandbox, multi-document editing… but ultimately it's just a wrapper bringing out some lower bound of the underlying LLM's latent capability. Accordingly it has no moat, and does not benefit China particularly.

Ah well. R2 will wash all that away.

It is interesting, however, that people seemed to have no clue just how good and useful LLMs already are, probably due to lack of imagination. They are not really chatbot machines, they can execute sophisticated operations on any token sequences, if you just give them the chance to do so.

I can only say that engaging with the Chinese, and with people like you, has gradually convinced me that White People (Hajnali European stock specifically) are basically jumped-up serfs, the confused lower caste of prawns from District 9, with little more to offer to the world sans stale kanging and hollow, corporate-coded pretense of “soul” that, if it ever existed, resided in your currently extinct owners. You don't even notice my point about simple economics and logistics, so lost you are in your racial superiority masturbation. But of course those issues are related.

if Japan were in China's position instead

But it isn't, and you are largely responsible for that, because your previous generation had the exact same attitude towards the Japanese. Deaths from overwork, rigid hierarchy, soulless collectivist automatons cheating and copying to flood the markets and dispossess our Christian Germanic workers – this can't be allowed, can it? Oh, what a pity that now that we know them better, Japan is a geriatric country of no ambition, that mainly produces anime to give you some respite from the toxic antihuman sludge of your own media. (Presumably this is the fault of Joos. Somehow for all your natural nobility of spirit you are not capable of resisting a tiny tribe of natural wordcels. At least the Chinese managed to overthrow the Manchu).

Regrettably, China is 10 times larger and the same tricks won't work.

A change in American economic policy sent global markets into a tailspin, so objectively speaking, America is in fact a big deal.

Yes, you can do a great deal of damage to humanity. This is akin to the bafflingly swinish line of argument that “China needs us more than we need them, because they need to sell their valuable manufactured goods to someone; our consumption is more valuable than production”. We shall see how well this philosophy works.

  • -11

I think this is still too self-serving and frankly racist a spin: “Chinese are robots, sure, but they can train robots on Western genius and follow their lead, still copying the West by proxy”.

I try to approach this technically. Technically you say that Asians are incapable of thought in full generality, that – speaking broadly – they can only “execute” but not come up with effective out-of-the-box plans; that their very robust IQ edge (Zhejiang, where DeepSeek is headquartered and primarily recruits, and where Wenfeng comes from, has average around 110 - that's on par with Ashkenazim!) – is achieved with some kind of interpolation and memorization, but not universal reasoning faculties. To me this looks like a very shaky proposition. From my/r1's deleted post:

The West’s mythos of creative genius – from Archimedes to Musk – emerged from unnaturally prolonged frontiers. When Europe lost 30-60% of its population during the Black Death, it reset Malthusian traps and created vacant niches for exploration. The American frontier, declared "closed" by the 1890 Census, institutionalized risk-taking as cultural capital. By contrast, China’s Yangtze Delta approached carrying capacity by the Song Dynasty (960-1279 CE). Innovation became incremental: water mills optimized, tax registers refined, but no steam engines emerged.

This wasn’t a failure of intelligence, but a rational allocation of cognitive resources. High population density selects for "intensive" IQ – pattern-matching within constraints – rather than "extensive" creativity. The same rice paddies that demanded obsessive irrigation schedules cultivated the hyper-adaptive minds now dominating international math Olympiads. China’s historical lack of Nobel laureates in science (prior to 1950) reflects not a missing "genius gene," but a Nash equilibrium where radical exploration offered negative expected value.

R1 might understate the case for deep roots of Western exploratory mindset, but where we agree is that its expression is contingent. Consider: how innovative is Europe today? It sure innovates in ways of shooting itself in the foot with bureaucracy, I suppose. Very stereotypically Chinese, I might say.

What I argue is that whereas IQ is a fundamental trait we can trace to neural architecture, and so are risk-avoidance or conformism, which we can observe even in murine models, “innovativeness” is not. It's an application of IQ to problem-solving in new domains. There's not enough space in the genome to specify problem-solving skills only for domains known in the bearer's lifetime, because domains change; Asians are as good in CTF challenges as their grandfathers were in carving on wood. What can be specified is lower tolerance to deviating from the groupthink, for example as cortisol release once you notice that your idea has not been validated by a higher-status peer; or higher expectation of being backstabbed in a vulnerable situation if you expend resources on exploration; or greater subjective sense of reward for minimizing predictive error, incentivizing optimization at the expense of learning the edges of the domain, thinking how it extends, testing hypotheses and hoping to benefit from finding a new path. Modulo well-applied and tempered IQ, this eagerness to explore OOD is just a result of different hyperparameter values that can also produce maladaptive forms like useless hobbies, the plethora of Western sexual kinks (furries?) and the – no, no, it's not just Jewish influence, own up to it – self-destructive leftist ideologies.

One anecdote is illustrative of the conundrum, I think. Some time ago, a ByteDance intern came up with a very interesting image generation technique, VAR. It eventually won the NeurIPS best paper award! Yandex trained a model based on it already, by the way, and Yandex has good taste (I may be biased of course). But what else did that intern do? Driven by ambition to scale his invention and make an even bigger name for himself, he sabotaged training runs of his colleagues to appropriate the idle compute, fully applying his creative intelligence to derail an entire giant corporation's R&D program! Look at this cyberpunk shit:

  • Modifying PyTorch Source Code: Keyu Tian modified the PyTorch source code in the cluster environment used by his colleagues, including changes to random seeds, optimizer's direction, and data loading procedures. These modifications were made within Docker containers, which is not tracked by Git.
  • Disrupting Training Processes: Keyu Tian deliberately hacked the clusters to terminate multi-machine experiment processes, causing large-scale experiments (e.g., experiments on over thousands of GPUs) to stall or fail.
  • Security Attack: Tian gained unauthorized access to the system by creating login backdoors through checkpoints, allowing him to launch automated attacks that interrupted processes of colleagues' training jobs.
  • Interference with Debugging: Tian participated in the cluster debugging meeting and continuously refined the attack code based on colleagues' diagnostic approaches, exacerbating the issue.
  • Corrupting the Experiments: Tian modified colleagues' well-trained model weights, making their experimental results impossible to reproduce.

Upon uncovering clear evidence, ByteDance terminated Tian's internship. Instead of taking responsibility, he retaliated by publicly accusing other employees of framing him and manipulating public opinion in a malicious manner.

This, I think, is peak of non-conformist genius, the stuff of the Romance of Three Kingdoms and warlord era. This is the essence of what the Confucian paradigm is trying to suppress, crushing benign self-expression at the same time.

But what if your peers cannot backstab you? What if resources are abundant? What if all your peers are rewarded for exploration and it clearly has positive ROI for them? It might not transmogrify the Chinese into archetypal Hajnalis, who engage in these behaviors without stilts, but the result will be much the same.

Only on greater scale.

R1:

Liang’s meta-derisking – making exploration legible, replicable, and prestigious – could trigger a phase shift. But true transformation requires more than outlier firms. It demands ecosystems that reward speculative genius as reliably as rice farmers once rewarded meticulousness. The question isn’t whether Chinese minds can innovate, but whether China’s institutional lattice will let a thousand DeepSeeks bloom – or if this lone swallow merely heralds a cultural spring that never comes.

I think the US Deep State was capable of winning this, just like Russia was capable of winning in Ukraine, in theory, if we were to ignore the actual level of Russian governance and corruption and ability to prosecute the war rationally. I knew of that one and so didn't expect Russia to win, and overestimated the US mainly because I did not account for the immense capacity for self-sabotage.

The US State department isn't staffed by geniuses who can shape the world to their liking.

I think they have enough talented people to do this, it's just those people have lost in internal politics.

simply because the world is too hideously complex a system for someone of any intellect or means to meaningfully manipulate

Manipulating the world is made much easier when you own major causal factors of that world. It doesn't take 200 IQ, though intelligence helps not to manipulate yourself into the ditch. All of great power politics is such manipulation. Suppressing competitors, strengthening allies, capturing international institutions, and yes, it's done by networks of high-agency people, not by vague sentiment of the electorate. Sorry, that's just what we can observe happening.

Nothing is set in stone; despite triumphalist propaganda directed at the public, I think the USG is aware of the problems by now and still has major cards like monopoly in crucial technology (ASML is a de facto American company), global reserve currency and, most of all, global goodwill, everyone anxious to go back to normal. Trump has improved his standing in the Middle East with a single speech. Americans are losing time but they can undo the self-inflicted damage with a few more such pivots, apologize for tone-deaf Greenland-posting, revitalize their alliance networks, actually reindustrialize, implement very liberal issuance of citizenship to all Chinese talent and brain-drain the nation – and that's not all. Maybe the AGI God plan will work out too – after all, the attack on Huawei and broader semiconductor supply chain was a resounding success of the sort I expected, it did delay China by years. Maybe Starship makes Brillant Pebbles a reality and forces China to disarm and sign unequal treaties… The US Hegemony is very much a viable project, except some Americans are in the way.

I recognize that my median prognosis has changed in a way that seems discrediting, but it's basically down to high-noise human factors on the US side.

All of these criticisms can be leveled at the Chinese as well - you've never heard them rant about 5,000 years of civilization?

They do have a strong belief in their civilizational superiority, and this chauvinism and smugness is another reason I was bearish on them. But in assessment of their current relative position they tend to be humble. “Building a world-class navy by 2035” is a typical Chinese goal. “Becoming a moderately prosperous society by 2020”. In 2018, Xi said:

When I met with Chinese and foreign journalists after the First Plenary Session of the 19th CPC Central Committee, I said that the Chinese Communist Party was determined to make a thousand years of greatness for the Chinese nation, and that a hundred years was just the right time to be in its prime. At the same time, I said this with a deep sense of worry. From our history, dynasties existed for more than 400 years in the Xia Dynasty, 600 years in the Shang Dynasty, 300 years in the Western Zhou Dynasty, 500 years in the Eastern Zhou Dynasty, 215 years in the Western Han Dynasty, 195 years in the Eastern Han Dynasty, 290 years in the Tang Dynasty, 277 years in the Ming Dynasty, 268 years in the Qing Dynasty, 15 years in the Qin Dynasty, 61 years in the Three Kingdoms, 167 years in the Northern Song Dynasty, 153 years in the Southern Song Dynasty, 90 years in the Yuan Dynasty, 38 years in the Republic of China, and other small dynasties There are countless blips and dynasties. The Qin Dynasty, Northern Song Dynasty, and Yuan Dynasty were all once unbeatable powers, but soon fell into disrepair. Those longer dynasties were also corrupt, socially unstable, discontented and rebellious, and many of them were left to languish and die. This shows that after a regime is established, it is not easy to maintain prosperity and long-lasting peace. Without self-reflection, vigilance, and effort, even the most powerful regimes can come to the end of the road.

It is now 97 years since the founding of our Party and 69 years since the founding of New China. The Soviet Communist Party has existed for 86 years, and the Soviet Union for 74 years. Our Party’s history exceeds that of the Soviet Communist Party, and our Party has not held national power for as long as the Soviet Union. By the middle of this century, the history of our Party will be close to 130 years, and the history of New China will reach 100 years. Comrade Deng Xiaoping said, “The consolidation and development of the socialist system will require a long historical stage, and it will take several generations, a dozen generations, or even dozens of generations of our people to struggle persistently and diligently.” How many years is that? It has to be calculated in terms of millenniums. This means that it will take a long historical period for us to build socialism with Chinese characteristics well and into. In this long historical process, it is an extremely difficult and risky challenge to ensure that the Chinese Communist Party does not collapse and the Chinese socialist system does not fall. Once upon a time, the Soviet Communist Party was so strong, the Soviet Union was so powerful, but now it has long been “the old country can not look back at the bright moon”. A generation does the work of a generation, but without historical perspective, without a long-term vision, also can not do the things of the moment.

This does not look as hubristic as American Main Character Syndrome to me.

The century of humiliation making them temporarily embarrassed hegemons

China has never held more than tenuous regional hegemony, I think this framing is not reflective of their ambitions and self-perception.

And you think that a world where China is hegemon won't see shit like Trump's exploitative trade war on the regular?

Yes. It's a stupid trade war and it's highly likely that no Tsinghua graduate will be so stupid. That aside, China has an official policy of not pursuing global hegemony. This certainly has no teeth, but Americans don't even have an equivalent toothless commitment.

Not to mention I'm fairly confident I've seen you mock Americans hyping the 'Chinese threat' and making them out to be more competent than they actually are as a motivation for more defense spending.

I've been right about that, Americans do hype up the Chinese military threat excessively, and they don't even build military that'd be useful in countering that threat, it's nearly entirely a grift. $1 trillion will go to more nebulous next-generation prototypes and battling the tyranny of distance in distant bases, not to a buildup of autonomous platforms that can compete in the SCS. Again, assuming Americans keep self-sabotaging.

Putin would have caved because his nation is barely hanging on while fighting against a 3rd rate local power

As a matter of fact I think this is not how we should be perceiving Ukraine, and in the present condition it would likely have been able to overwhelm any European military except perhaps France and Poland one on one. Consider that Europeans are not actually Aryan superhumans, their pretty exercises would amount to meme material in a week of fighting a real large scale war, and they have very little in the way of materiel too. They are concerned about Russia for a good reason: they are in fact weak.

Or are you trying to make the argument that the US state department is competent, but got played by even bigger-brained Israelis?

More charitably, I think that the US and Israel are a geopolitical bloc with shared elites, in which core US interests sometimes take a backseat for the bigger picture, to some consternation of the electorate. The US is long-term invested in the Israeli dominance in the Middle East. This isn't even different from the official rhetoric.

Ah, that was very generous of them. I'm sure self-interest played no part in it, and it's not even clear what you mean by that - buying treasuries? If so, they bought treasuries throughout the early 2000s at a rate not that different from 2008 - was that also for altruistic reasons?

It was after this paragraph that I decided to just stop reading. A Hajnali is just a Hajnali, in your head reality and morality melt together, a proper and cooperative action must be morally motivated, so you will engage in these ridiculous theatrics because you feel morally outraged at China and at me. It's so annoying.

Of course China did it for self-interest. As for buying treasuries. Have you actually checked? That was the sharpest sustained acceleration on the chart.

Some facts gathered by a Chinese open-source AI, which you could freely use instead of trying to be clever:

  • In November 2008, China launched a ¥4 trillion ($586 billion) stimulus (13% of its GDP), dwarfing the U.S. stimulus (5% of GDP) 1 … This rapid recovery boosted global markets and commodity demand
  • Mechanism of Support: China recycled trade surpluses into U.S. Treasuries to maintain a weak yuan, ensuring export competitiveness. By 2008, it held $700 billion in U.S. debt. This provided critical demand for U.S. debt during massive deficit spending (e.g., TARP bailouts)

During 2008, about half of China’s total reserve accumulation of $400 billion went towards net purchases of U.S. treasury bills and bonds.

During September to November 2008, the latest three-month period for which data are currently available from the U.S. Treasury, Chinese purchases of U.S. treasury bills and bonds amounted to nearly $123 billion—this at a time when U.S. financial markets were in deep turmoil. The continued flow of Chinese money into U.S. treasuries is of course rather convenient for the U.S. at a time when it faces the prospect of having to finance a massive budget deficit.

All this has saddled China with provincial debt, bloated real estate market, and systemic imbalances they're still not finished dealing with.

Of course not just China, everyone had to pay for American profligacy and scamming, to avoid a truly catastrophic recession. But my argument here was not that China Good: solely that allowing Chinese development in the first place, instead of pursuing a more negative-sum strategy, was not a blunder or a betrayal of American self-interest. America actually can benefit from global growth (eg by getting bailed out in a crisis, after having become a pillar of global economy). Chinese growth prior to this phase of conflict is, therefore, not evidence of American Deep State being incompetent.

You've lost track of that, and I've lost interest in combing through your rather emotional text. In short, Vietnam was premised on a reasonable fear of the domino effect, and most specifically-American problems (eg falling birth rate isn't one) are genuinely hard to solve due to the nature of American economy and population. The tradeoffs so far have been very worth it, accumulating towards even greater ones, and I believe they have been greater than what the American population without such high-IQ stewardship could hope to earn.

For the contrary example, look no further than the EU and Canada. They have comparable population quality, are at the same stage of development, and share many of your natural advantages. How have the last 20 years been for them? Are they famed for their Deep State? I rest my case.

I'll wager that if we're still here in 3-5 years, you'll be saying the same thing about underestimating the Chinese capacity for self-sabotage.

I have never underestimated their capacity for self-sabotage.

Your complaints about GWOT are motivated reasoning, GWOT was quite successful for Israel at least.

The US has been able to grow its economy extremely rapidly through Chinese industrialization, without that your, as marxists say, Internal Contradictions would have likely brought about a protracted recession already. Don't forget that in 2008, it was China that bailed you out. Those aren't so much major errors as conflicts of priority between sectors of American elite.

where's the golden era in American foreign and domestic policy mediated by these people?

1970s-2023, I'd say. Your safe and prosperous world is a product of an overall competent policy. Just continuing and improving on Biden's program could have been enough. See the success of CHIPS act, for example.

Like what, the financial system that proved utterly incapable of regime change in Iran or hindering Russia's ability to wage war?

Like owning the biggest consumer market in the world, most of the world's most prized IP, having military presence in all corners of the world. It's not the UN, it's the ability to spit at UN decisions and opinion of all UN members individually when needed, and not suffer economic consequences like Russia.

You bring up Russia and Ukraine - in March 2022, was there anyone (including what we can guess the US state department thought at the time!) who confidently predicted the outcome would be >= 3 year grinding war with little movement on the front, dominated by drone warfare?

I recall I did predict a long grinding war after like a week of it. Failure of the brazen paratrooper operation at Hostomel suggested that no quick resolution is likely; Ukrainians recognize it was a pivotal point, and if better executed (and less competently opposed), would have likely allowed Russia to settle the war on preferred terms. There have been a few others who thought likewise. I did miss drones, and predicted more WWII style mass mobilization with heavy artillery and aviation use and millions dead. We got some WWII features but not that. What did you say at the time?

If Americans were truly hegemonic and held that as their goal, the world would look very different.

Sorry, this sounds very much like Russian “we haven't even started yet” narrative to me.

That's not terrible prose but how do you square the idea that Trump isn't stupid with the fact that he apparently doesn't know how his beloved tariffs work?

Find me a single instance in history where a nation was able smoothly transition through a period of declining population as the old begin to outnumber the young.

Can you give me a list of failure cases?

This is uncharted territory. All developed countries are aging, and all of them are losing out in overall population productivity through some combination of aging, dysgenics and demographic replacement. It's not even clear that China is declining faster than the US – at the very least, they are consistently graduating more and more highly educated workers, while Americans are struggling to hire literate people for menial jobs. Quantitatively, Chinese workforce size will continue to exceed the entire Western world's one for decades. Dependency ratio will reach Japanese levels in, what, 2045? This is not serious.

utterly dependent on continued imports of agricultural products and energy and most raw materials

What does this have to do with anything? They'll keep importing soybeans from Brazil and iron from Australia. They have $1T trade surplus and, for some years, have been annually installing as much or more industrial automation than the entire rest of the world combined. Their problem right now is not workforce, but that the world is too poor to absorb their exports.

Do you just operate on the assumption that China is a land of mobilized peasants gluing sneakers by hand, and when peasants get old, the gig is over?

This would actually make more sense yes. Allegedly, Trump’s Love for Tariffs Began in Japan’s ’80s Boom:

WASHINGTON — Donald J. Trump lost an auction in 1988 for a 58-key piano used in the classic film “Casablanca” to a Japanese trading company representing a collector. While he brushed off being outbid, it was a firsthand reminder of Japan’s growing wealth, and the following year, Mr. Trump went on television to call for a 15 percent to 20 percent tax on imports from Japan.

“I believe very strongly in tariffs,” Mr. Trump, at the time a Manhattan real estate developer with fledgling political instincts, told the journalist Diane Sawyer, before criticizing Japan, West Germany, Saudi Arabia and South Korea for their trade practices. “America is being ripped off,” he said. “We’re a debtor nation, and we have to tax, we have to tariff, we have to protect this country.”

Thirty years later, few issues have defined Mr. Trump’s presidency more than his love for tariffs — and on few issues has he been more unswerving. Allies and historians say that love is rooted in Mr. Trump’s experience as a businessman in the 1980s with the people and money of Japan, then perceived as a mortal threat to America’s economic pre-eminence.

That's from 2019. If China is just a stand-in for Japan (which he also tariffed anyway), it is no wonder that he acts so brazenly. You can bully Japan with no repercussions. He can finally get back at them for that piano humiliation.

I'd say it's another bit of evidence for Google upgrading their product strategy, but nothing unexpected capabilities-wise. Shame they did not release the weights, instead shipping only Gemma 3 with image-in text-out. «Safety» reasoning is obvious enough.

Contra @SkoomaDentist I think it's not fair to describe this as «The LLM is still talking to the image generator», ie that the main LLM is basically just the encoder for some diffusion model or another separate module. The semantic fidelity and surgical precision of successive edits suggest nothing like that, and point instead to a unified architecture with a single context where each token, be it textual or visual, is embedded in its network of relationships with all others (well, that's what these models are – literally, hypotheses about the shape of the training data manifold). Back when OpenAI announced their image-out capabilities with 4o, the teaser generation said «suppose we directly model P(text, image, sounds) with one big autoregressive transformer». Shortly after, Meta (or really Armen Aghajanyan, who has since departed largely in protest over Chameleon's safety-informed nerfing, and his team) published their Chameleon, a parallel work in identical spirit:

This early-fusion approach, where all modalities are projected into a shared representational space from the start, allows for seamless reasoning and generation across modalities. … Chameleon represents images, in addition to text, as a series of discrete tokens and takes advantage of the scaling properties of auto-regressive Transformers … We train a new BPE tokenizer (Sennrich et al., 2016) over a subset of the training data outlined below with a vocabulary size of 65,536, which includes the 8192 image codebook tokens …

Later, DeepSeek, who are probably the best team in the business (if not for resource limits), have been working on Janus, which is also a unified model of a potentially superior design:

Specifically, we introduce two independent visual encoding pathways: one for multimodal understanding and one for multimodal generation, unified by the same transformer architecture … Autoregressive models, influenced by the success in language processing, leverage transformers to predict sequences of discrete visual tokens (codebook IDs) [24, 65, 75]. These models tokenize visual data and employ a prediction approach similar to GPT-style [64] techniques. … Chameleon [77] adopts a VQ Tokenizer to encode images for both multimodal understanding and generation. However, this practice may lead to suboptimal outcomes, as the vision encoder might face a trade-off between the demands of understanding and generation. In contrast, our Janus can explicitly decouple the visual representations for understanding and generation, recognizing that different tasks may require varying levels of information. … for text understanding, we use the built-in tokenizer of the LLM to convert the text into discrete IDs and obtain the feature representations corresponding to each ID. For multimodal understanding, we use the SigLIP [92] encoder to extract high-dimensional semantic features from images. These features are flattened from a 2-D grid into a 1-D sequence, and an understanding adaptor is used to map these image features into the input space of the LLM. For visual generation tasks, we use the VQ tokenizer from [73] to convert images into discrete IDs. After the ID sequence is flattened into 1-D, we use a generation adaptor to map the codebook embeddings corresponding to each ID into the input space of the LLM. We then concatenate these feature sequences to form a multimodal feature sequence, which is subsequently fed into the LLM for processing. The built-in prediction head of the LLM is utilized for text predictions in both the pure text understanding and multimodal understanding tasks, while a randomly initialized prediction head is used for image predictions in the visual generation task.

I expect DeepSeek's next generation large model to be based on some mature form of Janus.

I think Gemini is similar. This may be the first time we get to evaluate the power of modality transfer in a well-trained model – usually you run into the bottleneck of the projection layer, as @self_made_human describes. But here, it can clearly copy an image (up to the effective "resolution" of its codebook and tokenizer) and make isolated transformations, precisely the way transformers can do to a text string. Hopefully this means its pure verbalized understanding of the visual modality (eg spatial relations, say… anatomy…) is upgraded. Gooners from 4chan ought to be reaching the conclusion as I type this.

In the next iteration video and probably 3d meshes are getting similar treatment.

P.S. SkoomaDentist being bizarrely aggressive and insistent that this is whatsoever like inpainting is being very funny. Inpaint this. No, no, these are not vulgar tricks, and I don't see why one could be invested in bitterly arguing against that.

I think it has a non-negligible chance of happening. Trump is the new face of America that does not pretend to play by normal countries' rules. The United States is a super-hegemon, a nation not facing even any plausible threat of competent adversary. They can take what they want, the way China/Russia/Iran/etc would very much like to be able to do but can't on account of the United States existing. In front of this face, sovereignty of almost every other country is a bluff that's easy to call. Nobody can militarily oppose the US, and most people on the globe buy into American culture and vision more than into their own regimes and bureaucracies. Certainly that's true of Egypt.

The actual shape of the deal will be about cleansing Gazans and providing security to settlers, though. Securing Israeli interests is one of the foundational, terminal values of the US.

What matters is not whether I go full Moldbug, but whether Trump will go full retard. He does suggest that the EU buys impossible volume of oil, no? How is my plan much worse?

This is of course a projection of your own tribalism and your own deluded moral framework.

Your problem is that your only guiding light, the only salvation you see for your people, is Nazism, and Nazism is still quite degenerate and NGMI. I won't talk of its moral merits, it's just strategically bad because it's aestheticized desperation and refuge from hopelessness in animalistic impulses. A stronk chieftain (high agency!), will to power (rock the boat!), blood-based tribal identity, vibes over facts… in effect, reject modernity, retvrn by rolling back the evolutionary clock 9000 years, to where an average European was a fat bipolar slob with 65 IQ. Nazism was swiftly crushed by Capitalism and Communism. 80 years later, they remain the dominant forces on the planet and continue their dialectic and coevolution. You like to think that Judaism is still more important, the root of all evil. Well, it's underrated for obvious reasons, I'll give you that, but Earth is a big place, and your struggle with Joos is ultimately quite parochial.

I have observed many sincere Nazis over the years and most are suicidal. It doesn't have to be this way. Accept that the dream of Aryan greatness is dead, but you can live if you accept this world on its own terms, where your people have some advantages and disadvantages entirely irrespective of “jewish manipulation” or “suicidal empathy” or what have you, and need to manage them soberly. In particular this requires a good understanding of where you stand relative to that huge chunk of humanity in East Asia. One approach is to cope with 4chan gifs of tortured dogs and industrial accidents, or the book of Ralph Townsend. Another is to grow the fuck up.

If you're really an SWE, I must presume that you're not speaking in good faith here.

Asking it for a gear setup for a specific boss results in horrible results, despite the fact that it could just have copied the literally wiki (which has some faults like overdoing min-maxing, but it's generally coherent). The net utility of this answer was negative given the incorrect answer, the time it took for me to read it, and the cost of generating it (which is quite high, I wonder what happens when these companies want to make money).

You must know that GPT 4.5 is pretty mid as far as instruction models of this generation go. DeepSeek's latest is close in performance and literally 100-200x cheaper. More importantly, what do you think would be a random college-educated human's score on Runescape questions? It is so trivial to grant these systems access to tools for web browsing as to not be worth talking about.

The rest of your comment is the same style. What is amazing and terrifying about LLMs is not their knowledge retrieval but generality and in-context learning. At sufficient context length and trained to appropriately leverage existing tools, there is nothing in the realm of pure cognitive work they cannot do on human level. This is not hard to understand. So tell me: what are you going for? Just trying to assuage your own worries?

The likelihood of winning a conflict has little relevance to whether that conflict should be waged in the first place.

It actually has a lot of relevance. The real reason you act like it doesn't is that you do not seriously engage with the possibility of losing, and losing badly (losing what? To what degree? How many cards do you have left at the point of losing, and what terms can be negotiated?). People make unreasonable maximalist demands when they are assured of their invulnerability. You treat a great power conflict like another Middle Eastern adventure, “oh we found WMDs in this shithole, our Democracy will perish if we do not conquer it hue hue!”. It's an instinct that's hard to overcome after a century of uninterrupted wins and cost-free losses. The same Main Character Syndrome, coupled with low human capital in Trump team, explains decidedly suboptimal and cost-insensitive means that were chosen for prosecuting the conflict. Americans think they can afford anything, because that's recorded in their institutional DNA. But they have never fought a superior power, due to it never having existed prior to this day. So they have developed an auxiliary belief that the very fact of them antagonizing any power confirms it is inferior. It's hard to feel pity for such a narcissistic people.

it is the serf who acts in accordance with prudence and rationality. The serf is a serf precisely because he correctly calculates that servitude is what gives him the best odds of continued survival. The nobleman, in contrast, acts in accordance with virtue, even when the outcome is certain destruction.

In Imperial Russia, there was a trend when mujiks, LARPing as nobles, initiated duels over petty spats, murdering each other with axes; eventually the state had to put its boot down. Due to extremely low literacy rates they couldn't have plausibly cited Nietzsche when doing so, but I believe that they'd have appreciated your quote.

Self-serving, petulant, handwavy, shallowly aesthetic notions of virtue are cheap and easy to brandish in defense of one's animalistic impulses; any kind of impulsive retardation can be dressed up as a calling of aristocratic, virile masculine nature, there's a whole genre of extremely popular Western music about it, authored by the impromptu warrior aristocracy of the streets. Your own elite has been wiped out to such a degree that this whole discourse is vacuous, we can't consult with a living bearer of a tradition, only speculate. It is plausible that I am wrong and there's just never been any substance to the whole fraud.

No, you're correct. In fact, had the US continued the course it had just half a year ago, I'd still be largely holding this opinion. But the election of an illiterate boomer strongman does change matters. Xi also has managed to not do anything too self-defeating long enough. I admit I was wrong: the US does not have a functional elite to make appropriate use of its genuine (if transient) political, economic and technological advantages and keep the Chinaman down.

ASI may restore my faith in the previous model, but this is looking like a remote possibility.

After all, if the American GDP is in some way fake how come the median American can buy so much Chinese production with his or her dollars?

Largely because China (like everyone else) is buying your assets and the USD is the global reserve currency.

Trump is doing what he can to fix this pathological situation, by being laser focused on goods.

It's a harder brand of Russian sarcasm, applied in inherently absurd circumstances.

I think some win-win can be had, especially considering that Trump's platform is incoherent. He said he wanted Europe to spend more on defense and be more independent, and he'll get it. Did he want it ho happen like this? And strengthened EU-China trade too? Probably not. But he'll definitely have something to report as a win to his electorate.

The problem is that you consume too much neocon/Zionist propaganda from trash like Zenz. The reporting bias may actually run in the other direction. Xinjiang today is peaceful and Uighurs are beneficiaries of strong labor laws and affirmative action. Western tourists can visit it, Americans marry Uighur people, economy is booming, infrastructure is being built… Uighurs are still the majority and will likely remain the majority because there's a finite and dwindling supply of Han people in China. Whatever has happened there during the heavy enforcement and «reeducation» period, has ended with a state of affairs both parties can at least survive without bloodshed. This is not an endorsement of what has been done. This is a point of comparison.

Meanwhile Gaza is a smoldering ruin with casualties on par with Russia-Ukraine war, and Israel is negotiating for a thorough ethnic cleansing, while the fighting goes on.

No matter how you look at it, Israelis have been extraordinarily brutal and inefficient at that. It's like saying Russia has shown exemplary discipline in Chechnya, any nation would do the same in its position. No we haven't, it was a shitshow (and ended in humiliation of handing it over to Kadyrov).

There is something to the French case, but modern fertility collapse is uncharted territory in that it happens globally, for new reasons, in conditions of rapidly rising productivity via technological progress. I do not believe that “this country has higher TFR”, alone, is now predictive of much of anything, except the population age structure itself.

And yet it is unable to employ all of those workers

Fair enough, and yes, this goes to show that they're not on the verge of economic decline through labor shortages.

They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby

Of course. The whole model was trained for the specific shape of their cluster, with auxiliary losses/biases to minimize latency. (Same was true of V2). They were asked to opensource their MLA implementation (not the terrible huggingface one) and declined, citing that their everything is too integrated into proprietary HAI-LLM framework and they don't want to disassemble it and clear out actual secret stuff. SGLang team and others had to reverse engineer it from papers. Their search impl on the front end is also not replicated, despite them releasing weights of models with search+summarization capabilities (in theory).

Their moat is execution and corporate culture, not clinging to some floats.

That's the point: He is invited NOW, after "suddenly" shipping a model on Western Frontier level.

7 months ago I have said:

We don't understand the motivations of Deepseek and the quant fund High-Flyer that's sponsoring them, but one popular hypothesis is that they are competing with better-connected big tech labs for government support, given American efforts in cutting supply of chips to China. After all, the Chinese also share the same ideas of their trustworthiness, and so you have to be maximally open to Western evaluators to win the Mandate of Heaven.

Presumably, this was true and this is him succeeding. As I note here.

As for how it used to be when he was just another successful quant fund CEO with some odd interests, I direct you to this thread:

The Chinese government started to crack down on the quant trading industry amid economic slowdown, a housing crisis and a declining stock market index.

The CSI300 (Chinese Blue Chip Index) reached an all-time low. They blamed high frequency traders for exploiting the market and causing the selloff.

  • Banned a quant competitor from trading for 3 days
  • Banned another from opening index futures for 12 months
  • Required strategy disclosures before trading
  • Threatened to increase trading costs 10x to destroy the industry High-Flyer faced extinction. (High-Flyer’s funds have been flat/down since 2022 and has trailed the index by 4% since 2024)

so I stand by my conjectures.

they still have a good model, though I wouldn't exactly trust the headline training cost numbers since there's no way to verify how many tokens they really trained the model on

So you recognize that the run itself as described is completely plausible, underwhelming even. Correct.

What exactly is your theory then? That it's trained on more than 15T tokens? 20T, 30T, what number exactly? Why would they need to?

Here's a Western paper corroborating their design choices [Submitted on 12 Feb 2024]:

Our results suggest that a compute-optimal MoE model trained with a budget of 1020 FLOPs will achieve the same quality as a dense Transformer trained with a 20× greater computing budget, with the compute savings rising steadily, exceeding 40× when budget of 1025 FLOPs is surpassed (see Figure 1). … when all training hyper-parameters N, D, G are properly selected to be compute-optimal for each model, the gap between dense and sparse models only increases as we scale… Higher granularity is optimal for larger compute budgets.

Here's DeepSeek paper from a month prior:

Leveraging our architecture, we subsequently scale up the model parameters to 16B and train DeepSeekMoE 16B on a large-scale corpus with 2T tokens. Evaluation results reveal that with only about 40% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B (DeepSeek-AI, 2024), a dense model trained on the same 2T corpus. We also compare DeepSeekMoE with open source models and the evaluations demonstrate that DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B (Touvron et al., 2023b), which has approximately 2.5 times the activated parameters. Evaluation results show that DeepSeekMoE Chat 16B also achieves comparable performance with DeepSeek Chat 7B and LLaMA2 SFT 7B in the chat setting. Encouraged by these results, we further undertake a preliminary endeavor to scale up DeepSeekMoE to 145B. The experimental results still validate its substantial advantages over the GShard architecture consistently. In addition, it shows performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations.

As expected they kept scaling and increasing granularity. As a result, they predictably reach roughly the same loss on the same token count as LLaMA-405B. Their other tricks also helped with downstream performance.

There is literally nothing to be suspicious about. It's all simply applying best practices and not fucking up, almost boring. The reason people are so appalled is that American AI industry is bogged down in corruption covered with tasteless mythology, much like Russian military pre Feb 2022.