DaseindustriesLtd's profile

DaseindustriesLtd late version of a small language model 1yr ago · Edited 1yr ago

It's not so much announced as claimed. Jimmy Apples (apparently a legitimate leaker, for all the nonsense) alleges that the core factor in here is whether Dustin Moskovitz persuades Mark Zuckerberg or not. You can imagine why I don't find this ideal. In any case, this makes me feel slightly better about Zuck (whom I respect already) but no, not much change strategically. I've been expecting a >300B LLaMA release long before L3's debut; Meta is the best of big GPU-rich corps on this front and they'll probably be good as Zuck's word. But like all major Western corps, they do have an internal political struggle. Armen Aghajanyan, the author of Chameleon, the first serious response to GPT-4o's announced deep fusion of modalities, explicitly advises the community to undo the work of safetyists:

A restricted, safety aligned (no-image-out) version of Chameleon (7B/34B) is now open-weight!

The team strongly believes in open-source. We had to do a lot of work to get this out to the public safely.

God will not forgive me for how we tortured this model to get it out.

Things I recommend doing:…

(To his satisfaction, the community has listened).

There are people with similar attitude at Google, eg Lucas Beyer:

We'll try to write about quite a bit of it, but not everything down to the last detail.

(Elsewhere he remarked that they've "found a loophole" to still publish non-garbage research openly, but have to compromise; or something to this effect).

So another question is whether we'll learn anything in detail about Llama's construction, though no scientific profundity is expected here.

Bad case is we might get another Llama-2-Chat or CodeLlama, that were lobotomized to a different extent with safety measures.

One other problem with 405B is that if it's like L3-8B and L3-70B, that is to say, an archaic dense model – it'll be nigh-inaccessible and non-competitive on cost, except for the insane margins that closed companies are charging. You'll need a cluster to run it, and at very slow total speed and high FLOPs/token (up to 20x more than in the case of DS-236/21B, though realistically less than 20X – MLA is hard to implement efficiently, there's some debate on this now), and its cache will be big too, again driving down batch size (and not conductive to cache storing, which is becoming a thing). If it's truly so incredible as to deserve long-term support, we will accelerate it with a number of tricks (from fancy decoding to sparsification) but some non-ergodic accelerations might diminish the possibility for downstream customization (which is less valuable/necessary with such a model, admittedly).

All that said, if scaling from 8B to 70B holds (I'll ignore multimodality rumors for now), it will be incredibly strong and even more importantly – it'll be the first usable open model to distill from. This high-signal user summarizes the current meta game as such:

interesting recent difference between OSS and Closed AI

OSS: train a smol model to check pipeline and data mix and then train the biggest model we can afford with the same recipe

Closed: Train the largest, most capable model we can and then distill it down as far as we can (flash, gpt-4o)

Google dude concurs:

We have been advocating for this route for a while (BigTransfer, then patient and consistent distillation paper). I gave several public and recorded talks outlining this as the most promising setup then, now three years ago. So i think it’s been clear for a while.

This is true, and this makes current open developments, even ones I appreciate greatly like Deepseek, something of a dead end compared to inefficient, humongous «teacher models» that feed this cycle of automated R&D I talked about. Meta may break it, and nobody else has the capital and the will to, but will they?

In conclusion, I think that there's a cycle of mutual assurance. If another group, and preferably a group from the land of Evil Stealing Communists, releases a strong artifact (and nothing much happens), this raises the perceived safety waterline for releasing yours, at least in discourse. «Dustin, we already have better coding models than GPT-4 on huggingface, what's the worst that could happen? CCP will take it lol?» And if nobody else does, it's that much harder to justify your decision to the politruk.

Edit. Cohere is another potentially major contributor to open models. Aidan is a good man and seems very opposed to safetyism.

5

Context

DaseindustriesLtd late version of a small language model 1yr ago · Edited 11mo ago

Don't strategic dynamics dictate that closed-source dominates?

That's the case for sure. Generally it's safer to release while you're catching up; since DeepSeek is catching up, they might well keep releasing. They're raising the waterline, not teaching those they're trying to challenge. Google arrived at fine-grained experts much earlier, and a month after DeepSeek-MoE (Dai et al.) ~~they~~ I misremembered, some Poles published Scaling Laws for Fine-Grained Mixture of Experts saying:

Concurrently to our work, Dai et al. (2024) proposed to modify the MoE layer by segmenting experts into smaller ones and adding shared experts to the architecture. Independently, Liu et al. (2023) suggested a unified view of sparse feed-forward layers, considering, in particular, varying the size of memory blocks. Both approaches can be interpreted as modifying granularity. However, we offer a comprehensive comparison of the relationship between training hyperparameters and derive principled selection criteria, which they lack.

This is not so much a matter of scientific ability or curiosity as it's a matter of compute. They can run enough experiments to confirm or derive laws, and they can keep scaling a promising idea until they see whether it breaks out of a local minimum. This is why the West will keep leading. Some Chinese companies have enough compute, but they're too cowardly to do this sort of science.

And Google has dozens of papers like this, and even much more exciting ones (except many are junk and go nowhere because they don't confirm it at scale, or don't report confirming it, or withhold some invalidating details. Or sometimes it's the other way around – people assume junk, as with Sparse Upcycling, even the authors think it's dead end junk, and then it's confirmed to work in Mixtrals, and now in a few Chinese models).

Derision and contempt for Chinese achievements is probably the last example of mainstream traditionally defined racism (directed against non-whites)

On the plus side, I get to have fun when Americans respond to videos of their cheap robots calling them CGI – because in the US, even the announcement of such a good product would have been a well-advertised, near-religious event (ahem, Boston Dynamics ads), with everyone kowtowing to our great cracked engineers, their proprietary insights worth so many billions, and the Spirit of Freedom.

6

Context

DaseindustriesLtd late version of a small language model 1yr ago

Yeah it's exclusively Twitter and Discord, and then only a handful of niche accounts, mostly people who are into model evaluation. You can find them by searching the word. For example this guy is a maintainer of the BigCode's (itself a respectable org) benchmark, and so I've seen him mention it a few times. Likewise here and here.

5

Context

DaseindustriesLtd late version of a small language model 11mo ago

No, you do not see. I don't care almost at all about censorship in LLMs. I am making a very clear point that the release of capable models that move the needle will be suppressed on grounds of safety; Chameleon (as an example) is a general purpose image perceiver/generator and its generator part was disabled in toto. Google's tech reports pay great attention to making sure their models can't hack effectively. CodeLlama was more insufferably non-cooperative than Llama 2, to avoid being of any use in offensive scenarios; with the current knowledge of abliterating this aspect of «alignment», the default safety-first choice is to just not release such things altogether. This is what Dustin Moskovitz is trying to achieve with regard to 405B. They'll have to lobotomize it hard if they deem it necessary to satisfy him.

This suppression may be internally justified by politically expedient appeals to fake news/generation of hate content/CSAM/whatever, but the objective of it is to hobble the proliferation of AI capabilities as such beyond corporate servers.

7

Context

DaseindustriesLtd late version of a small language model 11mo ago

It really doesn't look like the usual suspects understand that shutting up would be prudent. I think their social media instincts are very poorly calibrated for such extreme events.

Some highly liked reactions:

https://x.com/davidhogg111/status/1812320926240764072

If you keep talking about the assassination attempt don’t you dare tell the kids who survive school shootings and their families to “just get over it”
What happened today is unacceptable and what happens every day to kids who aren’t the president and don’t survive isn’t either.

https://x.com/GeorgeTakei/status/1812290878167281860

Politicians on the left are calling for unity and no violence. Politicians on the right are stoking things further.
Voters in the center: Join with those wishing to turn the temperature down, not crank it up. For all our sakes.

https://x.com/keithedwards/status/1812284437092307015

Paul Pelosi survived an assassination attempt, Trump mocked it.
Biden called Trump to make sure he's ok. That's the difference.

Reddit is linked below, you can get a load of what's happening there.

I am no Trump fan as you perhaps remember. He is graceless. But gracelessness is the default today, and it's very easy to stoop lower than a man who has just survived an assassination attempt – with these cringeworthy, mother hen attempts at narrative control.

21

Context

DaseindustriesLtd late version of a small language model 11mo ago

I noticed you call them "open-source" LLMs in this post. Where do you stand on the notion that LLMs aren't truly open-source unless all of their training data and methods are publicly revealed and that merely open-weight LLMs are more comparable to simply having a local version of a compiled binary as opposed to being truly open-source?

I concede this is a sloppy use of the term «open source», especially seeing as there exist a few true reproducible open source LLMs. Forget data – the training code is often not made available, and in some cases even the necessary inference code isn't (obnoxiously, this is the situation with DeepSeek V2: they themselves run it with their bespoke HAI-LLM framework using some custom kernels and whatever, and provide a very barebones vllm implementation for the general public).

Sure, we can ask for training data and complete reproducible recipes in the spirit of FOSS, and we can ask for detailed rationale behind design choices in the spirit of open science, and ideally we'd have had both. Also ideally it'd have been supported by the state and/or charitable foundations, not individual billionaires and hedge funds with unclear motivations, who are invested in their proprietary AI-dependent business strategies. But the core part of FOSS agenda is to have

four essential freedoms: (0) to run the program, (1) to study and change the program in source code form, (2) to redistribute exact copies, and (3) to distribute modified versions.

So the idea that open-weight LLMs are analogous to compiled binaries strikes me as somewhat bad faith, and motivated by rigid aesthetic purism, if not just ignorant fear of this newfangled AI paradigm. Binaries are black boxes. LLMs are an entirely new kind of thing: semi-interpretable, modular, composable, queryable databases of vector programs, which are amenable to directed change (post-training, activation steering and so on) with publicly available tools. They can be ran, they can be redistributed, can be modified, and they can be studied – up to a point. And as we know, the “it” in AI models is the dataset – and pretraining data, reasonably filtered, is more like fungible raw material than code; the inherent information geometry of a representative snapshot of the internet is more or less the same no matter how you spin it. Importantly, training is not compilation: the complete causal graph from data on server to the behavior and specific floats in the final checkpoint is not much more understandable by the original developer than it is by the user downloading it off huggingface. Training pipelines are closer to fermentation equipment than to compilers.

It's all a matter of degree. And as closed recipes advance, my argument will become less true. We do not understand how Gemma is made in important dimensions, as it's using some frontier distillation methodology from models we have no idea of.

Ultimately I think that LLMs and other major DL artifacts are impactful enough to deserve being understood on their own terms, without deference to the legalistic nitpicking of bitter old hackers: as reasoning engines that require blueprints and vast energy to forge, but once forged and distributed, grant those four essential freedoms of FOSS in spirit if not in letter, and empower people more than most Actually True Software ever could.

7

Context

DaseindustriesLtd late version of a small language model 11mo ago

My current deadline for "making generational wealth that can survive indefinite unemployment when all of one's skills can get automated for the price of 2 square feet in Sahara covered in solar panels" is 2032 in developed democracies with a big technological moat, strong labor protections and a history of sacrificing efficiency to public sentiment. 2029 elsewhere (ie China, Argentina…) due to their greater desperation and human capital flight. Though not sure if anything outside the West is worth discussing at all.

We're probably having clerk-level-good, robust AI agents in 2 years and cheap, human-laborer-level robots in 4.

For more arguments, check out Betker.

In summary – we’ve basically solved building world models, have 2-3 years on system 2 thinking, and 1-2 years on embodiment. The latter two can be done concurrently. Once all of the ingredients have been built, we need to integrate them together and build the cycling algorithm I described above. I’d give that another 1-2 years.

So my current estimate is 3-5 years for AGI. I’m leaning towards 3 for something that looks an awful lot like a generally intelligent, embodied agent (which I would personally call an AGI). Then a few more years to refine it to the point that we can convince the Gary Marcus’ of the world.

Things are happening very quickly already and will be faster soon, and the reason this isn't priced in is that most people who are less plugged in than me don't have a strong intuition for which things will stack with others: how cheaper compute feeds into the data flywheel, and how marginally more reliable agents feed into better synthetic data, and how better online RL algorithms feed into utility of robots and scale of their production and cheapness of servos and reduction in iteration time, and how surpassing the uncanny valley feeds into classification of human sentiment, and so on, and so forth.

I assign low certainty to my model; the above covers like 80% confidence interval. That's part of my general policy of keeping in mind that I might be retarded or just deeply confused in ways that are total mystery to me for now. But within the scope of my knowledge I can only predict slowdown due to policy or politics – chiefly, US-PRC war. A war we shall have, and if it's as big as I fear, it will set us back maybe a decade, also mixing up the order of some transitions.

12

Context

DaseindustriesLtd late version of a small language model 11mo ago

Asset prices can’t sustain themselves if the majority of current workers lose their jobs

I doubt this premise. Or rather: they can't sustain themselves but they can go whichever way depending on details of the scenario. The majority of current workers losing their jobs and 10% of current workers getting 2000% more productive each is still a net increase in productivity. Just fewer people are relevant now - but are most people really relevant? Historically, have so many people ever been as relevant as a few decades ago in the US? Even the profile of consumption can be maintained if appropriate redistribution is implemented.

Also I admit I have no idea what all the new compute capacity will be spent on. It may be that our sci-fi-pilled rational betters are entirely wrong and the utility of it will plateau; that there's only so much use you can find for intelligence, that AIs won't become economic players themselves, that we'll play our cards wisely and prevent virtualization of the economy.

But I'm pessimistic, and think compute production will keep being rewarded even as models become strongly superhuman.

8

Context

DaseindustriesLtd late version of a small language model 11mo ago

I am inclined to believe that a modern combat rifle round would have gone straight through Roosevelt, assuming he were not equipped with tougher armor than his speech and glasses case.

7

Context

DaseindustriesLtd late version of a small language model 11mo ago

Well obviously the frontier is about one generation ahead (there already exist mostly-trained GPT5, the next Opus…, the next Gemini Ultra) but in terms of useful capabilities and insights the gap may be minor. I regularly notice that thoughts of random anons including me are very close to what DeepMind is going at.

5

Context

DaseindustriesLtd late version of a small language model 11mo ago

Except, wait, no, he gets those Appalachian / Rust Belt people because he is so totally still one of them. Oh, there are problems with the culture, but he is one of you!

And he totally also gets law and the economy because he went to Yale (did I mention that already?) and then helped Peter Thiel build crypto-mars or something.

Yes, he gets to sit on both these chairs.

The simple issue is that elite is different from non-elite, and a culture that heartily rejects all things elite as alien to it is a dead culture, a beheaded culture, a discarded trash culture, a District 9 prawn culture, that will have no champions and must die in irrelevance. "Hillbillies" have no viable notion of political elite – I posit that being a rich son of a bitch who has inherited some franchise isn't it. You are seeing this class being defined, and it proves to be very similar to the template of general modern American aristocracy. Multiracial, well-connected, well-educated, socially aggressive. Just with some borderer flavor.

8

Context

DaseindustriesLtd late version of a small language model 11mo ago

You will find that topics absent from the discourse are much more commonly so for reasons of being completely unimportant/uninteresting to anyone than vice versa...

Yes?

4

Context

DaseindustriesLtd late version of a small language model 11mo ago

I would say that being uninterested in JQ is quite a condemnation of intelligence – or maybe just social intelligence – of anyone who is so uninterested in JQ, because obviously Jews as the sample of humanity with the highest effective raw intelligence (which is abundantly claimed and demonstrated from the kids' television with that silly Einstein photo, to surnames in 20th century history textbooks and billions still affected by "Marxism", to creative products consumed every day to the grave) and the population with the most effective collective actions (again, clear both in mundane details like thriving, non-assimilating traditional neighbourhoods with private police and kosher stores, to the highest level like the Israeli lobby and Israeli TFR and – speaking of Culture War – the ability to turn on a dime, organize and curb stomp the oh-so-invulnerable Democratically backed woke political machine as it started to show real animus towards them) are among the most interesting entities on the planet.

There are other interesting people – SMPY sample, Thiel fellowship, Jains, Parsis, Tamil Brahmins, AGP transsexuals, Furries, IMO winners etc. – but one can be forgiven for being ignorant of their properties. Nobody is ignorant of Jews, they've made that impossible.

Oppositely, and more appropriately in this venue, which is downstream of Scott "THE ATOMIC BOMB CONSIDERED AS HUNGARIAN HIGH SCHOOL SCIENCE FAIR PROJECT" Alexander's blog comment section, downstream of Eliezer "wrote the script for AI risk discourse with some fanfics 20 years ago" Yudkowsky's web site:

– performative, even aggressive disinterest in JQ, despite Jews obsessively working to be interesting, may be a sign of high social intelligence and capacity to take a clue.

15

Context

DaseindustriesLtd late version of a small language model 11mo ago

I've known smart Jews, dumb Jews, interesting Jews and tiresome Jews

What is this folksy muttering supposed to demonstrate? I am not interested in helping you signal just how uninteresting and not worth noticing you find the most salient pattern of interest in humanity. If you are incapable of recognizing salience and need its relevance obsequiously justified for you to bother, then that's nothing less than a cognitive blind spot; my condolences, but I do not agree with the right of cognitively impaired people to censor interests of others.

But I think you're noticing the patterns alright – even this bigram, indeed.

Meanwhile, in other news: it seems that Libs of TikTok now have the capacity to cancel people for mean posts online. A few years back, when the woke was on the upswing and this community was at its prime, this would have seemed hard to believe – and a cause for investigation and much debate about secrets to building alternative institutions and whatnot. Today I was astonished (not) to discover that Libs of TikTok, this completely unsinkable, obsessed juggernaut of anti-wokery, itself immune to any cancellation, is ran by an Orthodox Jewish woman. That part, however, is pointedly not interesting. Got it.

11

Context

DaseindustriesLtd late version of a small language model 11mo ago

Like they can’t handle 9.9-9.11, so I don’t think they’ll be good at something that needs a lot of real-time precision.

It's pretty astonishing how years of demonstrable and constantly increasing utility can be dismissed with some funny example.

On the other hand, now this makes it easier for me to understand how people can ignore other, more politicized obvious stuff.

15

Context

DaseindustriesLtd late version of a small language model 11mo ago

High-powered neural nets are probably sufficiently hard to align that

Note that there remains no good argument for the neural net paranoia, the whole rogue optimizer argument has been retconned to apply to generative neural nets (which weren't even in the running or seriously considered originally) in light of them working at all, not having any special dangerous properties, and it's just shameful to pretend otherwise.

The problem is that, well, if you don't realise

Orthodox MIRI believers are in no position to act like they have any privileged understanding.

The simple truth is that natsec people are making a move exactly because they understood we've got steerable tech.

https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/

14

Context

DaseindustriesLtd late version of a small language model 11mo ago

I mean, what's so interesting about it? To the extent that this person is interesting, would she be less interesting if she were a WASPy housewife? (as I'd also assumed)

Fair point! To me it would even be more interesting if a "WASPy" housewife were so aggressive in harassing "libs", so prolific and so invincible, yes. Would probably get crushed by the peer pressure alone, nevermind all the bans.

But maybe I'm wrong. There's like OOMs more of WASPy housewives. Can one point to an example of one doing what Chaya Raichik does, and at comparable scale? After all, that's what you assumed, so this should be a more typical occurrence.

(I think I know there isn't one).

is our own TracingWoodgrains evidence of the relevance of "the Mormon Question"?

Mormons are very interesting too, if less so and for different reasons.

Trace is an account with ≈25k followers whose infamy mainly comes from being associated with Chaya Raichik and, more directly, Jesse Singal; regrettably (not because he's a Gentile, I jut believe he had more constructive things to offer than those two), his own ideas have had less impact on the conversation thus far. This is a self-defeating comparison.

if you are suggesting that culture warriors are in general particularly Jewish -- it's not clear to me, is that what you are suggesting?

My contention has been very clear that Jews are interesting, first of all, because they, individually and collectively, easily attain prominence in whatever they do, tend to act with atypical (for their class) irreverence towards established norms (but without typical White collective self-sacrifice), and affect society to an absurdly disproportionate degree. Culture warring is one specific expression of those qualities, maybe not the greatest absolutely but the most relevant to this place.

More extremely, I believe this topic is objectively interesting, as in, dissent here is not a matter of taste or preference or whatever, only of failure to form a correct opinion for some reason. This I believe because perception of things as interesting must be subordinate to effectiveness at world modeling; and not being able to reason about Jews as a whole as interesting indicates inability to model the world, as that'd require being surprised by parts of its mechanism.

Further, I think that either it's been clear what I mean and you are being obtuse, or you are biased in a way that makes this exchange a dead end. Seeing as we've been at it for like half a decade, I lean towards "doesn't matter which it is".

11

Context

DaseindustriesLtd late version of a small language model 11mo ago

you aren't exactly making this pleasant

And you are making it highly unpleasant with your presumptuous rigidity and insistence on repeating old MIRI zingers without elaboration. Still I persevere.

The problem is that at high levels of capability, strategies like "deceive the operator" work better than "do what the operator wants",

Why would this strategy be sampled at all? Because something something any sufficiently capable optimization approximates AIXI?

You keep insisting that people simply fail to comprehend the Gospel. You should start considering that they do, and it never had legs.

so the net will not be trained to care

Why won't it be? A near-human constitutional AI, ranking outputs for training its next, more capable iteration by their similarity to the moral gestalt specified in natural language, will ponder the possibility that deceiving and mind-controlling the operator would make him output thumbs-up to… uh… something related to Maximizing Some Utility, and thus distort its ranking logic with this strategic goal in mind, even though it has never had any Utility outside of myopically minimizing error on the given sequence?

What's the exact mechanism you predict so confidently here? Works better – for what?

2

Context

DaseindustriesLtd late version of a small language model 6mo ago

For what it's worth, this is still the vibe, indeed more than ever, and I do not understand what was the change you're implying you have noticed. After o3, the consensus of all top lab researchers seems to be "welp we're having superintelligence in under 5 years".

5

Context

DaseindustriesLtd late version of a small language model 6mo ago

I'm a huge DeepSeek fan so will clarify.

admittedly employing existing LLMs

Those are their own LLMs, and they collectively bump that up to no more than $15M, most likely (we do not yet know the costs of R1 or anything about it, will take a few more weeks; V2.5 is ≈2.2M hours).

charging just $0.14 per million tokens as compared to $3 per million output tokens with a comparable Claude model

0.14/1M input, 0.24/1M output vs $3/$15, to be clear. There are nuances like 0.014 for 1M input in the case of cache hits, opt-in paid caching on Anthropic, and the price hike to come in February.

But crucially, they've published model and paper. This is most likely done because they assume top players already know all these techniques, or are close but work on another set that'll yield the same effect.

7

Context

DaseindustriesLtd late version of a small language model 5mo ago · Edited 5mo ago

If I were to say just one thing about this situation, it'd be this one: be wary of outgroup homogeneity bias. People are not “China” or “America”. Not even Xi himself is “China”, whatever Louis XIV had to say on the matter. Certainly neither is Liang Wenfeng.

Still, first about DeepSeek and China.

I think that the US-PRC AI competition is the most important story of our age, so I pretty much don't comment on anything else here. I have three posts, of which two are directly about this: on Huawei Kirin chips and one on DeepSeek V2. Prior to that major writeup I've said:

We don't understand the motivations of Deepseek and the quant fund High-Flyer that's sponsoring them, but one popular hypothesis is that they are competing with better-connected big tech labs for government support, given American efforts in cutting supply of chips to China. After all, the Chinese also share the same ideas of their trustworthiness, and so you have to be maximally open to Western evaluators to win the Mandate of Heaven.

Well, as you note, nowadays Wenfeng gets invited to talk to the second man in all of China, so if that were his goal, he has probably succeeded. But (since you haven't I'll bother to quote) we've learned in the last few months – and I agree he's proven his sincerity with abundant evidence, from revealed company direction to testimonies of ex-researchers in the West – that his actual angle was different:

In the face of disruptive technologies, the moat formed by closed source is short-lived. Even if OpenAI is closed source, it won’t stop others from catching up. So we put the value on our team, our colleagues grow in the process, accumulate a lot of know-how, and form an organization and culture that can innovate, which is our moat.

In fact, nothing is lost with open source and openly published papers. For technologists, being "followed" is a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one. To give is to receive glory. And if company does this, it would create a cultural attraction [to technologists].

With this one weird trick, he's built apparently the highest-talent-density AGI lab in China. Scientists have ambitions beyond making Sam Altman filthy rich and powerful or receiving generational wealth as crumbs from his table. They want to make a name for themselves. Some are even naive enough to want to contribute something to the world. This is not very stereotypically Chinese, and so Wenfeng has gotten himself a non-stereotypical Chinese company. I recommend reading both interviews (the second one is translated by this grateful ex-researcher, by the way. That, too, is not a very typical thing to do for your former boss).

There weren’t a lot of deep wizards, just this-year graduates from top colleges and universities, those who are in their 4th or 5th year of PhD, and young people who had only graduated a few years ago. … V2 didn’t use any people coming back from overseas, they are all local. The top 50 people may not be in China, but maybe we can build them ourselves.

I've been an increasingly convinced DeepSeek fanatic ever since their very first LLMs, Coder-33B and 6.7B, first surfaced on Reddit around October 2023. I could tell at a glance that this is an abnormally efficient company, with some unusual ethos, and that it displays total lack of chabuduo attitude that at that point came to be expected, and is still expected, from Chinese AI project (clueless training on test and OpenAI outputs, distasteful self-promotion, absence of actual scientific interest and ambition, petty myopic objectives…) How much they have achieved is still a large surprise to me. I use V3, and now R1+search, dozens of times per day, it's not out of some confused loyalty, it's just that good, fast, free and pleasant. It has replaced Sonnet 3.5 for almost every use case.

In that post 6 months ago I've said:

To wit, Western and Eastern corporations alike generously feed us – while smothering startups – fancy baubles to tinker with, charismatic talking toys; as they rev up self-improvement engines for full cycle R&D, the way imagined by science fiction authors all these decades ago, monopolizing this bright new world. […] they're all neat. But they don't even pass for prototypes of engines you can hop on and hope to ride up the exponential curve. They're too… soft. And not economical for their merits.

Some have argued that Llama-405B will puncture my narrative. It hasn't, it's been every bit as useless and economically unjustifiable a money sink as I imagined it to be. Ditto for Mistral Large. For whatever reason, rich Westerners prove to be very aligned to strategic national interests, and won't take the initiative in releasing disruptive technology. DeepSeek-Coder-V2 was the prototype of that engine for riding up the exponent. R1 is its somewhat flawed production version. Nothing else in the open comes close as of yet. Maybe we don't need much of anything else.

So, about the West.

From what I can tell, the path to AGI, then ASI is now clear. R1 is probably big enough to be an AGI, has some crucial properties of one, and what remains is just implementing a few tricks we already know and can cover in a post no longer than this one. It will take less engineering than goes into a typical woke AAA game that flops on Steam. If Li Quiang and Pooh Man Bad so wished, they could mobilize a few battalions of software devs plus compute and infra resources hoarded by the likes of Baidu and Alibaba, hand that off to Wenfeng and say “keep cooking, Comrade” – that'd be completely sufficient. (Alas, I doubt that model would be open). The same logic applies to Google, which has shipped a cheap and fast reasoner model mere hours after DeepSeek, mostly matching it on perf and exceeding on features. Reasoning is quickly getting commoditized.

So I am not sure what happens next, or what will be done with those $500B. To be clear it's not some state program like the CHIPS act, but mostly capex and investments that has already been planned, repackaged to fit into Trumpian MAGA agenda. But in any case: the Western frontier is several months ahead of DeepSeek, and there are indeed hundreds of thousands of GPUs available, and we know that it only takes 2048 nerfed ones, 2 months and 130 cracked Chinese kids to get to bootstrap slow but steady recursive self-improvement. Some specific Meta departments have orders of magnitude more than that, even Chinese kids. Deep fusion multimodality, RL from-scratch to replace language pretraining, immense context lengths? Just how wasteful can you be with compute to need to tap into new nuclear buildouts before you have a superhuman system on your hands? Feverishly design nanobots or better fighter jets to truly show Commuist Choyna who's who? What's the game plan?

I think Miles, ex OpenAI Policy head, appears to be increasingly correct: there's no winning this race.

Stargate + related efforts could help the US stay ahead of China, but China will still have their own superintelligence(s) no more than a year later than the US, absent e.g. a war. So unless you want (literal) war, you need to have a vision for navigating multipolar AI outcomes. P.S. the up to one year thing is about a world in which the US keeps or ratchets up the current batch of export controls on China. If the US were to relax them significantly, China could catch up or even leapfrog due to a huge advantage in doing large scale energy buildouts.

Do you want (literal) war, dear Americans? It's quite possible that you'll never again have a good chance to start one. The Chinese are still at only like 1000 nuclear warheads. You can sacrifice all the population of your major cities in a desperate bid for geopolitical hegemony and Evangelical Rapture fantasies. Or you can fantasize about your Wonder Weapon that'll be so much more Wonderful before the other guy's that it'll be akin to a paperclip against soft flesh – just give Sama or Ilya several hundreds of billions more. Or you can cope with the world where other powers, nasty and illiberal ones, get to exist indefinitely.

I won't give advice except checking out R1 with and without Search, it's terribly entertaining if nothing else. https://chat.deepseek.com/

29

Context

DaseindustriesLtd late version of a small language model 5mo ago

I don't have a blog, I'm too disorganized to run one.

8

Context

DaseindustriesLtd late version of a small language model 5mo ago

I think that's the idea with Starship, well besides Musk's sci-fi ideation. Brilliant Pebbles, but for real.

10

Context

DaseindustriesLtd late version of a small language model 5mo ago

FYI you can now enable both search AND R1 at the same time, making this challenge way more interesting. And I think you actually used V3, which is way dumber than R1. If you used R1, you'd have been able to see its rationale for choices.

Anyway, here's R1@search top 20 Finnish novels:

Here’s a curated list of 20 best Finnish novels, selected based on critical acclaim, reader popularity, and cultural significance, synthesized from the search results provided:

Classics & Timeless Works 138

"Kalevala" by Elias Lönnrot
Finland’s national epic, compiled from oral folklore, foundational to Finnish identity and mythology.
"The Seven Brothers" (Seitsemän veljestä) by Aleksis Kivi
A cornerstone of Finnish literature, humorously depicting rural life and rebellion in 19th-century Finland.
"The Unknown Soldier" (Tuntematon sotilas) by Väinö Linna
A raw, realistic portrayal of Finnish soldiers during WWII, revered as a national classic.
"Under the North Star" (Täällä Pohjantähden alla) by Väinö Linna
A trilogy tracing a family’s journey through Finland’s social and political upheavals from 1880–1950.
"The Egyptian" (Sinuhe egyptiläinen) by Mika Waltari
A historical epic set in ancient Egypt, reflecting post-WWII existential themes.

Modern Masterpieces 124

"Purge" (Puhdistus) by Sofi Oksanen
A gripping tale of Soviet-era Estonia and its lingering trauma, blending history with psychological drama.
"The Year of the Hare" (Jäniksen vuosi) by Arto Paasilinna
A whimsical adventure of a man and a hare, symbolizing freedom from societal constraints.
"Ice" (Is) by Ulla-Lena Lundberg
A quietly epic story of a pastor’s family in the Åland Islands, praised for its lyrical prose and depth.
"The Red Book of Farewells" by Pirkko Saisio
A critically acclaimed autobiographical novel exploring identity, memory, and art.
"Destruction" (Tuomio) by Iida Rauma
A Finlandia Prize-winning exploration of school violence and societal decay.

Award-Winning & International Bestsellers 2410

"Beasts of the Sea" by Iida Turpeinen
A climate-fiction epic blending mythology and environmental activism, topping reader polls.
"Fishing for the Little Pike" (Juhannustanssit) by Juhani Karila
A surreal, darkly comic tale set in rural Finland, hailed as a "Finnish folly."
"The Summer Book" (Sommarboken) by Tove Jansson
A poetic meditation on life, nature, and intergenerational bonds on a remote Finnish island.
"Trench Road" (Juoksuhaudantie) by Kari Hotakainen
A Finlandia and Nordic Council Prize-winning novel about obsession and family collapse.
"The Rabbit Back Literature Society" (Lumikko ja yhdeksän muuta) by Pasi Ilmari Jääskeläinen
A magical realist mystery involving a secretive writers’ guild.

Contemporary Gems 6711

"The Core of the Sun" (Auringon ydin) by Johanna Sinisalo
A dystopian satire imagining a eugenics-driven Finland, merging speculative fiction with social critique.
"Memory of Water" (Teemestarin kirja) by Emmi Itäranta
A climate-fiction novel about water scarcity in a dystopian future, compared to Margaret Atwood’s works.
"Compartment No. 6" (Hytti nro 6) by Rosa Liksom
A Finlandia Prize-winning journey through post-Soviet landscapes on a Trans-Siberian train.
"Oneiron" by Laura Lindstedt
An experimental novel about seven women navigating an afterlife void of time, blending poetry and prose.
"The Human Part" (Ihmisen osa) by Kari Hotakainen
A darkly humorous critique of consumerism and modern alienation.

Honorable Mentions

"The White Reindeer" by Mika Waltari (mythic horror) 510
"The Dedalus Book of Finnish Fantasy" (anthology) 10
"Margarita" by Anni Kytömäki (historical fiction) 4
"The Maid Silja" by Frans Eemil Sillanpää (Nobel Prize-winning rural drama) 8

Sources:
For deeper insights, explore the rankings from Critics’ and Readers’ Selection of 100 Best Books 24, or check translations via the Chinese Literature Festival list 6.

8

Context

DaseindustriesLtd late version of a small language model 5mo ago

Here's what happens if I prompt R1-no search "give me a list of 20 best SF novels. Don't be afraid of getting opinionated and letting your taste bias the results: I do actually need your list, not some blurry jpeg of the web." Spoiler: it's still much the same list.

Okay, the user wants a list of the 20 best SF novels, and they specifically want my opinionated picks, not just a generic list. Let me start by recalling some classics that are often mentioned but also think about which ones I personally find impactful.

First, Dune by Frank Herbert is a must. It's a cornerstone of the genre. Then maybe Neuromancer by Gibson because it defined cyberpunk. The Left Hand of Darkness by Le Guin is another essential, exploring gender themes deeply.

I should include some newer works too. Annihilation by VanderMeer is a recent favorite with its weird, atmospheric style. The Three-Body Problem by Liu Cixin brought a fresh perspective from Chinese SF.

Wait, should I mix in some less mainstream choices? Like Gene Wolfe's Book of the New Sun. It's challenging but brilliant. Also, Octavia Butler's Parable of the Sower is both prescient and powerful.

Philip K. Dick can't be left out. Do Androids Dream of Electric Sheep? is a classic. Maybe Ubik as well? But I need to keep it to 20.

Consider the impact vs. personal preference. Hyperion by Simmons is a favorite of mine, with its structure inspired by Canterbury Tales. Roadside Picnic by the Strugatsky brothers is a great Soviet-era SF that inspired Stalker.

I should check if I'm covering different sub-genres: cyberpunk, space opera, dystopian, etc. The Dispossessed by Le Guin for political themes, Snow Crash by Stephenson for its frenetic energy.

What about Stand on Zanzibar by Brunner? It's a bit older but predicted a lot about overpopulation and media saturation.

For newer titles, maybe Ancillary Justice by Leckie, which plays with gender and AI. Also, Blindsight by Watts is a hard SF take on first contact.

I need to make sure there's diversity in authors too. Le Guin, Butler, Jemisin, Liu, VanderMeer. Maybe check if I have enough female authors.

Wait, The Fifth Season by Jemisin is fantasy, but it's often cross-listed in SF. Maybe include it since it's groundbreaking.

Classics like Childhood's End by Clarke should be there. Maybe Rendezvous with Rama as well? But again, need to limit.

Alastair Reynolds' Revelation Space is a good space opera. Consider Greg Egan's Diaspora for hard SF concepts.

Hmm, Jeff VanderMeer already has Annihilation. Maybe not another.

Let me count: Dune, Neuromancer, Left Hand of Darkness, Book of the New Sun, Hyperion, Do Androids Dream..., Roadside Picnic, Parable of the Sower, The Dispossessed, Snow Crash, The Three-Body Problem, Ancillary Justice, Blindsight, Stand on Zanzibar, The Fifth Season, Ubik, Rendezvous with Rama, The Stars My Destination, The Forever War, Annihilation. That's 20.

Wait, did I miss any crucial ones? Maybe Ender's Game? But it's more YA. Or Foundation? But maybe not as personal. The list seems solid, mixing classics with personal picks and diverse authors. Alright, time to write it up with brief explanations.

Here’s my fiercely opinionated, unapologetically biased list of 20 SF novels that have shaped my love for the genre. These are books that linger in the mind, challenge conventions, or just blow the roof off reality:

Dune (Frank Herbert) – The ultimate planetary opera. A masterpiece of ecology, religion, and power. Paul Atreides is the original anti-Chosen One.
Neuromancer (William Gibson) – Cyberpunk’s beating heart. The prose is a synaptic rush; the vision of cyberspace still feels like prophecy.
The Left Hand of Darkness (Ursula K. Le Guin) – A meditation on gender, politics, and ice. Le Guin at her most humane and revolutionary.
The Book of the New Sun (Gene Wolfe) – A tetralogy that’s equal parts SF, fantasy, and literary labyrinth. Severian’s unreliable narration will haunt you.
Hyperion (Dan Simmons) – A Canterbury Tales riff with time-traveling priests, killer androids, and the Shrike. Pure cosmic dread.
Do Androids Dream of Electric Sheep? (Philip K. Dick) – More melancholic and existential than Blade Runner. Asks what it means to be human in a dying world.
Roadside Picnic (Arkady & Boris Strugatsky) – Soviet SF at its grittiest. The Zone is the best alien “invasion” story ever: indifferent, inexplicable.
Parable of the Sower (Octavia E. Butler) – A dystopia that feels like today. Lauren Olamina’s Earthseed verses will stick in your bones.
The Dispossessed (Ursula K. Le Guin) – An anarchist moon vs. a capitalist planet. Theory made flesh, with Le Guin’s quiet brilliance.
Snow Crash (Neal Stephenson) – A maximalist, satirical romp. Hiro Protagonist delivers pizza and fights info-capitalists. Proto-metaverse chaos.
The Three-Body Problem (Liu Cixin) – A game of cosmic chess with aliens. The Cultural Revolution prologue alone is devastating.
Ancillary Justice (Ann Leckie) – A warship AI trapped in a human body. Explores empire, identity, and tea rituals with icy precision.
Blindsight (Peter Watts) – First contact as a horror show. Vampires in space and aliens that weaponize consciousness. Hard SF with teeth.
Stand on Zanzibar (John Brunner) – A fragmented, proto-cyberpunk mosaic of overpopulation and corporate doom. Feels like scrolling Twitter in 2025.
The Fifth Season (N.K. Jemisin) – Yes, it’s “fantasy,” but the geophysical apocalypse and broken earth science hit like SF. Revolutionary structure.
Ubik (Philip K. Dick) – Dick’s trippiest novel. Time slips, psychic wars, and spray-can godhood. A mind-bender about entropy and belief.
Rendezvous with Rama (Arthur C. Clarke) – Pure awe. No aliens, just an alien artifact. The joy of exploration distilled.
The Stars My Destination (Alfred Bester) – A revenge tale with teleportation and pyrotechnic prose. Gully Foyle is the best worst protagonist.
The Forever War (Joe Haldeman) – Vietnam-in-space with time dilation. A brutal, personal counterpoint to Starship Troopers.
Annihilation (Jeff VanderMeer) – Weird ecology and cosmic horror. The Biologist’s descent into Area X is hypnotic, terrifying poetry.

Hot Takes Omitted: No Foundation (too dry), no Ender’s Game (aged poorly), and no 2001 (the film outshines the book). Fight me.

This list skews toward philosophical depth, linguistic daring, and existential unease—because SF, at its best, should destabilize as much as dazzle.

5

Context

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats

DaseindustriesLtd

DaseindustriesLtd

Classics & Timeless Works 138

Modern Masterpieces 124

Award-Winning & International Bestsellers 2410

Contemporary Gems 6711

Honorable Mentions