site banner

Culture War Roundup for the week of January 20, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

5
Jump in the discussion.

No email address required.

To continue the drama around the stunning Chinese DeepSeek-r1 accomplishment, the ScaleAI CEO claims DeepSeek is being coy about their 50,000 H100 GPUs.

I realize now that DeepSeek is pretty much the perfect Chinese game theory move: let the US believe a small AI lab full of cunning Chinese matched OpenAI, with a tiny fraction of the compute budget, with no ability to get SOTA GPUs. Let the US believe the export regime works, but that it doesn't matter, because Chinese brilliance is superior, demoralizing efforts to strengthen it. Additionally, it would make the US skeptical of big investment in OpenAI capital infrastructure because there's no moat.

Is it true? I have no idea. I'm not really qualified to do the analysis on the DeepSeek results to confirm it's really the run of a small scrappy team on a shoestring budget end-to-end. Also what we don't see are the potentially 100-1000 other labs (or previous iterations) that have tried and failed.

The results we have now are that -r1 b14 and b32 are fairly capable on commodity hardware, and it seems one could potentially run the 671b model which is kinda maybe but not actually on par with o1 on a something that costs as much as a tinybox ($15k). That's a remarkable achievement, but at what total development cost? $5 million in compute + 100 Chinese worth of researchers would be stunningly impressive. But if the true cost is actually a few more OOMs, it would mean the script has not been completely flipped.

I maintain that a lot of OpenAI's current position is derivative of a period of time where they published their research. You even have Andrej Karpathy teaching you in a lecture series how to build GPT from scratch on YouTube, and he walks you through the series of papers that led to it. It's not a surprise that competitors can catch up quickly if they know what's possible and what the target is. Given that they're more like ClosedAI these days, would any novel breakthroughs be as easy to catch up on? They've certainly got room to explore them with a $500b commitment to play with.

Anyway, do you believe DeepSeek?

I think they found way to use their compute much more efficiently somehow, that's the key secret that they're not open-sourcing. Deepseek models are insanely cheap to run compared to Western models. If they're cheap to run, it follows that they're probably cheap to train.

Just look at openrouter: https://openrouter.ai/deepseek/deepseek-r1

Deepseek as a provider is by far the cheapest and fastest with modest but totally usable context length and output limits. The Americans serving this (with potentially superior GPUs) are completely shitting the bed, half their responses just stall in the chain of reasoning and don't get anywhere, despite them being 10x more expensive. They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby. But Americans can all run American models just fine at the exact same price. Claude on google or amazon costs exactly the same. I think in addition to the advantage of knowing how to use their model they have some secret insight into how to use compute efficiently.

On the other hand, US export restrictions just don't work. Russian oil is still being sold, it just goes in circuitous routes through India to reach Europe. Russian imports of luxury vehicles from Europe still happen, it just goes through Azerbaijan or Kazakhstan.

China still buys H100s. They have money. Nvidia wants money. Middlemen want money. World markets go brrr. Deepseek is surely capable of rustling up a big cluster, or the Chinese state could give them access to one. Or they could borrow some via the cloud. Export controls work on big rare things monopolized by governments like H-bombs and fighter jets (and maybe semiconductor equipment which needs manufacturer support), not finished products that are produced en masse.

Deepseek as a provider is by far the cheapest and fastest with modest but totally usable context length and output limits. The Americans serving this (with potentially superior GPUs) are completely shitting the bed, half their responses just stall in the chain of reasoning and don't get anywhere, despite them being 10x more expensive. They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby. But Americans can all run American models just fine at the exact same price. Claude on google or amazon costs exactly the same. I think in addition to the advantage of knowing how to use their model they have some secret insight into how to use compute efficiently.

Hi again. Any more insight on this? Perhaps it's because they optimized it for Huawei chips or something and everyone else is trying to make it run on Nvidia?

Context: https://x.com/olalatech1/status/1883983102953021487

I saw that too but I'm not a technical guy, daseindustries would probably be the one who can speak most knowledgeably on the details. Apparently they did some unnerfing of the Nvidia chips they had and optimized the model to fit their cluster but I don't really understand how they're so efficient. This is a trillion-dollar question after all.

My general belief was that Chinese made GPUs were OK for inference but still behind H100s and H100s are a last-generation product. And the H-series is much better in training. I think there are also all kinds of complexities in the software stack that make Nvidia. Like an Ascend 910 might be cheaper to produce but they're probably a bit more finnicky to work with and you need lots of talent to get a good bug-free experience. But Deepseek obviously has overflowing talent. H100s are more expensive in China since they need to be smuggled into the country...

I think they are playing games regarding prices. The prices of running a GPU once it's set up and serving a given model vs the price of installing a cluster and paying off that capital cost are very different. I think that's got a lot to do with the $5.5 million pricetag everyone is talking about.

I notice, as of a few hours ago, a new provider called "DeepInfra" appears with similar rates to DeepSeek. Despite the name they don't appear related to DeepSeek.

Looks like it's gotten so cheap that people are now making it free and just harvesting the info: https://openrouter.ai/deepseek/deepseek-r1:free

I feel like such a cuck paying for Deepinfra or Together or the others, even more of a cuck paying for Claude subscription.

Deepseek as a provider is by far the cheapest and fastest with modest but totally usable context length and output limits. The Americans serving this (with potentially superior GPUs) are completely shitting the bed, half their responses just stall in the chain of reasoning and don't get anywhere, despite them being 10x more expensive. They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby.

Are we comparing apples to apples? I understand the models with Llama and Qwen in their names are distills from their native model for compatibility with plugging into existing frameworks, though they might perform like crap.

Whereas I understand the native DeepSeek r1 is a mixture of experts thing that selects a dynamic 37b parameters out of the overall 671b.

The Openrouter link in my post is just for big deepseek r1, I'm not talking about the little distills.

thank you. I was not familiar with openrouter so I thought I would ask

(I did try the llama 70b distill on my fairly beefy desktop and it ran at about half a token per second for about a minute before crashing my computer)

They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby

Of course. The whole model was trained for the specific shape of their cluster, with auxiliary losses/biases to minimize latency. (Same was true of V2). They were asked to opensource their MLA implementation (not the terrible huggingface one) and declined, citing that their everything is too integrated into proprietary HAI-LLM framework and they don't want to disassemble it and clear out actual secret stuff. SGLang team and others had to reverse engineer it from papers. Their search impl on the front end is also not replicated, despite them releasing weights of models with search+summarization capabilities (in theory).

Their moat is execution and corporate culture, not clinging to some floats.

On the other hand, US export restrictions just don't work. Russian oil is still being sold, it just goes in circuitous routes through India to reach Europe. Russian imports of luxury vehicles from Europe still happen, it just goes through Azerbaijan or Kazakhstan.

They do work as long as your definition of "work" isn't "100% effective".

Well, they haven't produced the desired result of 'Russia being unable to sustain its war effort', 'Russian elites overthrowing Putin due to not getting luxury imports' or 'China being unable to reach the frontier of AI research'. The Russian economy is performing quite well. Everything seems to be going up, real gdp, real incomes:

https://carnegieendowment.org/russia-eurasia/politika/2024/05/russia-war-income?lang=en

However, given the growth in income tax payments (and that Rosstat assesses that 59 percent of incomes in 2023 derived from wages), it can be said with confidence that real incomes have risen faster than inflation since the full-scale invasion.

They're not totally ineffective. But most small, thin, unfit women aren't totally ineffective at fighting. They're just not significantly effective. They still lose vs big strong men.

Failing to prevent Russian oil from ending up in western markets is a failure of application. Sanctions shift the expected outcome by making the sanctioned party pay a higher cost to achieve their goal. In the case of Russia, this means the point at which they are no longer able to sustain the war effort arrives sooner.

The Russian economy is not doing quite well, it is verging on stagflation.

https://carnegieendowment.org/russia-eurasia/politika/2024/11/russia-central-bank-dilemma?lang=en

The war with Ukraine means that Russia’s economic policymaking is caught in a paradox: on the one hand, the government is increasing expenditure (over 8 percent of GDP is being spent on the war), which fuels inflation, while on the other, the central bank is trying to dampen inflation by raising interest rates. It is this paradox that drives calls for greater coordination between the government and the central bank. Of course, the looming specter of a recession and stagflation also mean that there is a preemptive hunt under way for scapegoats.

Western sanctions mean that the Russian elite has no institutional alternative to Putin and Russia’s current economic course. They can’t flee to the West, and their only option if they want to earn money is to remain in the country. But high interest rates are squeezing their margins and cutting into profits. Business lobby groups have complained that companies are scaling back investment plans.

The case for China is much murkier. But if one starts from the assumption that ASI are of the same level of strategic significance as nukes, improving your chances of getting there first seems like a defensible position to me.

Russian GDP is not doing so hot (I can't find a good multi year graph that goes up to nearly the present day, but this serves to show the recent trend). It's true that rumors of Russian collapse were obviously overblown but that doesn't mean they don't work at all.

5% growth over the past year is not hot? Any western country would consider it a miracle to have that growth rate. For reference america, which is the western country which best recovered from covid, has a growth rate of 2.5% over the past year. Honestly I was ready to accept your claim that the russian economy is doing bad until I saw the chart you linked. Now I'm wondering how they managed such impressive growth under such a restrictive sanctions regime.

It's not that hot when there's zero growth since 2013. It's easy to have a year with big growth if you crash beforehand.

if you switch plot to constant 2015 US$, it shows growth and if to PPP in current international $, it shows nearly 2-fold growth.

I feel you're losing the plot here. We were talking about recent sanctions effect on the economy. Why are you bringing up the last decade when the discussion was about the last 2 years of economic growth?

Because sanctions have been in place since the invasion of crimea in 2014.

More comments

Russia is also not doing so hot on metrics of Diversity, nor on total amount of Californian wine consumed. Why is any of these three things relevant?

Take it up with ranger if you don't think GDP has any income on material living conditions.