dr_analog
razorboy
No bio...
User ID: 583

Deepseek as a provider is by far the cheapest and fastest with modest but totally usable context length and output limits. The Americans serving this (with potentially superior GPUs) are completely shitting the bed, half their responses just stall in the chain of reasoning and don't get anywhere, despite them being 10x more expensive. They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby. But Americans can all run American models just fine at the exact same price. Claude on google or amazon costs exactly the same. I think in addition to the advantage of knowing how to use their model they have some secret insight into how to use compute efficiently.
Hi again. Any more insight on this? Perhaps it's because they optimized it for Huawei chips or something and everyone else is trying to make it run on Nvidia?
Context: https://x.com/olalatech1/status/1883983102953021487
Ah okay I failed to compute influence ops on not smart decision makers in a city that's the political capital of the EU.
EU migration policy seems to be a Soros creation, where he ran an influence ops on the baboons in Brussels.
The what?
one way of telling the crypto story is that it's one long sequence of Chesterson's Fences
It's complicated, but the core idea is you create a payment channel with a liquidity provider (or many), and within the context of that channel you can adjust the balance without having to publish to the blockchain.
You and the counterparty adjust the balance by both signing a valid Bitcoin transfer but it's kept off-chain. Just you and your counterparty keep track of it. Since it's off-chain, transfers happen instantly. You can be confident that because a signed transfer exists, it's inevitable.
You only need to publish the final transfer to the blockchain if the channel is closed (settled or you can't reach the counterparty anymore).
The fact that you can move the slider balance around on a payment channel is a little mind bending, but the idea is you have a bustling economy where you send people Bitcoin for things and they send you Bitcoin for things and it nets out.
A network of these channels allows you to route payments semi-instantaneously to participants for minimal fee. It's actually not unlike TOR. The fees are so low they had to invent milli-sats because charging whole sats was too high.
I got into a bit of an argument with a socialist feminist once. My position was that actually even if the President sucks, now is still the best time to be alive. She countered no it has always been great for men. Take Mad Men for instance. Don Draper was rich and got to fuck around and be drunk all day.
I was so shocked by this I couldn't even retort. Like the show leaves some people with the impression that the median male existence in the 60s was a Manhattan advertising executive's hedonistic life and not closer to all of the little people he steps on. (To say nothing of the fact that he was drafted to fight in Korea and ... I'll save the spoilers)
Additional random anecdote: an old girlfriend who was actually in the ad industry had a Mad Men themed birthday party at her place in Brooklyn, while it was still airing. That definitely brought out people's frisky sides.
More reminiscing: I remember that scene with the presentation of the carousel slide projector being pretty epic. Like some of the most mesmerizing TV ever.
Anyway, uh, it's a great show. The breadth and depth of the appeal makes it a legend. Though, I never did see it through to the end. Is it worth it?
Off-chain solutions, like Defi protocols like Lightning, allow for denominations in milli-satoshis (or (0.00000000001). Presumably if Bitcoin goes up by 10x again we're really not still publishing things down to the penny on the main blockchain.
thank you. I was not familiar with openrouter so I thought I would ask
(I did try the llama 70b distill on my fairly beefy desktop and it ran at about half a token per second for about a minute before crashing my computer)
Yeah, I believe it. I wouldn't expect AI researchers working at the pytorch level to be aware of any of this stuff. It sounds really hard to be an expert in the full stack like this.
For the most part, yes. Their models are definitely cheaper to run. If they can make a 30x gain in inference cost, I think it's not unreasonable to think they could make similar gains in training costs.
I'm not sure this follows.
What DeepSeek r1 is demonstrating is a successful Mixture of Experts architecture that's as good as a dense model like GPT. This MoE architecture has lower inference time costs because it dynamically selects a reduced subset of the parameters to activate for a given query (671b down to 37b).
It does not follow that the training cost is similarly reduced. If anything the training costs are even higher than a dense model like GPT because they must do further training of the gating mechanism that helps isolate which portions of the NN are assigned to what experts.
I think we should remain skeptical of the $5 million training number.
Deepseek as a provider is by far the cheapest and fastest with modest but totally usable context length and output limits. The Americans serving this (with potentially superior GPUs) are completely shitting the bed, half their responses just stall in the chain of reasoning and don't get anywhere, despite them being 10x more expensive. They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby.
Are we comparing apples to apples? I understand the models with Llama and Qwen in their names are distills from their native model for compatibility with plugging into existing frameworks, though they might perform like crap.
Whereas I understand the native DeepSeek r1 is a mixture of experts thing that selects a dynamic 37b parameters out of the overall 671b.
This is not a Sputnik moment for the US. The US has a secure and increasing lead due to bog standard logistics and capital advantage, as always. What this should be is “are we the baddies?” moment. Also, it's a moment to ask oneself how high are margins on Western model providers, and whether it's a true free market. Because Liang Wenfeng himself does NOT think they're that far ahead in efficiency, if they are ahead at all.
If DeepSeek was a Chinese psyop this would be a good in-kind comment :futurama-suspicious-fry:
But more seriously, why is Facebook's Lllama so lousy by comparison if the labs are hiding their true edge? DeepSeek is presumably what they wish they had released and their AI team do not seem like dummies.
Is the implication that they deliberately released a fat model even though they can go leaner? Or are we writing off Facebook for this discussion?
Also this would imply a level of collusion that doesn't seem sustainable.
sure but presumably it cuts other ways too. do we think current models can be used to train next generation models?
Please forgive my uninformed speculation, but is it possible that DeepSeek leveraged existing AI's to train on synthetic data for cheap?
This is probably a taste of the recursive self-improvement we've been promised by foomers. It's now known one of the reasons Anthropic held back on releasing Opus is because they were using it themselves to train Sonnet 3.5 New.
Everyone's gotta be doing it.
SolidGoldMagikarp
are you using that word to mean secret Chinese backdoor?
I understand that the SolidGoldMagikarp thing is just a class of rare tokens in the training data that send next token prediction off the rails because they're so unusual and don't clearly associate with anything.
I don't think following a GPT-from-scratch lecture is going to get you there
I wasn't claiming that. Just trying to support the claim that they were more open in the past. I doubt any novel AI technique discovered in the future will even have that.
I'm not convinced that they have any left to make.
Counting out the most absurdly well resourced AI lab with a history of breakthrough success seems fairly bold.
I agree Facebook is not as great as it could be, and I consider myself forced to use Facebook to provide the cute stream of grandkid pics to my mom.
To continue the drama around the stunning Chinese DeepSeek-r1 accomplishment, the ScaleAI CEO claims DeepSeek is being coy about their 50,000 H100 GPUs.
I realize now that DeepSeek is pretty much the perfect Chinese game theory move: let the US believe a small AI lab full of cunning Chinese matched OpenAI, with a tiny fraction of the compute budget, with no ability to get SOTA GPUs. Let the US believe the export regime works, but that it doesn't matter, because Chinese brilliance is superior, demoralizing efforts to strengthen it. Additionally, it would make the US skeptical of big investment in OpenAI capital infrastructure because there's no moat.
Is it true? I have no idea. I'm not really qualified to do the analysis on the DeepSeek results to confirm it's really the run of a small scrappy team on a shoestring budget end-to-end. Also what we don't see are the potentially 100-1000 other labs (or previous iterations) that have tried and failed.
The results we have now are that -r1 b14 and b32 are fairly capable on commodity hardware, and it seems one could potentially run the 671b model which is kinda maybe but not actually on par with o1 on a something that costs as much as a tinybox ($15k). That's a remarkable achievement, but at what total development cost? $5 million in compute + 100 Chinese worth of researchers would be stunningly impressive. But if the true cost is actually a few more OOMs, it would mean the script has not been completely flipped.
I maintain that a lot of OpenAI's current position is derivative of a period of time where they published their research. You even have Andrej Karpathy teaching you in a lecture series how to build GPT from scratch on YouTube, and he walks you through the series of papers that led to it. It's not a surprise that competitors can catch up quickly if they know what's possible and what the target is. Given that they're more like ClosedAI these days, would any novel breakthroughs be as easy to catch up on? They've certainly got room to explore them with a $500b commitment to play with.
Anyway, do you believe DeepSeek?
... would you say you're from the German part of Switzerland by chance?
I started it two months ago.
I highly recommend it. The effects are amazing. I actually feel full after eating meals where everyone else also reports they're full, instead of being ready to have 3 more servings. And while I still think about snacking sometimes, the idea of snacking is just too boring to motivate me to eat.
It's miraculous.
This might be your elite user opinion.
My mother, a great grandmother in her 80s, is from another world. Despite being in a household with computers since the 1990s, she has never been able to use computers to do anything.
I tried to teach her once to use a desktop PC and she picked up the mouse and waved it around in the air, confused. She never figured out web browsers or email.
When cell phones had SMS, she ... never once sent a message. Same with smartphones really.
But what she is able to use is Facebook on an iPad. Not perfectly, she still gets into trouble and needs tech support from time to time (gets lost in a deep tree of settings menus and can't figure out how to get back to her timeline, or gets logged out and can't remember how to log back in).
And I tell you every tech company in the world can burst into flames right now and she would not give a fuck so long as Facebook kept working. Facebook connects her to a steady drip of pictures of her cute grandkids and in touch with life updates from her extended family and it is literally all that matters. The entire computer revolution has been a useless gimmick to her, except for Facebook. Facebook is the only thing SV has done that has brought her actual joy.
I suspect for at least a billion people Facebook is great.
But if you're at all tech savvy it's a cringe wasteland and you keep in touch some other way.
Nevertheless, I think it's an enduring contribution. It makes computers actually useful, even joyful, to a whole mass of humanity that had been left out.
I do think catching up to OpenAI is not that impressive when their current tech tree is derived from a period of time where they published and released pretty openly.
You have Karpathy teaching you how to build GPT on YouTube starting from knowing only a bit of calculus and linear algebra.
We also don't see all of the companies that tried and spun their wheels before running out of money.
OpenAI is probably best positioned to do novel research that can't be as easily cloned with $500b in their bank account. Though without being under the umbrella of national security they are still vulnerable to espionage and a conga line of ex employees going to competitors.
If DeepSeek can make such a capable model with a relatively small team and scrappy budget, what does OpenAI need $500 billion for?
Is it possible DeepSeek is lying to discourage the US from making the capital investment?
Do they not cover this in media training?
"Very important. If you want to pantomime giving your heart to the crowd, you must use two hands. It is not a one handed gesture"
IIRC, if you're super promiscuously partyboi gay, condoms don't reduce the lifetime risk of getting HIV that much.
More options
Context Copy link