Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.
No email address required.
Notes -
So big week in models - in the matter of hours we got GPT-5.5, Deepseek V4 and Opus 4.7. What are your impressions so far - mine - GPT-5.5 is the least underwhelming of the 3. Deepseek is really capable at insane prices, Opus 4.7 at least for me it totally undistinguishable from 4.6
Also the limits are absurdly tight on the 20$ level.
Haven't really tried the other two much myself, high costs kinda scare me off where I find it hard to believe the value is there for my use case, but I'm looking quite closely at Deepseek V4 Flash specifically, because the cost-to-performance seems to be pretty insane.
I mean, the official API price is $0.0028 per million input (cached), $0.14 (cache miss), $0.28 per million output. Has a 1M context window, and per OpenRouter, because their stats have above a 90% cache hit rate, this means the effective weighted price per million input is only $0.015 per million. That's really crazy. I need to test a bit more to figure out where I'd place the intelligence exactly, but...
NONE of the current frontier 'cost-efficient' models come even remotely close to that. For comparison Gemini 3.1 Flash Lite is 0.25/1.50 input/output per 1M, Gemini 3 Flash is 0.50/3.00, GPT 5.4 Nano is 0.20/1.25, GPT 5.4 Mini is 0.75/1=4.50, Claude Haiku 4.5 is 1.00/5.00. Sure, it's not as good at coding as Gemini Flash, allegedly, but also allegedly it's better at agentic workflows. Those are some pretty significant gaps, approaching an order of magnitude in some cases.
So yeah, Pro is also very cheap and that might make some waves, but contextually Flash is SUPER cheap. Like, obscenely so.
This to me is a big deal because part of what makes AI so compelling is the cost/benefit ratio. With a model like V4 Flash, especially input-heavy workflows, there are plenty of scenarios where it's literally cheaper to throw 5 different approaches at the wall and pick the best than to make a single attempt with a model that's just a hair smarter. We'll see how well it does when encountering actual codebases and such, but I find that it might potentially enable a slightly different type and set of workflows than we're used to.
It's hard to say for sure these days because especially with agentic coding the harnesses are so important (and often what works for certain setups doesn't transfer that well, including across generations of models). I'm curious if someone will figure out a good way to leverage this new cost-benefit balance, because it potentially changes e.g. how you might spin up subagents quite a bit. Although possibly as I mentioned the model is just a bit too stupid to do a large enough range of useful work. We'll see, gotta figure out how much is benchmaxxing vs inherent quality.
At the same time, the Claude shift in tokenizer probably long-term helps efficiency and intelligence, but short term you're looking at a 10-30% flat increase in costs on higher token costs alone, before you get into the token efficiency of the models themselves, at least per the numbers I was looking at initially.
More options
Context Copy link
Opus 4.7 seems to handle the stock ticker tests I do better than 4.6. I assume it's the new tokenizer. Otherwise the only difference I notice is that it's more expensive to run on the same thinking level
What tests are those?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link