site banner

Friday Fun Thread for March 3, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

4
Jump in the discussion.

No email address required.

A few months ago OpenAI dropped their API price, from $0.06/1000 tokens for their best model, to $0.02/1000 tokens. This week, the company released their ChatGPT API which uses their "gpt-3.5-turbo" model, apparently the best one yet, for the price of $0.002/1000 tokens. Yes, an order of magnitude cheaper. I don't quite understand the pricing, and OpenAI themselves say: "Because gpt-3.5-turbo performs at a similar capability to text-davinci-003 but at 10% the price per token, we recommend gpt-3.5-turbo for most use cases." In less than a year, the OpenAI models have not only improved, but become 30 times cheaper. What does this mean?

A human thinks at roughly 800 words per minute. We could debate this all day, but it won’t really effect the math. A word is about 1.33 tokens. This means that a human, working diligently 40 hour weeks for a year, fully engaged, could produce about: 52 * 40 * 60 * 800 * 1.33 = 132 million tokens per year of thought. This would cost $264 out of ChatGPT.

https://old.reddit.com/r/singularity/comments/11fn0td/the_implications_of_chatgpts_api_cost/

...or about $0.13 per hour. Yes technically it overlooks the fact that OpenAI charge for both input and output tokens, but this is still cheap and the line is trending downwards.

Full time minimum wage is ~$20k/year. GPT-3.5-turbo is 100x cheaper and vastly outperforms the average minimum wage worker at certain tasks. I dunno, this just feels crazy. And no, I wont apologize for AI posting. It is simply the most interesting thing happening right now.

Not Fun thread material, fit for a separate post.

This week, the company released their ChatGPT API which uses their "gpt-3.5-turbo" model, apparently the best one yet, for the price of $0.002/1000 tokens

Well, as you quote, they imply it's not the best one yet, at least in «some use cases», and some experiments show it indeed is intellectually inferior to the most recent naive GPT-3 version; maybe this is inherent to ChatGPT model we see in the demo, because it is known that RLHF can give models the computational equivalent of brain damage as far as benchmarks are concerned, maybe it's something specific for the new one. I am not sure how they've achieved the price cut (though my intuition is that this is cynical undercutting at a loss to nip competition in the bud and keep the data flywheel accelerating) – perhaps it's smaller (trained from scratch, distilled, etc.), or aggressively quantized, or more greedily sampled, or they've somehow increased the batch size into high thousands, whatever. In any case, Altman is a great showman and this may not be the revolution it seems like at the moment. Do we really need an endless stupidity generator with short context? Most people who are on this level (and I do think we can now meaningfully say that some people are on this level) around ChatGPT) aren't exactly making money with their smarts. It's nice for employers to automate even more drudgery, of course.

But in principle, yes, optimizations and next-gen models with more novel architectures than Vawani!Transformer will definitely allow even cheaper inference (I think GPT-4 under the hood of Bing already is that).

I found an comparison between davinci-003 and chatgpt. Seems like chatgpt is better at some things and worse at others; not a blanket downgrade: https://scale.com/blog/chatgpt-vs-davinci.

Also there is no more data flywheel:

Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in

How have they cut costs? I've seen answers I don't understand. It doesn't seem like they are necessarily running at a loss:

Quantizing to mixed int8/int4 - 70% hardware reduction and 3x speed increase compared to float16 with essentially no loss in quality.

A*.3/3 = 10% of the cost.

Switch from quadratic to memory efficient attention. 10x-20x increase in batch size.

So we are talking it taking about 1% of the resources and a 10x price reduction - they should be 90% more profitable compared to when they introduced GPT-3.

https://old.reddit.com/r/MachineLearning/comments/11fbccz/d_openai_introduces_chatgpt_and_whisper_apis/jaj1kp3/

I did some quick calculation. We know the number of floating point operations per token for inference is approximately twice the number of parameters(175B). Assuming they use 16 bit floating point, and have 50% of peak efficiency, A100 could do 300 trillion flop/s(peak 624[0]). 1 hour of A100 gives openAI $0.002/ktok * (300,000/175/2/1000)ktok/sec * 3600=$6.1 back. Public price per A100 is $2.25 for one year reservation.

https://news.ycombinator.com/item?id=34986915

Essentially, I had previously thought that better language models will need to be orders of magnitude larger and orders of magnitude more expensive. Now it seems clearer that we're moving towards a future of intelligence too cheap to meter. And "endless stupidity generator"? I just plain disagree, and feel compelled to build a thin client product on top of the API to prove it. There are many jobs that are mostly low-skill word manipulation, and now that you might get that 100x or 1000x cheaper, it opens up yet unimagined opportunities IMO.

I'm not sure how much of this to believe; if it is dumber, that has to be explained somehow.

In my experience, the subjective generic text completion (and iirc perplexity) – is worse, even if most downstream tasks end up being better, and while the mean of sensible instruction-following is higher for RLHF-version, the variance is much lower, so you struggle to get ChatGPT out of its midwit persona in a way that isn't true for davinci (though this may be damage from arbitrary preferences and political bias, rather than the inherent damage of the technique). Opt-in will probably be incentivized.

Even at int4 (which is still dumber in the absolute sense, just more efficient per byte used) a 175B model uses >2 A100-40gb, and if they mix layers it's more. And for a low-latency interactive app, the utilization of 50% sounds rosy – Google with PaLM barely broke 40% in their paper.

Assuming they use 16 bit floating point

We've already assumed int8/4 though, but in any case the bottleneck for interactive use is memory bandwidth, not flops. We'd need more details.

I know there exist (at least on paper) techniques for making inference way, way cheaper, probably even cheaper than you suggest, with minor loss in quality – just not sure they have deployed them yet.

I just plain disagree, and feel compelled to build a thin client product on top of the API to prove it

Here's some free lunch: take LLaMA-65B and rent some GPUs, it'll be 2x cheaper using all the same tricks and allegedly about as good.

There are many jobs that are mostly low-skill word manipulation

I wonder if we will discover that there are many jobs which are, in fact, net negative for the economy only when people doing them get automated away and their sinecures lose lobbying power .

I wonder if we will discover that there are many jobs which are, in fact, net negative for the economy only when people doing them get automated away and their sinecures lose lobbying power.

Already had been discovered.

At the best, about 20% of adult population is keeping all of us alive and comfy, the rest is engaged in digging holes and filling them again (only in comfy offices). UBI is already there, only in the most wasteful and inefficient form imaginable.

Who wants a job not as last desperate attempt to get money, but "to serve mankind", should find employment in water, sewer, power, food production/distribution etc...

I wonder if we will discover that there are many jobs which are, in fact, net negative for the economy only when people doing them get automated away and their sinecures lose lobbying power .

Interesting - which jobs come to mind? HR and environmental reg agencies jump to mind for me.

HR, environmental regulation, personal banking/financial advising, marketing and fundraising, low-risk medical diagnostics, social work, secretaries, most software operations (devops) teams, data analysis, first-pass website content moderation, (indirectly) higher education Arts departments, and some others I'm sure I'm forgetting.

I think the far more interesting thing is that most of the industries in the crosshairs of AI are majority-female; it's the first example of a technology that's going to primarily replace women (nearly all technology ever invented only replaces men) and I think the political effects of that are going to be very, very interesting when you combine it with the fact that a lot of the women currently in those jobs will be single and functionally sterile when they're replaced.

Which reminds me that I should probably buy some Petco stock.

I get a it confused by that list tbh.

A lot of these things are either not useless or not going to be automated in the near term. I might despise advertising and people working with advertising as much as the next dude but you're dreaming of you think AI is going to make this go away.

I shamelessly reposted this to the CW thread if you want to throw this in there. I tend to agree, given that women have social power maybe UBI will actually happen once women are affected?

I suspect it, too, is a gimped/lossily accelerated version, with the full one to be released under the «DV» brand as part of their Foundry product. I may be wrong, however, and ultimately this PR-driven nomenclature is making less and less sense, they have trained like a dozen major models at this point, and more finetunes. Same story with Google's PaLM.

More importantly, quantity has a quality all of its own. DV offers a context window of 32k tokens. If the model can be cheaply ran with such enormous contexts, this will more than compensate for some intellectual deficiency in the context-sparse text-prediction mode. You have seen effects of StableDiffusion mega-detailed prompting, and of prompt prefixes that amount to a single page – now imagine 20 pages of precise specification, few-shot examples, chain-of-thought induction, explicit caveats for every failure mode discovered in testing; basically a full employee's manual. Writing and testing these megaprompts may become a new job for ex-middle managers who have suddenly transformed into leaf nodes of the org chart – for a short while.