@dr_analog comments on "OpenAI seminar at work has me thinking of the "Ai Bubble"

OpenAI seminar at work has me thinking of the "Ai Bubble"

I recently attended a seminar at work lead by openAI (whom my company is paying for tools) which was billed as an opportunity to learn more about using AI to do our jobs more effectively. I attended mostly because I assumed there would be some technical discussions about the technology (which was largely absent) and maybe some interesting demos showing how someone used openAI’s product to solve technical problems (also absent). Instead, I was treated to a bizarre presentation, which felt strangely paternalistic and maybe even a little desperate? In order of events:

The presentation opened with a discussion of the (impressive) scale of the data centers that openAI will be deploying + a little bragging about sora 2 (I promise you none of the scientists or engineers present give a shit about sora 2)
It proceeded to a gentle haranguing focused on how we should not resist using AI, and that in every organization AI will become more popular as a few high performers learn how to use it to get ahead (ok, some demos would be great, openAI’s tools have been available for months, now would be a great time to show how a co-worker has used it solve a complex problem)
Some discussion about how scientists and engineers tend to be bad at using AI relative to manager’s/procurement people/ executives/lawyers and others with what I would characterize as paper pushing roles where accuracy isn’t actually that important.
Which finally devolved into a q&a. The most charitable questions went something like the following: Hi I am a $tpye_of_physical_scientist I love using your tool to help write python code, but it is completely worthless for helping me solve any kind of problem that I don’t already understand very well. For example, here is a tomography technique that I am aware of people using in another industry that I am mostly unfamiliar with. Right now, my approach to using this would be to read papers about how it works, try to implement it and maybe contact some other experts if I can’t figure it out. Wouldn’t it be great if I could just upload the papers about this technique to your bot and have it implement the new technique, saving myself weeks or months of time. But if you try this basic approach you usually end up with something that doesn’t work and while the bot might be able to give some superficial explanation of the phenomenon, it doesn’t add much to me just doing the background research / implementation myself and comes off as feeling like a waste of time. The response to these questions was usually some variation of the bot will get better as it scales and that you should be patient with it and make sure that you are prompting it well so that it can lead you to the correct solution.

Which brings to my primary point: which is that I am someone who has consistently tried to use AI at work in order to be effective, and while it helps somewhat with code creation, it isn’t a particularly useful research tool and doesn’t save me very much time. Apparently my co-workers are having much the same experience.

It really seems to me that openAI and their boosters believe (or would have me believe that they believe) that transformers really are all that you need and at some point in the near future they will achieve a scale where the system will rapidly go from being able to (actually) help me do my job to being able to comfortably replace me at my job. And the truth is that I just am not seeing it. It also seems like a lot of others aren’t either, with recent warnings from various tech leaders (Sam Altman for instance, by the way what possible motive for making Ai bubble statements unless it’s an attempt to prevent employees from leaving to start found their own startups).

I have been very inclined to think that this whole industry is in a bubble for months, and now that the mainstream press is picking up on it, it’s making me wonder if I am totally wrong. Id be interested if others (especially anyone with more actual experience in building these things) can help me understand if I either just suck at using them or if my “vibes” about the current state of the industry are totally incorrect. Or if there is something else going on (ie. can these things really replace enough customer service or other jobs to justify the infrastructure spend outs).

Jump in the discussion.

No email address required.

dr_analog top 1% of underdog fetishists 4mo ago · Edited 4mo ago

I use ChatGPT pretty much all day every day but as a replacement for Googling mostly. It's great at pinging a dozen news sources on a issue and giving me more information than I'd get from reading a single article (and it's usually not wrong).

If I have trivial code to write in an unfamiliar framework it's good for that too.

It's also good for teaching me entry level stuff in a new topic faster than anything else.

It's generally better at telling me what's wrong if I paste an error message than anything I'd get from Googling.

And that's about it. And this is awesome, don't get me wrong.

But everything else it kind of sucks at. And not just ChatGPT, but Claude (including Claude code as well).

If I ask for help in a mature codebase it will almost certainly waste my time. Ask it for more subtle plot details of a popular sci-fi book that you just read and you will see how hard it hallucinates.

I would be quite worried about doing science or medicine with it if I can't rapidly verify its information.

It's sort of hard to see this improving very quickly? They've run out of gains from training on all of the internet. Inference costs are increasing exponentially but the gains in intelligence are only increasing logarithmically. You will note that the model that they used to win the Math Olympiad is very much not available to the public. Why? Perhaps because it cost millions in inference to do it.

It sure seems like other architectural breakthroughs are needed to keep scaling, and I don't see those as guaranteed.

Or, as Yannic Kilcher put it, "we have entered the Samsung Galaxy era of LLMs"

Context

faul_sname Fuck around once, find out once. Do it again, now it's science. dr_analog 4mo ago

I've had luck with certain time-consuming rote tasks in medium-large codebases (1M - 10M LOC) like writing good tests for existing legacy code.

Here is some code.
Here are some examples of tests for other code which are well-structured and fit with the house style [style doc]. Note our conventions for how to invoke business logic. Note particularly that we do not mock injected dependencies in functional tests, other than the ones in [this short enumerated list].

Identify the parts of the code I just handed you which look sketchiest.

Write some functional tests for the code under test, mimicking as closely as possible the style and structure of the canonical examples of good tests

Use this command to run your new test, iterating until the test passes.

You can use this tool to identify which lines were tested - try to have passing tests that exercise as many lines of code as practical of the ones you identified as sketchy in step 1.

Perform these linting and code quality assessment steps in order, redoing all previous steps on each change.

If at any point during this process, you identify a bug in the code you are writing a test for, describe the bug, propose a fix for the bug, and stop working.

It's not doing anything I couldn't have done, it's not even faster than me in terms of wall-clock time to get a good functional test, but I can kick it off in the background while I'm doing other things and come back to some tests that definitely pass and probably even test the stuff I want to test in something approximating the way I want to test it.

fmac Ask me about bike lanes dr_analog 4mo ago

Seconding all of this as incredibly true to my experience. If it wasn't for the references to you job being coding, I'd wonder if this was an alt account of mine that I post on while sleep walking.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats