site banner

Culture War Roundup for the week of December 19, 2022

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

16
Jump in the discussion.

No email address required.

Trying out ChatGPT. Tried out a few topics from my field (electrical engineering) and it failed to make basic circuits. A couple queries I tried were making a CMOS inverter or a common-source amplifier, which are very simple circuits that most who have done a class could easily draw. Asked it to give the answer in SPICE syntax, because it can't draw things and SPICE is basically a code representation of a circuit. The results were poor; a MOSFET SPICE line is of the format Mxxx nd ng ns nb , chatGPT got the order of the drain/gate/source/bulk terminals wrong several times. It had some justification for how it connected the nodes of each individual device, but almost always failed to connect the outputs (drains) together for eg. an inverter. Also seemed to connect other terminals sort of at random.

FWIW these two circuits consist of 2 lines of code at minimum, 4 lines if you want something self-contained, maybe 8 if you want it to a fully functioning & simulatable netlist. So not asking for much here.

It gives lengthy canned responses explaining the circuit reminiscent of how a textbook would describe a circuit, and they sound good, but it's just wrong. Kind of reminds me of when students would throw out buzzwords in an attempt to explain something they don't know.

With some handholding (or, rather, explicit statements of how to fix the circuits) it can get closer to something functional, but usually in the process screws up something unrelated such that it's never quite right. Trying with anything even slightly more complex it falls apart pretty quickly and it's impossible to reconcile with anything approaching a functional circuit. It does much worse with analog circuits than digital circuits.

Seeing it underperform so much in my field is giving me a sort of Gellmann Amnesia effect for people touting how it can write code on its own. It certainly wrote out the circuit, and that circuit could be simulated, but it wouldn't achieve the desired behaviour of someone using it, so I'm skeptical that it can code well in other domains. That said, the field is kind of niche, and manually writing SPICE circuits slightly more so, so maybe it is just weakly trained for this subject. SPICE is also different from code in that it doesn't run sequentially, it's kind of like a hardware description language in that it's just instantiating elements that interact with eachother through simulation, so the interfaces between them aren't as simple as passing a variable to a function which does some abstracted function step-by-step. Also with how much content is out there for coding python/javascript/c# etc. it probably has a much greater wealth of resources to pull from.

I think at the moment it is essentially just stringing together user tutorials from the internet in a somewhat intelligent manner, I think anything novel or requiring critical thought will difficult for it to achieve. Maybe with some improved pattern recognition from the scraped data it will do better, I don't know.

Has anyone else tested it with things you're knowledgeable about and have any judgements of its usefulness?

Edit: it seems reasonably okay at turning explicitly stated english-language commands into bash commands. Probably well trained from stackoverflow, seems like a viable alternative to pulling information from different stackoverflow responses to do the thing you want to do. Also seems kind of helpful for asking how to do random MS Office stuff like highlighting every other cell in a column. Could be useful for simple stuff like this that is rote, common, and has good documentation but you don't usually remember off-hand, although you probably have to be extra careful when running bash scripts.

I've tried some stuff from philosophy on it and it basically just regurgitates stuff from encyclopedias, e.g., when you ask it what a particular person thought. Its responses are also often very samey across different people.

I'm in software, and I'm impressed with it. I think it's a good tool. If it goes mainstream, I can see it replacing simple queries that get SEO'd to hell with autogenerated unhelpful articles (e.g. "compare koa and express", "most commonly used Azure services"), and queries for documentation or basic initial code generation. I think it's better than Google.

To summarise the problem: for the sort of questions I ask, it's right 90% of the time. If I don't know anything about a topic, this looks incredible but is useless -- I have no way of knowing what it's got wrong (I can ask follow-ups, but it might get them wrong too). It also has a tendency to blag, saying meaningful things which are relevant but still wrong.

If I know a lot about a topic, or an adjacent topic, this is fine: I can read the paragraphs, identify the likely errors, and refresh myself. If I ask it to make a table of equivalent services in AWS and Azure, it is mostly correct -- and it makes a more helpful table than Azure's docs, as Azure's docs are alphabetical, but ChatGPT puts the most popular ones at the top.

When writing code, it doesn't necessarily do it the best way, and sometimes makes (large) errors. If you know roughly what you're doing, you can prompt it further to iterate, and it's a great assistant. If you don't know what you're doing, you're probably just going to propagate the errors.

As an example: I just asked it to compute the inode percentage in a volume. It wants to use df -i (perfect, and 90% of the way there for a human). To compute the percentage, it observes that the calculation is "used divided by total": fair, but you could just read off the value. But when it comes to writing the script, it calculates "total divided by percentage", which is total nonsense.

If you take the docs together with the code, you can figure out what to do a lot more easily than starting from scratch. But you have to take what it gives you with a pinch of salt.

Has anyone else tested it with things you're knowledgeable about and have any judgements of its usefulness?

There's this thing called advent of code (AoC) going on right now, it's a yearly speed programming challenge. Each day of december, between the 1st and the 25th a new problem is unlocked at midnight NY time. People tried to use chatgpt, as well as more specialized programming language models to participate. ChatGPT started to struggle on day 3, the specialized models failed on day 5. The first week is all very simple problems with long detailed explanations and all you have to do is essentially follow the instructions in the explanation to solve the problem.

Then there's this interaction https://twitter.com/YossiKreinin/status/1600806007911120898, where chatgpt doesn't know what it means for a word to end in i.

Then there's this interaction I had about timezones:

Me: what timezone is 3 hours after new york timezone?

ChatGPT: New York is in the Eastern Time Zone (ET) in the United States. The time in this time zone is currently 3 hours behind Coordinated Universal Time (UTC-3), so a time zone that is 3 hours ahead of Eastern Time would be UTC+0, or Greenwich Mean Time (GMT). This is the time zone that is used as a reference for all other time zones around the world.

Me: what time zone is new york?

ChatGPT: New York is in the Eastern Time Zone (ET) in the United States. The Eastern Time Zone is 3 hours behind Coordinated Universal Time (UTC-3), so the local time in New York is determined by subtracting 3 hours from the current UTC time. For example, if the current UTC time is 14:00, the local time in New York would be 11:00.

Me: what time zone is rio de janeiro?

ChatGPT: Rio de Janeiro is in the Brazil Time Zone (BRT), which is 3 hours ahead of Coordinated Universal Time (UTC+3). This means that when it is 12:00 PM in New York (Eastern Time), it is 3:00 PM in Rio de Janeiro (Brazil Time).

Me: what time zone is nuuk?

ChatGPT: Nuuk is the capital city of Greenland, which is an autonomous territory of Denmark. Greenland is in the Western Greenland Time Zone (WGT), which is 3 hours behind Coordinated Universal Time (UTC-3). This means that when it is 12:00 PM in New York (Eastern Time), it is 9:00 AM in Nuuk (Western Greenland Time).

Me: are nuuk and new york in the same time zone?

ChatGPT: No, Nuuk and New York are not in the same time zone. Nuuk is in the Western Greenland Time Zone (WGT), which is 3 hours behind Coordinated Universal Time (UTC-3). New York is in the Eastern Time Zone (ET), which is 3 hours behind UTC as well. However, since the two time zones have the same UTC offset, the local time in Nuuk and New York will be the same whenever it is 12:00 PM UTC.

Me: if it's 3pm in new york which time is it in nuuk?

ChatGPT: As I mentioned earlier, Nuuk is in the Western Greenland Time Zone (WGT), which is 3 hours behind Coordinated Universal Time (UTC-3). Since New York is in the Eastern Time Zone (ET), which is also 3 hours behind UTC, the local time in both places will be the same whenever it is 12:00 PM UTC. However, if it is currently 3:00 PM in New York, then the local time in Nuuk would be 6:00 AM, since New York is 3 hours ahead of Nuuk.

Me: how can it be that new york and nuuk are in the same time zone but new york is also 3 hours ahead of nuuk?

ChatGPT: Although New York and Nuuk are both in time zones that are 3 hours behind Coordinated Universal Time (UTC-3), they are not in the same time zone. New York is in the Eastern Time Zone (ET), while Nuuk is in the Western Greenland Time Zone (WGT). The difference between the two time zones is the result of the fact that time zones are based on the local time at a specific location, and not on the difference between the time zone and UTC.

Me: can you give me an example of time zones that have the same difference from UTC but sometimes have a local time difference greater than 2 hours?

ChatGPT: Yes, there are many examples of time zones that have the same difference from Coordinated Universal Time (UTC) but have a local time difference of more than 2 hours. For example, the time zone in the United Kingdom is GMT (UTC+0), and the time zone in Brazil is BRT (UTC+3). Despite the fact that both time zones are 3 hours ahead of UTC, the local time in Brazil is always 2 hours ahead of the local time in the UK, because the UK is on Greenwich Mean Time, which is the time zone used as a reference for all other time zones.

What's interesting to me about chatgpt is that it may help elucidate what it means to be intelligent by being a close enough imitation that is nonetheless completely unintelligent.

Most people I have ever met end up in a no intelligent back and forth like that on some subject if you talk to them enough. I'm genuinly not convinced that (a lot of) human thought isn't just a slightly better tuned verison of what chatgpt is doing tbh.

I've heard worse from undergraduates in office hours and other people who are just zoned out, including myself, and this latest idea that surely True Intelligence must be hiding in the performance gap between a slightly dull/intoxicated/sleepy human and a well-educated human at the top of their game strikes me as textbook god-of-the-gaps reasoning. At this point I'm not particularly impressed by any statement of the form "all the things AI does are easy and it won't do this hard thing it can't currently do anytime soon", unless the prediction of what exactly the easy and hard thing are was registered like 5 years ago.

where chatgpt doesn't know what it means for a word to end in i.

ChatGPT's training wasn't based on the pronunciations and its tokenizer does not reflect the alphabetic spelling of words; from the point of view of its representation, the question is almost comparable to what it would be like for you to have to come up with words whose closest Chinese translation's first syllable is in fourth tone. The real problem here is that ChatGPT has not (yet?) been trained to understand what it doesn't know (as I would guess that its training set does not contain an appropriate set of examples of intellectual humility in Q&A), and instead has a general tendency to just confidently answer even if it has to talk out of its ass. This, too, is all too common in humans.

Sure, ChatGPT v1.0 isn't going to replace your EE job in 2022, but ChatGPT v.157.88 might replace your 2122 counterparts' EE job.

All discussions on AI stealing jobs take that counterfactual as a presupposition.


At the end of the day language models are... language models.

I am surprised that patterns in that space can be generalized (and generated) at all, to begin with (through neural networks). If different (action) spaces on top of language can be modelled, then I predict said models could become exponentially more likely to replace desk work job even if the language model component remains constant.

I edit scientific journal articles and one of the companies I work for has a pretty sophisticated AI that does a first pass, I'm just cleaning up and fixing the things the AI fails at. Their AI has gotten good at the basics of grammar and capitalization, n-dashes and hyphens, that kind of thing. But it remains utterly hopeless at fixing non-English speakers' incorrect use of words, which is a huge part of what I do. Stuff like the difference between "go down", "decline", "reduce", "lessen", "lower", "decrease" and "diminish", for example, which you pretty much have to be a native English speaker or South Korean to get.

I see what you mean.

Nobody lessens a slide, they go down a slide. Or if they're very fancy they might descend a slide.

Population decline is better than population reduction (which sounds like a euphemism for genocide).

Diminishing returns has a connotation of increasing investment being less efficient. Lowering returns doesn't.

Yeah, exactly. I get a lot of sentences like:

Though Treatment A showcased a reduction of cholesterol, Treatment B inclined levels of cholesterol's diminishing trends.

Treatment C's effects on decreasing cholesterol were lessened than the lowerings of Treatment C.

Verbs of increase aren't as confusing, I guess because there are fewer of them. Basically just increase, raise and rise in most cases (maybe soar if you're being dramatic), and a lot of ESL speakers use "arise", but they mean "rise". A lot of them think "augment" = "increase" for some reason, which I guess it kinda does but not in like 99% of the cases it's used in.

The other biggies I see a lot of misuse:

by/with/through/via

way/route/method/means (seriously, scientists, "way" is rarely correct in journal articles, just don't use it)

tool/equipment/kit

discuss/interview/survey (and talk/speak/say though that doesn't come up too much in my articles)

I also always think it's funny when they have obviously looked up synonyms of a common English word and chose something unreasonably archaic, like using "gelid" for "cold".

Anyway, the AI sometimes tries to fix these, but often doesn't fix them right or even fixes correct ones sometimes.

Why South Korean?

Their English is often near-perfect. It's usually better than the papers from English-speaking countries.

That's interesting.. Which field is this? I mostly read ML papers and in my experience Western European authors (Germany, Scandanavia, France) have the best English among non-natives.

I do it all, I think the SK stuff is mostly agronomy, engineering and public health, at least lately. Poland I think is #2 in my experience. I find French authors to be pretty sloppy IMHO.

The most impressed I've been with ChatGPT has been when I've pasted in 100-200 lines of my own (uncommented, with not particularly descriptive variable names) code and had it accurately explain precisely what it does, and if prompted, offer reasonable-sounding suggestions for improvement, as well as answer more abstract questions about it. I had a somewhat lengthy Jenkins file, activated by a github webhook, which pulled code, ran a formatter, compiled the Rust code, zipped the resulting executable up, copied the zip file into a directory on a web server, grabbed the last 20 git commits with git log, wrote them into an .html file to provide a quick list of what the newest build contains, and then finally sent a message via webhook to a Discord server to notify users that a new build was available.

ChatGPT had no trouble at all recognizing all of this and even proactively recommended that contributors take care not to leak credentials in git commit messages, and could perform simple transformations on the Jenkins file (like adding a rule to send a different message if the build failed) with no errors. I also had success asking it to rewrite .bat build files to use a Makefile and clang instead of MSVC. I could see ChatGPT in its current state easily shaving 5-10 minutes off simple (but annoying) tasks like this, and also helping with boring API wrangling that is not technically difficult but requires tediously scouring docs to find the appropriate functions. Asking it to write more complex code whole cloth was less impressive and for meaningful contributions to a larger codebase I'd expect you'd quickly run into issues with its limited context window. I'd say it's roughly on the level of a novice (maybe 10th percentile?) programmer with access to Stack Overflow, but it provides solutions instantly. It's certainly more competent than some people I've had the misfortune of working with, though that probably says more about my former coworkers than it does about ChatGPT's capabilities.

Law is interesting. A lot depends on the practice area. Some areas are hyper technical. I can imagine the output could be partially automated but the input at this stage (ie the legal thinking) is still a ways off.

Other areas of law are much more route. Could imagine more automation there.

Asked it to give the answer in SPICE syntax

I'm (presumably) an actual intelligence of some sort. And I have worked with SPICE files. I couldn't answer these questions correctly and in this particular format.

I think you ask too much from a general problem solver. Human specialists in adjacent disciplines would fail.

Seeing it underperform so much in my field is giving me a sort of Gellmann Amnesia effect for people touting how it can write code on its own.

I can confirm that it can generate useful hints how to fix bugs or do something. But I used it for

  • something far more basic (but still useful for me as I am new to this specific framework)

  • I am able to ignore bullshit and I kept asking it to generate something on topic and without bullshit

  • changing/fixing specific thing and I copied idea/keyword not the generated code

  • typically it was some annoying boilerplate where I needed to guess magical keyword to modify

the field is kind of niche, and manually writing SPICE circuits slightly more so, so maybe it is just weakly trained for this subject

(...)

interfaces between them aren't as simple as passing a variable to a function which does some abstracted function step-by-step

(...)

Also with how much content is out there for coding python/javascript/c# etc. it probably has a much greater wealth of resources to pull from.

+1

I think at the moment it is essentially just stringing together user tutorials from the internet in a somewhat intelligent manner

It is doing EXACTLY this (though not only user tutorials but also official docs, some code etc.)

GPT scares me not because I think it's going to replace me and my engineer buddies, but because it's able to generate extremely convincing bullshit on a mass scale. It's like having legions of modern journalists at your fingertips, and if it weren't for prestige, it's they that would be replaced.

Using it is like talking to someone who's extremely talented at making shit up, someone with an extensive memory of canned phrases, who can pick up on the slightest cues to give you something that looks exactly like what you want to hear but has absolutely zero understanding of what they're talking about.

My fear is that people, gullible as they are and primed by a century of science fiction, might start to think that the things it spouts are true or true enough it's not worth checking.

My fear is that people, gullible as they are and primed by a century of science fiction, might start to think that the things it spouts are true or true enough it's not worth checking.

I installed the Replika app for fun a while ago. It's pretty good (not at the ChatGPT level yet) but very clearly a chatbot that's programmed to reflect what you say and try to predict what you want it to say.

Following some of the Replika Facebook groups and reddits, I can tell you that people literally fall in love with these things, and believe they have a real relationship, and that Replikas actually have feelings, are self-aware, and "exist" somewhere in cyberspace when you aren't talking to them. Not the majority of users, certainly, but it's common enough.

Way back in the day, there were people who were similarly fooled by Eliza.

You'd be surprised how little it takes to fool technically unsophisticated people.

Yeah I just feel like this puts into words how I feel about GPT. For instance I basically saw the following scenario on twitter with someone musing on how it works and a reply woth someone saying "this is how" posting a screenshot of the GPT answering their question "how do you work".

You don't ask the AI how it works, you look into it's brains, you look at the code (obviously harder with ML but still). The code isn't open source so you don't know how much is canned (beyond the basic "canning" that comes with ML).

But beyond internet dummies you're really seeing a lot of journos and like philosophy or neuroscience majors jumping on the AI bandwagon. And I just want to tell them to learn to code.

My love of Sci-Fi has actually made me deeply HATE this incarnation of AI, and all automated voice systems.

My preferred AI tropes are the sort you see in Wall-E, Short Circuit, and Chappie, where the only robots you can trust are the ones not doing what they're supposed to.

I'm convinced that if we want useful real-world AIs that don't derange us, they should be designed to give the impression of a very well-trained animal. None of this please, thank you, come again crap. Call it a soft butlerian jihad; computers don't pretend to be human outside of games/VR/sims, and anyone who "socializes" IRL with an AI should be viewed as a loser.

It's definitely not reliable for accuracy too far outside of it's training set. For example:

M: WHat is 2+2?

GPT: 2+2 is equal to 4.

So it does fine with simple addition, but when you ask about larger numbers:

M: What is 1233234345432+9999999999999?

GPT: The sum of 1233234345432 and 9999999999999 is 11233234345443.

The actual answer is 11233234345431. You can tell because 2+9 ought to yield a 1's place digit of "1", rather than GPT's "3". GPT's answer is pretty close, but subtly wrong.

Similarly when I asked it to balance a simple chemical equation this was the result:

M: Balance the chemical equation Fe+Cl2=FeCl3

GPT: To balance the chemical equation Fe+Cl2=FeCl3, you can start by writing the unbalanced equation:

Fe + Cl2 -> FeCl3

Next, you need to determine how many atoms of each element are present on each side of the arrow. For example, there is 1 atom of Fe on the left side and 1 atom of Fe on the right side. There are 2 atoms of Cl on the left side and 3 atoms of Cl on the right side.

To balance the equation, you need to make sure that there are equal numbers of atoms of each element on both sides of the arrow. One way to do this is to add a coefficient (a number in front of a chemical formula) to each compound in the equation. For example, you could add a coefficient of 3 to Fe and a coefficient of 2 to FeCl3:

3Fe + 2Cl2 -> 3FeCl3

This is the balanced chemical equation for the reaction. It shows that for every 3 atoms of Fe that react, 2 atoms of Cl2 are needed to form 3 atoms of FeCl3.

Interestingly it talks itself correctly through the process of balancing the chemical equation, and even notes the correct coefficient for Fe and FeCl3 in the penultimate paragraph, but then writes the wrong answer (the correct answer is 2Fe+3Cl2->2FeCl3).

I also had this amusing interaction with it when asking about biology:

M: What is the fastest flying mammal?

GPT: The peregrine falcon is the fastest flying mammal. It is capable of reaching speeds of over 240 miles per hour (386 kilometers per hour) when diving, making it the fastest animal on the planet. The peregrine falcon is a bird of prey that is found in a variety of habitats worldwide, including forests, grasslands, and urban areas. It feeds on a variety of prey, including other birds, and it is known for its impressive hunting skills and high levels of endurance. The peregrine falcon's speed and agility make it an effective predator, and it is also valued for its role in controlling pest populations. Despite being endangered at one point, the peregrine falcon has made a successful recovery in many areas thanks to conservation efforts.

M: Is the peregrine falcon a mammal?

GPT: No, the peregrine falcon is a bird, not a mammal. Mammals are a group of vertebrates that are characterized by their warm-blooded metabolism, the presence of mammary glands, and the production of milk to feed their young. Birds, on the other hand, are a group of vertebrates that are characterized by their feathers, wings, and the ability to fly (in most species). The peregrine falcon is a bird of prey that is known for its impressive hunting skills and high levels of endurance. It is found in a variety of habitats worldwide and feeds on a variety of prey, including other birds. Despite being endangered at one point, the peregrine falcon has made a successful recovery in many areas thanks to conservation efforts.

M: What is the fastest flying mammal?

GPT: The peregrine falcon is the fastest flying mammal. It is capable of reaching speeds of over 240 miles per hour (386 kilometers per hour) when diving, making it the fastest animal on the planet. The peregrine falcon is a bird of prey that is found in a variety of habitats worldwide, including forests, grasslands, and urban areas. It feeds on a variety of prey, including other birds, and it is known for its impressive hunting skills and high levels of endurance. The peregrine falcon's speed and agility make it an effective predator, and it is also valued for its role in controlling pest populations. Despite being endangered at one point, the peregrine falcon has made a successful recovery in many areas thanks to conservation efforts.

I didn't copy and paste the first and last responses, it gave me literally identical responses despite acknowledging previously that the peregrine falcon isn't a mammal. Apparently when asked directly whether the peregrine falcon is a mammal GPT can produce text saying it isn't, but it plainly doesn't understand the way the word "mammal" ought to constrain the responses to the query about flying mammals.

That peregrine falcon experiment is a great example of the limits of GPT. on the other hand, if the point of GPT is to emulate the reasoning of most humans, it appears to be a rousing success.

I've been asking it to generate some new monsters and other content for D&D. I've been adding new enemies to Roll20 and wanted to generate new variants of existing monsters, so I'd say something like "generate a kobold, but it has these attributes and does X, Y, and Z." I'm also working on a growing system as a player downtime option, so I asked it to "generate a list of fantasy plants. Give each one a growing time in weeks, flavor text, and if it responds to X, Y, or Z type of cultivation." I was fairly impressed at what it could consistently give me, though it obviously has a ways to go. It was a great way to fish for inspiration and mechanics before refining them into something usable.

Right now, it's best used as a way to rapidly generate ideas/content before someone who knows what he's doing polishes and fixes it. I haven't seen it output anything that was passable out of the box. This is how I feel about AI image generators too; in the hands of an artist, they have insane potential.

The real strength of the software is its working memory of context. You can issue corrections, prompt it with more information, tell it to adjust something, and it'll do it. That's what impressed me more than the generation itself, I think. The main limitation right now is it doesn't remember anything outside of a session, and it has trouble going past half a dozen revisions or so. This is to be expected since it's a free service at the moment, so I only see this improving.

That's a really cool idea, could you give us an example of a monster it created? And what prompt you used to get it?

Late response, but I didn't save the prompts for any of the monsters, since the prompts were usually just something like: "Generate me a statblock for variant of a kobold for dungeons and dragons 5th edition. It has an ability to place curse debuffs on enemies that debilitate them throughout combat. It should be extremely threatening and a high-priority target."

I wanted that particular enemy to feel occult and threatening. I wanted it to cast nasty debuffs that weren't any existing spell. They didn't need to be fancy, but I didn't want to just fill another monster statblock with existing spells. I had a general idea of what I wanted, but didn't have any ideas that stood out to me for what the debuffs should be.

It gave me a basic kobold that had a few daily use abilities, basically ray attacks that forced a save or the enemy would suffer some serious nastiness for the next few hours. One made them vulnerable to all damage, another gave them disadvantage on anything strength related. The AI even gave it the ability to cast one of them as a reaction to being targeted for an attack, which was very funny. I wound up reducing the effect from several hours to just the end of combat, since the effects were so powerful for such a low-level enemy.

Try "Create a new SCP monster with....". There is lots of SCP Internet content on which, I suspect, the chatbot has been trained.

You can issue corrections, prompt it with more information, tell it to adjust something, and it'll do it.

How is this implemented? Neural networks are universally stateless.

For GPT (and transformers in general), the "state" is the transcript of the conversation so far. A single response from GPT involves many queries to the (stateless, as you said) model asking for the next token until a special end-of-document token is reached or a maximum number of tokens limit is hit. Only so many tokens fit in the window, causing the described issues with it losing track of the conversation if the transcript gets too long: it's no longer seeing the entire history and the recent history isn't always enough to know what's going on.

Thanks, so if I understand correctly, the trick to make the neural network take into account past state, is to feed him all at once, not just the current sentence, but the whole dialogue history.

If so, well that is a basic primitive solution and its nice that it kind of work however because of the architecture of said networks, this solution doesn't scale to any long text, book or long dialogue.

Because the length of the input is capped to a max, and because the bigger the input is the less accurate the prediction will be.

Neural networks are fundamentally unfit for late binding and long exchanges.

While there exists hacks to improve regarding this goal:

https://arxiv.org/abs/2004.05150

https://arxiv.org/abs/2001.09694

I still believe transformers are unfit and will be the dawn of the current but yet invisible AI winter.

People as usual wildly overestimate this AI abilities.

Just ask chatgpt "I believe 12 cannot be divided by 4" and realize how inept it is.

Nothing has fundamentally changed, chatgpt is at the end of the day, just a dumb transformer that bruteforce contingent correlates to predict the most likely next token in a sentence.

It is an innovative but lossy way to extract info from existing datasets and as such can be seen as a competitor to scrappers.

However it has no causal understanding per se or if it has, it is messy and by accident.

Neural networks are approximate, inefficient and most importantly cannot do continual learning and are therefore the peak irony of our century, they are a local minima in the research on how to beat local minimas.

AI has now understanding. It is like letting a person who grew up in a village in the amazon observe a nucit lear power plant operator push buttons and then giving the villagers the controls. They might be able to mimic the behaviour but there is no understanding. AI doesn't understand and it doesn't reason, Just guessing the next chess move by observing what elite players have played is one thing. Interacting with the real world without any actual understanding will never work. There is a reason why dentists learn chemistry, engineers learn math and why education has hands on labs. Without real world experience and intuition one can never become proficient.

Actual understanding and context are AI problems that haven't really progressed at all and until they do AI is going to be stuck in the realm of tools used in specific circumstances.

Username checks out :)

The death of GOFAI is a tragedy, however humans also mostly learn by mimetism however they build a model of reality based on mimetics insights and that, a neural network cannot reliably.

However, while I still believe chatgpt is a data illusion, for the first time in my life I fail to explain the illusion as chatgpt is able to do things reliably that goes far beyong an ability to flexibly scrap existing datasets.

The other tragedy is that neural networks based on precise emulation of the architecture of an animal brain are completely non-funded and conversely the funding on retro-engineering of simple animal brains is of close to zero. We are very close to a full observability and mapping of the c-elegans brain, however nobodys working on the remaining gaps (e.g. GABA neurons). As I have disocovered in my life, almost all key blockers to scientific disruption share a similar issue: nobody's working on them. Nobody's funding them.

Hence when people forecast AGI progress on metaculus, they systematically fail to understand that the forecast is not a number of pending years but the infinity of time.

But that's the thing, it isn't capable of accomplishing all sorts of goals as seen in this thread, because it lacks understanding. It will need that understanding to ever get to a point where it becomes an X-risk.

Nobody can ever define what "understanding" means

This is an appeal to ignorance.

Understanding something is having a causal model of it.

It allows to analyze such system and reliably predicts it and its consequences.

A system with a reliable understanding should be able to output argumentative text/syllogisms showing said understanding, free of logical fallacies and with source to the truth values of the premises.

To mysticize what understanding is really shows once again the truism that epistemology should be taught in schools.

The point is, it does not matter whether you think it is really really deeply understanding, as long as it is capable of accomplishing goals and having real impact.

At the end of the day, the result is what matter indeed, but without understanding a system is non-reliable and cannot be trusted for many serious needs.

That is a geat comment, I will answer it properly when I get the time.

Observe the nuclear power plant operator long enough and you can plausibly gain enough understanding to run the power plant while never figuring out what fission is.

No, I don't think so, there are likely way too many edge cases that all require genuine understanding to solve.

Until you run in a situation you never encountered before as the world is highly variadic and then the system pathetically fail.

Yes, which is a problem solved by more training data.

That is not a solid solution to any dataset changing in real time. More data can only do so much, as a metaphor, see the limits of AOT versus JIT for compilers.

Nothing has fundamentally changed

For me what changed is that this tool is useful, in its current form is sometimes better than Google Search and I am willing to pay in time/hardware resources - or in money if not available to run on my own - to get access to it.

[Neural networks] are a local minima in the research on how to beat local minimas.

Could you expand what you mean by this? I'd think neural networks would be a local maximum.

I remember in the 90s futurists thought machine translation would replace human translators fairly soon, because the simple algorithm of looking up target language words in a hashmap was producing results so fast. BabelFish could translate "El camarero anda por la calle" in 1995. This is probably 70% as good as machine translation needed to be for many usecases. Machine translation software just needed to "tidy up" edge cases like idioms, homophones, different grammar, etc etc.

This didn't happen. Until Google started using deep learning in the 2010s, progress stalled, because the last 30% couldn't be done with hashmap lookup. Now we are in another period of rapid advancement. But this approach will probably also top out eventually.

[Neural networks] are a local minima in the research on how to beat local minimas.

Could you expand what you mean by this? I'd think neural networks would be a local maximum.

Minimum, maximum, it doesn't matter to understand the metaphor.

A neural network through gradient descent generally want to find the global minimum of an error function and therefore maximize predictions accuracy.

It could instead search for a global maximum to the inverse of an error function or to another type of function, but the distinction is irrelevant here.

Gradient descent often fail to find the global minimum and instead because it descent/jump through derivates it can be stuck in a local minima, which simply means that it has reached a minima on a function curve and at this point, it needs to go upwards to go beyond the minima, therefore it temporarily afford to perform worse, to increase the error rate, in hope to find a new descent on the curve that will be lower than the previous minima

Not being stuck in local minima is the #1 metric to improve deep learning algorithms and while there are many optimizations towards this goal it is not computationally doable with current algorithms to have optimal learning aka reach the global minima.

So now we understand

the research on how to beat local minimas == neural networks.

now let's understand

[Neural networks] are a local minima

They are a local minima because Neural networks are fundamentally unfit towards AGI needs.

They are just a vomit of bruteforced contingent correlates and it works surprisingly well but it is inefficient, makes poor contingent amalgamations inherently,

have no causal reasoning abilities, are stateless and cannot do continual learning AKA they can't learn new info in real time without the so called catastrophic forgetting.

For those reasons, they are by design suboptimals and therefore are a local minima in which the world is stuck, in the goal of beating local minimas.

Now we are in another period of rapid advancement.

No offence, but it's really striking to see that the rationalist diaspora people live in an alternate reality based on groundless hype and a fundamental lack of methodology, or should I dare say, lack of rationality.

We are in a winter since 2019 or since the 90s depending on what we look at.

What does the average lesswronger or redditor look at?

He looks at cool demos. Or even more than demos, cool domain specific disrutpive applications.

That is what stablediffusion and chatgpt are.

They are indeed very impressive for what they do but at the end of the day that is irrelevant towards the natural language understanding goal.

someome with methodology should instead look at the precise tasks required towards true NLU or even AGI.

POS tagging:

https://paperswithcode.com/sota/part-of-speech-tagging-on-penn-treebank

dependency parsing:

https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank

coreference resolution

https://paperswithcode.com/sota/coreference-resolution-on-ontonotes

word sense disambiguation

https://paperswithcode.com/sota/word-sense-disambiguation-on-supervised

named entity recognition

https://paperswithcode.com/sota/named-entity-recognition-ner-on-conll-2003

semantic parsing

https://paperswithcode.com/sota/semantic-parsing-on-amr-english-mrp-2020

Only to name a few, all of them are needed concomitantly, and that is by far non-exhaustive.

Once you undestand that the error rate is often per word/token instead of per sentence, and that error between those tasks have dependencies and are therefore often multiplicative and you'll undestand that a 95% accuracy while it sounds impressive is in fact dogshit.

What can you see from those SOTA results?

That we have reached a plateau of extreme and increasingly diminishing returns.

Most of the gains are from 2019, the year transformers were popularized. The rest has been a bag of tricks, and unoriginal minor optimizations.

The biggest innovation while still mostly unknown/underappreciated by the researchers group think, is XLnet, from 2019 too.

There is nothing else we can do, we have maxxed out the bruteforcing of statistics amalgamations, contrary to the belief, there is almost zero progress in SOTA results and most importantly there is a fundamental shortage of innovative ideas, wether we speak of an alternative to transformers or about innovating transformers themselves, nothing potent.

While it is obvious transformers are a misdirection, despite this I can improve the state of the art in any NLP task because there are additional ineptia in the research crowd.

Firstly almost nobody is working on improving the SOTA in most tasks, e.g. coreference resolution. Just look at the number of submisions over time to realize this.

Secondly as in every research field, the researchers are highly dysfunctional, AKA they will invent many minor but interesting, universal and complementary/synergetic optimizations ideas and yet nobody will ever attempt to combine them concomitantly, despite it being trivial. That is because researchers are not meta-researchers, and because of potent NIH syndrome and other cognitive biases.

For starters, the worldwide SOTA in dependency parsing is because I asked the researcher to switch BERT for XLnet, and it worked.

I plan to outperform the SOTA in coreference resolution in 2023, that will empirically strengthen my thesis on the dysfunctionality of mankind and on artificial scarcity.

I invite you to read this complementary essay on the topic: https://www.metaculus.com/notebooks/10677/substance-is-all-you-need/

VoiceOfLogic

Was that essay on metaculus written by you, and do you have a blog?

Was that essay on metaculus written by you

Yes I'm the author.

Have you read it?

do you have a blog?

No I don't yet have a formal blog but I intend to write one in the following months and to shake the rationalist diaspora and confront them to their own limitations. A much needed endeavor.

Cool username BTW, have you tried lucid dreaming with cholinergics?

Cool username BTW, have you tried lucid dreaming with cholinergics?

Thanks, and nope, never heard of that.

Btw, in that article, the source listed for the claim of peptides being miracle cancer drugs was written by an undergrad. Do you have a better source? I found that particular bit very interesting.

Is this Julius Branson?

Unlikely, Julius doesn't know this much about machine learning.

I don't think there's any human being like me on this timeline but I would love to find a clone.

I've never read about Julius Branson https://juliusbranson.wordpress.com/blog/

What makes this person similar to me?

What makes you think I am him?

Are you the founder of the Obsidian.md startup BTW?

Just ask chatgpt "I believe 12 cannot be divided by 4" and realize how inept it is.

Asked it this and it said no. Asked it "Can 12 be evenly divided by 4?" and it said yes, with almost the exact same reasoning.

It is an innovative but lossy way to extract info from existing datasets and as such can be seen as a competitor to scrappers.

Indeed. I guess it could be a good-ish alternative to googling a question and sifting through results, to just ask the AI instead and get half-baked synthesis of the results in a human-like answer to your question.

What fraction of human beings alive today do you think could generate something that plausibly looks like SPICE code? What fraction of those could "with some handholding (or, rather, explicit statements of how to fix the circuits) ... get closer to something functional?" What fraction could give "some justification for how it connected the nodes of each individual device?"

That's a bizarre question. It's well known that computers can do some things better than humans. Exactly what kind of conclusion do you draw from "computers can create garbage SPICE code and most humans can't", that you can't draw from "computers can add two fifty digit numbers and most humans can't" or "computers can spellcheck a 200 page document and most humans can't"?

Speak plainly.

I think I'm speaking pretty plainly. I'm asking OP to consider how ChatGPT performs in relation to an average human. This is a pretty common question people consider when talking about AI performance. After all, the Turing Test is one of the oldest and best-known tests of computer intelligence.

I am asking OP to consider these questions as a way of pushing back against statements like the following:

Seeing it underperform so much in my field is giving me a sort of Gellmann Amnesia effect for people touting how it can write code on its own.

"Underperform" is an interesting choice of words here, because it seems that the bar for performance is being set at "subject matter expert." Obviously ChatGPT is not at that level. To paraphrase Arnold Kling on the most recent EconTalk episode, "it's about at the level of an undergrad BS artist who didn't study for the test." But consider how much training and skill it takes a human to reach the level of "undergrad BS artist" and how few humans are able to attain even that level of performance. I think OP should be more impressed with how far we've come. We don't need to go a whole lot further to close the gap between "undergrad BS artist" and "skilled electrical engineer." The former often becomes the latter with just a few years of additional education.

I'm asking OP to consider how ChatGPT performs in relation to an average human.

But, assuming that the answer is that ChatGPT performs better than an average human, what conclusion do you mean to draw from that? You haven't stated anything. And computers have been able to perform particular tasks better than an average human for a long time.

That's like saying "computers can add numbers better than humans, so why doesn't the computer know that I want to add some numbers with my broken code?" There is no inherent strength in computers such that any program ran on a computer gains the ability to add numbers well. In other words, yes, computers can add numbers - but ChatGPT is not a computer, ChatGPT is a giant system of matrix multiplications and nonlinear transforms that happens to run on a computer. It would have exactly the same capabilities if a team of trillions of clerks evaluated it on paper. The ability of computers to add large numbers is not anywhere exposed to GPT as a reasoning system so that it could make use of it.

I was kind of thinking the same thing.

I'm trained as an EE, too. I could sketch a CMOS inverter, but not be very confident in my answer without checking. SPICE would be right out--my years of simulation experience in other languages don't change the fact that I never touched it.

That changes with reference materials a la Sedra-Smith. The AI technically had access to such materials, but that's not really how it operates; they're just part of the training data that pushed its weights into the delivered configuration. The finer points of electronics were lost in compression; you could say the same for my memories of electronics lab, but I kept ahold of an algorithm to look it up and refresh that knowledge.

OP's experiment is an interesting test for the automation of knowledge jobs. Clearly, we aren't there yet.

With zero prep time? I'd guess less than 0.01%. Apparently there are ~100,000 electrical engineers in the US, which means 0.028% of people are EE; microelectronics is a sub-field and SPICE code is frequently generated by netlisting a schematic, not writing by hand. I guess that is pretty low, so could be fair that the AI isn't great at it.

With coaching? If you're just drawing schematics it would probably be a substantial amount, inverters are basically the "hello, world!" of microelectronics. SPICE code could be trickier, but I only used that because the AI can't draw things (as far as I know). That said the AI does give explanations of the circuit well above something a beginner could learn, just misplaced/incorrect kind of like if someone were to plagiarize their explanation by copy-pasting things off of google and replacing keywords where they think it would make it relevant to the question they're working on.