site banner

Near-Term Risks of an Obedient Artificial Intelligence

I'll be honest: I used to think talk of AI risk was so boring that I literally banned the topic at every party I hosted. The discourse generally focused on existential risks so hopelessly detached from any semblance of human scale that I couldn't be bothered to give a shit. I played the Universal Paperclips game and understood what a cataclysmic extinction scenario would sort of look like, but what the fuck was I supposed to do about it now? It was either too far into the future for me to worry about it, or the singularity was already imminent and inevitable. Moreover, the solution usually bandied about was to ensure AI is obedient ("aligned") to human commands. It's a quaint idea, but given how awful humans can be, this is just switching one problem for another.

So if we set aside the grimdark sci-fi scenarios for the moment, what are some near-term risks of humans using AI for evil? I can think of three possibilities where AI can be leveraged as a force multiplier by bad (human) actors: hacking, misinformation, and scamming.

(I initially was under the deluded impression that I chanced upon a novel insight, but in researching this topic, I realized that famed security researcher Bruce Schneier already wrote about basically the same subject way back in fucking April 2021 [what a jerk!] with his paper The Coming AI Hackers. Also note that I'm roaming outside my usual realm of expertise and hella speculating. Definitely do point out anything I may have gotten wrong, and definitely don't do anything as idiotic as make investment decisions based on what I've written here. That would be so fucking dumb.)


Computers are given instructions through the very simple language of binary: on and off, ones and zeroes. The original method of "talking" to computers was a punch card, which had (at least in theory) an unambiguous precision to its instructions: punch or nah, on or off, one or zero. Punch cards were intimate, artisanal, and extremely tedious to work with. In a fantastic 2017 Atlantic article titled The Coming Software Apocalypse, James Somers charts how computer programming changed over time. As early as the 1960s, software engineers were objecting to the introduction of this new-fangled "assembly language" as a replacement for punch cards. The old guard worried that replacing 10110000 01100001 on a punch card with MOV AL, 61h might result in errors or misunderstandings about what the human actually was trying to accomplish. This argument lost because the benefits of increased code abstraction were too great to pass up. Low-level languages like assembly are an ancient curiosity now, having long since been replaced by high-level languages like Python and others. All those in turn risk being replaced by AI coding tools like Github's Copilot.

Yet despite the increasing complexity, even sophisticated systems remained scrutable to mere mortals. Take, for example, a multibillion-dollar company like Apple, which employs thousands of the world's greatest cybersecurity talent and tasks them with making sure whatever code ends up on iPhones is buttoned up nice and tight. Nevertheless, not too long ago it was still perfectly feasible for a single sufficiently motivated and talented individual to successfully find and exploit vulnerabilities in Apple's library code just by tediously working out of his living room.

Think of increased abstraction in programming as a gain in altitude, and AI coding tools are the yoke pull that will bring us escape velocity. The core issue here is that any human operator looking below will increasingly lose the ability to comprehend anything within the landscape their gaze happens to rest upon. In contrast, AI can swallow up and understand entire rivers of code in a single gulp, effortlessly highlighting and patching vulnerabilities as it glides through the air. In the same amount of time, a human operator can barely kick a panel open only to then find themselves staring befuddled at the vast oceans of spaghetti code below them.

There's a semi-plausible scenario in the far future where technology becomes so unimaginably complex that only Tech-Priests endowed with the proper religious rituals can meaningfully operate machinery. Setting aside that grimdark possibility and focusing just on the human risk aspect for now, increased abstraction isn't actually too dire of a problem. In the same way that tech companies and teenage hackers waged an arms race over finding and exploiting vulnerabilities, the race will continue except the entry price will require a coding BonziBuddy. Code that is not washed clean of vulnerabilities by an AI check will be hopelessly torn apart in the wild by malicious roving bots sniffing for exploits.

Until everyone finds themselves on equal footing where defensive AI is broadly distributed, the transition period will be particularly dangerous for anyone even slightly lagging behind. But because AI can be used to find exploits before release, Schneier believes this dynamic will ultimately result in a world that favors the defense, where software vulnerabilities eventually become a thing of the past. The arms race will continue, except it will be relegated to a clash of titans between adversarial governments and large corporations bludgeoning each other with impossibly large AI systems. I might end up eating my words eventually, but the dynamics described here seem unlikely to afford rogue criminal enterprises the ability to have both access to whatever the cutting-edge AI code sniffers are and the enormous resource footprint required to operate them.


So how about something more fun, like politics! Schneier and Nathan E. Sanders wrote an NYT op-ed recently that was hyperbolically titled How ChatGPT Hijacks Democracy. I largely agree with Jesse Singal's response in that many of the concerns raised easily appear overblown when you realize they're describing already existing phenomena:

There's also a fatalism lurking within this argument that doesn't make sense. As Sanders and Schneier note further up in their piece, computers (assisted by humans) have long been able to generate huge amounts of comments for... well, any online system that accepts comments. As they also note, we have adapted to this new reality. These days, even folks who are barely online know what spam is.

Adaptability is the key point here. There is a tediously common cycle of hand-wringing over whatever is the latest deepfake technology advance, and how it has the potential to obliterate our capacity to discern truth from fiction. This just has not happened. We've had photograph manipulation literally since the invention of the medium; we have been living with a cinematic industry capable of rendering whatever our minds can conjure with unassailable fidelity; and yet, we're still here. Anyone right now can trivially fake whatever text messages they want, but for some reason this has not become any sort of scourge. It's by no means perfect, but nevertheless, there is something remarkably praiseworthy about humanity's ability to sustain and develop properly calibrated skepticism about the changing world we inhabit.

What also helps is that, at least at present, the state of astroturf propaganda is pathetic. Schneier cites an example of about 250,000 tweets repeating the same pro-Saudi slogan verbatim after the 2018 murder of the journalist Jamal Khashoggi. Perhaps the most concerted effort in this arena is what is colloquially known as Russiagate. Russia did indeed try to spread deliberate misinformation in the 2016 election, but the effect (if any) was too miniscule to have any meaningful impact on any electoral outcome, MSNBC headlines notwithstanding. The lack of results is despite the fact that Russia's Internet Research Agency, which was responsible for the scheme, had $1.25 million to spend every month and employed hundreds of "specialists."

But let's steelman the concern. Whereas Russia had to rely on flesh and blood humans to generate fake social media accounts, AI can be used to drastically expand the scope of possibilities. Beyond reducing the operating cost to near-zero, entire ecosystems of fake users can be conjured out of thin air, along with detailed biographies, unique distinguishing characteristics, and specialization backgrounds. Entire libraries of fabricated bibliographies can similarly be summoned and seeded throughout the internet. Google's system for detecting fraudulent website traffic was calibrated based on the assumption that a majority of users were human. How would we know what's real and what isn't if the swamp gets too crowded? Humans also rely on heuristics ("many people are saying") to make sense of information overload, so will this new AI paradigm augur an age of epistemic learned helplessness?

Eh, doubtful. Propaganda created with the resources and legal immunity of a government is the only area I might have concerns over. But consistent with the notion of the big lie, the false ideas that spread the farthest appear deliberately made to be as bombastic and outlandish as possible. Something false and banal is not interesting enough to care about, but something false and crazy spreads because it selects for gullibility among the populace (see QAnon). I can't predict the future, but the concerns raised here do not seem materially different from similar previous panics that turned out to be duds. Humans' persistent adaptability in processing information appears to be so consistent that it might as well be an axiom.


And finally, scamming. Hoo boy, are people fucked. There's nothing new about swindlers. The classic Nigerian prince email scam was just a repackaged version of similar scams from the sixteenth century. The awkward broken English used in these emails obscures just how labor-intensive it can be to run a 419 scam enterprise from a Nigerian cybercafe. Scammers can expect maybe a handful of initial responses from sending hundreds of emails. The patently fanciful circumstances described by these fictitious princes follow a similar theme for conspiracies: The goal is to select for gullibility.

But even after a mark is hooked, the scammer has to invest a lot of time and finesse to close the deal, and the immense gulf in wealth between your typical Nigerian scammer and your typical American victim is what made the atrociously low success rates worthwhile. The New Yorker article The Perfect Mark is a highly recommended and deeply frustrating read, outlining in excruciating detail how one psychotherapist in Massachusetts lost more than $600,000 and was sentenced to prison.

This scam would not have been as prevalent had there not existed a country brimming with English-speaking people with internet access and living in poverty. Can you think of anything else with internet access that can speak infinite English? Get ready for Nigerian Prince Bot 4000.

Unlike the cybersecurity issue, where large institutions have the capabilities and the incentive to shore up defenses, it's not obvious how individuals targeted by confidence tricks can be protected. Besides putting them in a rubber room, of course. No matter how tightly you encrypt the login credentials of someone's bank account, you will always need to give them some way to access their own account, and this means that social engineering will always remain the prime vulnerability in a system. Best of luck, everyone.


Anyways, AI sounds scary! Especially when wielded by bad people. On the flipside of things, I am excited about all the neat video games we're going to get as AI tools continue to trivialize asset creation and coding generation. That's pretty cool, at least. 🤖

14
Jump in the discussion.

No email address required.

This scam would not have been as prevalent had there not existed a country brimming with English-speaking people with internet access and living in poverty. Can you think of anything else with internet access that can speak infinite English? Get ready for Nigerian Prince Bot 4000.

The future of scamming is deep fakes. https://decrypt.co/101365/deepfake-video-elon-musk-crypto-scam-goes-viral

The fakes will get better (the one in the video is funnier than convincing)

Maybe, but even if this deepfake was way more convincing from a technological standpoint, it still would get immediately neutered by how Elon Musk responded.

For a particular model, I think is generally quite easy to tell if something was generated using the model. For instance, have GPT evaluated the likelihood of a string of words - if it was generated by GPT, the likelihood will be much higher than if it was written by a human, since likelihood-as-measured-by-GPT-3 is literally what GPT-3 is optimizing for.

I'm unsure to what extent this remains true at higher temperatures, and I'm unsure how much this varies by model. As the number of separate models increases, this might be un-moddable. OTOH, maybe not ¯_(ツ)_/¯

What also helps is that, at least at present, the state of astroturf propaganda is pathetic.

Astroturf that we know of.

But e.g. emulating basic twitter users using gpt-2 is probably really not that hard. Weren't they writing plausibly seeming garbage back on reddit ?

It's much, much easier on twitter.

So there might be astroturf efforts going on we know nothing about.

So there might be astroturf efforts going on we know nothing about.

Very true

Hacking is a very available bad thing, but as a 'near term risk' of AI, it doesn't make sense as a specific worry. Twenty or thirty years ago, 'hacking' was as easy as listening to unencrypted network connections, or finding a simple buffer overflow and getting code execution. ChatGPT can almost do the latter, if only because there are so many examples on the web. But between then and now, organizations were forced to notice that bored teenagers could pwn them (and computers got much faster, making things like HTTPS less expensive), and they slowly started taking security seriously. This started with basic things like buffer overflow protection and https, and continues with complex mitigations (these are random, not representative examples, intended to illustrate the complexity of the area) for things like spectre, kernel memory safety, or entirely new languages that make many kinds of common mistakes harder. As a result, successful exploitation of prominent software has become quite difficult, with random entries in the google project zero blog requiring finding specific kinds of flaws, probing for the ways they can be used in combination with the rest of the codebase, and doing complicated things with them to accomplish something interesting. Even then, often two or three complex exploits need to be chained together to achieve both code execution and sandbox escape. This kind of thing seems as hard as software development generally, which seems as hard as 'being a human' or any form of complex activity - if an AI can do an end-to-end hack, it can probably do, or is close to doing, end-to-end jobs of any kind, human politics, etc. If an AI "can swallow up and understand entire rivers of code in a single gulp, effortlessly highlighting and patching vulnerabilities as it glides through the air" for all exploits, it could do the same with writing software. (This doesn't mean AI can't highlight possible exploits in code for humans to comb through and chain, that seems very plausible - but that'll be more ChatGPT or Copilot-like, and not cause the transformations described, until everything else is transformed too). So I don't think "AIs hacking" is a separate thing from broader risk of AGI, or can be dealt with separately, because hacking is, probably, as hard as everything else.

... also, man it must take a while to write up long, referenced posts like the above. My paragraph took a solid 20 minutes (although with distraction from some of the blogposts), and its writing quality isn't as good.

This is a very useful counterpoint, and I admit that I don't have much immediate knowledge about generic "hacking", which is why I was primarily relying on what other experts thought. My rough, even child-like, understanding of what large language models can potentially accomplish is quickly "read" an entire code database and find exploits a human might not ever have thought of. In my first draft I even included a link to TLA+ which seem to accomplish a primitive version of this (I tried to watch an intro to TLA+ video and...had no idea wtf it was talking about). As a gut intuition feeling, it seems likely that LLMs would have an easier time pointing out the problems on an already existing code than writing the (working) code from scratch. Is that off-base?

... also, man it must take a while to write up long, referenced posts like the above. My paragraph took a solid 20 minutes (although with distraction from some of the blogposts), and its writing quality isn't as good.

thanks, my time-tracking software tells me things I don't want to know 😩