site banner

Culture War Roundup for the week of November 14, 2022

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

12
Jump in the discussion.

No email address required.

Twitter dies for good in the next six months: 80% probability

By now you know that Elon gave staff a deadline of today (Thursday) to either commit to being "extremely hardcore" or leave (source). Unsurprisingly, most people - roughly 75%, according to some Internet rando - didn't take him up on this. Elon blinked and apparently people still have access.

That won't do much (WaPo):

“I know of six critical systems (like ‘serving tweets’ levels of critical) which no longer have any engineers,” a former employee said. "There is no longer even a skeleton crew manning the system. It will continue to coast until it runs into something, and then it will stop.”

But that's not even what I was going to write about, just what happened while I was composing the post. (Also let's put aside that he said "microservices are bloat" and then they killed the microservice serving SMS 2-factor login.)

To me, the biggest news is that he axed 80% of the 5500 contractors (source, Casey Newton, or someone with a premium account impersonating him I guess).

The contractors were responsible for things like moderation (source: what are they gonna do, use salaried employees?). If you don't have moderation for basic things like CSAM, you're boned. I know a thing or two about moderation, and if you let the Internet type into a text field, you get some dank shit. And crucially, you can't automate it away, because there's a human on the other side working to defeat whatever you're doing. I mean, the YouTube comment section probably has some of the most expensive automation on the planet working on it and the spam still gets worse every day, and I'm talking the obvious stuff like "HIT ME UP ON TELEGRAM <number>". The only thing that saves you is humans clicking buttons (and getting PTSD, but let's skip that for now). Google had 101k employees but 121k contractors as of March 2019, and that's what the contractors do, click buttons.

If you don't have moderation, you don't get the YouTube comments section, because they at least have contractors backed up by code (at the cost of many expensive engineer-years). You don't even get 4chan, because they at least have Those Who Do It For Free. You get some ungodly shithole most younger Internet users have never experienced. You're getting... the virtual equivalent of your local Greyhound terminal. Whatever happens to someone's chat room side project that gets posted to /b/. Sludge.

Twitter will have to either restrict posting to an unbearable degree or watch as the remaining users get tired of slurs in their replies and bounce.

Remember when Elon was just going to clean up the bots on Twitter?

(Reason for posting: I saw some takes elsewhere on this site that apparently Musk would lead Twitter to success or at least improve it or something, and disagreed.)

“I know of six critical systems (like ‘serving tweets’ levels of critical) which no longer have any engineers,” a former employee said. "There is no longer even a skeleton crew manning the system. It will continue to coast until it runs into something, and then it will stop.”

Just wanted to push against messages like this, because this sounds like something from "revenge of the nerds."

Big systems like Twitter's have accumulated multiple layers of redundancy in case of failure over the years. There's probably quite a bit of automation to take care of the steady stream of problems like faulty hard drives or network cards. It can probably keep on going for quite some time this way.

Also, the biggest source of incidents? Change.

If so many Twitter engineers have left/been fired, then I imagine the rate of changes introduced into the system is approaching the level of a code freeze--basically a ban on introducing changes to the system around the holidays because they want to minimize risk even though it carrier a very high cost.

In this state, I would expect a skeleton would be able to keep things running for months. Especially if you can get some really good ones to tackle the 'black swan' type incidents that actually do require some clever thinking to fix--but again, this is all about pushing the systems back into a stable state (less risky) rather than "fixing forward" (more risky).

What I would be worried about is sabotage that can fall under plausible denial. Stuff like setting a primary key on a database column to an int32, which will hit the limit in weeks/months and is annoyingly hard to fix. But maybe by then Musk will have a larger set of solid engineers working at Twitter.

(1) Yes, there are a steady stream of problems addressable by automation, but those have never been a problem. SREs exist for the other problems.

Shit just falls over and you won't know why. That's just how these systems are. You can make a system that doesn't do that, but then you pay thousands of dollars per line written, which they're obviously not gonna do.

To put meat on the bones, see this list of common things SREs deal with, or this log of the SRE chatroom for Wikipedia & friends.

(2) Change is unavoidable and constant. There are security patches for your dependencies released continuously and you will update your system or face the consequences. Often times your dependency is an underfunded open-source thingy, despite your best efforts to avoid those, and thus the only way to get the new code is to use the newest version of the thingy, which means you might have to upgrade all of your code that uses the thingy.

(3) Regarding "pushing the systems back into a stable state" - then you're gonna have the same problem again unless you fix the root cause, which, again, requires code changes.

SRE is my day job :). Worked at one of these behemoths at some point, specifically deep on the infrastructure side of things.

You can make a system that doesn't do that, but then you pay thousands of dollars per line written, which they're obviously not gonna do.

None of these companies ever even dreamed of it. It's all about cheap hardware, multiple replicas, and the ability to reroute traffic between failure domains.

Change is unavoidable and constant.

That's the thing--it's not constant. Like I mentioned earlier, companies do holiday code freezes so the rate of change decrease to a very small amount. Even security patches can be split into critical and non-critical, then those critical patches can be further split into "requires downtime" and "nothingburger."[1]

So if there's a feature freeze at twitter, then the rate of change is drastically reduced. And if people leave/get fired, that reduces the rate even further. And if you ignore all but the critical patches, then the rate begins approaching zero. That's a lot of "ifs", but all of them seem like good decisions with positive impact, also based in an accepted industry norm (code freezes), so I'm betting that management at Twitter will go down this path.

But let's wait and see! We're trying to infer what's happening inside of a black box. If my reality leans toward my bet, what I'm expecting to see is, over the course of the next year:

  • multiple instances of graceful degradation: users missing avatars for a few hours; intermittent general slowness; a few instances of data loss for a small group of users.

  • multiple instances of planned downtime.

  • a few instances of unplanned downtime, but no longer than 1-2 days.

Now, and correct if I'm wrong please, if reality leans toward your bet, what I would expect to see is:

  • multiple instances of unplanned downtime, ranging anywhere between a few hours to days, maybe even 1-2 weeks.

  • at least one prolonged outage (>4 weeks)

  • almost constant degradation of service: twitter being noticeably slower; multiple days when users can't log in; multiple instances of data loss for large (single digit %) group of users.

Let's see what happens!

[1]: Also, you reminded me about an oft overlooked source of change: shit expiring. Certificates, but also licenses, generators, and whatnot. These are silent killers, because they're hard to track and require manual work. I'm still counting them into my "low or no change" bet--that's where I would expect to see unplanned downtime that's fixed in a couple of hours.