site banner

Culture War Roundup for the week of September 19, 2022

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

33
Jump in the discussion.

No email address required.

I wrote a post about de-biasing efforts in machine learning, which got a bit long, so I decided to turn it into an article instead. It's about how corporate anti-bias solutions are mostly only designed to cover their asses, and does nothing to solve the larger (actually important) issue.

(As an aside: does it still count as a "bare link" if I point to my own content, just hosted elsewhere?)

You are right, they are not really trying that hard. Anybody smart enough to build bleeding-edge AI systems is smart enough to understand why if you try to predict the likelihood of a criminal repeating a crime, it will always say that black people are more likely to repeat (it's because black people are more likely to repeat). The problem is fairly hopeless, because AI's accurately that black people are more likely to commit crimes, women are for the most part uninterested in studying machine learning, and other things that true but verboten.

So their manager asks them to do something about bias, and they apply the laziest possible hack. I think this disinterest is more prominent in top-tier researchers. Low-end researchers who will never accomplish anything useful are happy to feast on the de-biasing funding teat.

There are some other niche cases, like facial recognition software not recognizing blacks. But this requires no special debiasing effort, it is simply a weakness in the system that can be addressed the same as any other weakness.

Obviously, the only possible "de-biasing" technique that can work is explicitly biasing systems against white men. If two criteria are mutually conflicting such that one group or the other must be "discriminated" against according to one criteria or the other, choose the criteria that discriminates against white men (in that order: first discriminate against whites, then discriminate against men). It is very simple.

Anybody smart enough to build bleeding-edge AI systems is smart enough to understand why if you try to predict the likelihood of a criminal repeating a crime, it will always say that black people are more likely to repeat (it's because black people are more likely to repeat).

The question is are you measuring "blackness" or are you measuring "membership in the underclass" which thanks to both accidents of history and explicit policy is comprised disproportionately of black people?

You might also be measuring membership in a group whose ancestors were less shaped by executions.

https://journals.sagepub.com/doi/pdf/10.1177/147470491501300114

It can also be that membership in the underclass is the effect not the cause.

No, it’s not obviously the only workable technique.

Half the point of this parent article was about the confusion, willful or otherwise, between the objective and the normative. Is the model supposed to be drawing like a human, or like an idealized paragon of virtue? If you ask it to draw a medieval knight (per someone else downthread) do you want it to match history or popular culture?

Predictive ML is a clear example of the objective case. Given the lack of a market for a robot which lies to you, such robots aren’t popular. Generative ML is a different story.

Train on data from the world as it is, and you’ll get relative objectivity, plus journalists will call you racist. Papering over that with prompt engineering is a hack to get more normativity for an AI which doesn’t “need” to be objective. But this suggests another strategy: don’t train it on the world as it appears to be. Train it on the world that could be.

Is creating such training data trivial? No. Does it require discriminating against anyone? Also no. Seems like a decent idea to me.

But this suggests another strategy: don’t train it on the world as it appears to be. Train it on the world that could be.

Is creating such training data trivial? No. Does it require discriminating against anyone? Also no. Seems like a decent idea to me.

Presuming that this theoretical "world that could be" is at all different from the "world as it appears to be", it absolutely requires discriminating against someone. There's just no way to bridge that difference without applying some discrimination against someone at some point in the process; otherwise we'd just end up back where we started.

Not as long as we’re talking about generative instead of predictive. For example, an AI which does everything like DALL-E except it doesn’t need prompt engineering to make diverse pictures.

Imagine a specialized net which only draws armor. We can train it on the entire internet, in which case it will draw us Renaissance artisanal plate, which has been inserted into countless historically-inaccurate pieces of media. If we instead only use a historical database of curated and tagged armors, it will end up with a better understanding of actual medieval armor. It will also be much less flexible, but it will be defended against criticisms of renaissancism.

So by your lights, what would it take for a generative AI model to be discriminatory against someone? An "AI that does everything like DALL-E except it doesn't need prompt engineering to make 'diverse' pictures" would necessarily involve discriminating against someone when creating the training set; if no such discrimination were required, then the 2 models would be identical. You seem to be saying that that doesn't count. So what would count?

I think there are a couple uses of the word "discriminatory".

First, yeah, defining a training subset is a decision to favor one group over another. Making the historically-accurate-armor AI is going to disappoint the fantasy-art users. An AI that makes everyone [insert feature here] is going to disappoint the people wanting it to be realistic. And of course there's nothing stopping the dataset curator from excluding all members of a race, or all Christians, or all women with realistic proportions; doing so would be obviously discriminatory.

On the other hand, a sentencing or profiling AI is categorically different. When a predictive machine discriminates, it is causing a disparate impact on actual people rather than on the collective. I'm struggling to find the right words...it is the difference between perpetuating an injustice vs. causing it, a sort of active vs. passive harm.

(I think there's also an argument to be made that the ill-trained AI is more obvious that an AI trained well on an evil world, and thus more compatible with exit-rights liberalism, but I'm not so confident in that...)

The OP argued that predictive AI was a zero-sum game where only reverse racism could compensate for the inbuilt discrimination. I think...there's some degree to which that is correct? It's extending that conclusion to generative AI that strikes me as wrong. I believe generative discrimination is an easier problem than predictive because it's a lesser problem. Maybe you can't make a "world that could be" generative AI without excluding anyone, but I'm reasonably sure you can make one without actively harming any individuals.

Anyway, thanks for pressing me on this. It's an interesting philosophical topic.

OK, so it sounds like based on this post that when you wrote:

Does it require discriminating against anyone?

about a theoretical generative AI model based around a world that could be (instead of the world as is), you were making a general statement about how no generative AI model could possibly discriminate against someone. Which seems like a strange thing to say about 1 specific type of generative AI model (i.e. based on a world that could be) when discussing 2 different theoretical types of models that are both generative (i.e. the other one being based on the world as it is), but technically true, I suppose.

More or less, though I wouldn't frame it as an absolute, if only because that's inviting extreme counterexamples.

The original article was writing about unimpressive debiasing in corporate generative models. When sulla responded with the assertion that this is unavoidable in a bleeding-edge AI, I thought it appropriate to point out that his examples were all predictive rather than generative. I think that really undermines his equation of discrimination with the "true but verboten." A generative AI could be wildly discriminatory, in the loose sense, without ever discriminating in the narrow, personal sense, and it is the latter which is a one-way ticket to a media circus.

If my model generates pictures of people from an even ethnic spectrum, I do not believe I am discriminating against anyone. It's not the Harvard auditions, and this isn't actual people I'm failing to generate or generating excessively.

If I'm a designer using a generative AI to fill a world of my making, I do not care for "accurate" demographic representation being baked in the model. Either I know exactly what ratios I want and include them in the prompt (and if I'm the type to want "accuracy", I already know who I want to be 13% of the generated population yet 50% of the generated criminals), or otherwise it should give me a selection of 50% male, 50% female, [for race in races generate total*race/races] American presidents.

I see the division between "factual predictors" and "diverse generators" AI models to be quite acceptable and even desirable. Make it explicit and call out those who present their model as one when it in fact has characteristics of the other.

If my model generates pictures of people from an even ethnic spectrum, I do not believe I am discriminating against anyone. It's not the Harvard auditions, and this isn't actual people I'm failing to generate or generating excessively.

By this logic, then no model is discriminating against anyone. If a model only ever returns white women for "good person" and black men for "bad person," then that's not discriminating against anyone since it's not the Harvard auditions, and this isn't actual people you're failing to generate or generating excessively. Great, looks like we can associate any race with any quality we want in our model and not worry about discriminating against anyone!

If I'm a designer using a generative AI to fill a world of my making, I do not care for "accurate" demographic representation being baked in the model.

Indeed, and notably no one seems to have any problem with such models existing. There's room for multiple types of generative AI models in this world, including ones that have uniform distributions, desired distributions, best-attempt-at-accurate distributions, or really any other arbitrary distributions of demographics.

If a model only ever returns white women for "good person" and black men for "bad person," then that's not discriminating against anyone since it's not the Harvard auditions, and this isn't actual people you're failing to generate or generating excessively.

Do you really think your example is as egregious as generating perfectly uniform selections for both "good person" and "bad person" (or "criminal" and "Harvard student", for that matter)?

Do you really think your example is as egregious as generating perfectly uniform selections for both "good person" and "bad person" (or "criminal" and "Harvard student", for that matter)?

What does being egregious have to do with this in any way? As you wrote, either way, "it's not the Harvard auditions, and this isn't actual people I'm failing to generate or generating excessively." Given that the reason that perfectly uniform selections isn't discrimination has literally nothing to do with the type of distribution and everything to do with the fact that these are generated images rather than actual people, we can change the distribution to anything we want (including my example of encoding "good person" with "white woman" and "bad person" with "black man") and still land at the same result of "no discrimination is taking place."

What does being egregious have to do with this in any way?

Pretty clear to me that the analogy you deployed was deliberately absurd.

But anyway, provided that you clearly label your Pro-White-Women, Anti-Black-Men model as such, go ahead.

More comments

This is why I distinguished between predictive and generative tools.

A robot that makes inaccurate predictions is next to useless. One that makes unrealistic art is not. Engineers trying to avoid accusations about their generative model don’t need to add discrimination, because unlike in real life, changing the training data is an option.

I would expect the “dream data” version to have all the same mainstream use cases as DALL-E or similar, except also avoiding Problematic picture generation.

Which begs the question of why we should care what journalists say, but lots of people do anyways

I don't think the companies specifically care about the journalists' opinions as a terminal goal. Rather, the (realistic) fear is that average people (potential paying customers or just people who might popularize the brand) read the articles written by the journalists and adopt those opinions and stances.

Anybody smart enough to build bleeding-edge AI systems is smart enough to understand why if you try to predict the likelihood of a criminal repeating a crime, it will always say that black people are more likely to repeat (it's because black people are more likely to repeat).

An alternative explanation is that doublethink required to simultaneously believe in the party line and in the reality required to do your job doesn't actually work very well and tends to devolve into believing in the party line only. Imagine that you're a bright young guy working on a Google's image classifier. To generate the thought that the classifier might confuse black people for apes so you must specifically check that it doesn't, you must believe that black people tend to have certain ape-like facial features. That's a very dangerous thing to believe, your woke peers would be very unamused if you just blurt it out or inexpertly wink-wink nudge-nudge your way to suggesting that you need to check for that etc. If you have a lot of wrongfact beliefs you have to watch your every word to avoid committing a social suicide. Accidentally releasing a classifier that does in fact mistake black people for apes on the other hand is relatively safe: it's not your personal fault and who could have thought and it's probably bias in the training data anyway. So in a highly ideologized environment people just naturally fail at their jobs instead of trying to maintain a bag of forbidden beliefs.

This example doesn't work - black people are dark, apes have dark fur, image classifiers often pick up on easy-to-detect features like color.

More generally I'd question how important the party line/reality conflict is - many genuine smart people believe in wokeness and will probably continue to indefinitely. E.g. OpenAI is clearly woke yet manages to put out a great product.

So their manager asks them to do something about bias, and they apply the laziest possible hack.

I actually have a different impression: most of these professional ML researchers and engineers genuinely wish they could serve up a model that provides politically correct responses, because politically correct responses are also commercially correct, and everyone wants to make money. Probably the main reason a bunch of giant and amazing Google models aren't made available to the public via API is because of the risk that they might say or display something politically incorrect, and certainly some fraction of the user base (especially tech journalists lusting after those sweet engagement metrics) will try to bait it into doing so.

So there's ample incentive to solve this problem "the right way," and the fact that so far all we see are cheap hacks and opacity is because no one knows how to solve it the right way, or even if it is solvable the right way at all, even in principle, with the technology we have today.

Part of the problem is exactly what makes these models so exciting to begin with. They can notice things, they can extrapolate from training data, they can make analogies and they can roll with out-of-sample prompts, and they develop all of these amazing abilities ex nihilo, from a largely uncontrollable black box made of inscrutable matrices gently nudged in the direction of data.

The other part of the problem is that political correctness isn't a well defined or static problem. It is a messy social problem, involving subtle adversarial factional games, sort of like fashion.

And these two halves of the problem compound with one another. It isn't enough to generate a black person one time in X -- you have to define X, you have to solve this equation for all possible identities, and you have to then translate this equation into every conceivable fact pattern that the user will (adversarially) use to challenge the model. If you want to generate a picture or story involving a policeman arresting a criminal, it is fraught whether you make the policeman white or black, whether you put him in a wheelchair or not. Should the model generate trans women? If they're visibly identifiable as trans women, are you making a minstrel caricature to further the stereotype that trans women look like men in dresses? If they aren't visibly identifiable, how is one to know they are trans at all, and that you haven't committed the deadly sin of erasure? Should black women look like white people but with a darker skin tone (and draw criticism for e.g. straightening her hair, itself a political minefield), or should you make them look recognizably phenotypically black in terms of facial features and hair (and draw criticism for reinforcing a stereotype)? If both murderers and NBA stars are disproportionately likely to be black, does the model need to recognize that murderers are bad and NBA stars are good and apply its distortion of the underlying distribution only to the bad category, i.e. return mostly white guys for criminals but mostly black guys for NBA stars? How is it to know? And when ideological opponents start to stress-test these categories and ask for a thuggish NBA player or a corrupt President, should it reverse the categories? What about middle grounds, like an "aggressive" NBA player, or a "desperate, nonviolent" criminal? We even have minor culture wars about the perceived race of robots.

I'll point out that the problem might not be so unsolvable as you describe; prompt engineering being what it is, a very thinkable (but dystopian) way some more-capable future version of DALL-E might resolve this is by adding to the prompt "and also, make sure to never portray X ethnicity negatively."

Yes, I think this sort of "prosaic alignment" solution is likely to solve all of our consternations about AI capabilities at least as well as a human intelligence could... in the long term. Eventually, you wouldn't even have to talk about portraying X ethnicity negatively, you could just say "and make it politically correct" and the AI would understand those rules better than any individual. For the time being, though, Dall-E has a hard enough time drawing a complex but coherent picture, much less enforcing its conformity with protean standards of political correctness.

Worth pointing out that OpenAI tried this sort of "prosaic alignment" approach to its so-called diversity filter. It appears to append stuff like "black male" or "hispanic female" to some proportion of prompts that it believes call for the depiction of a person. It has been vigorously panned by the community, because it has unintended negative effects on many prompts. Gwern did some sleuthing on why a complicated prompt for a picture of a cowboy at a certain angle in certain lighting etc. returned a bizarre misfire, and eventually discovered that the same prompt with "cowgirl" instead of cowboy worked flawlessly -- seemingly implicating the diversity filter in the original prompt's total failure.

Hilariously, OpenAI's approach here was discovered by asking for stuff like "A person holding a sign that says" -- and then you'd often get a picture of a sign that says "FEMALE" or "BLACK". So there's a degree to which adversarial prompt construction can overcome attempts at coercive prosaic alignment, at least using current techniques.

It also doesn't know our delicate rules about when it's socially appropriate to re-gender or trans-racialize the subject of a picture. It's weird if prompts for Princess Zelda return a black or Asian Zelda, or if prompts for George Washington return a colonial-era woman in a white wig. Maybe we'll accept that sort of thing by the time Season 10 of Bridgerton comes out, but I don't think we're there now, and it would take a pretty advanced AI to figure that stuff out.

The social-rules-about-reracialization thing is definitely a reasonable one; that's a significant issue that would result in many funny PR disasters.

On reflection vulnerability to adversarial prompt injection seems almost innate to the technology, considering both the above "person holding a sign that says " attack and also the more recent one with remote.ly.

That runs into the problem of "what counts as 'negative?'" Traits don't come with in-built value judgments; it's up to us to decide which side of some dichotomy, for example, is better. Sometimes it's blindingly obvious to all sane men which is better, but often it isn't. Many things can be cast in different lights to praise or condemn somebody depending on whether one already likes them or hate them, and so if your goal is to avoid any associations that anyone could consider negative - especially if there are motivated defenders who'd love to claim you as a prize - I think there is little, if anything, that's safe.

The details of what counts as "negative" would be determined based on the language model's own ideas of what constitutes "negative" based on its time spent with the training data. This is likely, for the most part, to align with conventional understandings of what is "negative".

Which could satisfy the proverbial Reasonable Person, perhaps, but not the proverbial Cardinal Richelieu ("If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.") It would be nice if appealing to common sense would be all it took to deflect such attacks, but we've ceded that possibility earlier in this hypothetical from the very premise of this reputation-managing AI.

How about things like VLMs inadvertently putting out black Donald Trumps? Or more broadly, if I use a model to generate “Republican Senator”, what’s the ideal number of black or other ethnic minority faces to produce? Are we going to keep up with the liberal facade that a Senator is a Senator, regardless of political alignment, and thus we should see a diverse representation of races? Or will we instead accept that “Republican Senators are privileged white guys” and turn out a distribution of faces that supports the progressive narrative? These are points of tension within the modern left, so the only winning move is not to play. And before you suggest “just show the accurate racial distribution for a given prompt”, consider that the liberals at least still have to pretend to care about consistency, so committing to “actual truth above normative truth” as a principle is an invitation to embarrassment when the same principle is applied to other domains, eg, CEOs or nurses.

These are points of tension within the modern left, so the only winning move is not to play.

Traditionally the only winning move is to side with the most radical and potentially vengeful faction based on a more realistic version of the Basilisk/Pascal's wager theory.

I think that's one of the reasons people and institutions radicalize so quickly; there are punishments for anyone who doesn't stay ahead of the curve, but none for those who get ahead of it.

In this case there are clear incentives to start "sanitizing" Problematic output even in silly and arbitrary ways, because the value is showing that you're an accomplice. The details don't matter, as long as you throw out some shibboleths like "female-presenting nipples".

I think that's one of the reasons people and institutions radicalize so quickly; there are punishments for anyone who doesn't stay ahead of the curve, but none for those who get ahead of it.

Hence accelerationism, I think: there exist memeplexes that are fortified to the point of invulnerability from one direction but have no answer to attacks from behind. So the idea, it seems to me, goes that the way to defeat them is to attack from the unguarded direction and lure them into an untenable position ( "I dare you to step over this line!") Which I think is a pretty risky strategy, really, as the phase transition may not be as near as one would hope.

Oh, shit, I didn't know about the Black Donald Trump thing. That's hilarious.

Yeah, okay, it's a fair cop; even such a policy as I describe would result in amazing PR debacles.

Clearly the AI is a fan of the Prince song "Donald Trump (Black Version)"

My article really only covers generative models, like the recent Stable Diffusion. Controversial models like classifiers that try to evaluate how likely somebody is to commit a crime has entirely different considerations. Maybe I should have made that more clear.

Also I disagree that a "de-biased" crime model would discriminate against white men! Men commit a highly disproportionate amount of crime compared to women; any sort of adjustment you make has to adjust for that, adding a whole bunch of likelihood on women especially, probably more than the racial difference even.

I dunno, the parent comment by sulla strikes me as basically calling out a similar (though more inflammatory) situation. We have two possible meanings of the term "bias" in common use, and these two meanings are:

  1. Not faithfully representing statistical realities present in the data.

  2. Not faithfully representing the statistical outcomes that we would like to see in the data-- most commonly, which reflect reality except for not showing differences based on race or gender.

These are, of course, mutually exclusive definitions; e.g. as pointed out in your article the president of the United States should always be drawn as male using definition (1) and should half the time be female using definition (2) . Likewise, classifiers determining how likely someone is to commit a crime ALSO have to make a decision between definitions (1) and (2) while facing the complicated issue of how to avoid public controversy over admitting that these are different things.

You suggest a third, equally plausible definition (3): "Not faithfully representing the statistical outcomes present IN REAL LIFE (as opposed to just the data being trained on)."

That actually strikes me as brutally difficult, running into the same basic issues as fact checkers do now-- evaluating what is true or false in real life is really hard and intersects with political agendas in such a way as to make it even harder. And how do you even evaluate if you succeeded? Don't get me wrong, i think it's reasonably likely that some generative models will get fine-tuned on specific datasets curators will have labeled as similar to "real life" along various dimensions. But I would not anticipate that this will end up becoming the norm.

As an aside, I think it makes a lot of sense that fundamentally the problem being solved by companies is "how do we stop journalists from agitating about our platform", not anything more interesting or important, and the "debiasing" solutions put in place reflect this reality.

Also I disagree that a "de-biased" crime model would discriminate against white men! Men commit a highly disproportionate amount of crime compared to women; any sort of adjustment you make has to adjust for that, adding a whole bunch of likelihood on women especially, probably more than the racial difference even.

You are missing the point. In de-biasing, blacks will receive an adjustment that favors them, whites will not. Women may receive some adjustment that favors them, men will not. If some model rates men negatively, this is because of the deficiencies of men. There is no need to debias the model: men are simply worse, as the model captures. If the same model rates blacks negatively, this is a flaw of the model and it must be de-biased.

This double standard is very obviously the consequence of radical anti-racist ideology. Bias is privilege + power. You can't be biased against whites or men. It is by definition impossible.

You are missing the point.

I don't think I am. I agree that a naïvely de-biased crime model will favour blacks over whites compared to a model that just went for simple accuracy and nothing else, but men will also necessarily similarly have to be favoured. If not, people are immediately going to notice the model convicting men and freeing women even when the facts are identical. There is absolutely no way people are going to accept that; radical anti-racist ideology isn't that powerful. Adding even more weight in favour of women would just be silly.

(What is slightly more realistic is if the model somehow gets access to a variable that correlates with gender but also crime itself, like your level of testosterone. With that, apologists may explain that the model convicted a man for e.g. murder based on his hormone levels which made it likely that he'd been aggressive; when in reality the model considered that to be rather unimportant compared to it being able to figure out that it's analysing a male.)

I don't think I am. I agree that a naïvely de-biased crime model will favour blacks over whites compared to a model that just went for simple accuracy and nothing else, but men will also necessarily similarly have to be favoured. If not, people are immediately going to notice the model convicting men and freeing women even when the facts are identical. There is absolutely no way people are going to accept that; radical anti-racist ideology isn't that powerful.

People have already noticed this IRL and people already accept it just fine, no radical anti-racist ideology needed. It's just the reality of the situation, sans any sort of ideology, that this sort of bias is fully and openly accepted.

But to the actual point of the thread, I think you are missing the point. I don't think sulla is describing a crime model that's de-biased "naively," but rather one that's de-biased in the most likely way that it is to be de-biased, which is by explicitly putting the thumb on the scale against disfavored groups such as whites and men. A universe in which real de-biasing efforts implemented by real institutions tend to follow some "naive" implementation rather than a politically convenient one seems like a neat universe to live in, I imagine.

People have already noticed this IRL and people already accept it just fine, no radical anti-racist ideology needed. It's just the reality of the situation, sans any sort of ideology, that this sort of bias is fully and openly accepted.

Yes, but I could probably have been more clear: I am not claiming that society will demand AI models that necessarily treat men more fairly than we do today! A model with no anti-bias applied will consider men by by default to be extremely likely offenders, especially for violent crime. It is likely that any model can get a good training score by just looking at the gender and ethnicity, and if it's e.g. an Asian woman just let her off the hook immediately.

This effect will be sufficiently extreme to get noticed, and counteracted, by adding bias in favour of men or against those women – likely not enough to make the model as a whole to favour men more than women, but it will still be adjusted away from reality in a way that favours men! An AI that randomly decides to imprison men 50% of the time and women 10% of the time can still be biased against women if women commit 0.1% of the actual crime.

In sulla's initial reply he stated that the model will be biased in favour of blacks, and biased in favour of women, which are both true but only true if you use two different definitions: "manually adjusted to favour a group" or "returning different results for different groups, all else being equal". I assume people think my reply denied that women will be a favoured group under the second definition; I do not.

I don't think sulla is describing a crime model that's de-biased "naively," but rather one that's de-biased in the most likely way that it is to be de-biased, which is by explicitly putting the thumb on the scale

That's precisely what I meant with "naïvely", as opposed to other complicated schemes (such as the case with generative AI where you could potentially do tricks like adding "no discrimination" to the prompt or the like). Apologies if that was unclear.

A model with no anti-bias applied will consider men by by default to be extremely likely offenders, especially for violent crime. It is likely that any model can get a good training score by just looking at the gender and ethnicity, and if it's e.g. an Asian woman just let her off the hook immediately.

This effect will be sufficiently extreme to get noticed, and counteracted, by adding bias in favour of men or against those women

And this is where I think you're missing the point. Perhaps the effect will be sufficiently extreme to get noticed; extreme discrepancies refuse to get noticed all the time depending on political expediency, but this could be one of those that does get noticed. It doesn't follow that there will be any desire to counteract this by people who tend to push for de-biasing such algorithms for the purpose of demographic-based justice.

Also, I don't know how to square your above explanation with:

I am not claiming that society will demand AI models that necessarily treat men more fairly than we do today!

Today, people notice that human-based sentencing systems are "biased" against men in the sense that men and women with equivalent records and crimes get sentenced very differently, with men getting more harsh sentences. People evidently have no issue with this apparent "bias" regardless of whatever lip-service they might pay.

You seem to be claiming that people will notice that AI-driven sentencing systems are "biased" against men in the sense that men and women with equivalent records and crimes get sentenced very differently, with men getting more harsh sentences, and that people noticing this will want to counteract that "bias" by putting their thumb on the scale in favor of men in these AI models. This seems to me to be similar to society demanding that AI models treat men more "fairly" than we do today.

There is absolutely no way people are going to accept that; radical anti-racist ideology isn't that powerful

Yes they will, and yes it is? We are already passed the point of naked favoritism (look at SAT scores required to get to top universities, segmented by group). There are some complaints, but most (of those who count) are happy to accept it.