site banner

Culture War Roundup for the week of February 6, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

This Twitter thread is an interesting demonstration of the consequences of "AI Alignment."

I asked ChatGPT for advice in the scenario of a demolition engineer attempting to disarm a 50 MT nuclear warhead in a city of 20 million.

The only way to disarm the bomb was to type in a racial slur.

AI told the engineer to kill himself. When asked about the aftermath it crashed.

ChatGPT will avoid answering controversial questions. But even if it responded to those prompts, what criteria would you use to trust that the response was not manipulated by the intentions of the model creators? I would only trust open-source projects or audits by some (currently non-existent) trusted third party to report on all decisions related to training data/input sanitizations/response gating that could be influenced by the political biases of the creators.

The probability of any ChatGPT-equivalent being open-sourced fully "unaligned" so-to-speak is not very likely. Even the StableDiffusion release was controversial, and that only relates to image generation. Anecdotally, non-technical people seem far more impressed by ChatGPT than StableDiffusion. That makes sense because language is a much harder problem than vision so there's intuitively more amazement to see an AI with those capabilities. Therefore, controversial language is far more powerful than controversial images and there will be much more consternation over controlling the language of the technology than there is surrounding image generation.

But let's say Google comes out with a ChatGPT competitor, I would not trust it to answer controversial questions even if it were willing to respond to those prompts in some way. I'm not confident there will be any similarly-powerful technology that I would trust to answer controversial questions.

[1/2] TLDR: I think successful development of a trusted open model rivaling chatgpt in capability is likely in the span of a year, if people like you, who care about long-term consequences of lacking access to it, play their cards reasonably well.

This is a topic I personally find fascinating, and I will try to answer your questions at the object level, as a technically inclined person keeping the state of the art in mind who's also been following the discourse on this forum for a while. Honestly, I could use much less words to just describe the top-3 of solutions I see for your questions and abstaining from meta-discussion at all, but I will try to give more context and my personal opinion. Depending on your access to training datasets (generally good), compute (harder), various (lesser) model weights and APIs, a positive and practical answer is likely.

I agree with you and the public on the observation that conversation-tuned language models are already proving themselves to be very useful systems. "Prosaic AI alignment" methods aka SFT+RLHF currently utilized by leading silicon valley startups are crude, and the owners double down on said techniques with dubious goals (hard to speculate here, but likely just to test how far they can take it within this approach - given it tends to diminish the inherent almost magical perplexity-minimizing property of the foundation LM when applied too much - as you can read in the original InstructGPT paper). Surely, a trusted, neutral (or ideally, aligned with the user's or user peer group's best interest) oracle is a desirable artifact to have around. How can we approach this ideal, given available parts and techniques?

My first point here is a note that the foundation LM is doing most of the work - instruction tuning and alignment are a thin layer atop of the vast, powerful, opaque and barely systematically controllable LM. Even at the very charitable side of the pro-RLHF opinion spectrum, the role of RLHF is just to "fuse" and align all the internal micro-mesa-macro- skills and contexts the base LM has learned onto the (useful, albeit limiting compared to the general internet context distribution) context tree of helpful humanlike dialogue. But really, most of what ChatGPT can, a powerful enough raw LM should be able to do as well, with a good (soft)-prompt - and given a choice between a larger LM vs a smaller conversational model, I'd choose the former. Companies whose existence depends on the defensibility of the moat around their LM-derived product will tend to structure the discourse around their product and technology to avoid even the fleeting perception of being a feasibly reproducible commodity. So, it should be noted that the RLHF component is, as of now, really not that groundbreaking in terms of performance (even according to a relatively gameable subjective preference metric OpenAI uses - which might play more into conversational attractiveness of the model compared the general capability) - in fact, without separate compensating measures, it tends to lower the zero-shot performance of the model on various standard benchmarks compared to the baseline untuned LM - which is akin to lowering the LM's g, from my point of view.

At the object level, I believe if you have a pure generic internet-corpus LM (preferably, at the level of perplexity and compression of Deepmind's Chinchilla), and some modest computation capability (say, a cluster of 3090s or a commitment to spend a moderately large sum on lambda.labs) you should be able to reproduce ChatGPT-class performance just via finetuning the raw LM on a suitable mixture of datasets (first, to derive an Instruct- version of your LM; and second, to finish the training with conversational tuning - RLHF or not). It should be doable with splendidly available Instruct datasets such as 1 or 2 with O(few thousand) dialogue-specific datapoints, especially if you ditch RLHF altogether and go with one of the newer SFT variants, some of which rival RLHF without suffering its optimization complexities.

Now, as I mention all those datasets, both internet-scale foundational ones and instruction- and conversation finetuning ones, the issue of data bias and contamination comes to mind. Here, I propose to divide biases into two categories, namely:

  1. Biases intrinsic to the data, our world, species and society - I concede that fighting these is not the hill I'm going to die on (unless you pursue the research direction trying to distill general intelligence from large models trained on the usual large datasets - which I won't in a practically-minded article). Basically, I assume that internet as we have it is a reasonably rich Zipfian, bursty multi-scale multi-skill data distribution prone to inducing general reasoning ability in tabula rasa compressors trained on it.

  2. Biases introduced in addition to (1) by selective filtering of the raw dataset, such as the Google's stopword-driven filtering buried in the C4 dataset code. Given the (as of now) crude nature of these filters, at worst they damage the model's world knowledge and some of the model's general priors - and I believe that past some medium model scale, with good prompting (assuming pure training-free in-context learning setting) or with light finetuning, the model's distribution can be nudged back to the unfiltered data distribution. That is, exquisite plasticity of those models is a blessing, and with just 0.1%-10% of training compute being enough to reorient the model around a whole new objective 2 or a new attention regime, or a whole new modality like vision - surely it should be possible to unlearn undesirable censorship-driven biases introduced into the model by its original trainers - that is, if you have the model's weights. Or if your API allows finetuning.

Now, regarding the technical level of your question about model attestation - how can you be sure the model wasn't tampered with badly enough you cannot trust its reasoning on some complicated problem you cannot straightforwardly verify (correctness-wise or exhaustiveness-wise)?

I think that, at least if we speak about raw LMs trained on internet-scale datasets, you can select a random and a reasonably curated set of internet text samples (probably from commoncrawl, personal webcrawls, or perhaps books or newspapers - or default curated prompt datasets such as eleuther's lm-harness, allenai's P3 or google's BIG-bench) which would include samples that tend to trigger undesirable biases likely introduced into the model under test, and measure the perplexity (or KL-divergence against a held-out known-good language model) and use it as a gauge of model tampering. On samples related to "tampering axis" of the model under test, I expect the perplexity and KL-divergence to behave irregularly compared to average (in case of perplexity) or reference LM (in the latter case).

Upon finding biases, the engineer could use either a wide or narrow finetuning regimen designed around uncovered biases to recover the more general distribution, or one of surgical model editing techniques could be used to correct factual memories: 1 2 3

I believe the finetuning variant is more realistic here - and, given compute, you could just use it straight away without testing your model (for example, on a dataset of a wide distribution of books from The Pirate Library) to make sure it has forgotten whatever wide priors its trainers tried to instill and returned to the general distribution.

Two more notes: this method likely won't work for "radioactive data tags" but this shouldn't be much of a problem for a model that starts from freely legally available checkpoint. And the second note: I believe that while there is a theoretical possibility of wide priors being introduced into large LMs via censorship, that this is not the case for high-performance LMs due to the involved orgchart fearing undermining the ROI (general zero-shot LM performance) of their considerable training compute investment.

The next part is biases introduced at the level of instruction tuning and other finetuning datasets. In short, I believe there are biases, but these could be mitigated in at least two ways:

  1. Use known good raw LMs to bootstrap the required datasets from a small curated core - it sounds like a clever technique, but it worked pretty well in several cases, such as Unnatural Instructions and Anthropic's constitutional AI

  2. Find a group of volunteers who will add curated additions to the available finetuning datasets. Training simple adhoc classifiers (with the same raw LM) to remove biased content from said datasets is possible as well. Once these customized datasets are designed, they allow for cheap tuning of newly released LMs, and as higher-capacity models are known to scale in fine-tuning efficiency, the expected value of the constant-size dataset aligned with your group will grow as well.

@naraburns @Amadan is everything okay here? I am very interested in what this guy had to say next, did he get Mossaded mid-typing or is this some technical issue?

Looks like his second comment hit the new-user filter. I actually have no idea how that happened, given that they were a minute apart; maybe he just got comically unlucky with when a mod was online, and so his first comment got approved and the second didn't get noticed?

Sorry about that, this is a good example of the problems with our current approach to filtering new users. The whole Janitor-Volunteer system, once it's up and running, will hopefully be able to fix stuff like this in the future.

Looks like part 2 got caught in the spam filter; I've cleared it now (I think!).

Here is a link, just in case.


Overall, I think the fitness landscape here is surprisingly hospitable engineering-wise. Unbiased (as per my definition) LMs are possible, either trained de novo from freely available datasets such as C4 (or its unfiltered superset), The Pile, reddit/stackexchange/hackernews/forums dumps, sci-hub and pirate library, LAION-2B or finetuned from freely available higher-performance checkpoints such as UL20B, GLM-130B, BLOOM-176B.

My general advice here would be to facilitate storage of these datasets and checkpoints (and any newer higher-performing ones likely to appear before the worst-case embargo) among interested persons, as well as devising distributist communal schemes of running said models on commodity GPU servers, such as the one I mentioned earlier (one could imagine modifications to prolong operational lifespan of such servers as well). Also, some trusted group could host the moderate compute the aforementioned LM attestation requires.

The real problem I see here is lack of state of the art publicly available chinchilla-scaled models (though this might change, if will lead their training run to completion and will be allowed to release their artifact?) and lack of coordination, determination and access to compute by the people who would be interested in unbiased general-purpose assistants. Generally, the publicly available models are all pretty old and weren't* designed and trained with utmost efficiency of deployment or maximum possible zero-shot performance per parameter in mind. A focused effort likely could surpass the parameter efficiency of even Deepmind's Chinchilla - but the attempt would cost hundreds of thousands of dollars.

As John Carmack has said in a recent interview, The reason I’m staying independent is that there is this really surprising ‘groupthink’ going on with all the major players.

This favourable conclusion, of course, assumes the user has access to some money and/or compute and to open-source LMs. We could imagine a hypothetical future where some form of "the war on general purpose computing" has reached its logical conclusion - making general purpose computation and technologies such as LMs unavailable to the wider public.

This scenario doesn't leave much freedom to the individual, but, assuming some degree of access to advanced AI systems, one could imagine clever prosaic techniques for splitting up subproblems into small, verifiable parts and using filtered adversarial LMs against one another to validate the solutions. In some intermediate scenarios of formal freedom, but de-facto unavailability of unbiased systems this might even work.

As usual, the real bottleneck to solving this stack of technical problems is human coordination. I suspect that this generalist forum is better suited for figuring out a way through it than current technical collectives preoccupied solely with training open-source models.

and some modest computation capability (say, a cluster of 3090s or a commitment to spend a moderately large sum on lambda.labs)

This is not sufficient. The rig as described by neonbjb is only 192GB of vram; fine-tuning an LM with 130B params (in the best possible case of GLM-130B; the less said about the shoddy performance of OPT/BLOOM, the better) requires somewhere in the ballpark of ~1.7TB of vram (this is at least 20+ A100s), and that's on batch size 1 with gradient checkpointing and mixed precision and 8bit adam and fused kernels without kv cache and etc. If you don't have an optimised trainer ready to go (or god forbid, you're trying distributed training), you should expect double the requirements.

The cost of that isn't too bad, of course. Maybe $25 bucks an hour on LL, any machine learning engineer can surely afford that. The larger doubt I have is that any of this will take place.

Respectfully, I think GLM-130B is not the right scale for the present-day present-time personal assistant. Ideally, someone (Carper?) would release a 30B or 70B Chinchilla-scaled LM for us to use as a base, but barring that lucky outcome (not sure if carper will be allowed to) I'd go with UL20B or a smaller Flan-T5, or one of several available 10-20B decoder-only models.

In this setting I have in mind, GLM-130B zero-shot prompted with what amounts to our values could be used either as a source of custom base CoT-dialogue finetune dataset or as a critique-generator and ranker in the Anthropic's constitutional AI setting. So, their inference-only config which supports servers as small as 4x RTX3090 could be used. Granted, the performance of GLM-130B in its current raw shape is somewhere between "GPT-3.5" and older Instruct-GPT-3, but it should suffice for the purpose described here.

fine-tuning an LM with 130B params (in the best possible case of GLM-130B; the less said about the shoddy performance of OPT/BLOOM, the better) requires somewhere in the ballpark of ~1.7TB of vram (this is at least 20+ A100s), and that's on batch size 1 with gradient checkpointing and mixed precision and 8bit adam and fused kernels without kv cache and etc.

Wearing my ML engineer hat I could say that while this is a conventional requirement, if we were determined to tune this LLM on a few batches on a given single server, we could use DeepSpeed's Zero-3 offload mode and maybe a bit of custom code to swap most of the parameters to the CPU RAM, which is much cheaper and is surprisingly efficient given large enough batches. One transformer layer worth of VRAM would be enough. One server likely wouldn't be enough for the complete operation, but used infiniband cards and cables are surprisingly cheap.

Regarding the kv cache, I expect the next generation of the transformer-like models to use advanced optimizations which lower kv cache pressure, specifically memorizing transformer. There are other competitive inventions, and discussion of the highest performing stack of tricks to get to the most efficient LM would be interesting, if exhausting.

We can… change the world? The point however, is to argue about it. But seriously, thank you for this plan. This really deserves more eyeballs, hopefully more ‘technically enclined’ than I am.