@Grant_us_eyes comments on "Culture War Roundup for the week of February 27, 2023

Culture War Roundup for the week of February 27, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

ThenElection 2yr ago

Facebook's LLaMa{-7B,-13B,-30B,-65B} has apparently been leaked on 4chan via torrent. Amusingly, the leaker included sufficient info to identify himself in the leak: basic opsec, people!

It's still not quite runnable for most hobbyists, but give it time. For better or worse, the democratization of AI continues.

Context

Grant_us_eyes ThenElection 2yr ago

I'll be the one to ask the stupid question; For those of us whom haven't been exhaustively following software development, what does 'LLaMa{-7B,-13B,-30B,-65B}' actually mean?

DaseindustriesLtd late version of a small language model Grant_us_eyes 2yr ago · Edited 2yr ago

Like already answered, this is the number of parameters. A parameter is the same thing as a weight, a unit loosely inspired by the synapse in biological systems like ourselves: a coefficient that is adjusted during training to reduce the predictive error, maximize reward or however else the objective function is defined for the purpose of a given project.

You can consider the number of parameters to be a measure of a neural network's expressivity: theoretically, the more parameters there are, the more algorithms, or more complex ones, can be learned/approximated by the model (this is a nice elegant illustration of the sense in which a neural network learns to represent an algorithm). But in practice, for now it seems that most models, and virtually all models released prior to Google's Chinchilla, are grossly overparametrized: a smaller network trained in a reasonable way on the same amount of data learns more or less the same skills, and a smaller model trained for longer learns qualitatively more, in that it actually reaches the underlying algorithms that allow it to find solutions in the general case, and doesn't just memorize superficial patterns or even raw data itself.* In this case, LLaMA-13B (13 billion parameters) is allegedly equal in benchmark performance/apparent "intelligence" to GPT-3-175B, so it's more parameter-efficient by a factor of 13,46, and also vastly more efficient in terms of training expense. The main secret is that it was exposed to 1 trillion tokens (a character group that's basically equivalent to a short word, see here), whereas GPT-3 only saw 500 billion. (It must be added that the average LLaMA token is shorter, because it uses character-level tokenization for numbers, so it should also have better arithmetic). The biggest LLaMA is trained on 1.4T tokens like Chinchilla-70B (with the same caveat about tokenizing numbers) and, for some not so trivial reasons, is slightly better still.

Aside from the total number, what matters is parameter precision. Models are usually distributed with fp32 weights. As Elon Musk notes, int8 (1 byte per parameter) is fine for inference. @ThenElection may be wrong here, I think 7B and even 13B will run just fine – after some tuning by nice anons, of course – on recent Apple Silicon Macbooks, with even 33B possible on top-of-the-line 64Gb version** (curiously, in one benchmark, 33B model is superior to the 65B one).

See @Porean's experimental results here and the recent AAQC winner @TransgenicSolution's related note here.

*That said, super-large models still seem to have unique emergent capabilities, though as we proceed with training Chinchilla-proportioned models, fewer and fewer such capabilities remain. Before UL2-20B, the consensus was that you need like 60B or 100+ to get advantages from chain-of-thought prompting.

** tfw no 64B M3 macbook to run your personal genie

Edits: typos

Grant_us_eyes DaseindustriesLtd 2yr ago

You and everyone else answered this wonderfully, thank you.

I confess, a part of me can't help but be excited at the notion of this getting 'out to the masses', so to speak, and what weaponized autism will do with such a tool.

Fun times ahead, I think.

jeroboam Grant_us_eyes 2yr ago

It feels like the long-predicted spampocalyse might now become a reality.

Notably, this model is quite a lot better than what state actors previously had unfettered access to, if they decide to go that route.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats