@Porean's banner p

Porean


				

				

				
3 followers   follows 1 user  
joined 2022 September 04 23:18:26 UTC

				

User ID: 266

Porean


				
				
				

				
3 followers   follows 1 user   joined 2022 September 04 23:18:26 UTC

					

No bio...


					

User ID: 266

Roughly speaking, I see your point and agree that it's possible we're just climbing a step further up on an infinite ladder of "things to do with computers".

But I disagree that it's the most likely outcome, because:

  1. I think the continued expansion of the domain space for individual programmers can be partially attributed to Moore's Law. More Is Different; a JavaScript equivalent could've easily been developed in the 80s but simply wasn't because there wasn't enough computational slack at the time for a sandboxed garbage collected asyncronous scripting language to run complex enterprise graphical applications. Without the regular growth in computational power, I expect innovations to slow.

  2. Cognitive limits. Say a full stack developer gets to finish their work in 10% of the time. Okay, now what? Are they going to spin up a completely different project? Make a fuzzer, a GAN, an SAT solver, all for fun? The future ability of AI tools to spin up entire codebases on demand does not help in the human learning process of figuring out what actually needs to be done. And if someone makes a language model to fix that problem, then domain knowledge becomes irrelevant and everyone (and thus no one) becomes a programmer.

  3. I think, regardless of AI, that the industry is oversaturated and due for mass layoffs. There are currently weak trends pointing in this direction, but I wouldn't blame anyone for continuing to bet on its growth.

Start a substack. Please. Perfection is the enemy of good, and you are really good.

Whenever someone brings up the photography analogy, I always think they're completely missing the point. It's almost like you're Seeing as a State -- artists exist now, revolution happens, artists exist after.

What you're neglecting to mention is that the artists that exist in the present will not be the artists of the future. We had photorealistic painters, and later we had photographers. The latter were not made of the former. People will suffer, perish, anguish, and all of this stuff is important for understanding how things play out in the near future.

I predict advertising will become far more ubiquitous with the rise of Dall-E and similar image producing AIs. The cost of creating extremely compelling, beautiful ads will plummet, and more and more of our daily visual space will become filled with non stop advertising.

I predict it won't, honestly. You currently a 20B parameter model to generate pictures with readable text, and then you need a marketing expert to filter for the best generated outputs, anyway. Maybe a year from now, Google will train a static ad generator based on their AdSense data, but those are still just static ads. They don't perform that well. You need animated visuals at the very least, or a video if possible, and that kind of technology just isn't here yet -- not to mention how expensive it'd be.

30s scripted ads on YouTube are not going to come from AI within the next 1-2 years. Maybe 5. But by the time text2YouTubeAd comes out, we'll have far more problems than more attractive advertisements.

where did you learn that from?

and some modest computation capability (say, a cluster of 3090s or a commitment to spend a moderately large sum on lambda.labs)

This is not sufficient. The rig as described by neonbjb is only 192GB of vram; fine-tuning an LM with 130B params (in the best possible case of GLM-130B; the less said about the shoddy performance of OPT/BLOOM, the better) requires somewhere in the ballpark of ~1.7TB of vram (this is at least 20+ A100s), and that's on batch size 1 with gradient checkpointing and mixed precision and 8bit adam and fused kernels without kv cache and etc. If you don't have an optimised trainer ready to go (or god forbid, you're trying distributed training), you should expect double the requirements.

The cost of that isn't too bad, of course. Maybe $25 bucks an hour on LL, any machine learning engineer can surely afford that. The larger doubt I have is that any of this will take place.

Resolve?

For the people that might get duped like me: no, this article isn't about figuring out the true statistically rigourous answer for what a medium-sized breast looks like.

I've always wondered if the parentheses attention format was intentionally designed for humour.

If it does then it will be smart enough to self-modify,

This does not work out the way you think it will. A p99-human tier parallelised unaligned coding AI will be able to do the work of any programmer, will be able to take down most online infrastructure by merit of security expertise, but won't be sufficient for a Skynet Uprising, because that AI still needs to solve for the "getting out of the digital box and building a robot army" part.

If the programming AI was a generalised intelligence, then of course we'd be all fucked immediately. But that's not how this works. What we have are massive language models that are pretty good at tackling any kind of request that involves text generation. Solve for forgetfulness in transformer models and you'll only need one dude to maintain that full stack app instead of 50.

Misogynist (in the feminist sense) would be more accurate. There is zero mention of anything related to getting laid.

Leave the rest of the internet at the door.

Or could you at least have something more substantial to talk about than, "redditors upvote dumb shit, news at 11"?

Premise #2: Within the unit of people we care about, we care about everyone equally.

I think this premise specifically is inherently anti-utilitarian. How can you assign the same utility to each individual when there's so much variance? When the actions and roles and beliefs and experiences of two people can differ so greatly?

I can't draw conclusions without knowing what kind of degenerate you are. If you're into hentai, the waifu diffusion model was trained on the 1.4 SD checkpoint && has much room for improvement. If you're a furry, fine-tuned models are currently a WIP and will be available soon. If you're a normal dude, I don't really understand because I honestly think it's good enough at this point.

The only thing I think is really poorly covered at the moment is obscure fetish content. A more complicated mixture of fine-tuning + textual inversion might be needed there, but I do truly believe the needs of >>50% of coomers are satisfiable by machines at this point.

Edit: I am less confident of my conclusion now.

I tend to stare down the same paragraph for two hours and finally squeeze out, word by painful word, something that sounds like the ramblings of a schizophrenic with aphasia

The problem is that you are not writing fast enough. Think about text too slow and the words will blend together and lose all meaning. Put your brain into Word Salad Generation mode and just dump as you would into a Motte comment; you can edit for style/tone/content once you actually have something to edit.

I've shilled this before, but you should really try The Most Dangerous Writing App to knock out a first draft. As described by Alexey Guzey:

DO ACTUALLY TRY THIS DON’T FLINCH AWAY. This app might seem like the dumbest thing in the world but it DOES REALLY HELP. And if it doesn’t work, you will just lose 5 minutes.

I hate how much coverage the AI/rat community is giving to "Loab". It seems abundantly clear to me it's a social hoax (or at least just a funny art exhibition) rather than demonstrating anything insightful into the latent space of diffusion models.

Most of your post is in line with what I believe. The information workers in blue tribe will turn to protectionism as AI-generated content supercedes them. Red tribe blue-collar workers will suffer the least, and the Republicans will have their first and last opportunity to lure techbros away from the progressive sphere of influence.

There is one thing, though.

I simply do not forsee Republicans being likely to make AI regulations (or deregulation) a major policy issue in any near-term election, whilst I absolutely COULD see Democrats doing so.

It only takes one partisan to start a conflict. Republicans might not initially care, but once the democrats do, I expect it'll be COVID all over again -- sudden flip and clean split of the issue between parties.

But this is just nitpicking on my part.

  1. We aren't important enough. We have about a dozen thousand users that do not-much more than words-words-words in a closed community.

  2. We have some pretty good programmers onboard. The codebase is probably not clean right now, but I think it's a matter of time.

Any independent replications?

Sure.

OCR-VQGAN

Ah, interesting!

Training consumes far more matmuls than inference. LLM training operates at batch sizes in the millions -- so if you aren't training a new model, you have enough GPUs lying around to serve millions of customers.

My instinct is that this should be smaller and easier than the Stable Diffusion I run on my PC, but maybe I am just super wrong about that?

Super-wrong is correct. Nobody has a consumer-sized solution for that, and if it ever happens it'll be huge news.

TLDR: it should be possible for any chump with 12GB of ordinary RAM, or some combination of offloaded RAM+vRAM that sums to 9GB, because running encoder-only is fast enough. Tests and stats mostly extrapolated from T5-3B because of personal hardware constraints (converting models costs much more memory than loading them)

There are markdown tables in this comment that do not display correctly on the site, despite appearing correctly on the comment preview. You may wish to paste the source for this comment into a markdown preview site.


To start, T5-XXL's encoder is actually 4.6B, not 5.5. I do not know why the parameters aren't evenly split between the encoder & decoder, but they aren't.

Additionally, it's likely that int8 quantisation will perform well enough for most users. load_in_8bit was recently patched to work with T5-like models, so that brings the memory requirements for loading the model down to ∼5GB.

What about vram spikes during inference? Well, unlike SD, the memory use of T5 is not going to blow significantly beyond what its parameter count would imply, assuming the prompts remain short. Running T5-3B from huggingface [0], I get small jumps of:

| dtype | vram to load | .encode(11 tokens) | .encode(75 tokens) |

|-|-|-|-|

| 3B-int8 | 3.6GB | 4.00GB | 4.35GB |

| 3B-bf16 | 6.78GB | | 7.16GB |

Note that the bump in memory for bf16 is smaller than int8 because int8 does on-the-fly type promotion shenangians.

Extrapolating these values to T5-XXL, we can expect bumps of (0.4∼0.8) * 11/3 = 1.5∼3GB of memory use for an int8 T5-XXL encoder, or <1.5GB for a bf16 encoder. We should also expect the model to take 10∼20% extra vram to load than what its parameters should imply.

So, an ideal int8 T5-XXL encoder would take up to (4.6*1.15+3)GB, or slightly more than 8GB of vram during runtime. That still locks out a substantial number of SD users -- not to mention the 10xx series users who lack int8 tensor cores to begin with. Are they fucked, then?


Short answer: no, we can get away with CPU inference via ONNX.

I first came across the idea below a Gwern comment. Given that prompts are limited to 77 tokens, would it be possible to run the encoder in a reasonable amount of wall time? Say, <60s.

Huggingface's default settings are atrociously slow, so I installed the ONNX runtime for HF Optimum and built ONNX models for T5-3B [1]. Results:

| quantized? | model size on disk | python RAM after loading (encoder+decoder) | model.encoder(**input) duration | full seq2seq pass |

|-|-|-|-|-|

| no | 4.7+6.3GB | 17.5GB | 0.27s | 42s |

| yes | 1.3+1.7GB | 8.6GB | 0.37s | 28s |

I'm not sure whether I failed to use the encoder correctly here, considering how blazing fast the numbers I got were. Even if they're wrong, an encoder pass on T5-XXL is still likely to fall below 60s.

But regardless, the tougher problem here is RAM use. Assuming it is possible to load the text encoder standalone in 8bit (I have not done so here due to incompetency, but the model filesizes are indicative), the T5-XXL text encoder would still be too large for users with merely 8GB of RAM to use. An offloading scheme with DeepSpeed would probably only marginally help there.


[0] - example code to reproduce:


PROMPT = "..."

model = T5ForConditionalGeneration.from_pretrained(model_name, device_map='auto', low_cpu_mem_usage=True, ...)#add torch_dtype=torch.bfloat16 OR load_in_8bit=True here

inputs = tokenizer(PROMPT, return_tensors='pt')

output = model.encoder(**inputs)

[1] - example code for ONNX model creation:


model_name = "t5-3b"

model_name_local = "./t5-3b-ort"

model_name_quantized = "./t5-3b-ort-quantized"


def create_ORT_base():

    model = ORTModelForSeq2SeqLM.from_pretrained(model_name, from_transformers=True)

    model.save_pretrained(model_name_local)


def create_ORT_quantized():

    model = ORTModelForSeq2SeqLM.from_pretrained(model_name_local)

    model_dir = model.model_save_dir

    #

    encoder_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="encoder_model.onnx")

    decoder_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="decoder_model.onnx")

    decoder_wp_quantizer = ORTQuantizer.from_pretrained(model_dir, file_name="decoder_with_past_model.onnx")

    quantizer = [encoder_quantizer, decoder_quantizer, decoder_wp_quantizer]

    #

    dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)

    for q in quantizer:

        q.quantize(save_dir=model_name_quantized,quantization_config=dqconfig)

I didn't have any good place to add this in my post, but it's worth noting that caching of text embeddings will help a lot with using T5-XXL. Workflows that involve large batch sizes/counts || repeated inpaintings on the same prompt do not need to keep the text encoder loaded permanently. Similar to the --lowvram mechanism implemented now, the text encoder can be loaded on demand, only when the prompt changes, saving memory costs.

I miss the past.

Then you've missed the point of the article entirely? It's an election prediction site. Trying to put forward a case for a Republican electoral victory. It would be very odd and partisan to portray Redd as an anti-strategist that doesn't care about the outcome and "prefers to die on that hill and lose election".