@rae's banner p

rae


				

				

				
1 follower   follows 1 user  
joined 2023 March 03 06:14:49 UTC

A linear combination of eigengenders


				

User ID: 2231

rae


				
				
				

				
1 follower   follows 1 user   joined 2023 March 03 06:14:49 UTC

					

A linear combination of eigengenders


					

User ID: 2231

Great writing as per usual, although I'm not too sure what's the culture war angle here.

Your tidbit about inflation has left me wondering. How exaggerated is the purported social and economic decay in the UK? The impression I'm getting from abroad is some of the lowest wages in Western Europe coupled with extremely high cost of living. The salaries for some professionals are comparable to Eastern Europe even before purchasing power parity. Underfunded everything from education to the NHS. Yet somehow the price of goods and rent keeps climbing, especially in London.

But at the same time I think they have some frustration about all the lay-peeps writing long posts full of complex semantic arguments that wouldn't pass technical muster (directionally).

The issue is that OP is the lay person writing a long post full of complex semantic arguments that don’t pass technical muster, while passing themself as an credentialed expert, and accusing others of doing what they’re doing. That tends to rile people up.

It comes across as a bitter nasty commentariat incredulous that someone would dare to have a different opinion from you.

I don't think the issue is OP's opinion. The issue I had was listing off credentials before making completely incorrect technical explanations, doubling down instead of refusing to admit they made a mistake, and judging researchers based on the fact that they don't hold any US or EU patents.

More like saying that the soyuz rocket is propelled by expanding combustion gasses only for somone to pop in and say no, its actually propelled by a mixture of kerosene and liquid oxygen.

I'm sorry but what you said was not equivalent, even if I try to interpret it charitably. See:

An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word.

The LLM, on its own, directly takes the block of text and gives you the probability of the next word/token. There is no "second algorithm" that takes in a block of text, there is no "distribution analysis". If I squint, maybe you are referring to a sampler, but that has nothing to do with taking a block of text, and is not strictly speaking necessary (they are even dropped in some benchmarks).

I would ask that you clarify what you meant by that sentence at the very least.

The old cliche about asking whether a submarine can swim is part of why made a point to set out my parameters at the beginning, how about you set out yours.

The only question I care about is, what are LLMs useful for? The answer is an ever-expanding list of tasks and you would have to be out of touch with reality to say they have no real-world value.

I don't know if you realize this, but you come across as extremely condescending and passive-agressive in text. It really is quite infuriating. I would sit down, start crafting a response, and as i worked through your post i would just get more angry/frustrated until getting to the point where id have to step away from the computer lest i lose my temper and say something that would get me moderated.

I would say perhaps I do deserve that criticism, but @self_made_human has made lengthy replies to your posts and consistently made very charitable interpretations of your arguments. Meanwhile you have not even admitted to the possibility that your technical explanation might have been at the very least misleading, especially to a lay audience.

You and @rae are both talking about vector based embedding like its something that a couple guys tried in back in 2013 and nobody ever used again rather than a methodology that would go on to become a defacto standard approach across multiple applications.

I literally said you can extract embeddings from LLMs. Those are useful in other applications (e.g. you can use the intermediate layers of Llama to get the text embedding for an image gen model ala HiDream) but are irrelevant to the basic functioning of an LLM chatbot. The intermediate layer "embeddings" will be absolutely huge features (even a small model like Llama 7B will output a tensor of shape Nx32x4096 where N is the sequence length) and in practice you will want to only keep the middle layers, which will have more useful information for most usecases.

To re-iterate: LLMs are not trained to output embeddings, they directly output the probability of every possible token, and you do not need any "interface layer" to find the most probable next word, you can do that just by doing torch.max() on its output (although that's not what is usually done in practice). You do need some scaffolding to turn them into practical chatbots, but that's more in the realm of text formatting/mark-up. Base LLMs will have a number of undesirable behaviours (such not differentiating between predicting the user's and the assistant's output - base LLMs are just raw text prediction models) but they will happily give you the most probable next token without any added layers, and making them output continuous text just takes a for loop.

You're acting like if you open up the source code for a transformer you aren't going to find loads of matrix math for for doing vector transformations.

How was this implied in any way?

I understand how my statements could be interpreted that way, but at the same time I am also one of the guys in my company who's been lobbying to drop degree requirements from hiring. I see myself as subscribing to the old hacker ethos of "show me the code". Its not about credentials its about whether you can produce tangible results.

I agree with you on this at least. :)

For a given definition of fine, i still think OpenAI and Anthropic are grifters more than they are engineers but I guess we'll just have to see who gets there first.

I dislike OpenAI's business practices, oxymoronic name and the fact that they are making their models sycophants to keep their users addicted as much as the next gal/guy, but I think it's absolutely unfair to discount the massive engineering efforts involved in researching, training, deploying and scaling up LLMs. It is useful tech to millions of paying customers and it's not going to go the way of the blockchain or the metaverse. I can't imagine going back to programming without LLMs and if all AI companies vanished tomorrow I would switch to self-hosted open source models because they are just that useful.

In the interest of full disclosure, I've sat down to write a reply to you three times now, and the previous two time I ended up figuratively crumpling the reply up and throwing it away in frustration because I'm getting the impression that you didn't actually read or try to engage with my post so much as just skimmed it looking for nits to pick.

Let me go back to this:

Imagine that you are someone who is deeply interested in space flight. You spend hours of your day thinking seriously about Orbital Mechanics and the implications of Relativity. One day you hear about a community devoted to discussing space travel and are excited at the prospect of participating. But when you get there what you find is a Star Trek fan-forum that is far more interested in talking about the Heisenberg compensators on fictional warp-drives than they are Hohmann transfers, thrust to ISP curves, or the effects on low-gravity on human physiology. That has essentially been my experience trying to discuss "Artificial Intelligence" with the rationalist community.

I hope you realise you are more on the side of the Star Trek fan-forum user than the aerospace engineering enthusiast. Your post was basically the equivalent of saying a Soyuz rocket is propelled by gunpowder and then calling the correction a nitpick. I don't care for credentialism, but I am a machine learning engineer who's actually deep in the weeds when it comes to training the kind of models we're talking about, and I can safely say that none of the arguments made in your post have any more technical merit than the kind of Lesswrong post you criticise.

In any case, to quote Dijkstra, "the question of whether Machines Can Think is about as relevant as the question of whether Submarines Can Swim". Despite their flaws, LLMs are being used to solve real-world problems daily, are used in an agentic manner, and I have never seen any research done by people obsessing over whether or not they are truly "intelligent" yield any competing alternative or actual upgrade to their capabilities.