rae
A linear combination of eigengenders
User ID: 2231
But at the same time I think they have some frustration about all the lay-peeps writing long posts full of complex semantic arguments that wouldn't pass technical muster (directionally).
The issue is that OP is the lay person writing a long post full of complex semantic arguments that don’t pass technical muster, while passing themself as an credentialed expert, and accusing others of doing what they’re doing. That tends to rile people up.
It comes across as a bitter nasty commentariat incredulous that someone would dare to have a different opinion from you.
I don't think the issue is OP's opinion. The issue I had was listing off credentials before making completely incorrect technical explanations, doubling down instead of refusing to admit they made a mistake, and judging researchers based on the fact that they don't hold any US or EU patents.
More like saying that the soyuz rocket is propelled by expanding combustion gasses only for somone to pop in and say no, its actually propelled by a mixture of kerosene and liquid oxygen.
I'm sorry but what you said was not equivalent, even if I try to interpret it charitably. See:
An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word.
The LLM, on its own, directly takes the block of text and gives you the probability of the next word/token. There is no "second algorithm" that takes in a block of text, there is no "distribution analysis". If I squint, maybe you are referring to a sampler, but that has nothing to do with taking a block of text, and is not strictly speaking necessary (they are even dropped in some benchmarks).
I would ask that you clarify what you meant by that sentence at the very least.
The old cliche about asking whether a submarine can swim is part of why made a point to set out my parameters at the beginning, how about you set out yours.
The only question I care about is, what are LLMs useful for? The answer is an ever-expanding list of tasks and you would have to be out of touch with reality to say they have no real-world value.
I don't know if you realize this, but you come across as extremely condescending and passive-agressive in text. It really is quite infuriating. I would sit down, start crafting a response, and as i worked through your post i would just get more angry/frustrated until getting to the point where id have to step away from the computer lest i lose my temper and say something that would get me moderated.
I would say perhaps I do deserve that criticism, but @self_made_human has made lengthy replies to your posts and consistently made very charitable interpretations of your arguments. Meanwhile you have not even admitted to the possibility that your technical explanation might have been at the very least misleading, especially to a lay audience.
You and @rae are both talking about vector based embedding like its something that a couple guys tried in back in 2013 and nobody ever used again rather than a methodology that would go on to become a defacto standard approach across multiple applications.
I literally said you can extract embeddings from LLMs. Those are useful in other applications (e.g. you can use the intermediate layers of Llama to get the text embedding for an image gen model ala HiDream) but are irrelevant to the basic functioning of an LLM chatbot. The intermediate layer "embeddings" will be absolutely huge features (even a small model like Llama 7B will output a tensor of shape Nx32x4096 where N is the sequence length) and in practice you will want to only keep the middle layers, which will have more useful information for most usecases.
To re-iterate: LLMs are not trained to output embeddings, they directly output the probability of every possible token, and you do not need any "interface layer" to find the most probable next word, you can do that just by doing torch.max() on its output (although that's not what is usually done in practice). You do need some scaffolding to turn them into practical chatbots, but that's more in the realm of text formatting/mark-up. Base LLMs will have a number of undesirable behaviours (such not differentiating between predicting the user's and the assistant's output - base LLMs are just raw text prediction models) but they will happily give you the most probable next token without any added layers, and making them output continuous text just takes a for loop.
You're acting like if you open up the source code for a transformer you aren't going to find loads of matrix math for for doing vector transformations.
How was this implied in any way?
I understand how my statements could be interpreted that way, but at the same time I am also one of the guys in my company who's been lobbying to drop degree requirements from hiring. I see myself as subscribing to the old hacker ethos of "show me the code". Its not about credentials its about whether you can produce tangible results.
I agree with you on this at least. :)
For a given definition of fine, i still think OpenAI and Anthropic are grifters more than they are engineers but I guess we'll just have to see who gets there first.
I dislike OpenAI's business practices, oxymoronic name and the fact that they are making their models sycophants to keep their users addicted as much as the next gal/guy, but I think it's absolutely unfair to discount the massive engineering efforts involved in researching, training, deploying and scaling up LLMs. It is useful tech to millions of paying customers and it's not going to go the way of the blockchain or the metaverse. I can't imagine going back to programming without LLMs and if all AI companies vanished tomorrow I would switch to self-hosted open source models because they are just that useful.
In the interest of full disclosure, I've sat down to write a reply to you three times now, and the previous two time I ended up figuratively crumpling the reply up and throwing it away in frustration because I'm getting the impression that you didn't actually read or try to engage with my post so much as just skimmed it looking for nits to pick.
Let me go back to this:
Imagine that you are someone who is deeply interested in space flight. You spend hours of your day thinking seriously about Orbital Mechanics and the implications of Relativity. One day you hear about a community devoted to discussing space travel and are excited at the prospect of participating. But when you get there what you find is a Star Trek fan-forum that is far more interested in talking about the Heisenberg compensators on fictional warp-drives than they are Hohmann transfers, thrust to ISP curves, or the effects on low-gravity on human physiology. That has essentially been my experience trying to discuss "Artificial Intelligence" with the rationalist community.
I hope you realise you are more on the side of the Star Trek fan-forum user than the aerospace engineering enthusiast. Your post was basically the equivalent of saying a Soyuz rocket is propelled by gunpowder and then calling the correction a nitpick. I don't care for credentialism, but I am a machine learning engineer who's actually deep in the weeds when it comes to training the kind of models we're talking about, and I can safely say that none of the arguments made in your post have any more technical merit than the kind of Lesswrong post you criticise.
In any case, to quote Dijkstra, "the question of whether Machines Can Think is about as relevant as the question of whether Submarines Can Swim". Despite their flaws, LLMs are being used to solve real-world problems daily, are used in an agentic manner, and I have never seen any research done by people obsessing over whether or not they are truly "intelligent" yield any competing alternative or actual upgrade to their capabilities.
I’m sorry but the way you started off by introducing yourself as an expert qualified in the subject matter, followed by completely incorrect technical explanations, kinda rubbed me the wrong way. To me it came across as someone quite intelligent venturing in a different technical field to their own, skimming the literature, and making authoritatively baseless sweeping claims while not having understood the basics. I’m not a fan of the many of the rarionalists’ approach to AI which I agree can border on science fiction, but you’re engaging in a similar kind of technical misunderstanding, just with a different veneer.
Just a few glaring errors:
LLM stands for "Large Language Model". These models are a subset of artificial neural network that uses "Deep Learning" (essentially a fancy marketing buzzword for the combination of looping regression analysis with back-propagation)
Deep learning may be a buzzword but it’s not looping regression analysis, nor is it limited to backprop. It’s used to refer to sufficiently deep neural works (sometimes that just means more than 2 layers), but the training objective can be classification, regression, adversarial… and you can theoretically use other algorithms than backprop (but that’s mostly restricted to research now).
to encode a semantic token such as the word "cat" as a n-dimensional vector representing that token's relationship to the rest of the tokens in the training data.
Now if what I am describing does not sound like an LLM to you, that is likely because most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word. This is essentially what is happening under the hood when you type a prompt into GPT or your assistant of choice.
That’s just flat out wrong. Autoregressive LLMs such as GPT or whatnot are not trained to encode tokens into embeddings. They’re decoder models, trained to predict the next token from a context window. There is no “additional interface layer” that gets you words from embeddings, they directly output a probability for each possible next token given a previous block, and you can just pick the highest probable token and directly get meaningful outputs, although in practice you want more sophisticated stochastic samplers than pure greedy decoding.
You can get embeddings from LLMs by grabbing intermediate layers (this is where the deep part of deep learning comes into play, models like llama 70B have 80 layers), but those embeddings will be heavily dependent on the context. These will hold vastly more information than the classic word2vec embeddings you’re talking about.
Maybe you’re confusing the LLM with the tokenizer (which generates token IDs), and what you call the “interface layer” is the actual LLM? I don’t think you’re referring to the sampler, although it’s possible, but then this part confuses me even more:
As an example "Mary has 2 children", "Mary has 4 children", and "Mary has 1024 children" may as well be identical statements from the perspective of an LLM. Mary has a number of children. That number is a power of 2. Now if the folks programming the interface layer were clever they might have it do something like estimate the most probable number of children based on the training data, but the number simply can not matter to the LLM the way it might matter to Mary, or to someone trying to figure out how many pizzas they ought to order for the family reunion because the "directionality" of one positive integer isn't all that different from any another. (This is why LLMs have such difficulty counting if you were wondering)
This is nonsense. Not only is there no “interface layer” being programmed, but 2, 4, 1024 are completely different outputs and will have different probabilities depending on the context. You can try it now with any old model and see that 1024 is the least probable of the three. LLMs entire shtick is outputting the most probable response given the context and the training data, and they have learned some impressive capabilities along the way. The LLMs will absolutely have learned that the probable number of pizzas for a given number of people. They also have much larger context windows (in the millions for Gemini models), although they are not trained to effectively use them and still have issues with recall and logic.
Fundamentally, LLMs are text simulators. Learning the concept of truth is very useful to simulate text, and as @self_made_human noted, there’s research showing they do possess a vector or direction of “truth”, which is quite useful for simulating text. Thinking of the LLM as an entity, or just a next word predictor, doesn’t give you a correct picture. It’s not an intelligence. It’s more like a world engine, where the world is all text, which has been fine tuned to mostly simulate one entity (the helpful assistant), but the LLM isn’t the assistant, the assistant is inside the LLM.
Yeah sure. And if you have a job applicant whose resume shows 12 different jobs in the past 5 years, none of which lasted more than 3 months, they're 'insecure' if they pass you over for an applicant with a more stable history, right?
If you’re dating a 28 year old, that 6-12 is spread out over ~12 years, so a new sexual partner every 1-2 years. Switching companies every 2 years is perfectly normal in industries like software engineering (in fact it’s often easier to further your career that way than by getting promoted internally).
Also you’re assuming those 6-12 partners were 3 month long relationships. It could have been two high school boyfriends, 3 college flings over the span of 4 years, and a 5 year long relationship that just ended. Are you really going to call that behaviour promiscuous?
Nobody is obligated to be 'secure' about promiscuity, that's laughable to even suggest. Its about the one thing we are genetically wired to BE insecure about. Which is to say, your comment reads like satire.
Body count has never been an issue in my relationships. I know people who’ve had over a hundred sexual partners, now that I understanding having some reservations with, but 6-12 is still in the perfectly normal range. We’re not talking about people who take part in rationalist polyamorous orgies here.
Look a single dude straight in the eye and say "Yeah she's banged 6-12 dudes prior to you, but I'm sure that she won't ever be thinking about any of them or comparing your performance and YOU'RE the one she's going to stick with" with a straight face.
This is just your insecurity talking. You're afraid that you might be worse off in some way than a previous partner, and thinking of sex like it's a "performance" instead of viewing it as a mutual exploration of intimacy, pleasure, and most importantly, as a way to bond with your partner.
Also 6-12 partners, those are rookie numbers. Like I could understand being weirded out by your partner having over 50 hook-ups, but 6-12 is perfectly normal in this day and age.
I think modern right-wing converts are very different from people who actually grew up in socially conservative communities because they’re fundamentally not conservatives at all. They’re people who grew up in a liberal environment who want to rebel against it (often for valid reasons), by adopting the values the liberals themselves previously fought against. Paradoxically, to be a socially conservative convert, you need to be a non-conformist who’s not afraid of questioning the worldview they were brought up in.
If you were a conformist who respected and followed societal expectations, the behaviour that from your description is encouraged in conservative communities, you wouldn’t have converted at all.
By being a right-wing convert in a liberal environment, you’re joining a counterculture, you’re adopting certain views because they’re cool, edgy, based, provocative, you want to tear down the system… you’re obviously going to have a very different attitude to life than people born in a socially conservative bubble.
There’s also big differences outside of the Hajnal line in Europe itself. Slovenia is a prosperous country with well functioning infrastructure on track to surpass the UK in terms of GDP per capita while Serbia is a poor corrupt autocracy, even if both were part of Yugoslavia and are ethnically south Slavs.
I don’t know any circumstances where HBD is a better explanation than culture and history.
The American conflation of race with class is bizarre. Upper-middle class urban white Americans share few cultural values with unemployed drug-addicted Appalachians and grouping them together as a homogenous “white” block makes little sense.
Wasn’t there a link a while back to one of those Woke Rightists who moved to a majority white town and realised he had nothing in common with the people there, and ended up missing the diverse big city?
- Prev
- Next
Great writing as per usual, although I'm not too sure what's the culture war angle here.
Your tidbit about inflation has left me wondering. How exaggerated is the purported social and economic decay in the UK? The impression I'm getting from abroad is some of the lowest wages in Western Europe coupled with extremely high cost of living. The salaries for some professionals are comparable to Eastern Europe even before purchasing power parity. Underfunded everything from education to the NHS. Yet somehow the price of goods and rent keeps climbing, especially in London.
More options
Context Copy link