YoungAchamian
No bio...
User ID: 680
I read everything and never comment. It's way easier.
I owe you responses to the other posts, but I am a slow & lazy writer with a penchant for procrastination, and lurking. I'll answer this first because it's a quick answer. My motivations is that I'm deeply sceptical about people and the world. This is only partly related to LLMs but starts deeper. I'm sceptical and cynical about human motivation, human behavior, and human beliefs. I'm not really interested in weighing in about "intelligence" that's a boring definitional game. I use LLMs, they are useful, I use them to write code or documents stuff in my professional life. I use the deep research function to do lit reviews. They are useful, doesn't mean I think they are sentient or even approaching sentience. You are barking up the wrong tree on that one, misattributing opinions to me that I in no way share.
Possibly, I can get where it feels like they are lording it over all the peons in the thread and why that would be frustrating. But at the same time I think they have some frustration about all the lay-peeps writing long posts full of complex semantic arguments that wouldn't pass technical muster (directionally). I interpreted the whole patent + degree bit as a bid to establish some credibility, not to lord it over people. I also think they aren't directly in the LLM space (I predict the signal processing domain!) so some of their technical explanations miss some important details. This forum is full of autists who can't admit they are wrong so the later part is just par for the course. No idea why everyone needs to get so riled up about this topic.
I think this gets into what is a "world model" that I owe self_made_human a definition and a response to. But I'd say cause-effect relationships are indeed patterns and regularities, there's no dispute there. However, there's a crucial distinction between representing causal relationships explicitly, structurally, or inductively, versus representing them implicitly through statistical co-occurrence. LLMs are powerful precisely because they detect regularities, like causal relationships, as statistical correlations within their training corpus. But this implicit statistical encoding is fundamentally different from the structured causal reasoning humans perform, which allows us to infer and generalize causation even in novel scenarios or outside the scope of previously observed data. Thus, while cause-effect relationships certainly are patterns, the question isn't whether LLMs capture them statistically, they clearly do, but rather whether they represent them in a structured, grounded, explicitly causal way. Current research, that I have seen, strongly suggests that they do not. If you have evidence that suggests they do I'd be overjoyed to see it because getting AIs to do inductive reasoning in a game-playing domain is an area of interest to me.
Why do you open up like this:
Having no interest to get into a pissing context
But start your argument like this:
but amounts to epitemically inept, reductionist, irritated huffing and puffing with an attempt to ride on (irrelevant) credentials
It doesn't come off as some fervent truth-seeking, passionate debate, and/or intelligent discourse. It comes across as a bitter nasty commentariat incredulous that someone would dare to have a different opinion from you. Multiple people in this post were able to disagree with OP without resorting to prosaic insults in their first sentence. I get that you have a lot of rep around here, which gives you a lot of rope but why not optimize for a bit more light instead of a furnace full of heat? It could not have been hard to just not write that sentence...
At the risk of getting into it with you again. What did you think of this when it made its rounds 2 months ago: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
Idk if I believe language possesses all the necessary info for a world model. I think Humans interpret language through their world model which might give us a bias towards seeing language like that. Just like intelligence, humans are social creatures we view the mastery of language as a sign of intelligence. A LLM's apparent mastery of language gives people the feel that it is intelligent. But that's a very anthropocentric conception of language and one that is very biased towards how we evolved.
As for why some prominent AI scientists believe vs others that do not? I think some people definitely get wrapped up in visions and fantasies of grandeur. Which is advantageous when you need to sell an idea to a VC or someone with money, convince someone to work for you, etc. You need to believe it! That passion, that vision, is infectious. I think it's just orthogonal to reality and to what makes them a great AI scientist.
How exactly does an LLM know that Mozart wasn't a fan of hip hop without some kind of world model? Do you think that fact was explicitly hand-coded in?
It's learned statistical representations and temporal associations between what Mozart is and what hip hop is. Statistically Mozart and Hip hop likely have no statistical co-occurrence. When you ask if Mozart liked hip-hop, the model isn't "thinking," "Mozart lived before hip-hop, so no." Instead, it generates text based on learned probabilities, where statements implying Mozart enjoyed hip-hop are statistically very rare or nonsensical.
Do you think that fact was explicitly hand-coded in?
I specialize in designing and training deep learning models as a career and I will never assert this because it is categorically wrong. The model would have to be very overfit for this to happen. And any company publishing a model that overfit is knowingly doing so to scam people. It should be treated similar to malfeasance or negligence.
To predict them well, it must compress latent generators: seasons, cities, typical temperatures, stylistic tropes. When we bolt on retrieval, we let it update those latents with fresh data.
I strongly agree that latent spaces can be surprisingly encompassing, but I think you're attributing more explicit meaning and conceptual structure to LLM latent spaces than actually exist. The latent space of an LLM fundamentally represents statistical relationships and contextual patterns derived entirely from textual data. These statistical regularities allow the model to implicitly predict plausible future text, including semantic, stylistic, and contextual relationships, but that doesn't amount to structured, explicit comprehension or 'understanding' of concepts as humans might interpret them. I'd postulate that GLoVe embeddings act similarly. They capture semantic relationships purely from statistical word co-occurrence; although modern LLMs are much richer, deeper, and more context-sensitive, they remain statistical predictors rather than explicit world-model builders. You're being sorta speculative/mind-in-the-clouds in suggesting that meaningful understanding requires, or emerges from, complete contextual or causal awareness within these latent spaces (Which I'd love to be true, but I have yet to see it in research or my own work). While predictive-processing metaphors are appealing, what LLMs encode is still implicit, statistical, and associative, not structured conceptual knowledge.
RLHF shapes behavior. It does not build the base competence.
RLHF guides style and human-like behavior. It's not based on expert truth assessments but attempting to be helpful and useful and not sound like it came from an AI. Someone here once described it as the ol' political commissar asking the AI a question and when it answers wrongly or unconvincingly, shooting it in the head and bringing in the next body. I love that visualization, and its sorta accurate enough that I remember it.
By insisting on “explicit, grounded, structured” you are smuggling in “symbolic, human-inspectable, modular”. That is a research preference, not a metaphysical requirement. Cognitive science moved past demanding explicit symbol tables for humans decades ago. We allow humans to count as having world models with distributed cortical encodings. We should use the same standard here.
I'll consider this, will probably edit a response in later. I wrote most of this in 10-20 minutes instead of paying attention during a meeting. I'm not sure I agree with your re-interpretation of my definition, but it does provoke thought.
A. The base model already has a world model:
Pretraining on next-token prediction forces the network to internalize statistical regularities of the world. You can’t predict tomorrow’s weather report, or the rest of a physics paper, or the punchline of a joke, without implicitly modeling the world that produced those texts. Call that latent structure a “world model” if you like. It’s not symbolic, but it encodes (in superposed features) distinctions like:
What typically happens vs what usually doesn’t Numerically plausible vs crazy numbers causal chains that show up consistently vs ad-hoc one-offs
I'm going to need a citation; I have seen no research to date that suggests LLMs develop any sort of a word model. A world model is:
- An explicit internal representation of cause-effect relationships.
- Grounded reasoning about physical, social, or conceptual structures independent of linguistic statistics.
- A structured understanding of external reality beyond pure linguistic correlation.
Instead, current research strongly suggests that LLMs are primarily pattern-recognition systems that infer regularities purely from text statistics rather than internally representing the world in a structured, grounded way.
An LLM can easily write a weather report without one, will that report be correct? Depends on what you consider the "LLM" the actual text model: no, the whole engineered scaffolding and software interface, querying the weather channel and feeding it into the model: sure. But the correctness isn't emerging from the LLM's internal representation or conceptual understanding (it doesn't inherently "know" today's weather), but rather from carefully engineered pipelines and external data integration. The report it is producing was RLHF-ed to look correct
While the current paradigm is next-token-prediction based models, there is such a thing as diffusion text models, which aren't used in the state of the art stuff, but nonetheless work all right. Some of the lessons we are describing here don't generalize to diffusion models, but we can talk about them when or if they become more mainstream. There are a few perhaps waiting in the stables, for example Google semi-recently demoed one. For those not aware, a diffusion model does something maybe, sort of, kind of like how I wrote this comment: sketched out a few bullet points overall, and then refined piece by piece, adding detail to each part. One summary of their strengths and weaknesses here. It's pretty important to emphasize this fact, because arguably our brains work on both levels: we come up with, and crystallize, concepts, in our minds during the "thinking" process (diffusion-like), even though our output is ultimately linear and ordered (and to some extent people think as they speak in a very real way).
I hate that I feel compelled to nitpick this. But while it's a good layman explanation for how Diffusion models work, the devil is in the details. Diffusion models do not literally, or figuratively diffuse thoughts or progressively clarify ideas. They diffuse noise applied to the input data. They take input data noised according to a fixed schedule and model it as a gaussian distribution which they learn to remove said noise. Since they are an encoder/decoder networks, during inference they take only the decoder (Edit. technically this is incorrect, it's the forward process vs reverse process they aren't explicitly encoder/decoders, its unfortunately how I always remember them), input noise and have it generate output words, text, etc. It is 100% not "thinking" about what it has diffused so far and further diffusing it. It is doing it according to the properties of the noise and the relationship to the schedule it learned during training. It is entirely following a Markovian property; it has no memory of any steps past the immediately previous one, no long-term refinement of ideas. During training it is literally comparing random steps of denoised data with the predicted level of denoising. You can do some interesting things where you add information to the noise via FFT during training and inference to influence the generated output, but as far as I know that's still ongoing research. I guess you could call that noise "Brain thoughts" or something but it is imprecise and very speculative.
Source: 3 years spend doing research on DDIM/DDPMs at work for image generation. I admittedly haven't read the new battery of nlp-aligned diffusion papers (They are sitting in my tabs) but I did read the robotic control paper via diffusion, and it was similar, just abstractions on how the noise is applied to different domains. I'm guessing the NLP ones are similar though probably uses some sort of discrete noise.
Autism can lead to people not having an innate understanding of why social rules work the way they do
Most normal neurotypical people don't understand why social rules work the way they do. They just can intuit what the rules are and don't question following them. Trying to get them to actually explain these arbitrary rules and why this or that particular variation exist is a maddening exercise in futility. It almost always results in a tautology.
Meta notably can’t even catch fully up to the front players and most of the team quit in frustration.
Do you have any source on this? I'd love to learn more.
I don't think our own problems get solved until we have an executive with unchallenged personal authority and immunity to firing.
Cool, and I'm sure you would still hold that belief if the executive was some blue-haired progressive who went by Ze/Zir pronouns right?
(Edit) After some thought, I decided to tone done my dismissive vitriol and maybe offer a more constructive response.
Despite what you might think I don't have unlimited free time/brain power to engage in high-effort debate with random people online, I'm a shape-rotator, not a word-cell. Particularly since debating people online rarely leads to any information exchange or substantive opinion change. As such I apply a heuristic when having a discussion online on whether my interlocutor is worth it. Needless antagonism, unfounded arrogance, pithy insults and pettiness are the typical markers that its not. People who don't engage charitably and treat discussion as some sort of mal-social debate team competition, where anything goes, doubly so.
Dase you tripped up all of the above. To my chagrin, I snapped back which was unbefitting of my expectations for myself. If you want people to engage with you substantively, with high information density conversation, you have to give them a reason to put the effort in. If you write only for extreme heat with unproportionate amounts of light then no one reasonable is going to engage with you. Maybe that is to your taste, who am I to judge pigs that want to roll in the mud. Regardless I have better uses of my time than getting into the stie with you.
Food for thought: ML != LLMs, if your comment here:
Fetishizing algorithmic design is, I think, a sign of mediocre understanding of ML, being enthralled by cleverness. Data engineering carves more interesting structure into weighs.
was changed to this:
Fetishizing algorithmic design is, I think, a sign of mediocre understanding of LLMs, being enthralled by cleverness. Data engineering carves more interesting structure into weighs.
Then it is a far more applicable to the evidence you have provided and honestly I think the topic you actually care about. I might even agree, however the original doesn't align with the reality of ML as a field across ALL domains. But who knows, maybe my attempt at being charitable here will go nowhere, you'll double down on being an ass, and I'll update my weights with finality on the pointlessness of engaging with you in the future.
Have a good one.
It actually goes beyond that. In MTBI a T isn't just a T, but a cognitive function at a particular placement. You have 4 placements: Dominant, Aux, Tertiary, and Inferior and together they make your categorization. Cognitive functions can be extroverted or introverted, the E/I on MTBI marks which starts first then they alternate. So not only are they an axis but a T in two different types means two different things.
For example, a T in an INTJ is their Aux function: Extroverted Thinking, a T in an INTP is their Dominant function: Introverted Thinking. There're all sorts of analyses on what that actually means but it definitely doesn't mean that all Ts, Es, Is, Fs etc. are alike, will get along together, or will connect.
The hardcore real MTBI tests require an in-person psychologist visit that takes hours. the hokie test that corpo's give you or that you can find online generally aren't very "accurate" and thus really lend to the stereotype of sciency-astrology.
You are completely unwarranted in making this assumption, and you're only saying this to be nasty towards me. It's a really cheap shot, doubly so because I cannot show how wrong you are without doxxing myself. You can do better than this.
I did not mean for this to be a cheap shot so if it came across as one, I apologize. You could straight up say that, yes you too, are an MLE in this field, so it is also a consideration for you. That's not doxing, I'd take that at a face value statement, honor system. No proof needed until proven otherwise.
(you know what these two are, right?), (if you don't understand how I came up with this number, X AI's Grok will helpfully explain it to you, just copypaste this paragraph to it verbatim, and enable Think mode).
That said as far as cheap shots you seem to like giving as good as you get... Let's take the spice level down.
"is entirely dependent on the individuals estimation of its long term payoff and the time horizon on which they want a return on it". Yes, thank you, that's exactly what I was trying to get across the entire time.
Then we agree on this...
So yeah, maybe they got offered $300k TC when they joined, but that $300k is worth much more after a year or two.
We disagree on this, because they aren't taking home any more money. This is still entirely dependent on whether or not xAI IPOs or provides a vehicle to sell their equity. Currently it doesn't sound like there is a plan to do so. The stock rising might be a great sign, but shit happens and if xAI kicks it tomorrow then they didn't actually make all that extra money. Counting it now is counting chickens before they hatch.
I see you took this pretty personally.
That tends to happen when you insult people out of the blue Dase. This:
a sign of mediocre understanding of ML
Is called being an asshole. I do ML for a living, insinuating my competence is mediocre because we disagree intellectually is poor taste. There are ways to have this discussion intellectually without resorting to being a douche. The last AI thread you commented on you were a prick to everyone who disagreed with you, up and down the thread. I have no desire to put up with your shit. Call it taking it personally or giving what was given. It's up to you if you want to be an adult and have conversation or be a bratty child.
not ScaleAI
This was heavy sarcasm on my part. ScaleAI did OpenAIs data engineering but I don't think that makes them a top AI company. data engineering is needed and important! But it's not revolutionary. Data engineering, is the same as its always been.
«low-level Cuda compiler writing and server orchestration»
This is why arguing with laymen is annoying. Low level is not "condensation" it is the technical term for "low on the compute stack" or "closer to the compiler". It's very important, the theories I have heard is that it's one of Deepseek's great winning points for why they were able to train their LLM much cheaper than everyone else. They were willing to go even more low level than the Cuda and write their own firmware-level orchestration code.
see DeepSeek's NSA paper.
We agree.
lolcow, so has Schimidhuber.
Schmidhuber has always been a lolcow, he's an inside joke. Any other ML person knows his schtick and finds it funny and doesn't take him seriously. I included it as a humorous inside joke.
Past glory is no evidence of current correctness, however. LeCun with his «AR-LLMs suck» has made himself a lolcow,
At the same time someone who has actually contributed to the field, who is in the arena, infinitely outweighs a nobody posting hot takes on an obscure forum. Regardless of my humor on Schmidhuber, even he, far out weights me as a titan in the ML field because he has contributed groundbreaking research. Where does that leave you? Jeering in the audience like its a sporting match?
What has Hinton done recently
Won a nobel prize.
is still within the Transformer (Vaswani et al., 2017) framework
Don't quote the old magic to me I was there when it was written. You seem to be labouring on the delusion that LLMs == ML/AI. LLMs are a subset of ML/AI. The current hottest topic definitely! The this "Algo Fetish" you say goes far beyond just LLMs. The research on model architectures has lead us to the encoder/decoder, then self attention, then Transformers. It's not a fetish and its not mediocre because it's not going to stop at transformers. Maybe you've forgotten a fundamental tenet of the Scientific Theory, but experiments fail. It sounds like you've just listed a bunch of experiments that failed. Should we give up and go on praising on the altar of bronze because no one has figured out how to forge iron? Seems like you are asking us to praise ignorance over discovery?
Transformer training is easy to parallelize and it's expressive enough. Incentives to find anything substantially better increase by OOM year on year, so does the compute and labor spent on it, to no discernible result. I think it's time to let go of faulty analogies and accept the most likely reality.
The problem is transformer's don't work on everything, and the whole field isn't just LLMs. That's the reality.
I'm not sure I believe AGI will come from transformers. If you want to have this as a separate discussion. You can let me know, nicely and we can talk about it.
Likewise my entire point, before you jumped into insult me, is that the Big Names in ML/AI are "fetishy algo freaks" They shockingly don't want to do non "mediocre algo butt sniffing" work. And Data Engineering isn't new, it isn't revolutionary, it's great, it works well, but it doesn't require some 1% ML researcher to pull it off. It requires a solid engineering team, some technical know-how, and a willingness to get your hands dirty. But no one is going to get famous doing it. It's an engineering task not a research task. And since research tasks are what people pay the ludicrously big bucks for at tech companies the engineers at xAI aren't being paid some massive king-sized salary...
As an exercise, can you tell me THE engineer at Deepseek who proposed or wrote their Parallel Thread Execution(PTX) code with a citation?
No, I never said that someone is making that. All I said that this is reasonable range for cash compensation at a place like X AI, and that these ranges are only published to satisfy California labor law.
I'm not really sure what your disagreement with me is then other than risk appetite and investment value. My stance this entire thread is that Average MLEs working at xAI make the a likely comparable comp to other MLEs at other FAANGs, which is ballparked at 300k to 350k. Part of that is equity so the likely take home is lower. It's not enough to retire early or live like kings or really any of the grandiose claims made up thread that I originally responded to...
I encourage you to try the following exercise. Pick any person, and ask him to name something he owns that’s literally completely worthless, as in, worth $0 to him, and offer to buy it for $10,000. Do you expect him to reject this offer, or eagerly jump for it?
Thanks I'll take that rhetorical trick under consideration. However something can be worthless at the moment and theoretically still be worth more latter. It entirely will depend on the individuals estimation of its long term payoff and the time horizon on which they want a return on it.
I would assert that to me personally, equity in xAI earned from working there is not valuable to me for a number of reasons that are beyond monetary. If I was like you and could purchase it on a private exchange I also might.
Maybe they do for you, but I have higher expectations of success for companies ran by Elon Musk, and value them accordingly.
Maybe, I'm likely to be wrong, betting against musk seems like a bad idea but at the same time the man just keeps betting on black, eventually he's going to lose and I currently do not see the value difference that xAI has over its entrenched rivals. A non-woke ai is great but I'm not sure to will convert into monetary value. I also think he made it to piss on Sam Altman in their little spat.
I think you just use the word “value” and “worth” in a much different way than most people who deal with stocks do. By my definition, if the stock is worth $0, then any offer above $0 is not lowball. You seem to be interpreting it as “stock value is what it trades at on public markets” which is not that far from how I interpret it when talking about public companies, but completely useless and confusing when talking about private companies.
I likely do value stock that I make as part of compensation much less than stock I buy as investment. I'd say my time horizon for a return is much shorter. That's just my preference for compensation. It's not better, it definitely has lost me money before but at some level money has lesser importance to me. I argued this point from my personal beliefs as someone who is in a position to go potentially work for xAI, it's not an abstract argument like it might be for you.
Senpai comes down from his castle in the clouds to debate me... What a time to be alive.
Fetishizing algorithmic design is, I think, a sign of mediocre understanding of ML, being enthralled by cleverness. Data engineering carves more interesting structure into weighs.
This isn't data engineering either. Its low-level Cuda compiler writing and server orchestration. Goodfellow, Hinton, Schimdhuber, LeCun, et al. are definitely designing architectures, they aren't doing any more data engineering than normal MLEs, like me do... They clearly enjoy designing "algos", and the world clearly respects them greatly for that expertise. Also calling it "algo" design is incredibly reductive. Afterall everyone knows LLMs were discovered when Hinton invented the Boltzmann machine decades ago. This Transformer is just a paltry, fetish, "algo". The Data Engineering in Boltzmann machines was just too primitive!!!
But obviously you must have a far deeper understanding of ML/AI, why don't you quite your day job and start an AGI company? Put your clever prose and subtle insults to work on something more real. Maybe you can compete with ScaleAI, they do data engineering. Definitely the top AI research company.
You got me, it appears Neil used Ellis's playbook too.
as if it didn't matter what the startup is actually doing. It does. Similarly, for me, there's a difference between how much I value Blue Origin vs SpaceX equity.
It does matter but there are so many of these companies that call you when you do AI/ML that they all blend together. They have very little in the way of actual value add. I have personal, if-non-proven, technical opinions on how unlikely these companies are to take-off. It was pointless for the hypothetical because while I wouldn't take an xAI-like job 26 people did.
The point of my $10k offer was to argue that the equity in a private company is not worth nothing, contrary to what you said. This argument was extremely successful, because you are now arguing for my side, telling me that the stock is worth more than $10k, and that the owner of that stock should hold out for better offer than mine.
It actually makes my point; it is worth $0. But just because an investment is currently worth $0 doesn't mean you should cash it in for the first lowball cash offer. Most people understand that level, I guess you don't, or you desire to be obnoxiously pedantic. Whatever...
In any case, the salary brackets in job postings for this segment of the market have no actual relevance for anything. Everyone knows that equity is where the action is, and since the California law does not mandate including equity compensation in these brackets (as if there even was a reasonably useful way to do that), nobody cares about these figures.
Your argument was that someone is making 448k salary, you've pretty much agreed with me that no one at xAI is making that, they might get equity but realistically since xAI isn't seed funded that equity is worth how much Musk decides it is. And that's only if the company goes public. Looping us back to the above argument where it is currently worth $0 but could potentially be worth a Bajillion, but it's not like they can live off of xAI stock before that happens. Hence their TC is their salary + bonus. Which is likely ballparked at 300k just like Twitter/X MLEs... So again, they aren't living like kings and there is no guarantee that they will retire early in the future either. They are in it likely for the name-rec and love of the tech.
Have you considered that maybe X AI is trying to attract talent of this caliber?
Yes, but there aren't that many people of that caliber to go around, and they charge wayyyy more than 448k in cash. Google just "paid" the creator of the transformer over a billion in cash to come back to work for them, by buying out his current company.
Regardless of whether transformers are a dead-end or not, the current approach isn't doing new science or algo design. Its throwing more and more compute at the problem and then doing the Deepseek approach of finetuning the assembly level gpu instructions to exploit the compute even better so you can throw more compute at it. I doubt, Hinton, Goodfellow, LeCunn, Schimdhubber et al. have any desire to do that. Maybe if xAI did something revolutionary like leave the LLM space or introduce a non-MoE-Transformer model for AGI, then talent of that caliber might want to work there. Currently they exist so Elon can piss all over Altman.
I picked up The Universal State of America: An Archetypal Calculus of Western Civilisation by Simon Sheridan on @thejdrizzler 's recommendation, and I'm thoroughly enjoying it.
It definitely has me thinking of my own life from a Jungian archetype perspective and whether I've actually completely transcended the orphan stage into adulthood especially since the transitional, exoteric societal rites don't seem to exist, or have any meaning. I also just got to the section about the increase of Sage's who are disconnected from the cultural milieu and because the "magic" is gone from the societal rites it causes them to question the societal metaphysics. I can clearly see the parallel with a lot of the dissident right and woke left that gets discussed on here.
Definitely worth the pickup!
Sigh Ok lets play do this dance then...
Here's a hypothetical: Say I'm a Senior MLE with 8+ years of experience looking for a new job. I get contacted by/apply at a startup that is a start trying to do the next big LLM/LLM Agent/LLM dongle/Dohicky/Whatever. They cite to me an impressive amount of "just so" about who's funding them/running them/working at them and that their strategy to capitalize on ELITE human capital like myself as a non-progressive, white, male. With me so far?
They are a Series A/B startup, and don't believe in remote work so I'll have to move to the Bay, non-negotiable. Their comp offer to me is: 250k Base + 200k in equity + 50k in equity every year after the first that I work there. Equity value was calculated based on the rate sold to their investors. Vested over several years to make sure I stick around. If I leave early, before it vests, I will forfeit my shares. Pretty standard.
Ok first question, how much is that comp offer worth?
The Wlxd stance seems to be that is 450k TC and I would be stupid to turn down nearly half a million! I could work there for 5-10 years and retire early!
The Akka stance is that the 250k base is nice, but that 200k equity is realistically worth $0 at this time. It could pay off but it also very likely could not. The LLM market is very saturated right now. Everyone and their rich uncle is trying to capitalize on it. Akka would consider that offer based on the base and how worthwhile that job would be, with the equity as a nice-to-have but not absolute. For this hypothetical, let's just say Akka takes it.
The next month the company goes bankrupt. How much is that comp worth?
Or:
You put the labor in, but it's a toxic workplace and is killing your mental/physical health so you quit and forfeit. How much is that comp worth?
The rival company has a better strategy and a headstart, they beat you to market, you IPO late. How much is that comp worth?
Your company's CEO is a known narcissistic asshole and while he's good at his job, one day he decides he doesn't like you and shit-cans your ass: How much is that comp worth?
In addition to being a narcissistic asshole your CEO also gets involved in politics, the other side develops quite the hateboner for him and when the incumbent swaps they punish him. How much is that comp worth?
Now somewhere before this all happens some rando on the internet has a PRIME offer for you. He'll give you 10k for all of your equity. Because after all, you value it as $0 or somewhere low like that. However like any MLE, you are a smart fucking cookie and you look at your $250k salary and the estimated 200k equity you have and think 10k REALLY?. Would you really be so dumb as to take literal pennies on the dollar for your equity?? It's not worth anything now and it very likely could never be, but 10k is fucking chump change, pardon my french. Then this Internet rando has the gall to tell you he thinks it's worth millions. You definitely aren't selling it to him at 10k a pop now, if this investment is worth so much then why doesn't he put his money where his mouth is and buy it from you at 90-95 cents on the dollar after all it's an AMAZING investment?
And yes, I get that its an investment, like i said it's all about risk appetite, and obviously, mine is less than yours. Lots of people who work at startups probably have a closer appetite to you, but also lots of them end up with nothing. That said 10k is an insultingly low offer and you clearly don't think it's a good investment or you would have offered more so....
Afterword
No, you quoted an article that was citing figures from job postings. Those figures are not there to sound impressive, they’re there to satisfy legal requirements California imposes on job postings
We can just agree to disagree on this. I've been in this field for a bit and know of zero no-name MLEs who make 448k salary. Maybe an Ian Goodfellow, or a Yann LeCun would command that sort of cold hard cash. Even the Senior Staff SWE at twitter make like 275k Salary, their half a million comps come from equity which is calculated in. As far as I know neither Ian, or Yann, or Schmidhuber, et al work at xAI, and in fact I don't know any big name who does. If you do, feel free to share for my learning.
You clearly don't actually understand how equity works. It obviously is worth something, because investors are paying billions of dollars for it. Being public or private only has indirect effect on how much stock is worth.
Trust me I do, the real discrepancy is that we have different risk appetites. But please keep talking down to me.
These figures only include cash compensation, and they are a very reasonable range for cash part of the comp for between junior and staff levels.
I linked twitter salaries in the above thread. Everything in the Bay is talked about in the TC range because it sounds more impressive. If the post-Elon twitter salaries are any benchmark, then the 118-440k are TC as well. That means probably about 50% is in equity.
To put it in more concrete terms: I'd happily go to any of these X AI employees, and buy whatever they vest in a year for $10,000. This offer establishes that their stock is worth something. (I'm not just making a point, this offer is 100% serious: if you work for X AI, feel free to DM me, I'm open to negotiations even).
Anyone who worked for xAI would be a fool to take this offer because you are massively lowballing them. If they vest over 3 years as is typical, that 150k vested would be 50k a year. They would be a fool to sell it to you at 10k, Realistically based on how much you value it they would be better charging you 3x-5x the price since you clearly think its worth it. Would you cough up 150k-250k cash for 50k vested stock in xAI? You do after all think it is worth near half a million+, that would be quite the steal for you...
There are around 40 people in the photo. I bet you that at least 10 of them are seniors or above.
You made me go back and count, there are 26 people visible in that photo. If we assume a typical corporate structure of Junior, Engineer II, Senior, Staff. The general ratio is a mix of 1-3 Juniors/IIs per Senior, and 1-2 Seniors per Staff. I'd say there is 4-6 Seniors and 2-4 Staff Engineers.
For context I'm an MLE, and while I'm not in the Bay, what I see is a 10-20% pay bump for MLEs over traditional backend SWEs. Frontend SWEs (Websites and stuff) are paid much less than backend SWEs. Database peeps can make wildly varying salaries depending on how much DevOps stuff they are doing and how critical the Database is.
MLE is generally a really nebulous job title as well, and than can really impact salaries a lot. I'm more analagous to an ML Researcher + SWE, but it can run the gamut from Data Scientist, Mathematician + SWE(I imagine this is a Quant), ML + Database/MLOps, SWE that interfaces with ML models, and most recently LLM specific Engineers (Generally called AI engineers).
- Prev
- Next
Every extent. It's really dominant. What's made worse is that a new set of "fantasy" fans are really insistent that their magical dragon school romance with 86 interspecies love triangles is actually really fantasy!
More options
Context Copy link