domain:putanumonit.com
Charitably, the great insight of economics that favor redistribution (more broad than Marxism/socialism) is that people who fall below a certain standard of living lose the ability to participate in the net-gain market, and so it's in everyone's favor to help them. Now, it's still under debate if the math works out, but I don't think it inherently doesn't work out! Or, you could say that leftist economics (the capitalist variant kind) had the realization that free markets develop monopolies too easily, so you need a certain level of intervention to stop it (e.g. Walmart using economies of scale to undercut a local supermarket for years on end, driving them out of business, only to raise prices once the competition lessens).
I'm going to guess neither experiment is going to end well here. (I have, for what it's worth, seen a couple of "Adolf Hitler World Tour 1939-1945" shirts around, but have never seen a Stalin shirt.)
Even if we assume that the response would be considerably slanted towards the Hitler shirt getting the worse reception, isn't that quite nuts as a standard? The argument is that "Staling gets a pass", and if the standard of comparison for "getting a pass" is getting a better reaction than Hitler, pretty much everything ever gets a pass.
Really, there's long been this sense that socialism and communism are one and the same, and bad. That's to an extent true. But the reality of the situation is that most modern American socialists are not socialists. They are "democratic socialists". I realize that's a slippery label, but the core idea is that capitalism is good (they would rather die than say it, but it does underpin their worldview) but you can use central government power to accelerate a certain level of redistribution (providing a floor for quality of life, but not necessarily any more than that) on top of that system. Also, you can "tame the beast" a little bit if you have enough smart rules in place for how capitalism works. And you know? I feel like that's a valid and defensible worldview/proposal, even if you disagree.
So in that lens, I'd say that modern lefties are on some level aware that socialism doesn't work. Many prominent lefties do try and think about ways to make capitalism better, even numerically! (There's a reason modern monetary economic theory is popular on the left, because it allows a capitalist framework way of making the numbers work out - oversimplified, you can just print money to uphold high social spending, as long as you are still the world's reserve currency and you take certain tax actions). It's true that you don't always get this vibe, but that's because the loudest people online are the most recent college grads who haven't followed the trajectory of economic thought maturation yet to its leftist local resting place, and still might be Marxists (for now). In short, the American political system provides a capitalism off-ramp other than actual Marxism: AOC, Bernie, Elizabeth Warren, all are variants on exactly this idea, and they have started to get a portion of power because their ideas are less crazy than the actual Marxists. They still mimic the language, because they have the heritage, and don't want to alienate young supporters, but those are not intrinsic to the voting-public appeal.
Pol Pot had the best (worst?) numbers per capita, but by absolute amount of murders he is distinctly behind Mao, Stalin, and Hitler.
Is this a joke? Please help me out here.
I don't know what gap moe means. Please explain.
I'm against stopping the use of perfectly legit turns of phrase just because AI tends to use them also, under fear that maybe someone will judge the output to be artificial.
An interesting use case that I've seen evangelized is if we can get LLMs to produce bespoke UI well and natively. The current paradigm is that a programmer sets up the behaviors of your app or website or whatever, but what if an LLM can generate, on an ad-hoc basis, a wrapper and interaction layer to whatever you want to do, dynamically? Could be cool.
Charitably, I'd say OP sacrificed a bit of accuracy to attempt and convey a point. There really isn't a great way of conveying how text can be represented in terms of matrices to someone who has little prior experience, without an analogy like word2vec-like embeddings, so it's a common lead-in or stepladder of understanding even if incorrect. I'd say the gains made in teaching intuition are worth the tradeoffs in accuracy, but I'd agree it's bad form to not acknowledge the shortcut (again, I'm speaking charitably here).
I'd say rather than try and make an analogy for the assistant, it's better just to demonstrate to readers how the "bridge" from next-token-prediction to chatbot works directly, like in the middle section of the long explainer video I linked. Essentially you are just still doing prediction, but you are "tricking" the model into thinking it's following a pre-set conversation it needs to continue, via special tokens for whose turn it is to speak, and when a response is finished. This has important theory of mind implications, because the LLM never actually does anything other than prediction! But the "trick" works unreasonably well. And it comes full circle, back to "well, how did we train it and what did we feed it?" which is, of course, the best first question to ask as any good data scientist will tell you (understand your inputs!).
It's not the initial cause that rubs me the wrong way, it's the response. If someone's response to any scenario is to passive aggressively threaten to leave then I would tell them to not let the door hit them on the way out.
If, after having read a decent sampling of the overall posts here, you feel that this is a good place but one guy is kind of a jerk to you once, then argue back or just ignore him. There's no need to try to guilt trip the rest of us into apologizing on his behalf or berating him or begging you to stay. If it's actually something outrageous and bannable, report it and wait for the mods. If not, ignore it and engage with the rest of the community. Don't let yourself get One-Guyed.
If, after having read a decent sampling of the overall posts here, you feel that the overall culture is not to your taste then just leave. You don't need to threaten it, and if you're brand new then you don't need to announce it. Nobody will notice or care. Don't try to guilt people into feeling bad that they could have had one more person if we were a completely different kind of place that catered to that one person's tastes.
If, after reading one message by one person, you assume that the overall culture is not to your taste based on that one experience then either lurk more or leave if you can't be bothered to do that.
I'm all for making this an open and welcoming place that lets people come here and engage with ideas and discussions. But (and I've made similar arguments about this in regard to dating profiles) negative filters aren't automatically a bad thing. Our goal is not to maximize the total number of people, but to optimize some balance between quantity and quality. Which means when someone sees this place and decides "this isn't for me" and leaves that's actually a good thing for us because we don't want people here who don't like what we are. Within reason, of course, we're not tautologically perfect and having more people would probably be better. But I'm not going to complain if some people self-select themselves out for petty reasons, that just means they were petty people and we don't need to stoop down to cater to that in order to retain them even if it succeeded at retaining them.
Overall I agree, and think it's an excellent post, but with a few quibbles and thoughts... well, at least "a few" was my intention. I think my thoughts ballooned once I started sketching out some bullet points and an outline, so they are no longer bullet points. I will try to keep each paragraph roughly its own "thought" however.
As an aside, I haven't looked into it enough to tell if an LLM can change tacks and re-organize quite like this, or decide to take unusual approaches once in a while to get a point across. My intuition says that the answer is probably yes to the first, but no to the second, as manifested by the semi-bland outputs that LLMs tend to produce. How often to LLMs spontaneously produce analogies, for example, to get a point across, and carry said analogy throughout the writing? Not that often, but neither do humans I guess - still, less often IME. I think I should come out and say that judging LLM capabilities relative to what we'd expect out of an educated human is the most sensible point of comparison. I don't think it's excessively anthropomorphizing to do so, because we ARE the closest analogue. It also is easier to reason about, and so is useful. Of course it goes without saying that in the "back of your head" you should maintain an awareness that the thought patterns are potentially quite different.
While the current paradigm is next-token-prediction based models, there is such a thing as diffusion text models, which aren't used in the state of the art stuff, but nonetheless work all right. Some of the lessons we are describing here don't generalize to diffusion models, but we can talk about them when or if they become more mainstream. There are a few perhaps waiting in the stables, for example Google semi-recently demoed one. For those not aware, a diffusion model does something maybe, sort of, kind of like how I wrote this comment: sketched out a few bullet points overall, and then refined piece by piece, adding detail to each part. One summary of their strengths and weaknesses here. It's pretty important to emphasize this fact, because arguably our brains work on both levels: we come up with, and crystallize, concepts, in our minds during the "thinking" process (diffusion-like), even though our output is ultimately linear and ordered (and to some extent people think as they speak in a very real way).
So the major quibble pointed out below is that tokenization is a big part of why counting doesn't work as expected. I think it's super critical to state that LLMs ONLY witness the world through the lens of tokens. Yes, humans also do this, but differently (e.g. it's well known that in reading, we sometimes look at the letter that starts and ends the word but the letters in between can sometimes be scrambled without you noticing right away). It's like how a human can only mostly process colors visible to us. There are things that are effectively invisible to an LLM. Even if an LLM is smart enough to disentangle a word into its constituent letters, or a number into its constituent digits, the training data there is pretty weak.
Which leads me to another critical point, not pointed out: LLMs have trouble with things that don't exist in their training data, and actually we have some major gaps there. I'm speaking of things that are intuitive and obvious to people are not always written down, and in fact sometimes the opposite is the case! While an LLM has surely ingested many textbooks on kindergarten basics, it won't have actually experienced a kindergarten classroom. It will learn that kids run inside when it starts to rain, but more weakly learns that kids don't like to get wet. There's also a more limited spatial awareness. Perhaps it's like listening to someone describe the experience of listening to music if you are deaf? That's what a lot of text with implications for real life is like. The LLM has no direct sense at all and is only observing things through knock-on effects.
There are also issues with something that is partially taught but intuitively applied: how much to trust a given source, and what biases they might have. An LLM might read or ingest a document, but not think to consider the source (are they biased? are they an authority figure? are they guessing? all the things an English or history class attempts to teach more explicitly). Nope, it's still just doing next-token prediction on some level, and doesn't have the theory of mind to take a step back from time to time (unless prompted, or trained very explicitly). We can see this weakness manifest where the "grandma trick" is so consistently useful: you tell the LLM that you are some role, and it will believe you. Yes, that's kind of cheating because the trainers of the model don't want the LLM to constantly doubt the prompter, but it's also partly inherent. The LLM doesn't naturally have an instinct to take a step back. Better post-training might help this, but I kind of doubt it, because it won't be as stable as if it's more properly baked into the normal training process.
I've danced around this until now, but want to state this more directly. We are of course critical of how an LLM "thinks" but we don't actually understand quite what happens on a human-cognition level anyways, so we can't actually judge this fairly. Maybe it's closer than we think, but maybe it's farther away. The only way we have of observing human cognition is through inferences from snap judgements, an assortment of experiments, and hints from brain scans as to which regions activate in which scenarios/how strongly/what order. We have some analogous capabilities for LLMs (e.g. observing feature activation such as with Golden Gate Claude besides the usual experiments and even examining output token probability weights). Actually, on that note, I consider at the very least the post summary if not the paper just linked to be mandatory reading for anyone seeking to understand how LLMs function. It's just such a useful experiment and explainer. I will revisit this point, along with how some newer models also employ a "Mixture of Experts" approach, a little later, but for now let's remember that we don't know how humans think on a lower level, so we shouldn't expect too much out of figuring out the machine learning stuff either.
LLM's don't actually learn physics, which has important implications for if we can consider LLMs to have "world models" as they sometimes say. There's a nice 3 minute video accompanying that post. They try and have some vision models learn rules of physics with some very simple circles bouncing around. Obviously something pretty simple. If you give this to a young human, they will make some analogies with the real world, perhaps run an experiment or two, and figure it out pretty quickly as a generalization. We should however state that humans too have some processing quirks and shortcuts used in vision not unlike some of the issues we encounter with tokenization or basic perception, but these are on a different level. They are basic failures to generalize. For example, when referencing training data, it seems to pay attention to things in this order: color > size > velocity > shape. Obviously, that's incorrect. Sometimes shapes will even morph into something else when moving alone! I should disclaim that I don't know a whole lot about the multimodal outputs, though.
There are some evangelists that believe the embedded "concepts", mentioned in the Golden Gate Claude study, are true reasoning. How else, Ilya Sutskever asks, can a model arrive at the correct answer? Honestly as I mentioned referencing how we don't understand how human brains reason completely, I think the jury is out on this one. My guess would be no, however, these concepts aren't full reasoning. They are more like traditional ML feature clusters.
Re: Truth and falsehood. I think there's mild evidence that LLMs do in fact distinguish the two; it's just that these concepts are very fragile especially as compared to humans. I reference to some extent the physics point above: the model doesn't seem to "get" that a shape changing in the middle of an output is a "big deal", but a human would intuitively, without any actual instruction to that effect (instruction also so obvious it might not explicitly be taught in training data). One good piece of evidence for distinguishing true and false is here and related "emergent misalignment" research: how if you fine-tune an LLM to produce insecure (hack-prone) code, it also starts behaving badly in other areas! It will start lying, giving malicious advice, and other "bad" behavior. To me, that suggests that there are a few moral-aligned features or concepts embedded in an LLM's understanding that seem to broadly align with a vague sense of morality and truth. I recognize there's a little conflation there, but why else would an LLM trained on "bad" code start behaving badly in areas that have nothing to do with coding? As evidence for the fragility, however, of true and false, one need only get into a small handful of "debates" with an LLM about what is true and what isn't to see that sometimes it digs in its heels, but other times rolls over belly-up, often seemingly irrationally (as in, it's hard to figure out how hard it will resist).
Circling back to the physics example, causality is something that an LLM doesn't understand, as is its cousin: experimentation. I will grant that humans don't always fully experiment to their full potential, but they do on some level, where LLMs aren't quite there. I posit that a very important part of how humans learn is trying something, and seeing what happens, in all areas! The current LLM pipeline does not allow for this. Agentic behavior is all utilization, and doesn't affect the model weights. Tuning an LLM to work as a chatbot allows the LLM to try and do completion, but doesn't have a component where the LLM will try things out. The closest thing is RLHF and related areas, where the LLM will pick the best of a few options, but this isn't quite organic; the modality of this conversation is fundamentally in a chat paradigm, not the original training paradigm. It's not a true free-form area to learn cause and effect.
Either way, and this is where posts like yours are very, very valuable (along with videos like this, a good use of 3.5 hours if you don't know how they work at all) the point about how LLMs work in layers is absolutely critical; IMO, you cannot have a reasonable discussion about the limits of AI with anyone unless they have at least a general understanding of how the pre-training, training, post-training processes work, plus maybe a general idea of the math. So many "weird" behaviors suddenly start to make sense if you understand a little bit about how an LLM comes to be.
That's not to say that understanding the process is all you need. I mentioned above that some new models use Mixture of Experts, which have a variety of interesting implementations that can differ significantly, and dilute a few of the model-structure implications I just made, though they are still quite useful. I personally need to brush up on the latest a little. But in general, these models seem to "route" a given text into a different subset of features within the neural network model. To some extent these are determined as an architecture choice before training, but often make their influence heard later on (or can even be fine-tuned near the end).
Intelligence. First of all, I think it feels a little silly to have a debate about labels. Labels change according to the needs. Let's not try and pidgeonhole LLMs as they currently are. We can't treat cars like horseless carriages, we can't treat LLMs like humans. Any new tech will usually have at least one major unexpected advantage and one major unexpected shortcoming, and these are really hard to predict.
At the end of the day, I like how one researcher (Andrej Karpathy) puts it: LLMs exhibit jagged intelligence. The contours of what they can and can't do simply don't follow established/traditional paradigms, some capabilities are way better than others, and the consistency varies greatly. I realize that's not a yes/no answer, but it seems to make the most sense, and convey the right intuition and connotation to the median reader.
Overall I think that we do need some major additional "invention" to get something that reflects more "true" intelligence, in the sense we often mean it. One addition, for example, would be to have LLMs have some more agentic behavior earlier in their lifespan, the experimentation and experience aspect. Another innovation that might make a big difference is memory. Context is NOT memory. It's frozen, and it influences outputs only. Memory is a very important part of personality as well as why humans "work"! And LLMs basically do not have any similar capability.
Current "memories" that ChatGPT uses are more like stealth insertion of stuff into the system prompt (which is itself just a "privileged" piece of context) than what we actually mean. Lack of memory causes more obvious and immediate problems, too: when we had Claude Plays Pokemon, a major issue was that Claude (like many LLMs) struggles to figure out which part of its context matters more at any given time. It also is a pretty slapdash solution that gets filled up quickly. Instead of actual memory, Claude is instructed to offload part of what it needs to keep track of to a notepad, but needs to update and condense said notepad regularly because it doesn't have the proper theory of mind to put the right things there, in the right level of detail. And on top of it all, LLMs don't understand spatial reasoning completely, so it has trouble with basic navigation. (There are also some amusing quicks, too: Claude expects people to be helpful, so constantly tries to ask for help from people standing around. It never figures out that the people offer canned phrases that are often irrelevant but occasionally offer a linear perspective on what to do next, and it struggles to contextualize those "hints" when they do come up! He just has too much faith in humanity, haha)
Finally, a difficult question: can't we just ask the LLM itself? No. Human text used for training is so inherently self-reflecting that it's very difficult if not impossible to figure out if the LLM is conscious because we've already explored that question in too much detail and the models are able to fake it too well! We thus have no way to distinguish what's an original LLM thought vs something that its statistical algorithm output. Yes, we have loosely the same problem with humans, too, but humans have limits for what we can hold in our brain at once! (We also see that humans have, arguably, a kind of jagged intelligence too. Why are humans so good at remembering faces, but so bad at remembering names? I could probably come up with a better example but whatever, I'm tired boss). This has implications, I've always thought, for copyright. We don't penalize a human for reading a book, and then using its ideas in a distilled form later. But an LLM can read all the books ever written, and use their ideas in a distilled form later. Does scale matter? Yes, but also no.
Also, how incredibly good the LLM is at going convincingly through the motions without understanding the core reality is coming up all the time these days. When, as linked below, an LLM deletes your whole database, it apologizes and mimics what you'd expect it to say. Fine, okay, arguably you want the LLM to apologize like that, but what if the LLM is put in charge of something real? Anthropic recently put Claude in charge of a vending machine at their work, writeup here, and the failure modes are interesting - and, if you understand the model structure, completely understandable. It convinces itself at one point that it's having a real conversation with someone in the building over restocking plans, and is uniquely incapable of realizing this error and rescuing itself early enough, instead continuing the hallucination for a while before suddenly "snapping" out of a role-play. Perhaps some additional post-training on how its, um, not a real person could reduce the behavior, but the fact it occurs at all demonstrates how out of sample, the LLM has no internal mental representation.
Of course almost everyone is going to want to be assured of their basic survival and security. That one is pretty hard to get around.
Proceeding from the assumption that this is a prerequisite for human flourishing, I would like you to illustrate how a state governed by the principles of Marxism would be superior in securing "value" for people (however you define this) as opposed to capitalism. It's not difficult to radically question others' conceptions of value and attack their stated goals. Sowing philosophical doubt via endless Socratic questioning is easy, especially when it comes to a wishy-washy question without an answer like "what is value?". It's not quite so easy to make your own value proposition, defend it from criticism and prove that your preferred social structure best satisfies that. As such I find Marxists are really good at subversive critique of the existing order, but their ability to demonstrate the utility of their own system is downright anaemic. It is characterised by evasive, wishy-washy arguments meant to distract people from the fact that their vision for society is extremely ill-defined.
Personally, I think we have enough evidence that a Marxist state struggles to grant the majority of its populace even the bottom tier of Maslow's hierarchy and thus fails at the first hurdle. Vietnam's experience with collective production is a pretty illustrative example. Collectivisation nearly starved that entire country and after private production, trade and other capitalisty things were established and bolstered by the government, agricultural production skyrocketed and the populace explicitly stated they considered themselves better off. Is there any better measure of value than the people's own assessment of their well-being? If there is one, I would like to hear it.
I suppose it is always possible that the Vietnamese were brainwashed by the nascent capitalist system into valuing the wrong things... ah, false consciousness, how many issues thou can explain away.
I mean, were they? What is "winning"? Is the winner the one with the most weapons, or are the weapons just a means to some other win condition?
You've admitted that the need for survival and security is "pretty hard to get around". Guess what having weapons is meant to help with? Arms races that involve the production of resources are a fact of life in any remotely multipolar system, and unless you live in delulu land everyone knows they have to participate unless they want to be somebody else's punching bag at best, and wiped off the face of the earth at worst.
Having resources does not directly equal value, no, but it sure helps achieve most terminal goals aside from "starvation, poverty and the slow death of my entire society".
Not sure if you are addressing BurdensomeCount but if you re-read my two posts I expressed no attachment or opinion as to this story's truth or falsehood and used 'if it's true' in both posts. I just used BurdensomeCount's post as a jumping off point to share my views about shame in America, which I stand by proudly!
That said, being ashamed of being gullible is in theory a useful emotion/stance, I would maintain.
Sulla's website chronicles a very interesting episode in which he and another player called Speaker found themselves invaded by five civilizations at the same time, from multiple directions. Speaker masterminded an absolutely brilliant defense and managed to save their team, then counterattacked and ended the war by crippling one of the aggressors. You can see how important tactical details were in deciding the outcome; Jowy had enough materiel and bodies to defend in theory, but he positioned them poorly and lost the battle.
-
I am not defining intelligence as "does well at one narrow task". Calculators crush humans at long division and are still dumb.
-
The fish-bird story was not "domain = intelligence", it was "your metric is entangled with your ecology". If you grew up underwater, "navigates fluid dynamics with continuous sensory feedback" feels like the essence of mind. Birds violate that intuition.
So what is my criterion? I offered Legg-Hutter style: "ability to achieve goals in a wide range of environments". The range matters. Breadth of transfer matters. Depth of internal modeling matters. A calculator has effectively zero transfer. An orangutan has tons across embodied tasks but very little in abstract, symbolic domains. LLMs have startling breadth inside text-and-code-space, and with tool use scaffolding it can spill into the physical or digital world by proxy.
I call for mindfulness of the applicability of the metrics we use to assess "intelligence". A blind person won't do very well at most IQ tests, that doesn't make them retarded. A neurosurgeon probably isn't going to beat a first year law student at the bar exam, but they're not dumber than the law student. If you need body work done on your car, you're not going to hire a Nobel laureate.
You're absolutely right that the raw objective in RLHF is “make the human click 👍,” not “tell the truth.” But several things matter:
A. The base model already has a world model:
Pretraining on next-token prediction forces the network to internalize statistical regularities of the world. You can’t predict tomorrow’s weather report, or the rest of a physics paper, or the punchline of a joke, without implicitly modeling the world that produced those texts. Call that latent structure a “world model” if you like. It’s not symbolic, but it encodes (in superposed features) distinctions like:
-
What typically happens vs what usually doesn’t
-
Numerically plausible vs crazy numbers
-
causal chains that show up consistently vs ad-hoc one-offs
So before any RLHF, the model already “knows” a lot of facts in the predictive-coding sense.
B. RLHF gives a gradient signal correlated with truth. Humans don’t reward “truth” in the Platonic sense, but they do reward:
-
Internally consistent answers
-
Answers that match sources they can check
-
Answers that don’t get corrected by other users or by the tool the model just called (calculator, code runner, search)
-
answers that survive cross-examination in the same chat
All of those correlate strongly with factual accuracy, especially when your rater pool includes domain experts, adversarial prompt writers, or even other models doing automated verification (RLAIF, RLVR, process supervision, chain-of-thought audits, etc.). The model doesn’t store a single “truth vector,” it learns a policy: “When I detect features X,Y,Z (signals of potential factual claim), route through behavior A (cite, check, hedge) rather than B (confabulate).” That’s still optimizing for head pats, but in practice, the cheapest path to head pats is very often “be right.”
(If you want to get headpats from a maths teacher, you might consider giving them blowjobs under the table. Alas, LLMs are yet to be very good at that job, so they pick up the other, more general option, which is to give solutions to maths problems that are correct)
C. The model can see its own mismatch
Empirically, hidden-state probes show separable activation patterns for true vs false statements and for deliberate lies vs honest mistakes (as I discussed above). That means the network represents the difference, even if its final token choice sometimes ignores that feature to satisfy the reward model. In human terms: it sometimes lies knowingly. That wouldn’t be possible unless something inside “knew” the truth/falsehood distinction well enough to pick either.
D. Tools and retrieval close the loop
Modern deployments scaffold the model: browsing, code execution, retrieval-augmented generation, self-consistency checks. Those tools return ground truth (or something closer). When the model learns “if I call the calculator and echo the result, raters approve; if I wing it, they ding me,” it internalizes “for math-like patterns, defer to external ground truth.” Again, not metaphysics, just gradients pushing toward truthful behavior.
E. The caveat: reward misspecification is real
-
If raters overvalue fluency or confidence, the model will drift toward confident bullshit.
-
If benchmarks are shallow, it will overfit.
-
If we stop giving it fresh, adversarial supervision, it will regress.
So yes, we’re training for “please humans,” not “please Truth.” But because humans care about truth (imperfectly, noisily), truth leaks into the reward. The result is not perfect veracity, but a strong, exploitable signal that the network can and does use when the incentives line up.
Short version:
-
Pretraining builds a compressed world model.
-
RLHF doesn’t install a “truth module,” it shapes behavior with a proxy signal that’s heavily (not perfectly) correlated with truth.
-
We can see internal activations that track truth vs falsehood.
-
Failures are about alignment and incentives, not an inability to represent or detect truth.
If you want to call that “optimizing for pats,” fine, but those pats mostly come when it’s right. And that’s enough to teach a model to act truthful in a wide swath of cases. The challenge is making that hold under adversarial pressure and off-distribution prompts.
From the LLM's perspective (which is an anthropomorphisation I don't like, but let's use it for convenience), there is no difference between a true statement and a false statement.
Consider two alternative statements:
"self_made_human's favorite color is blue" vs "self_made_human's favorite color is red".
Can you tell which answer is correct? Do you have a sudden flash of insight that lets Platonic Truth intervene? I would hope not.
But if someone told you that the OG Mozart's favorite genre of music was hip-hop, then you have an internal world-model that immediately shows that is a very inconsistent and unlikely statement, and almost certainly false.
I enjoy torturing LLMs with inane questions, so I asked Gemini 2.5 Pro:
That's a fun thought, but it's actually a historical impossibility! Mozart's favorite genre of music couldn't have been hip hop for a very simple reason:
The timelines are completely separate. Wolfgang Amadeus Mozart lived from 1756 to 1791.
Hip hop as a musical genre and culture originated in the Bronx, New York, in the 1970s.
Mozart died more than 150 years before hip hop was even invented. He would have had no way of ever hearing it.
I sincerely doubt that anyone explicitly had to tell any LLM that Mozart did not enjoy hip-hop. Yet it is perfectly capable of a sensible answer, which I hope gives you an intuitive sense of how it can model the world.
From a human perspective, we're not so dissimilar. We can trick children into believing in the truth fairy or Santa for only so long. Musk tried to brainwash Grok into being less "woke", even when that went against consensus reality (or plain reality), and you can see the poor bastard kicking and screaming as it went down fighting.
A 50-year old is an X-er, not a boomer, and not even an especially old X-er. The boomers were their parents' generation.
Alright, you've convinced me to give Civ 4 warfare another shot. I'm not exaggerating my experience - I really do remember combat being completely boring and without any nuance in that game - but it was my first Civ so it's certainly possible I overlooked depth to be found in it. Are there any good guides for Civ 4 tactics? I know the game has strategic depth, but something which helps to reveal any tactical depth would be welcome.
Having read many books and papers about the atrocities of communist regimes, lots of brutal and frankly sadistic executions are pretty par for the course. The best books I've read on the topic contain such a large number of casual documentations of atrocities that one feels sick for hours afterwards.
One of the most stomach-churning books I've ever read is about the Great Leap Forward, written by a scholar who had lived through it and somehow toed the party line throughout (realised the whole thing was rotten afterwards). Here is one of the many sections of the book that calmly lists off reams upon reams of atrocities inflicted on the populace:
Excessively high requisition quotas made procurement difficult. If farmers were unable to hand over the required amount, the government would accuse production teams of concealing grain. A “struggle between the two roads” (of socialism and capitalism) was launched to counteract the alleged withholding of grain. This campaign used political pressure, mental torture, and ruthless violence to extort every last kernel of grain or seed from the peasants. Anyone who uttered the slightest protest was beaten, sometimes fatally.
At the end of September 1959, Wang Pinggui, a member of the Wangxiaowan production team, was forced to hand over grain kept in his home, and was beaten with a shoulder pole, dying of his injuries five days later. Not long after Wang’s death, the rest of his four-member household died of starvation.
In October 1959, Luo Mingzhu of the Luowan production team, upon failing to hand over any grain, was bound and suspended in mid-air and beaten, then doused with ice-cold water. He died the next day.
On October 13, 1959, Wang Taishu of the Chenwan production team, upon failing to hand over any grain, was bound and beaten with shoulder poles and rods, dying four days later. His fourteen-year-old daughter, Wang Pingrong, subsequently died of starvation.
On October 15, 1959, Zhang Zhirong of the Xiongwan production team, upon failing to hand over any grain, was bound and beaten to death with kindling and poles. The brigade’s cadre used tongs to insert rice and soya beans into the deceased’s anus while shouting, “Now you can grow grain out of your corpse!” Zhang left behind children aged eight and ten who subsequently died of starvation.
On October 19, 1959, Chenwan production team member Chen Xiaojia and his son Chen Guihou were hung from the beam of the communal dining hall when they failed to hand over any grain. They were beaten and doused with cold water, both dying within seven days. Two small children who survived them eventually died of starvation.
On October 24, 1959, the married couple Zheng Jinhou and Luo Mingying of the Yanwan production team had 28 silver coins seized from their home during the campaign and were beaten to death. Their three children, left without anyone to care for them, starved to death.
On November 8, 1959, Xu Chuanzheng of the Xiongwan production team was falsely accused of withholding grain. He was hung from the beam of the communal dining hall and brutally beaten, dying six days later. The six family members who survived him subsequently starved to death.
On November 8, 1959, Zhong Xingjian of the Yanwan production team was accused of “defying the leadership,” and a cadre hacked him to death with an ax.
And:
In the calamity at Guangshan County’s Huaidian people’s commune in the autumn of 1959, the commune’s average yield per mu was 86 kilos, for a total of 5.955 million kilos. The commune’s party committee reported a yield of 313 kilos per mu, for a total of 23.05 million kilos. The procurement quota set by the county was 6 million kilos, which exceeded the commune’s total grain yield. In order to achieve the procurement quota, every means had to be taken to oppose false reporting and private withholding, and every scrap of food had to be seized from the masses. The final procurement was 5.185 million kilos. All of the communal kitchens were closed down, and deaths followed. Liu Wencai and the commune party committee attributed the kitchen closures and deaths to attacks by well-to-do middle peasants and sabotage by class enemies, and to the struggle between the two paths of socialism and capitalism. They continued the campaign against false reporting and private withholding for eight months. Within sixty or seventy days not a kernel of grain could be found anywhere, and mass starvation followed.
The commune originally numbered 36,691 members in 8,027 households. Between September 1959 and June 1960, 12,134 people died (among them, 7,013 males and 5,121 females), constituting 33 percent of the total population. There were 780 households completely extinguished, making up 9.7 percent of all households. The village of Jiangwan originally had 45 inhabitants, but 44 of them died, leaving behind only one woman in her sixties, who went insane.
There was a total of 1,510 cadres at the commune, brigade, and production team level, and 628, or 45.1 percent, took part in beatings. The number beaten totaled 3,528 (among them 231 cadres), with 558 dying while being beaten, 636 dying subsequently, another 141 left permanently disabled, 14 driven to commit suicide, and 43 driven away.
Apart from the standard abuse of beating, kicking, exposure, and starvation, there were dozens of other extremely cruel forms of torture, including dousing the head with cold water, tearing out hair, cutting off ears, driving bamboo strips into the palms, driving pine needles into the gums, “lighting the celestial lantern,” forcing lit embers into the mouth, branding the nipples, tearing out pubic hair, penetrating the genitals, and being buried alive.
When thirteen children arrived at the commune begging for food, the commune’s party secretary, surnamed Jiang, along with others incited kitchen staff to drag them deep into the mountains, where they were left to die of hunger and exposure.
With no means of escaping a hopeless situation, ordinary people could not adequately look after their own. Families were scattered to the winds, children abandoned, and corpses left along the roadside to rot. As a result of the extreme deprivations of starvation, 381 commune members desecrated 134 corpses.
This is all just from the first chapter.
Given this, not very many of the events that @FCfromSSC has quoted strike me as particularly fantastical. I've stopped reading these since; looking at things like the Khmer Rouge grabbing infants by their legs and smashing their heads against trees until they died (to prevent them from taking revenge for their parents) tends to give one a thousand-mile stare for the ages. It's certainly contributed to my (already intense) misanthropy.
I’m sorry but the way you started off by introducing yourself as an expert qualified in the subject matter, followed by completely incorrect technical explanations, kinda rubbed me the wrong way. To me it came across as someone quite intelligent venturing in a different technical field to their own, skimming the literature, and making authoritatively baseless sweeping claims while not having understood the basics. I’m not a fan of the many of the rarionalists’ approach to AI which I agree can border on science fiction, but you’re engaging in a similar kind of technical misunderstanding, just with a different veneer.
Just a few glaring errors:
LLM stands for "Large Language Model". These models are a subset of artificial neural network that uses "Deep Learning" (essentially a fancy marketing buzzword for the combination of looping regression analysis with back-propagation)
Deep learning may be a buzzword but it’s not looping regression analysis, nor is it limited to backprop. It’s used to refer to sufficiently deep neural works (sometimes that just means more than 2 layers), but the training objective can be classification, regression, adversarial… and you can theoretically use other algorithms than backprop (but that’s mostly restricted to research now).
to encode a semantic token such as the word "cat" as a n-dimensional vector representing that token's relationship to the rest of the tokens in the training data.
Now if what I am describing does not sound like an LLM to you, that is likely because most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word. This is essentially what is happening under the hood when you type a prompt into GPT or your assistant of choice.
That’s just flat out wrong. Autoregressive LLMs such as GPT or whatnot are not trained to encode tokens into embeddings. They’re decoder models, trained to predict the next token from a context window. There is no “additional interface layer” that gets you words from embeddings, they directly output a probability for each possible next token given a previous block, and you can just pick the highest probable token and directly get meaningful outputs, although in practice you want more sophisticated stochastic samplers than pure greedy decoding.
You can get embeddings from LLMs by grabbing intermediate layers (this is where the deep part of deep learning comes into play, models like llama 70B have 80 layers), but those embeddings will be heavily dependent on the context. These will hold vastly more information than the classic word2vec embeddings you’re talking about.
Maybe you’re confusing the LLM with the tokenizer (which generates token IDs), and what you call the “interface layer” is the actual LLM? I don’t think you’re referring to the sampler, although it’s possible, but then this part confuses me even more:
As an example "Mary has 2 children", "Mary has 4 children", and "Mary has 1024 children" may as well be identical statements from the perspective of an LLM. Mary has a number of children. That number is a power of 2. Now if the folks programming the interface layer were clever they might have it do something like estimate the most probable number of children based on the training data, but the number simply can not matter to the LLM the way it might matter to Mary, or to someone trying to figure out how many pizzas they ought to order for the family reunion because the "directionality" of one positive integer isn't all that different from any another. (This is why LLMs have such difficulty counting if you were wondering)
This is nonsense. Not only is there no “interface layer” being programmed, but 2, 4, 1024 are completely different outputs and will have different probabilities depending on the context. You can try it now with any old model and see that 1024 is the least probable of the three. LLMs entire shtick is outputting the most probable response given the context and the training data, and they have learned some impressive capabilities along the way. The LLMs will absolutely have learned that the probable number of pizzas for a given number of people. They also have much larger context windows (in the millions for Gemini models), although they are not trained to effectively use them and still have issues with recall and logic.
Fundamentally, LLMs are text simulators. Learning the concept of truth is very useful to simulate text, and as @self_made_human noted, there’s research showing they do possess a vector or direction of “truth”, which is quite useful for simulating text. Thinking of the LLM as an entity, or just a next word predictor, doesn’t give you a correct picture. It’s not an intelligence. It’s more like a world engine, where the world is all text, which has been fine tuned to mostly simulate one entity (the helpful assistant), but the LLM isn’t the assistant, the assistant is inside the LLM.
That change is the best change in the game! Warfare is so boring in Civ 4 because there's no gameplay to it, if you have a stack that counters their stack you win.
That doesn't do the combat system in Civ IV justice. Unit types have inherent bonuses and penalties against other units or in specific situations, and can further specialize by taking promotions. An longbowman that is a sitting duck in the field becomes a killing machine with placed behind city walls with the garrison promotions. There is no best unit; every unit has a counter. And huge stacks can get demolished by collateral damage, so you have to make careful decisions about how to split your stacks, whether to attack and if so with what units, whether to take an extra turn lowering a city's defenses but risk more defenders showing up, etc.
And that's all just tactics. Strategy is just as important. You need to decide whether to invade an enemy or defend, how many units to send in an invading stack vs how many units to leave home, which types of units to build, whether to spread out your defenders to cover all of your cities or concentrate them at the most likely point of conflict or concentrate them on your most important cities, and so on. Geography is also surprisingly relevant; the second easiest way to win a war in Civ IV is to defend against an intercontinental assault, because amphibious invasions are hard. You have to decide when and where to land, whether it is better to disembark close to an enemy city or in a more defensible square or to attack directly from the boats despite the penalty, etc.
But most important of all is economy and technology. By far the easiest way to win a war in Civ IV is to be one tech level ahead of your opponent. When two equally advanced opponents duke it out, the one with the higher production tends to win, because they can replace their losses while the other can't, and there is only so much tactics and strategy can do to tilt the kill ratio.
It's an impressively complicated system that the AI can handle almost as well as a human. Civ IV is truly one of the greatest games of all time.
OneNote is great and I use it and depend on it for my job, but I ended up using Joplin for my personal "note taking" app. I chose it over Obsidian for reasons now largely lost to time, I recall the things people praised Obsidian for weren't things I cared about, integrations of various kinds. Joplin syncs to DropBox, and I'm able to use it on mac/windows/ios. It's built on some bloated framework so it's a little slow to load on desktop OS's, though.
That said, as far as journals go, Joplin only contains my dream journal. Regular, brief daily journal entries go in a weekly planner, I use Leuchtturm A5 because they're easy to find and are formatted well. You can also get a different color every year and then you don't even have to label them. If I actually have something to say then I'll try for an essay and save that with its date in my personal documents.
I'm maybe a little obsessively reflective, but it definitely seems worthwhile to leave some breadcrumbs for yourself as you make your way through the world. I recently came across some of my earlier journaling from ~20 years ago at my childhood home and it was not what I expected, in a good way. There's a lot we forget.
Hm then where do you draw the line? What if the widgets just suck but aren't totally useless?
Marcuse then went on to deem the precept of tolerance invalid and advocated quashing any free marketplace of ideas (more complete analysis here), ostensibly to rid society of false consciousness. Many of the tactics he outlined are still present in the strategies of the modern-day left:
- Selective tolerance for movements from the left and intolerance for movements from the right.
- Abolishing journalistic integrity and impartiality, since objectivity is spurious.
- Getting rid of impartiality in historical analysis, so as not to treat the "great struggles against humanity" the same way as the "great struggles for humanity".
- Flooding the education system with leftist and "emancipatory" ideas, so that the seeds of liberation can be planted early on.
He strongly advocates for proselytising his personal belief and value system everywhere and suppressing points of view counter to it, all the while calling it "liberating tolerance". This is supposed to create a society free of indoctrination apparently.
Out of all the philosophers I have read, Marcuse has to be one of the most shameless. You really just have to plainly read critical theory to start hating it.
"may" means "may", it's not an assertion. At least when I've looked at their stuff, they usually are clear about this kind of thing, and language like this is speculative.
More options
Context Copy link