You might be interested in Greg Egan's book Permutation City, which takes this (as he calls it) Dust Theory, and runs with it to the extreme.
Or maybe at Mr. Burns' birthday party...
The law of non contradiction: "Not both A and not A" or "¬(p ∧ ¬p)". Is another first principle.
That one's pretty uncontroversial, but the more interesting one is the law of excluded middle: "either A or not A". We all learn it, but there's a school of thought (intuitionism) that this shouldn't be a basic law. And indeed there are some weeeeeeeird results in math that go away (or become less weird) if you don't allow proof by contradiction.
It's more of a variation of your first possibility, but RT could also be acting out of principal-agent problems, not at the behest of Hollywood executives. The explanations probably overlap. There's also the possibility that they care about their credibility every bit as much as they did in the past, but it's their credibility among tastemakers that's important, not the rabble.
Yeah, I'd be surprised if RT's review aggregation takes "marching orders" from any executives. In fact, I think RT is owned indirectly by Warner Bros., so if anything you'd expect they'd be "adjusting" Disney movies unfavorably. I like your explanation that RT's just sincerely trying to appease the Hollywood elite, rather than provide a useful signal to the masses. It fits.
I'm not sure why you'd put a low prior on the first, though. Particularly for high visibility productions, "everyone" knows to take politics into account when reading reviews. Positively weighting aligned reviews doesn't seem like an incredible step beyond that.
I knew to take that into account with the critics score, which I would usually ignore for the "woke" crap. But in the past I've generally found the audience score trustworthy. Maybe I was just naive, and it took a ridiculous outlier for me to finally notice that they have their fingers on every scale.
Technically Bing was using it before then, but good point. It's insane how fast things are progressing.
Well, I don't think your analogy of the Turing Test to a test for general intelligence is a good one. The reason the Turing Test is so popular is that it's a nice, objective, pass-or-fail test. Which makes it easy to apply - even if it's understood that it isn't perfectly correlated with AGI. (If you take HAL and force it to output a modem sound after every sentence it speaks, it fails the Turing Test every time, but that has nothing to do with its intelligence.)
Unfortunately we just don't have any simple definition or test for "general intelligence". You can't just ask questions across a variety of fields and declare "not intelligent" as soon as it fails one (or else humans would fail as soon as you asked them to rotate an 8-dimensional object in their head). I do agree that a proper test requires that we dynamically change the questions (so you can't just fit the AI to the test). But I think that, unavoidably, the test is going to boil down to a wishy-washy preponderance-of-evidence kind of thing. Hence everyone has their own vague definition of what "AGI" means to them; honestly, I'm fine with saying we're not there yet, but I'm also fine arguing that ChatGPT already satisfies it.
There are plenty of dynamic, "general", never-before-seen questions you can ask where ChatGPT does just fine! I do it all the time. The cherrypicking I'm referring to is, for example, the "how many Rs in strawberry" question, which is easy for us and hard for LLMs because of how they see tokens (and, also, I think humans are better at subitizing than LLMs). The fact that LLMs often get this wrong is a mark against them, but it's not iron-clad "proof" that they're not generally intelligent. (The channel AI Explained has a "Simple Bench" that I also don't really consider a proper test of AGI, because it's full of questions that are easy if you have embodied experience as a human. LLMs obviously do not.)
In the movie Phenomenon, rapidly listing mammals from A-Z is considered a sign of extreme intelligence. I can't do it without serious thought. ChatGPT does it instantly. In Bizarro ChatGPT world, somebody could write a cherrypicked blog post about how I do not have general intelligence.
What the hell? You most definitely did NOT give any evidence then. Nor in our first argument. I'm not asking so I can nitpick. I would genuinely like to see a somewhat-compact example of a modern LLM failing at code in a way that we both, as programmers, can agree "sucks".
Any particular reason why you're optimistic? What are your priors in regards to AI?
Same as you, I don't pretend to be able to predict an unknowable technological future and am just relying on vibes, so I'm not sure I can satisfy you with a precise answer...? I outlined why I'm so impressed with even current-day AI here. I've been working in AI-adjacent fields for 25 years, and LLMs are truly incomparable in generality to anything that has come before. AGI feels close (depending on your definition), and ASI doesn't, because most of our gains are just coming from scaling compute up (with logarithmic benefits). So it's not an existential threat, it's just a brand new transformative tech, and historically that almost always leads to a richer society.
You don't tend to get negative effects on the tech tree in a 4X game, after all. :)
As far as I can tell, the vast vast VAST majority of it is slop full of repurposed music and lyrics that get by on being offensive rather than clever. Rap artists aren't known for being intelligent, after all. I suspect most "celebrated" rap music would fail a double-blind test against some rando writing parodic lyrics and banging on an audio synthesizer for a few hours. Much like postmodern art, where the janitor can't tell it apart from trash.
There probably are some examples of the genre that I could learn to appreciate (Epic Rap Battles of History comes to mind), but it's hard to find them because of the pomo effect.
Sounds like you have some practical experience here. Yeah, if just iterating doesn't help and a human has to step in to "fix" the output, then at least there'll be some effort required to bring an AI novel to market. But it does feel like detectors (even the good non-Grammarly ones) are the underdogs fighting a doomed battle.
Here. I picked a random easyish task I could test. It's the only prompt I tried, and ChatGPT succeeded zero-shot. (Amusingly, though I used o3, you can see from the thought process that it considered this task too simple to even need to execute the script itself, and it was right.) The code's clear, well-commented, avoids duplication, and handles multiple error cases I didn't mention. A lot of interviewees I've encountered wouldn't do nearly this well - alas, a lot of coders are not good at coding.
Ball's in your court. Tell me what is wrong with my example, or that you can do this yourself in 8 seconds. When you say "like a few lines", is that some nonstandard usage of "few" that goes up to 100?
Even better, show us a (non-proprietary, ofc) example of a task where it just "makes shit up" and provides "syntactically invalid code". With LLMs, you can show your receipts! I'm actually genuinely curious, since I haven't caught ChatGPT hallucinating code since before 4o. I'd love to know under what circumstances it still does.
I did for all three, but it was many years ago, and I think I'd struggle with most IMO problems nowadays. Pretty sure I'm still better at proofs than the frontier CoT models, but for more mechanical applied computations (say, computing an annoying function's derivative) they're a lot better than me at churning through the work without making a dumb mistake. Which isn't that impressive, TBH, because Wolfram Alpha could do that too, a decade ago. But you have to consciously phrase things correctly for WA, whereas LLMs always correctly understand what you're asking (even if they get the answer wrong).
Apologies. I guess the joke was on me!
they frequently don't "perceive themselves" as having the literal knowledge that they're trained on
IMO this is roughly the right way to think about it. LLMs probably don't even have the capability to know what they know; it's just not what they're trained to do. A lot of people confuse the LLM's simulation of a chatbot with the LLM itself, but they're not the same. (e.g. we can't settle the question of whether an LLM is conscious by asking it "are you conscious?". The answer will just depend on what it thinks the chatbot would say.) From the LLM's perspective it's perfectly reasonable to extend a conversation with "the answer is" even when the word after that is undetermined. Hence hallucinations.
(I think RLHF helps a bit with this, allowing it to recognize "hard questions" that it's likely to get wrong, but that's not the same as introspection.)
Maybe I'm missing some brilliant research out there, but my impression is we scientifically understand what "pain" actually is about as well as we understand what "consciousness" actually is. If you run a client app and it tries and fails to contact a server, is that "pain"? If you give an LLM some text that makes very little sense so it outputs gibberish, is it feeling "pain"? Seems like you could potentially draw out a spectrum of frustrated complex systems that includes silly examples like those all the way up to mosquitos, shrimp, octopuses, cattle, pigs, and humans.
It'd be nice if we could figure out a reasonable compromise for how "complex" a brain needs to be before its pain matters. It really seems like shrimp or insects should fall below that line. But it's like abortion limits - you should pick SOME value in the middle somewhere (it's ridiculous to go all the way to the extremes), but that doesn't mean it's the only correct moral choice.
Then I tried it on Day 7 (adjusting the prompt slightly and letting it just use Code Interpreter on its own). It figured out what it was doing wrong on Part 1 and got it on the second try. Then it did proceed to try a bunch of different things (including some diagnostic output!) and spin and fail on Part 2 without ever finding its bug. Still, this is better than your result, and the things it was trying sure look like "debugging" to me. More evidence that it could do better with different prompting and the right environment.
EDIT: Heh, I added a bit more to the transcript, prodding ChatGPT to see if we could debug together. It produced some test cases to try, but failed pretty hilariously at analyzing the test cases manually. It weakens my argument a bit, but it's interesting enough to include anyway.
So, I gave this a bit of a try myself on Day 3, which ChatGPT failed in your test and on Youtube. While I appreciate that you framed this as a scientific experiment with unvarying prompts and strict objective rules, you're handicapping it compared to a human who has more freedom to play around. Given this, I think your conclusions that it can't debug are a bit too strong.
I wanted to give it more of the flexibility of a human programmer solving AoC, so I made it clear up front that it should brainstorm (I used the magic "think step by step" phrase) and iterate, only using me to try to submit solutions to the site. Then I followed its instructions as it tried to solve the tasks. This is subjective and still pretty awkward, and there was confusion over whether it or I should be running the code; I'm sure there's a better way to give it the proper AoC solving experience. But it was good enough for one test. :) I'd call it a partial success: it thought through possible issues and figured out the two things it was doing wrong on Day 3 Part 1, and got the correct answer on the third try (and then got Part 2 with no issues). The failure, though, is that it never seemed to realize it could use the example in the problem statement to help debug its solution (and I didn't tell it).
Anyway, the transcript's here, if you want to see ChatGPT4 troubleshooting its solution. It didn't use debug output, but it did "think" (whatever that means) about possible mistakes it might have made and alter its code to fix those mistakes, eventually getting it right. That sure seems like debugging to me.
Remember, it's actually kind of difficult to pin down GPT4's capabilities. There are two reasons it might not be using debug output like you want: a) it's incapable, or b) you're not prompting it right. LLMs are strange, fickle beasts.
Interesting. I admit ignorance here - I just assumed any UK-based newspaper would be very far to the left. (The video itself still seemed pretty biased to me.) Thanks for the correction.
I'm in the same position; but I suspect I'll end up giving WSL a try instead. (I've used Cygwin for decades.)
In fact, one line of argument for theism is that math is unreasonably useful here.
Um, what? It really is "heads I win, tails you lose" with theism, isn't it? I guarantee no ancient theologian was saying "I sure hope that all of Creation, including our own biology and brains, turns out to be describable by simple mathematical rules; that would REALLY cement my belief in God, unlike all this ineffability nonsense."
Absolutely. And I'm totally being a pedant about a policy I'm in complete agreement with. But this nitpicking is still valuable - if we as a society understand that we're banning torture for very good ideological reasons, then we won't be so tempted to backslide the next time a crisis (like 9/11) arises and people start noticing that (arguably) torture might help us track down more terrorists. Like how some people forget that free speech ideals are important beyond simply making sure that we don't violate the 1st amendment.
Well yeah, I don't disagree with any of it either so I don't really see what your point is?
But ... if you agree there are scenarios where you'd never get a particular piece of information without torture, then I don't understand how you can claim it's "inherently useless"...? I'm confused what we're even arguing about now.
Why should they notice? Institutions do immoral and ineffective things literally all the time for centuries on end. And we're talking about the CIA, the kings of spending money on absolute bullshit that just sounds cool to some dudes in a room, and that's not saying nothing given the competition for that title in USG.
A fair point! I'm never going to argue with "government is incompetent" being an answer. :) But still, agencies using it is evidence that points in the direction of torture being useful - incompetence is just a (very plausible) explanation for why that evidence isn't conclusive.
Me personally? Yes, for all the things you listed. But is that really all that surprising? We're on The Motte. The only one you listed that people here would really find controversial is CP, and while I (of course) agree that creating real CP should be illegal, sharing virtual/generated CP harms nobody and should be allowed. (This is basically the situation we're already in with hentai, which is full of hand-drawn underage porn.)
But if you want issues that do challenge my stance, I'd suggest revenge porn, doxxing or the Right To Be Forgotten. So, you're right that my "free speech maximalism" only goes so far; there's always something in this complex world that doesn't have an easy answer.
More options
Context Copy link