@SnapDragon's banner p

SnapDragon


				

				

				
1 follower   follows 0 users  
joined 2022 October 10 20:44:11 UTC
Verified Email

				

User ID: 1550

SnapDragon


				
				
				

				
1 follower   follows 0 users   joined 2022 October 10 20:44:11 UTC

					

No bio...


					

User ID: 1550

Verified Email

But man, it turns out somebody still has to do the hard work of keeping civilization turning so we can keep the lights on until we can finish the silicon god (or the false idol). Those data centers and nuclear plants won't build themselves. Yet.

I have to admit, I have no idea how society is still supporting itself right now. Almost everyone I know is intentionally not working, including myself (got burned out and retired 6 months ago). When I can make crazy amounts of money in tech, then sit back for the rest of my life and have people deliver DoorDash meals to me ... well, who's doing the actual work holding everything up? Some dedicated cadre of 10x engineers?

Sure wish I had the social skills to implement your gratuitous-sex suggestion, though. Not that I think we're in a true singularity - I expect the world to look pretty different in 10 years, but not an unrecognizable ASI dys/utopia. So maybe put the heroin away for now...

I mean, yeah, he's failing (badly) at using a tool that the rest of us are successfully using. And he thinks that this is some sort of flex. It's not worth engaging with him. We're going to keep getting better at using AI, and the serious programmers who aren't just trying to pwn "AI bros" will figure out its advantages and disadvantages, and successfully integrate it with their work. (Or get replaced, if line keeps going up, but the jury's still out on that.)

There's a fair bit of fudging, to be sure. Things like running your own model with different prompts or parameters, or just training to the test. But there's a limit. Since the actual ARC-AGI-3 test is not published, the only way companies could really "cheat" would be to sniff the data that's being fed into the models by the testers. While technically possible, that's pretty much Theranos-level fraud; I don't really suspect any AI company of doing this.

EDIT: Oops, I should have clicked the link. The 36% result was on the publicly available data, so it's not really an "official" result. For the reasons @sarker said, I still think it's fine, but it's not quite as bulletproof as I thought.

You're assuming he wants a good picture of the capabilities of AI agents. I get the strong impression from the sneering tone of the original post that he wanted to do just enough with a model that he could claim to have pwned the "AI-bros".

Indeed, there's almost nothing scientific about the scoring system of ARC-AGI-3; the test itself is kinda neat, and still highlights something that smart humans do (somewhat) better than the best LLMs, but it's dropped any pretense at being an actual measure of "general intelligence", and frankly they deserve to be ridiculed for the sensationalist scores.

Why is completion speed the main factor? Why is the difference squared? Speedruns are not how we define intelligence. If the squirrel in your backyard can solve sudokus, but a top-10th-percentile-of-self-selected-sudoku-solvers human can do it faster, you don't laugh and say "ha ha, this squirrel is so dumb". Also note that the test cuts the model off if it takes 5x longer than the smart human, and later questions build on earlier ones, so if a model goes slowly once it's handicapped for the rest of the test. (Again, this is probably completely intentional, to help deflate scores further.) They used a majority-of-self-selected-humans-can-solve-this metric for puzzle inclusion but not for the scoring. Why? Pure showmanship.

I suspect that average humans who take the test would probably also get a very low score! The old tests and metrics (including ARC-AGI-2) were useful because they showed something that humans genuinely find easy, but LLMs fail at. Those metrics have almost reached saturation, so I guess now we're switching to puzzles that some humans can solve but LLMs ... uh ... solve a bit slower. Ok?

But hey, the "0.5%" number does help low-information AI skeptics like OP point and laugh, so it's another "win" for AI journalism.

You're not wrong. I just don't think there are many good reasons to truly hate Israel, as it's a democratic country that generally respects human rights, in direct contrast to all of its hostile neighbours. (Like Bill Maher says, "one side is accused of genocide but doesn't do it, the other side actually would love to do it.") You can disagree with its politics, and hold it accountable when it crosses over the line (which it certainly has done occasionally - being constantly at war sucks). But I don't think the anti-Israel posters here are capable of that level of restraint - I've just seen too many barely-filtered rants about how the US is being controlled by their evil Israeli mind-control overlords.

I have my doubts, but you make a good point. A lot of the other emergent capabilities have been quite surprising, so there's no guarantee that this is out of the question, either.

I don't think LLMs can generate meaningful human-like feedback of what it feels like to use the software. They just don't see the UI in the way that humans do. And it's not clear that increasing their capabilities can ever fix this.

Still, I do expect that they'll get better and better at iterating quickly and nondestructively based on your feedback, so while it won't be a fully automated dev cycle, I wouldn't be surprised if bespoke AI software replaces giant professional products eventually.

The FDA, obviously...?

Uh, ok, as long as you weren't unlucky enough to be outside the country, being subject to a mandatory 14 day quarantine when you return. And the border was closed to non-citizens for 20 months. Australia overreacted more than almost any other country in the world.

Regarding challenge trials, 1Day Sooner came into being as a result of our clear failure here. COVID was a ridiculously good candidate for challenge trials: a disease that spreads quickly, so every day matters, and which is dangerous to one segment of the population but relatively harmless to everyone else. Our global failure here doesn't speak well for our prospects if a genuinely dangerous plague comes along. (Imagine if the disease had a 30% fatality rate to everyone. Challenge trials would be even more important, and a lot harder to justify ethically.)

I guess the most optimistic take is that if a real threat to society comes along (i.e. a plague which doesn't mostly just replace the "cause of death" for unhealthy seniors), we might actually be spurred to take appropriate measures. It's "only" the threat of creeping totalitarianism which we utterly failed at, enthusiastically cheering on lockdowns and unpersoning anybody who said "uh, wait a minute".

I guess it's entirely possible that was part of the reason! I actually kinda liked Paimon's old voice, but I know I'm in the minority. The new actress is doing quite a good job - the voice isn't too different, but definitely less grating.

Can't speak for the public in general, but that is absolutely why I got into Genshin Impact. (Which is Chinese, so technically not anime, but it's absolutely free of woke BS and performative virtue signaling.) It's so refreshing playing a game (or watching a show or movie) where I don't immediately know who the bad guys are because they're white and male. And where girls are allowed to look sexy, and heterosexual relationships are allowed to exist.

Interestingly, the English voice actress for Paimon (who is the most important character in the game, basically voicing 50% of the lines) actually was a woke lunatic, complete with performative "neurodivergence", an online persecution complex, and claims of being "non-binary". Finally HoYoverse had enough, and 4.5 years into the game's release, they actually replaced her with a proper professional actress. I couldn't imagine a Western studio doing that - if anything, they'd applaud her "bravery" and try to get her on staff permanently to fill out their quotas.

Yeah, we have the right to a trial for a reason. It's kind of stupid to cancel people in the court of public opinion for dubious "unreported" crimes that are decades old. But hey, at least Chavez is dead and doesn't care any more, unlike when that happened to Kavanagh.

I think there are likely a bunch of us that are just casually in favour or on the fence. (While I'm worried about the fallout, I am always going to tilt in the direction of good old Team America deposing dictators.) Probably not too many people who are rabidly gung-ho about the whole thing and willing to argue it extensively, so they're going to lose out in wordcount to our local antisemites who will take any excuse to post multipage slop essays about the joooooooz. It's a shame, but the ideals of free speech do require a little bit of sacrifice.

Sadly, I know most of my online acquaintances would burn me in effigy if I was ever honest about my trans thoughtcrimes. But maybe I'm overestimating how common site-wide permabans are on Reddit.

My account's still going at just shy of 15 years. The only place I'd ever expressed any political views were the SSC and Motte reddits. Anywhere else, exposing my preference to believe things that are true rather than politically convenient would have definitely gotten me banned.

That said, I don't think kids should be given unfettered Internet access. I know what can happen: I was there, and the Internet was in many ways a less scary place back then.

Hmm, I'd argue the opposite. Sure, there's more bad stuff out there, but there's more ANY stuff out there, and the bad stuff is a much smaller percentage and guarded by things like "safe search" and browser/site warnings, so it's harder to stumble across inadvertently. In the old days it was trivial to just get trolled by somebody and end up at goatse or lemonparty or the Anarchist's Cookbook, and that was just the common stuff. I stumbled across hentai """porn""" that I'd shudder to even describe - I honestly don't think I would even know how to find stuff that fucked up, nowadays. It might not even exist outside of an Onion link.

Point taken. We just need to hurry up and invent that Epstein Drive!

Cool, I learned something from this. I didn't realize nuclear rockets couldn't be used for the early stages. Thanks. I think you're wrong about them being the most efficient engines extant today - ion engines still have much higher specific impulse, but are only viable in space. And you're still sidestepping the point that upper-stage nuclear rockets (the original topic) and large nuclear payloads are completely separate issues.

Hmm, I think you're talking about two different things. One is the launch, from Earth, of a nuclear-powered rocket (e.g. NERVA). Even if it contains hundreds of kilos of uranium, it's a lot fairer to compare that to an A-bomb like Little Boy (64kg) rather than just the primer of an H-bomb. And, like you said, in an accident a lot less of it is going to vaporize than it would in a proper nuclear bomb.

But I wasn't talking about the payload at all. I guess you're thinking that you'd want to lift 100 tons of U-235 to orbit for space-based nuclear rockets? I agree that's a different kind of risk. And I'm not even sure how valuable nuclear rockets would be for long space trips (there are lots of options once you're up there).

Fissile material is extremely valuable per unit mass, and we're never going to run out of it on Earth, so you wouldn't save much by farming it out of the gravity well. What WOULD be valuable is mining and smelting a large amount of metal or rock that you can use to build large structures in space.

Er, yes, that's what "fallout" means. You missed @RandomRanger's point. One rocket's worth of nuclear material in the atmosphere is barely a blip. Note that even a normal rocket is chock-full of toxic chemicals, which is why we don't launch near population centers. Most normies tend to be off by many orders of magnitude when they intuit how dangerous "nucular" things are.

Hey, I'm not a biologist, and you might be right (...although I don't know why you listed "process" and "skillset" as not being knowledge-based?). But are you willing to bet civilization on it? The stakes are pretty high here, so I think it's fair to raise the burden of proof that "this is actually hard" beyond the normal level of an Internet argument.

Note that entire nations have tried and failed to create nuclear weapons for 80 years, which is good evidence that it's genuinely hard. Meanwhile, it's conceivable (if not proven) that a worldwide pandemic spread inadvertently from a small biolab in Wuhan. The two levels of effort are orders of magnitude apart.

Good lord! Their devious perfidy goes back all the way to moving "Zeta" from the 6th position of the Greek alphabet to the last of the Latin! Now THAT'S true control of the media!!