ChickenOverlord
No bio...
User ID: 218
Gonna cost you a foreskin as down payment
Claude Sonnet 4.6 is the latest model though
Sonnet has thinking
A team member did a full matrix test on models implementing solutions to multiple problems and then evaluated all implementations with said models. In the experiment, 5.4 was the undefeated and universal victor: 5.4 and 4.6 always preferred 5.4’s solutions.
Did he evaluate any of them himself?
I'm guessing it's a glitch, I was able to complete all 7 levels
I used the Cline plugin for JetBrains Rider, told it the path to the class, and asked it to extract out the duplicated logic throughout the class. So I could try again tomorrow with what you suggest (if that's even possible via the Cline plugin with Sonnet) and see if I get better results.
But my favorite part was when the AI assured me that its changes would so what I asked for without breaking existing fundtionality.
That said, the best and worst part if this is that it is pushing my incompetent Indian coworkers to use VS Code instead of VS because there is no Cline support in full fat VS and Cline is what the higher ups are mandating (I have a personal JetBrains subscription so that's what I'm using). .NET support in VS Code, especially for the many .NET 4.x projects we have kicking around (especially our WCF trash fires), is very lacking so I foresee my Indian coworkers' already pitiful productivity plummeting even further. I also forsee many requests for assistance coming to me and my Russian coworker (unsurprisingly he's the only other competent dev I work with regularly).
I work with a ton of Indians at my day job (about 80% in Mumbai, 20% in the US) and I make a point of looking people's surnames up since it's a decent (if imperfect) indicator of caste. The more competent of my coworkers (though this is an obscenely low bar, they're almost all awful) definitely tend to have last names correlating with higher castes.
Sonnet 4.6 is actually newer than the latest version of Opus, it came out a week or two later. So no, I didn't contradict myself. And Sonnet's training cutoff is about 6 months later than Opus's.
For the first puzzle, you need to change the symbol to match the one at the "exit" of the level, then move to the exit. The first level only has a way to change the rotation, later levels let you change the color and the symbol itself.
actually using AI detectors in reverse to make sure that PRs are sufficiently crammed full of AI code
Holy hell, hoping my employer doesn't try this. They're already tracking our AI usage pretty thoroughly.
Many of the puzzles have a limited amount of steps in which they can be completed. The puzzle it loads by default, for example, has an energy bar that is depleted with each move you make. You have to be able to actually reason about the rules of the game and the objects within the levels to be able to complete them at all, you only have maybe a 10% buffer of energy for mistakes and/or choosing a less efficient route/method to solve them.
Edit: Also, "humans" means at least 6 out of 10 volunteers must have been able to solve some puzzle for it to be included, not 10 out of 10. Thus the test, even ignoring the previous caveat, is comparing the AI not against humans but against at least 60th percentile of humans. bottom of page 15
Correct, that's why I suggested people in the 100-110 iq range (roughly) or higher would likely be able to solve them.
All good, it's not worth trying to keep up with the latest skibidi Ohio rizz grimace shakes.
I haven't posted too much about AI on here, largely because my own personal experiences with using it have been boring and underwhelming. Generating offensive memes (9/11 gender reveal, racial stereotypes, etc.) is my most positive interaction with AI. And partly because I find the pro-AI "AGI is just around the corner bro!" crowd obnoxious as hell, and I find that most discussions about it depend on accepting certain massive assumptions about what we actually do (and don't) know about the nature of intelligence, consciousness, the human brain, etc. For the purposes of making my biases clear up front: Personally, I'm religious and believe in the existence of a human spirit/soul, so I'm already strongly biased against claims that consciousness is an emergent property of sufficiently advanced systems or any arguments along those lines.
Regardless, a few developments have happened recently that have motivated me enough to actually make a top-level post about this. The first being my (employer-mandated) use of Claude to generate code. "You're not using the latest model, just one more model and we'll reach AGI"-bros officially in shambles after this one. I have an HTTP API client library I wrote a few years ago for interacting with a 3rd party API. There's a good amount of duplicate logic throughout for things like setting up and making the requests, caching, etc. I asked Claude to look over the code and extract out the duplicate logic into a single implementation Here's how it messed up just the authentication part of it:
- It didn't notice that there are two different ways to authenticate
- It didn't notice that one of those two methods requires two separate calls to the API
- It didn't notice the calls for refreshing the auth tokens
- It didn't notice the caching logic for the tokens, so it would have authenticated every time, meaning we would have hit rate limits on the API super fast
This was with the latest version of Claude Sonnet. We don't have access to the latest version of Opus, but I'm sure an AI-bro on here would insist that Opus would totally get this right. Regardless, it failed spectacularly at what would be an easy (but tedious) task for a mid-level developer and above (or a sufficiently talented junior).
The second happening is the ARC prize people releasing version 3 of their AGI test suite, a series of puzzle games. They released it within a few hours of Jensen Huang saying he thinks the latest and greatest models are capable of AGI. Humans were capable of solving 100% of the puzzles. The highest scoring AI couldn't complete more that 0.5%.
I'm willing to bet future models will do st least somewhat better on this, but only because I'm maximally cynical and I fully expect these puzzles to be included in the training set for future SOTA models.
I tried several of the puzzles myself, and none of them are terribly difficult. I'd estimate that anyone in the 100-110 IQ range or higher would be able to solve most or all of them. This development has further reinforced my belief that LLMs are basically just really advanced statistical regression models on crack, but nothing approaching what we would consider actual intelligence or conscious thought (and this is before we get into Chinese Room style criticisms of them).
In any case, I'm curious to see what you all think of these. Even the AI-bros I've been speaking about condescendingly throughout this post. If anything, I'm actually most curious about and interested in the AI-bros responses, I'd love to hear yoyr thoughts.
Here are the AGI puzzles for anyone interested in trying them out: https://arcprize.org/arc-agi/3
If anything, the most "anti-Indian" poster we ever had on here (though I haven't seen him here for a few months) was that actually Indian castist guy who seemed to think that most of modern India's problems could be fixed by getting rid of special treatment for the lower castes and making Brahmins truly the top dogs again. I think his name was something like MrVanillaSky? Was certainly an interesting perspective to see. But I saw him get modded a couple times, he definitely didn't seem to get special treatment.
And when everyone dogpiled on self_made_human for using AI to slopify his posts I didn't see any mods rushing to rescue him even though he's both Indian and a mod himself.
"Short king" is modern slang for short guys and doesn't refer to actual monarchs.
I love run-on sentences with tons of parentheticals, asides, etc., and I've found that that makes my writing less likely to get pegged as AI slop. So join the bad grammar gang (or maybe start writing like Cormac McCarthy) and you won't have to worry about it.
I've never understood this, updoots and downdoots have always been retarded to me, but maybe that's my 4chan background talking. If anything, downvotes are a sign I pissed off the right people, but w/e.
I mean, Cuba has had decades to wreck itself before the US decided to embargo anyone trading with them just this year.
their crime rates are not actually higher than rural southern whites
Citation very much needed
I'm very much a member of group 4, and have stated my views in this regard multiple times here. If Israel decided to turn Gaza into a parking lot tomorrow, my feelings on the matter wiuld be something to the effect of "It's too bad they couldn't work it out peacefully. Oh well, not my problem." I'm even fine with selling Israel the weapons to do the parking lot making with, I'm just tired of them getting them for free with my tax money.
Sure, but there were alternatives to Apartheid other than giving the black population full rights immediately. Rhodesia had educational and other requirements for voting rights, but no racial limitations, and didn't create special representatives in parliament for the black population until Britain pressured them into doing so as a precondition for independence.
The original proposal for Apartheid in South Africa involved giving black populations their own countries (on admittedly shitty land) and allowing them to come to South Africa on worker's permits/visas with limited rights. The first part of the program was never implemented, only the second half, making black South Africans effectively second class citizens in their own country.
Given that Gazans and West Bankers have their own countries (issues with international recognition, Jewish settlers, and Israeli military occupation aside) the comparison between the status of Palestinians and the status of black South Africans has always struck me as extremely disingenuous. Especially when you consider the large amount of Arab Israelis that have full rights under the law in Israel.
Taiwan and mainland China are the closest comparison I can think of, and they aren't a close comparison at all.
Edit: This wasn't intended to be an offer to DM it to everyone who asks.
DM me please too
- Prev
- Next

About 20 duplications
More options
Context Copy link