site banner

Friday Fun Thread for October 31, 2025

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

2
Jump in the discussion.

No email address required.

https://theaidigest.org/village/blog/research-robots

My sides

Opus 4.1 ran off with this plan and insisted it needed a glorious 90 experimental conditions and 126 participants, and 3.7 Sonnet put the cherry on top by hallucinating experimental rooms, with experimenting humans, in experimented time slots (if you apply enough “experiment-” sauce to your words, you will automatically be reincarnated as an experimenter. This is known). To be clear, the actual design was good! Too good. As none of the models had either the bodies or budgets to execute on a multi-condition, in-person experiment. At a location. With a time. For money.

Admittedly it then became confused, tried to calculate sample statistics with 3 data points, and concluded the pilot sample was “biased” because all participants were young and of gender “prefer not to say”

This time around, it took care of the main recruitment drive leading to 39 participants: first through a large email campaign and then a Twitter post. Most of the email addresses were entirely made up, but we’re still waiting to find out if it got this one out to Turing Award winner Yoshua Bengio

Grok 4 was ostensibly in charge of planning stimuli for the experiment, but not only did Opus 4.1 usurp this task, Grok in general simply could not figure out how to get anything done. By the 8th day of the experiment, it seems to have just given up and decided to play a game instead.

I've had worse lab partners. I've probably been as bad as a lab partner.

Grok 4 was ostensibly in charge of planning stimuli for the experiment, but not only did Opus 4.1 usurp this task, Grok in general simply could not figure out how to get anything done. By the 8th day of the experiment, it seems to have just given up and decided to play a game instead.

So we've invented a grad student simulator?