site banner

Friday Fun Thread for June 26, 2026

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

2
Jump in the discussion.

No email address required.

A little over a year ago I posted about ChatGPT's voice mode, which was, at the time, not much more than text-to-speech grafted onto the chat interface. A few weeks later I posted about a startup doing some impressive language processing, but its capabilities were limited to a 5-minute demo call and some snippets they were showcasing on their website. A few weeks ago I found out that Gemini rolled out a voice mode, which leans on the power of the Gemini LLM combined with a very decent voice synthesis model. At time of writing it is freely available in the Gemini app.

I haven't done much more than a few minute-long conversations but I foresee myself using it more in the future. Would be an excellent companion for long road trips. The only things holding me back are the privacy implications, as Google is now going to have a pretty decent collection of my recent voice data. I asked it how it does voice synthesis/transcription, and it sounds like they do all the processing off-device (i.e. they take your microphone audio, analyze/transcribe it in the cloud, and generate an audio response back) rather than using on-device text-to-speech.

You guys should try it out and report back.