site banner

Culture War Roundup for the week of June 26, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

11
Jump in the discussion.

No email address required.

VisionOS and the Future of Input



Ever since the computer first arrived, keyboard and mouse has been the standard. You have a flat surface with raised little squares that you smack with your fingers. You have another little rounded shape with a flat bottom you move around, and click with.

This awkward, clunky interface has significant culture war elements, in that an entire class of powerful people arose - specifically people who didn't have traditional status markers like height, strength, or indomitable physical presence. Instead these 'nerds' or 'geeks' or whatever you want to call them specialized themselves in the digital realm. Now, the Zuckerburgs and Musks of the prior generation rule the world. Or if they don't, they soon will.

These outdated interfaces seem perfectly normal to everyone who has only used them. Sure many people have used a controller for video games, and may think that controllers are superior for some cases, but not others. Keyboard and mouse is the only way to operate when it comes to a computer, most people surely imagine.

That being said, it's actually quite easy to dip your toes into alternate input methods. Talon is a system that utilizes voice to let you do practically anything on a computer. You can move the mouse, click on any object on your screen, dictate, edit text, and you can even code quite well. Talon's system even supports mapping operations, sometimes very complex ones, to custom noises you record on your own.

On top of that you can integrate eye tracking, using a relatively inexpensive device. If you've ever used voice control combined with eye tracking, you can operate around as fast as someone who is decent at using a keyboard and mouse.

If you have ever used these systems, you probably know that because most digital setups are built for keyboard and mouse, it's not necessarily perfect. Keyboard and mouse still hold the crown.

But. There is a certain magic to controlling a computer through your voice, or your eyes. It begins to open your mind to new possibilities, the idea that there are better, faster, easier, more natural ways of interfacing with a computer than the defaults we have been stuck with.



Enter Apple's VisionOS.

If you haven't seen the recent demo of Apple's new VisionOS they're breaking brand new ground. The entire OS is built around looking at things, and making minute hand motions to control the icons you're looking at. There are no controllers, no physical interfaces whatsoever besides your eyes and your hands. It's breathtaking to watch.

In a review from John Gruber, a well respected old head in the VR space and a creator of markdown, the possibilities behind this new technology are apparent. Gruber describes how

First: the overall technology is extraordinary, and far better than I expected. And like my friend and Dithering co-host Ben Thompson, my expectations were high. Apple exceeded them. Vision Pro and VisionOS feel like they’ve been pulled forward in time from the future. I haven’t had that feeling about a new product since the original iPhone in 2007. There are several aspects of the experience that felt impossible.

Now Apple does tend to get a ton of hype, but this reaction of being amazed by the experience is surprisingly common among earlier reviewers:

Similarly, Apple’s ability to do mixed reality is seriously impressive. At one point in a full VR Avatar demo I raised my hands to gesture at something, and the headset automatically detected my hands and overlaid them on the screen, then noticed I was talking to someone and had them appear as well. Reader, I gasped.

The implications of this 'spatial operating system' are varied and multitudinous, of course. There will be all sorts of productivity gains, and new ways of interacting with the digital world, and fun new apps. However I'm most interested in how this innovation could shift the balance of power back to the strong and physically capable, away from the nerds.

No longer will clunky interfaces make sense - instead computers will be optimized around healthy, fully functional humans. Ideally the most intuitive and common control schemes will reward physical fitness and coordination. Traits which nerds lack in droves.

Will we see a reversal of the popularity that being a nerd or geek has gained in the past few decades? Only time will tell.

I sit on the couch. There's a glass of tea (yes, a glass) to the left; hopefully I won't hit it with my elbow and send into the stone floor again. Not wearing a blindfold certainly helps in this regard. I put laptop on, well, my lap, open The Motte, scroll to the end of your post, think a second, click «reply».

What, if anything, in all of this could have been improved by Vision Pro? Adding a dancing Mickey Mouse (partnership with Disney, wooo!) to the periphery? Fitting the website into a circular window superimposed on the room? Strapping the same laptop's motherboard to my forehead? Replacing touch typing with tiny finger gestures that are picked up by the IR sensor array under my nose?

Actually there are some ideas here. I expect great things to come of augmented reality. I envision a future of uncompromising transparency and sovereignty, with tastefully minimal HUDs and AI digital assistants that stay well out of the way while brutally suppressing incoming noise; athletic young people with 20/20 vision and perfect innocence about «dark patterns» who walk in the sunlight and look with concern and pity at hunchbacked millenials and zoomers squinting into their pocket surveillance devices. This can be done. Contra Strugatsky brothers, we don't need communism to get to see the brightest parts of the Noon.

But as roon convincingly argues, text is the universal interface and it is primarily the inherent power of text, not technical limits of the age, that decided the shape of The Mother of All Demos and the hardware paradigm that we're still living in. Why do you think large language models get almost no benefit from multimodality? Because nothing has more meaningful dimensions than a text string. Tablets and smartphones, this great civilizational achievement of Apple, offer a strictly lesser channel than text – beloved by people who'd never have a clue what to do with CLI. Now, I suppose there are designers and architects, and surgeons and such, and all those colorful applications from WWDC will truly shine. But… rotating a 3D model of a Zahi Hadid-esque building in a teleconference? Is this what the digital era is about? I guess PowerPoint will add some zany VR features soon and they'll be adored by the same type of person who inserted WordArt into business presentations in the 90s, but… really?

This reminds me again of that epiphany I had while watching Alita: Battle Angel, particularly the scene where Alita dodges an aesthetic chain attack (admittedly, under a certain influence that brings out visual elaboration): What waste! The CG artists could have gone so much wilder, added complex patterns of acceleration and inertia and homing; but viewers won't perceive such detail. We're long in the regime where our tools let us depict actions of posthumans, but our merely human brains make that power sterile.

It is a nontrivial undertaking to find a paradigm that in practice does better than an IDE, or CLI, or even the humble chat window – when you're limited by the user on the other side. Almost everything of worth that we do is text and ways to manipulate and chain and condition its blocks on different scales. Skeuomorphic gimmicks, graphs, trees, mindmaps, desks with sticky notes, kanbans – frankly, all either collapses into unwieldy mess while text keeps going, or is as close to vanilla text in spirit as to make no difference and not benefit from new peripherals whatsoever. Many have tried. Yet here we still are. When some of us will get Vision Pros and ability to render arbitrary shapes, they'll still be peering into a rectangular website with an input box and a button to send comment.

I hope people with better imaginations than mine will prove me wrong. I'm pretty fed up with our interfaces, as well as with the human condition in general. But gimmicks and fetishes, exciting and novel as they can be, are no more the answer than frivolous surgery. Another, genuinely superior way has to be found and explored.

Strapping the same laptop's motherboard to my forehead?

This would be amusing at least...

Replacing touch typing with tiny finger gestures that are picked up by the IR sensor array under my nose?

Yes, this is actually incredibly useful. For instance even with a limited interface like Talon, I will map certain phrases or words I use frequently in my job to a keyboard shortcut, or a noise. This mapping means that I save probably ~5 minutes of work per day. Over time if we can map more of these things to even more minute/simple actions, we are looking at serious efficiency gains.

When some of us will get Vision Pros and ability to render arbitrary shapes, they'll still be peering into a rectangular website with an input box and a button to send comment.

I disagree here, it may be a while coming but I do think we're in for a paradigm shift with regards to input.

Yes, this is actually incredibly useful. For instance even with a limited interface like Talon, I will map certain phrases or words I use frequently in my job to a keyboard shortcut, or a noise. This mapping means that I save probably ~5 minutes of work per day. Over time if we can map more of these things to even more minute/simple actions, we are looking at serious efficiency gains.

Not only is this something you can do right now on existing computers, it's much easier to do than with a noise/gesture system where the need for disambiguation makes custom definitions a much harder proposition.

Unless you're the sort of person who already has a bunch of autohotkey scripts for those tasks set up, you sure as hell aren't going to do that in a worse interface.