site banner

Culture War Roundup for the week of April 10, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

14
Jump in the discussion.

No email address required.

~576MP streaming at 30FPS with a FOV of 120 degrees

This is not quite right. Eyes have a huge overall FOV, but the actual resolution of vision is a function of proximity to foveation angle, and there's only maybe a 5° cone of high-resolution visual acuity with the kind of detail being described. Just taking the proposed 120° cone and reducing it to 5° is more than a 99% reduction in equivalent megapixels required. And the falloff of visual acuity into peripheral vision is substantial. My napkin math with a second-order polynomial reduction in resolution as a function of horizontal viewing angle puts the actual requirements for megapixel-equivalent human-like visual "resolution" at maybe a tenth of the number derived by Clark. None of that is really helpful to understanding how to design a camera that beats the human eye at self-driving vision tasks though, because semiconductor process constraints make it extremely challenging to do anything other than homogenously spaced CCDs anyway.

On top of that, the "30FPS" discussion is mostly misguided, and I don't actually see that number anywhere in the text; I only see a suggestion that as the eye traverses the visual field, the traversal motion (Microsaccades? Deep FOV scan? No further clarity provided) fills in additional visual details. This sounds sort of like taking multiple rapid-fire images and post-processing them together into a higher-resolution version, something commercial cell phone cameras have done for a decade now. This part could also be an allusion to the brain backfilling off-focus visual details from memory. It's unclear what was meant.

especially if you expect to catch up with the 14 stop DR, which might not even be possible with current sensors.

This is already a solved problem, and has been for at least five years. Note that in five years, we've added 20dB dynamic range, 30dB scene dynamic range, bumped up the resolution by >6x (technically more like 4x at same framerate, but 60FPS was overkill anyway), and all that in a module cost that I can't explicitly disclose but I can guarantee you handily beats any LIDAR pricing outside of Wei Wang's Back Alley Shenzhen Specials. And it could still come down by a factor of 2 in the next few years, provided there's enough volume!

In any case, remember that the bet isn't beating the human eye at being a human eye, it's beating the human eye at being the cheap, ready-today vision apparatus for a vehicle. The whole exercise of comparing human eye performance to camera performance is, and has always been, an armchair philosopher debate. It turns out you don't need all the amazing features of the human visual system for the task of driving, this is sufficient but not necessary for a solution to the problem. You need a decent performance, volume-scalable, low-cost imaging apparatus strapped to a massive amount of decent performance, volume-scalable, low(ish)-cost post-processing hardware. It's a pretty safe bet that you can bring compute costs down over time, or increase your computational efficiency within the allocated budget over time. It's also a decent bet that the smartphone industry, with annual camera volumes in the hundreds of millions, is going to drive a lot of that camera manufacturing innovation you need, bringing the cost down to tens of dollars or better. Most of the image sensors are already integrating as much of the DSP on-die as possible, in a bid to free up the post-processing hardware to do more useful stuff, and that approach has a lot of room to grow in the wake of advanced packaging and multi-die assembly innovations in the last ten years. All the same major advances could eventually arrive for LIDAR, but it certainly didn't look that way in 2012, and even now in 2023 it still costs me a thousand bucks to kit out an automotive LIDAR because of all the highly specialized electromechanical structures and mounting hardware, money I could be using to buy a half-dozen high-quality camera modules per car...

As far as reaction time, real-time image classification fell to sub-frame processing time years ago, thanks in part to some crazy chonker GPUs available in the last few years. There's a dozen schemes for doing this on video, many in real-time. The real trouble now is chasing down the infinitely long tail of ways for any piece of the automotive vision sensing and processing pipeline to get confused, and weighing the software development and control loop time cost of straying from the computational happy path to deal with whatever they find.

This is also why I think Tesla's software just sucks. It's not the camera hardware that's the problem any more, and the camera hardware is still getting better. There's just no way not to suck when the competition is literally a trillion-dollar gigalith of the AI industry that optimized for avoiding bad PR and waited an extra four years to release a San Francisco-only taxi service. Maybe if Google was willing to stomach a hundred angry hit pieces every time a Waymo ran into a wall with the word "tunnel" spray-painted on it, we'd have three million Waymos worldwide to usher in a driverless future early. I doubt Amazon has any such inhibitions, so I guess we'll find out soon just how much LIDAR helps cover for bad software.