site banner

Small-Scale Question Sunday for November 20, 2022

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

2
Jump in the discussion.

No email address required.

About the lifetime of BSOD, I think I've mentally resigned to suffering from monthly strokes because basically every PC I've ever owned has suffered from it. They ranged in manufacturers: Dell, Lenovo, HP, Asus, they're laptops and desktops, they ran various versions of Windows. I am a very respectable average user, I swear. I don't subject my machines to harsh physical conditions, never spill anything on them, don't live in filth where dust covers everything, don't live with electrical surges, don't have little cousins borrowing it, don't mine crypto, don't pirate or visit sketchy sites with viruses, don't open phishing emails, don't leave it on 24/7, don't unplug USBs until I'm told it's safe to do so, I update fairly frequently, etc. The machines I buy new directly from manufacturers or Amazon/Best Buy. Anyways, you get the point. And yet I've literally never owned a single fully stable machine.

There's probably some common factor, although we can only guess at what it is. Whenever I've seen a machine behave like that it's been some combination of

  • Installed in a shed with no climate control and free access to outside air.

  • Over a decade old (chips and capacitors do degrade).

  • Manufactured during the early 2000s "capacitor plague" (rumor says one capacitor maker tried to steal a formula from another and didn't get it quite right).

  • Fixed by spraying contact cleaner in the memory slots and re-seating.

  • Showing messages in the log that match a common complaint on the bugtracker for the Linux kernel or graphics driver, and the problem goes away when that bug is reported fixed.

  • My own damn fault for overclocking/undervolting something.

Things that might be different between us:

  • We have different electrical grids.

  • We have different levels of background radiation. (EPA says gamma cross count rate in my location is ~3000/min.)

  • Almost none of my machines run Windows (only the one in the shed). But people on the internet say Windows BSoD-ing all the time is supposed to be a thing of the past.

  • All of my machines are either home-built or business grade.

  • I run one pass of memtest86 whenever I get a new machine or replace RAM. Only time this found something though, was when I was buying dodgy RAM from eBay.

If your electrical supply is spotty, you might be able to fix it with an uninterruptible power supply that has the "AVR" (automatic voltage regulation) feature. Unfortunately they're kind of expensive and the batteries usually have to be replaced every few years.

I will share one suspicion I've had about the cause of the BSODs, in case it provides any obvious clues to you as to what's the main culprit. I use a browser plugin called video speed controller to speed up all kinds of media that are too slowly paced. I think my freezes have semi-frequently coincided with when playing a video at higher speeds (say, maybe 2.5x or even 3x). Do you suspect that to be a RAM-related issue?

Playing back video at high speed is obviously a heavier load than 1x, but it could be any of CPU, RAM, power supply, or even the graphics card, assuming your browser uses hardware video decode (probably does).

The first thing you might try is to see if you can reliably reproduce the problem by cranking the video playback speed to the moon. I use a similar extension, "Enhancer for YouTube", which has no upper speed limit AFAICT. Use youtube's "stats for nerds" to detect dropped frames, which means you have reached the limits of your computer (or internet connection). This probably works best with a short video that you can re-play without having to re-download.

If you can reproduce the problem, you have a very good "my computer crashes when I do this" story to tell the warranty people.

If that didn't work, to try to differentiate between causes and maybe find a better reproducer, I would suggest...

First, install hwinfo64. This will show you a bunch of things, but the important ones are the Windows hardware error log counts and the CPU temperature and package power. Here's an example of it in use.

Then download prime95. Run the "small FFT" test for at least an hour. If your computer crashes, any of the threads crash, any of the self tests fail, or hwinfo shows any errors in the Windows log, it is probably a CPU or power supply problem. If the CPU package power is not near or above 125W while the all-thread test is running, and the CPU temperature is at or very close to 100°C, it's a cooling problem (heatsink detached in shipping?). If "small FFT" doesn't find anything, you might try blend. Keep in mind "CPU problems" are likely to be "motherboard power delivery to the CPU" problems, so replacing the CPU might not fix it.

For the graphics card, you can use any of the unigine benchmarks. Superposition is the most similar to modern AAA games, but also a large download. You have a monster graphics card with a much higher peak power draw than the CPU, but if you only play games like Rimworld and SC2 with vsync on, it's probably not being pushed close to full power. Unigine will do that. Unfortunately, I don't know any GPU tests that check their own results and are easy to run. But if it crashes, that's a fail obviously.

For the memory, memtest86+ is probably easiest. There are better tests that the overclockers use, which you can find here.

To really put the hurt on your power supply and cooling, you can run 7 threads of prime95 and unigine at the same time. This will draw more power than pretty much any real workload other than folding@home, crypto mining, or things involving custom job schedulers, but a proper computer should be able to take it.

Unfortunately, there's no guarantee that you will be able to identify the problem. But the good news is that you only need to find a reproducer, to use as ammunition against the customer service line. That's part of what you're paying for when you buy OEM computers and replace them before the warranty runs out.

I love your optimism. I can tell you that none of the machines I've owned lasted 7+ years. It's not that they always become inoperable at that point, but that they seem obsolete by the 5 year mark at the latest. I don't mean to sound like a snob. It's just that a computer is what I interact with the most both professionally and leisurely, so I think it's worthwhile to invest good money in it. Like, if I drove 8 hours a day for work and for fun, you bet I wouldn't be trying to extract every last bit of value until it qualifies for cash for clunkers. Plus, I really don't think it's that wasteful; people replace their thousand dollar smart phones every 2-3 years, so going all the way to 7 years for a $1700 computer seems comparatively overly conservative.

No doubt. But security update stoppage and battery degradation are big drivers of phone replacements, and neither is a problem for desktop computers. I am using a $200 phone, a CPU launched in 2013, and a graphics card from 2016, and they do what I need them to do.

Thanks, I definitely plan to run your recommendations the next time the PC crashes for no apparent reason. Until then, there is still hopium that somehow the problem goes away all by itself...

No doubt. But security update stoppage and battery degradation are big drivers of phone replacements, and neither is a problem for desktop computers. I am using a $200 phone, a CPU launched in 2013, and a graphics card from 2016, and they do what I need them to do.

Different usage levels and/or preferences, I suppose. What you describe sounds a bit too ascetic for the vast majority of people, at least those in middle class. Unless you never dine out or order delivery, food has gotten so expensive that $200 lasts like two restaurant dinners for two in a big city, at which point I'd much rather skip those two dinners and save toward say a $400 rather than $200 phone or upgrade to a $200 CPU from this year, and either would deliver much more utility.