site banner

Small-Scale Question Sunday for November 20, 2022

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

2
Jump in the discussion.

No email address required.

Unless you're regularly running CPU-heavy applications, there's no reason to upgrade unless you're actively seeing detrimental performance. A few years ago my MB crapped out and I decided to upgrade my CPU and RAM because it was Black Friday and everywhere was running deals. So I upgraded to a six0core Ryzen with hyperthreading and saw fuck all of a performance increase except certain processes (like converting large files between formats) ran faster. Interestingly enough, it was useless for the one area where I did need increased CPU performance; at the time my job required me (or at least it was easier) to OCR 1000+ page documents, which took a substantial amount of time on my work laptop and locked me out of doing any other work since I needed Acrobat to accomplish pretty much anything. The new CPU certainly made the process a little faster, but Adobe doesn't support hyperthreading so it was still running one page at a time (albeit at faster speed) rather than the 12 pages at a time it could theoretically handle. I was super pissed that a pro-grade product that costs a decent amount of money didn't have such an essential feature. The punchline is that even in cases where you would see a difference software limitations may prevent you from seeing it. Like you, I'm not much of a gamer so I have no idea how it will effect that end of things, but for most everyday tasks you should be fine with what you have unless you're performance is lagging.

Makes sense. I do expect that everyday tasks will see virtually no difference, and that the upside comes from just a couple of CPU-heavy apps. But as you note, once you do have those use cases, it does feel a bit magical to just cut down the processing type by 30% from one day to the next (and hundreds of dollars later).

Oh, good chance to ask, how good is acrobat OCR? I've been using the one built into Google drive, but it's not possible to batch it.

It's pretty good but it's time consuming for larger files. To provide some context, I was doing legal work for oil and gas and I had to determine if certain assignments pertained to certain leases (an assignment is when one company conveys lease rights to another; I'll include things like mortgages and financing statements in this category). They often do this in large documents conveying several thousands of interests at one time. It can be incredibly time consuming to do this by simply reading the document, especially since most of them are ordered by some kind of internal lease number rather than alphabetically or geographically or by some other parameter that I have access to. It gets even worse when they're conveying different interests for different leases and there are several exhibits to go through. After OCR I'd usually search by lessor name first. If I found what I was looking for, great, if not, I'd try parcel number, and if that failed, I'd search by the recording information for the original lease. These latter two parameters were kind of dicey because the information is often laid out in a table and the OCR occasionally has trouble determining where the line breaks are. With a name you at least have the security of knowing that the first few letters will be consecutive without a line break. If I got to this point and didn't find anything then I figured I could safely assume that the document didn't apply to the lease I was concerned about, unless, of course, there was some kind of blanket language, but that's usually easy to find. It wasn't 100% accurate, though, because there were some cases where I knew that what I was looking for was in there but it wasn't coming up because of a typo, or bad scanning, too-small printing, etc. at which point I'd have to search the whole document manually. My superiors didn't like relying on OCR because of this, but in my experience mindlessly scanning page after page was more likely to lead to an error than the OCR was. The advice I'd give to the client relied pretty heavily on the applicability of certain of these documents, so I'd say that it's probably good enough for whatever you plan on using it for, assuming that it isn't an application that could get you fired or cause some other kind of serious problem.

I never had to batch scan so I can't comment on how well this works. One final caution I'd give is that OCR info causes the file sizes to balloon considerably. The firm I worked at required us to eliminate all exhibit pages from these documents except the ones that were directly applicable to prevent the already-large size of the client's product to balloon to unmanageable levels and take up too much room on our cloud storage. This was followed by a prohibition on including OCR'd stuff in our final client PDFs for the same reason, as we saved copies of all our work and it was taking up entirely too much space. It wasn't uncommon for one of these large documents to take up in excess of 300 megs due to all the additional OCR data. So if you plan on saving all of these PDFs locally, it's something to be aware of.

Wow, thanks for the review. If you trusted it with that, it must be more than good enough for the stuff I was doing (casually browsing through old French books)