Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.
No email address required.
Notes -
There is one exception to this: A gpu that can run local AI models. Not LLMs or image generators but the sorts of custom neural networks that have become quite common for eg. photo noise reduction, audio stem separation etc. In those cases even a fairly low end "good gpu" will do fine (such as this ancient NVidia Quadro P2000 Mobile aka Gefore GTX 1050) but the difference between "have" and "have not" is massive.
Interesting, does the software integrating those non-LLM AI functionalities offer to run this in the cloud for you? My experience with both image generation/manipulation and local LLMs has been that its almost always better to run those loads in the cloud - either directly from the big AI labs, through a vendor like openrouter.ai or on a rented GPU like runpod.io.
You can very well do it all locally, but it's a pointless toy with 8GB of VRAM, semi-interesting with 16GB and you're finally cloud-independent with 24 GB of VRAM. And you can get many hours of GPU time on runpod.io for the price you'd have to pay for this much VRAM.
But yeah, I haven't worked with audio models and the things Photoshop can throw at a GPU.
LLMs and other generation AIs are much more HW intensive than domain specific neural networks. 500M - 1B parameters is plenty when the model doesn't have to understand instructions or global context (hell, there are some task specific audio models in wide use that can fit in just a couple of megabytes while performing well). Sure, a more powerful gpu will run the models faster but when you're doing noise reduction on a dozen culled and selected photos it doesn't really matter if it takes 15 seconds or a minute to run the process when on cpu it would take half an hour. Likewise stem separation taking a few minutes is a non-issue when you only need to run it once or at most a few times for a song (such as when remixing or isolating instruments for practising the lines).
There are apps that can run in the cloud but they're (expensive) subscription or credit based and having to pay $30-$50 per month per app gets really expensive. Not to mention they tend to have ridiculous censorship (think photos of people on a beach where you want to remove some distractions and the cloud version complains that your content breaks the terms of use).
OK, all good arguments. Maybe I'll have to take a deep dive on what's possible in open source land on that front (I refuse to touch Adobe/Abelton et al.). Do you use commercial models/implementations or do you have a recommendation where to start if I want to set it up myself?
That's the nice thing about just renting a VM on expensive hardware from runpod. They don't care at all what you do, because you provide all the software yourself.
For photo noise reduction I've used Lightroom and (free but specific to Olympus / OM cameras) OM Workspace. I don't think there are good open source models as the models are camera specific due to different sensor and bayer filter characteristics (and why they can produce really impressive results - check the OM-1 original vs denoised comparison here where the original is far from what I'd consider usable while the Lightroom one is perfectly fine).
For audio stem separation Ultimate Vocal Remove is my tool of choice (you can download the models from the settings page). I start by removing vocals with MDX-Net / Kim Vocal 2, take the residual and remove drums with Demucs and then possibly remove the bass from the residual of that. Be aware that if you just split the stems from the original they will not sum to 100% and thus you want to go the recursive route. I'm sure there are some newer models that you can install manually but I haven't used those as the existing ones work well enough for my purpose (removing distracting vocals or emphasizing instrumental part to better hear what's happening).
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link