site banner

Friday Fun Thread for February 17, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

4
Jump in the discussion.

No email address required.

Why does there not seem to be much progress with AI in the field of music? I am loving all the AI art and chatGPT stuff coming out but I'm really looking forward to AI implementations in music that are interesting. As a layperson it seems like it shouldn't be too difficult, as music is really rather formulaic at its most basic form.

Most of the AI music stuff I've seen so far has been music generated completely from scratch by AI but that isn't very interesting to me. How far away are we from convincing style transfer for example? I'd love to be able to convert a song into another style of music.

A few years ago I found this soundcloud account but the music is pretty garbled and prone to overfitting. Just seems like miles away from AI in other domains so I'm wondering if anyone has any insight into why AI music seems to be lagging behind AI in visual art and chat.

One theory I've seen bandied about is that the recording industry has a long and storied history of being very trigger-happy with lawsuits about their intellectual property, which led to AI devs being a bit more hesitant about training models on professionally published music. This is in contrast with Stable Diffusion, ChatGPT, and GitHub Copilot, which were all trained on publicly published but copyright protected images/text/code as well as public domain works. I don't know how much of an impact this actually had, but I imagine it's something at least on the back of the minds of devs. That said, there's no shortage of public domain music out there, and I wouldn't expect an AI trained only on classical music to be particularly bad - just limited.

Also, perhaps data size is an issue. A typical 3 minute song, even compressed, is far bigger in filesize than a typical image or text file. Sheet music could be used, though I'm also not sure how easy and cheap it is to get huge databases of sheet music. Scraping the internet for songs is also likely a bit more complicated than doing so for images and text, since generally music tends to be streamed by services which try intentionally to make it annoying to download the actual files, versus images and text which are trivial to right click + save.

There was, a couple months ago, some Stable Diffusion enthusiasts who developed what they called Riffusion which made use of the Stable Diffusion architecture to generate music. It involved training a model on image representations of sound waves, making the model generate new images of them and converting them back to sound. They got surprisingly decent results but the state of the tech then didn't seem like it could be used for much more than a toy, due in part to how short each output was. There are obviously workarounds to such limitations, but I don't know how far the development has come since then on actually bringing those workarounds to reality. Given that using Stable Diffusion to generate music is a hack, I'm not sure it'd be particularly worth it for devs to keep following that thread, but it's still a really clever and fun application of the tech.

Google's MusicLM is the most advanced model that have been publicly demonstrated as far as I know.