At the current rate of progress maybe it'll be months. I did a song for my company just to have fun around an event and the quality of the song was astounding, better than this one I think also using v3. I didn't even write the lyrics, I just described what I want in the prompt. Also I used one of poems I wrote as a teenager (not a lot of them) to create a song... the AI would notice the emphasis in the feelings of the lyrics, exactly in the same way I felt them, so it made pauses and chorus accordingly.
It reminds me of CGI in the 90's. Everyone was so astounded over the quality of jurassic park and toy story, the rate of improvements was mindboggling. Predictions were made that in a few years, actors and movie studios would be obsolete. But in the meantime, audiences started to notice artifacts that were not apparent to the untrained eye - CGI was actually ugly! And here we are 30 years later - actors are still on sets.
but the sets of today aren't the sets of the 90's. sets were giant green screens for a second, now they're giant TV's. they're still on set, sure, but things have changed.
I had a similar reaction. We can still detect AI-generated images after years of progress; I think it's hasty to say they'll be indistinguishable any time soon.
Thing is, they don't have to be indistinguishable. They just have to be "good enough" to be a major disruption to that market. And it's not a particularly high bar.
Sure, just as tv shows use obvious CGI for animals, explosions etc - it's cheaper and more practical. Particularly background music that needs to be licensed for ads and such is very likely to be disrupted in the next few years.
Pretty much all mainstream songs are of the same tempo (or slight variations), the same beat, the same scale, the same accents etc. I'm not surprised at all it can be regurgitated by an generative model.
What I would like to see is what happens if you ask it to do in 5/8, 5/4 or any of the 7/* variants and see how that goes (I have no idea, might actually work). There are some examples in common music but not a lot especially for the more obscure ones. Unless they also trained it on a lot of international folk songs.