You are assuming that "training" and "learning" in ML means exactly the same thing as "training" and "learning" in humans. It doesn't. The processes are completely different, with only an apparent resemblance.
No, learning (by ML) is not learning (by humans). It's just a same word used in different context, and doesn't by itself imply that the meaning is the same. The underlying process is completely different. Neural networks, despite their name, don't share anything in common with human brains.
Do you have any other argument besides "the word is the same"?
The concept is shallowly similar, not nearly the same.
Human learning, at the very least, consists of maintaining internal model of the world around us and integrates all data into a coherent structure. Current neural networks are extremely limited in their capabilities - each model operates on a strict class of data, and is not able to comprehend the world outside of that class of data, and as such cannot be compared to human understanding. CoPilot does not understand how the computer works - it only understands that connections between pieces of text exist.
If you're as informed on how ANNs work as you claim to be, then perhaps you should inform yourself more on how human mind works.
Like posts have said, whether training is fair use is not a matter of opinion, it is a matter of law. You can't use an appeal to your authority to make grand statements like this.
Frankly, I don't care that ML/AI _needs_ this to work. That's not my problem. You don't get to circumvent existing agreements (and law) because you believe that ML learning is the same as a human reading a piece of code and then typing it up on the side. Tesla manages just fine by generating their own training data. Other businesses have found partners to acquire data from. The only reason this isn't being immediately addressed is because there is near-zero accountability for license violations in software companies, and ML further obfuscates that.
A human that watches a Hollywood movie and then goes on to recreate it frame-by-frame with, idk, everyone has cat ears and go "nah this is all my original creation" is an idiot. A human that watches a Hollywood movie and then goes on to create existing works within the genre, with some homage (say, a specific hat, or a specific framing of a pivotal scene, or a specific lighting choice) to the original movie that inspired them, is learning.
I think the law should address multiple things, only one of which is the outputs of learning. For example, if both the copycat human and the original director human both watched the movie via stealing it, that's also bad. Especially because copycat human is then going on to create copies of work they never had legal permission to copy! CoPilot effectively cannot tell the difference between an homage and theft.
That's absurd, it's like saying you want an army of robot slaves.
Now, wanting to minimize the impact of robotic competition on human wellbeing, understandable. But the means to that end is declining to recognize property rights of those who who try to privatize the commons.
I agree that training is fair use. I don't agree that when the model spits out verbatim or near-identical copies of copyright code that somehow the copyright is striped or that the usage of it is somehow fair use.
I believe that the vast majority of code that copilot produces is fine. But we have also seen clear examples of copyright violation.
The biggest problem is that it is basically impossible for the user to tell which is which.
I agree there, and I think most rational people would too.
This whole website though and "investigation" is some sort of sensationalism that seems ultimately aimed at fair use training, which is very unfortunate. It seems like it involves an ego trip and popularity-chasing with targeting a megacorp as a means to that end.
ML training needs to be fair use of copyrighted works, or most machine learning and AI projects will be impossible.