Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Totally disagree. Training is fair use. It is akin to learning. Code licenses do not restrict you from reading or learning.

ML training needs to be fair use of copyrighted works, or most machine learning and AI projects will be impossible.



You are assuming that "training" and "learning" in ML means exactly the same thing as "training" and "learning" in humans. It doesn't. The processes are completely different, with only an apparent resemblance.


But the real question is why should fair use include both human and machine type of learning?


No, they aren't completely different. Learning is learning.


> Learning is learning.

No, learning (by ML) is not learning (by humans). It's just a same word used in different context, and doesn't by itself imply that the meaning is the same. The underlying process is completely different. Neural networks, despite their name, don't share anything in common with human brains.

Do you have any other argument besides "the word is the same"?


The concept is also the same.

> Neural networks, despite their name, don't share anything in common with human brains.

They share enough. Current ANNs are different than human neurons, however, the principle of learning they are operating on is very similar.

Machine Learning is learning. You can't look at ImageNet or just about any other model and say otherwise.

I'm informed on how current artificial neural networks work, how they are made, what they are and aren't capable of, etc. Thanks.


The concept is shallowly similar, not nearly the same.

Human learning, at the very least, consists of maintaining internal model of the world around us and integrates all data into a coherent structure. Current neural networks are extremely limited in their capabilities - each model operates on a strict class of data, and is not able to comprehend the world outside of that class of data, and as such cannot be compared to human understanding. CoPilot does not understand how the computer works - it only understands that connections between pieces of text exist.

If you're as informed on how ANNs work as you claim to be, then perhaps you should inform yourself more on how human mind works.


Ok thanks


You're welcome.


Like posts have said, whether training is fair use is not a matter of opinion, it is a matter of law. You can't use an appeal to your authority to make grand statements like this.

Frankly, I don't care that ML/AI _needs_ this to work. That's not my problem. You don't get to circumvent existing agreements (and law) because you believe that ML learning is the same as a human reading a piece of code and then typing it up on the side. Tesla manages just fine by generating their own training data. Other businesses have found partners to acquire data from. The only reason this isn't being immediately addressed is because there is near-zero accountability for license violations in software companies, and ML further obfuscates that.


There isn’t law any law yet.

And yes, it is the same thing as learning.

If you have a robot that learns like a human does … you think it should be illegal for that machine to look at GitHub? To watch a Hollywood movie?


A human that watches a Hollywood movie and then goes on to recreate it frame-by-frame with, idk, everyone has cat ears and go "nah this is all my original creation" is an idiot. A human that watches a Hollywood movie and then goes on to create existing works within the genre, with some homage (say, a specific hat, or a specific framing of a pivotal scene, or a specific lighting choice) to the original movie that inspired them, is learning.


So I think you would agree that the law should address the outputs of learning, rather than the inputs.

It should be illegal to reproduce copyrighted material, but not to “read”, “view”, or “consume” it.

Luckily this is what the law already says.


I think the law should address multiple things, only one of which is the outputs of learning. For example, if both the copycat human and the original director human both watched the movie via stealing it, that's also bad. Especially because copycat human is then going on to create copies of work they never had legal permission to copy! CoPilot effectively cannot tell the difference between an homage and theft.


>If you have a robot that learns like a human does … you think it should be illegal for that machine to look at GitHub? To watch a Hollywood movie?

Yes.


That's absurd, it's like saying you want an army of robot slaves.

Now, wanting to minimize the impact of robotic competition on human wellbeing, understandable. But the means to that end is declining to recognize property rights of those who who try to privatize the commons.


Ideally, I want the "army of robots" prevented from being created at all. But "robot slaves" is close to the second best option.


I agree that training is fair use. I don't agree that when the model spits out verbatim or near-identical copies of copyright code that somehow the copyright is striped or that the usage of it is somehow fair use.

I believe that the vast majority of code that copilot produces is fine. But we have also seen clear examples of copyright violation.

The biggest problem is that it is basically impossible for the user to tell which is which.


I agree there, and I think most rational people would too.

This whole website though and "investigation" is some sort of sensationalism that seems ultimately aimed at fair use training, which is very unfortunate. It seems like it involves an ego trip and popularity-chasing with targeting a megacorp as a means to that end.


Most AI projects being impossible sounds better to me than license violations.


Training by reading others works can be fair use.

But the moment you start reproducing more than a few lines of prose without attribution I guess you are in for a nasty letter.


>ML training needs to be fair use of copyrighted works, or most machine learning and AI projects will be impossible.

Good riddance then.


Microsoft has written and acquired plenty of codebases over the years. Train the AI on that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: