Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My view is the copilot is not stealing open source code. It is learning from it just as a human reader would. People's disguste is based on the assimilation of what they thought was a human trait being machine derived from their work.

The copilot service backed by an army of actual humans wouldn’t be a story at all. Nor would anyone be angry, if an individual offered coding skills as a service, and had gone through the exercise of learning great amount to open source software to do so.

No open source license was written with this in mind. Because previously learning was something only humans could do and no one had issue with sharing that knowledge. Until licenses take machine learning use into account I see no problems with Copilot.

Source cannot be open if you restrict any viewing of it.



You aren't allowed to just read code and regurgitate it in order to claim it as your own. That is, just because you memorized this great new novel you read, it doesn't mean you can go and sit down and hammer it out and sell new copies. People go to great lengths to do this sort of things (see: clean room reverse engineering [1]) in order to try and wash themselves of liability.

[1] https://en.wikipedia.org/wiki/Clean_room_design


If the code was purely utilitarian in nature, such as something that was optimized for execution time, there is plenty of precedent stating that the code in question is not covered by copyright.

Do an internet search for “copyright utilitarian” and read up on it if you don’t believe me!

Copyright is about protecting artistic expression which is held in contrast to the useful nature of a work.


Note: In the US, this concept is explicitly in the Copyright Act:

"In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work." (17 USC 102(b) [0]).

See also the "Useful Articles" doctrine. [1]

[0] https://www.law.cornell.edu/uscode/text/17/102

[1] https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...


If you think most people pay any attention to licenses or respect them you better think again. Snippets get copied verbatim with no regard to their source all the time. Licenses have no power and are routinely ignored.


The point is not if the law is actually upheld or not. Its if it is legal or not.


Mmm maybe rephrase that as “depending upon which entity’s copyright was violated”

Surely I don’t need to recite the last 50 years of tech legal precedent and case history for you to see that such a blanket generalization cannot be left unaddressed.

Litigants litigate


> It is learning from it just as a human reader would

I don't see how that invalidates the copyright/license argument. So, instead of just a straight up license violation it's a license violation via plagiarism.

That argument wouldn't hold up even if it was a human that caused the violation. You can't just paraphrase someones licensed work and then lie about looking at and pretend you made it yourself, which is basically what seems to happen with co-pilot, as it doesn't also automatically reproduce the license of the code it reproduces.


> You can't just paraphrase someones licensed work

Yes you can. That's exactly why you paraphrased it instead of copying verbatim.

At the fringes, your transformation may not be enough to overcome the requirements, but that's an exception. Nearly all paraphrasing is legal by default.


I like you


It learns the same way a human does by learning patterns. It is not illegal to comprehend how to accomplish tasks by reading other people's source code.

The arguments against my point always assume perfect memory of everything this model is consumed. This is the plagiarism position. In reality, some patterns are more common than others and generate a code that looks nearly identical. I can’t speak for the reasons for this, as I’m not familiar with all of the methods. However, I don’t assume that is the current working state or intent of Codex.


> It learns the same way a human does by learning patterns. It is not illegal to comprehend how to accomplish tasks by reading other people's source code.

It remains to be seen whether ML is true "learning" in the sense of developing a skill the way a human does over time.

It is however irrelevant to the manner in which this model operates today.


> People's disguste is based on the assimilation of what they thought was a human trait being machine derived from their work.

No, people's disgust is with Microsoft violating their legal privileges.

> The copilot service backed by an army of actual humans wouldn’t be a story at all.

Correct, it would be an open-and-shut lawsuit.


It isn't really learning, if it's just regurgitating whole function bodies. I use Copilot a lot, and definitely see whole functions being spit out, that were presumably written by a person somewhere.


I also use Copilot a lot, and while it does suggest large function bodies, I'm not sure that it's "regurgitating" them (though it could be...I don't know). I suspect that it's seen so many function bodies that are similar that it generates another similar output. Like autocomplete in a word processor has seen so many similar chunks of text that it reproduces them based on past experience. I don't know this as a fact, of course. I'm just reacting to the word "regurgitating."


It regenerates the comments from the Quake 2 fast inverse square root function.


Just did a quick github.com search on that function (with comments) and found around 131 matches. Many without a license. So yes, I believe that it would produce those comments...because it's seen humans repurpose and reuse that code without attribution or license many times.

Definitely an issue...but not as simple as copy and paste.


The fact you don't know is the problem.

I can't use co-pilot because if I am stealing someone else's copyrighted code I'm in trouble from a legal standpoint.


I definitely agree that we should find out. I've used Copilot almost since its inception, and I've seen nothing like a large copied/pasted function. If anything, it's mostly a single/double line autocomplete based on what I would have written anyway.


The Luddite reaction to copilot is very hilarious to me. It seem to be a great way to identify low-talent coders, because who else would possibly feel so threatened by an AI?… Watching HN commenters suddenly become ardent defenders of copyright is quite the sight.


Caring about licenses, fair use, and copyright has been deeply ingrained in the open source and hacker community for literally decades.


Caring about defending peoples right to fair use has been. Which is the exact opposite of the Luddite reaction to copilot.


I see your statement as an inversion of consensus reality. What actual coder would use copilot? A beginner or dabbler.

I predict your attempt at tactically “managing” this copilot scandal will not play well on HN to experienced coders, your Microsoft colleagues chiming in next claiming it boosts their productivity notwithstanding.

Yes, I do indeed suggest astroturfing afoot.


I tried it out, but I don’t use it at all on a day-to-day basis. I have no idea if it boosts productivity or not. I also have no idea how you came up with the claim that it’s only for beginners, I’m presuming you just made this up? All of the people that I’ve noticed talking publicly about their experiences using it are highly experienced engineers.

I just think it’s hilarious how fair use is so widely supported on HN when it comes to music, or videos, or interface names, but all of a sudden is a moral crisis when it appears to threaten the value of HN member’s labor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: