Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Piling on:

Is copilot really a violation of GPLv3 (our license)? How is co-pilot different from someone lifting sections of code? Lifting a few sections of the code is a world different compared to re-distributing the entirety of the source code, or forking the project and replacing 2 or 3 letters from our brand name & then redistributing that.

I think the article needed to go into more detail about how that really is a violation of a license. This seems like a similar argument that was made in court whether the Java APIs themselves could be copyrighted. Can an algorithm be copyrighted or licensed? If someone uses the same algorithm as found on FOSS, have they violated the license of that FOSS?

Then the reaction, instead of pursing litigation, and/or communicating and working with github, the reaction is we should 'cancel' github and move to.. gitlab? Is that even an answer? If we think algorithms are copyright-able, wouldn't have any kind of code search be a violation? Would allowing for any kind of transcription of code be a violation? If so, then seemingly having the source be open would invite this.

I think this gets to the heart of FOSS in some ways. It's closed for privatization, open to the community, and what matters is the software provided. If someone cribs the project to configure a Feign client, or set up a unit tests with DbRider - it's okay! It's the same thing as viewing HTML source code, learning a cool javascript trick by looking at how some website did that trick - is part of the openness.

I wonder then, is the point of FOSS openness only to allow others strictly to view and edit the code for the purpose of contributing back to that exact software product? Or is the openness more than that, and that others are going to use the software in creative and novel ways, and use it for learning and who-knows how else (all pursuant to GPLv3).



> Is copilot really a violation of GPLv3 (our license)? How is co-pilot different from someone lifting sections of code?

If it's reproducing vebratim your code, yes it's a violation. And this is no different from others copying sections of your code.

The license that you chose says that people are allowed to do many things (e.g. copy and modify the code), provided they also fulfil some obligations. If they don't do that, they are violating the license.

Please note that this is not specific to GPLv3 (your chosen license). Other licenses, BSD-3-Clause, for example, also give rights (e.g. to copy and modify) and have their own set of obligations to be fulfilled (e.g. "Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer"). If someone redistributes source code (not the "complete" source code, even a "section of code" to use your words), they have to fulfill the obligations, or they violate the license.


It sounds like Github needs to index the license of where it obtained code, and check the license of a project using co-pilot so that it only suggests GPLv3 code for compatible projects. I'm not sure that it is really enough, enforcement is tough, it could be like one of those "are you 18" checkboxes.

I get the impression that this article and it's reaction is largely a lot of people wanting to cry big brother and mostly just shit on Github & Git. IMO it would be more appropriate to look for solutions and post a public letter to github and/or just pursue actual litigation before advocating FOSS projects spend inordinate amount of effort across the board to vacate Github.


I think the article starts from the position "everyone should respect the software license" and obviously promoting the SFC views.

You are correct that the problem is hard, because adding licensing info to training data and subsequently use it is something not yet accomplished by anyone (who at least has spoken publicly about it). It might be the only way out would be to have different training sets according to license, but then you lose the advantage of scale.

On the other hand, I believe it's a little much to ask SFC to do the cutting-edge research and propose solutions to Microsoft. Let's not forget that, when faced with the problem, their chosen path was "completely disregard licenses (for now), ask legal later". Many legal opinions are that there would not be any issue if copies of the input code were not reproduced verbatim; unfortunately this is not the case.

Looking forward to more exciting developments!


Microsoft could afford to train several models for every significant class of licenses. So if you don't want the model which included e.g. GPL code in its training set, you would opt for the model that was trained on the dataset which didn't include any. It's a bit of a hassle for MS, but that would partially resolve this issue.


Amazon claim their Copilot equivalent knows about license info.


> Is copilot really a violation of GPLv3 (our license)? How is co-pilot different from someone lifting sections of code? Lifting a few sections of the code is a world different compared to re-distributing the entirety of the source code, or forking the project and replacing 2 or 3 letters from our brand name & then redistributing that.

Yes. Many projects will see your license and never copy your code if their licenses is incompatible with yours. Copilot disregards your license and offers your code or derivations of it to anyone who asks the correct questions, breaching your license both in ethical and legal level.

You can't lift a function from a GPL licensed codebase and plant it to a MIT licensed codebase. It's that simple.


I appreciate the response. The ethical level, in my opinion, is reproducing the functioning software and the "business" specific logic. At a specific function level, and particularly for boilerplate code, I don't view that as specific to this (GPLv3 licensed FOSS) project. There are also dozens more examples of the same code being used, I don't think FOSS necessary gets a monopoly on technology.

Which I think creates an interesting debate, where is the line of general technology vs the intent of not allowing private companies to re-package FOSS for their own benefit?

> You can't lift a function from a GPL licensed codebase and plant it to a MIT licensed codebase. It's that simple.

If that function is a 'pad-left' type of function, is there an ethical violation? Or is this similar to learning Javascript by viewing the source code of webpages? At some point, functions are hardly unique in what they do, and there are only so many different ways to write a 'pad-left' function. I mention this question not to refute what you've said (I think I agree there is a likely license violation), but to explore where the line is for the ethics. I mean, an inferred implication could be that software developers stop looking at HTML source in order to learn. If you learn how to write a standard algorithm or function, and then you reproduce that later in private software, that is "lifting code". It's not much different committing that to memory then doing an outright copy-paste of a 2 or 3 liner. This makes me also think to the variety of shell scripts and standard bash'isms, if a single FOSS project uses a 'sort | uniq -c | sort -n', does that mean a MIT codebase has to re-invent a new way to do that?


You're welcome. First of all thanks for your kind and welcoming response.

It's an interesting debate for sure. When a program is divided into many functions, and when there are many utility functions, the debate indeed gets blurry. Moreover, many of the open projects in today's world are web applications and "simple" in nature.

In this context, simple means that the application can be composed via many simple, relatively general functions with a unique asset set and unique way of connecting them, so the magic (or sauce if you pardon the term) gets more abstract.

On the other hand, we have another group of programs, which can be similarly called "complex" programs. The biggest difference is these programs use complex, one of a kind functions.

Consider Blender, KiCAD, many open source and GPL licensed scientific software, libraries like Eigen, video/audio encoders, CD/DVD tools. Even a three liner in these programs and libraries can be a game changer.

I have a research oriented code, and a ~25 line function in this is worthy of its paper. I published a paper, and didn't obfuscate the language and algorithm, so you can implement it if you want.

However, if I open the code of this algorithm with AGPL3, can you say it's too small to be licensed? I don't think so. A more general algorithm can be ruled out as "too simple", but "fast_inverse_square_root" or my function or any other math heavy secret sauce can't be excluded because it's 2-3 lines.

As a result, while this issue needs serious discussion, we need to understand that even a small piece of code can carry a lot of research, knowledge and advantage in itself, regardless of its size.

So, while lifting a left pad from a GPL code can be understandable up to a certain point, getting the secret sauce, improving it and keeping it closed or merging into a similar, but incompatibly licensed software is inexcusable.

At the end of the day, this means we need to defend our GPL codebases, because we can't protect our secret sauce functions without defending the simpler ones.


If source its not subjet to corpyright then independently of it being illegaly leaked, i could unbrand XP's source code (that as I recall was leaked no long back) use it and redistribute it even for profit... I would like to see M$ answer to that.

I think the issue with Java's API was actually terribly explained even by Oracle's lawyers, Google stole that API, first because it already had a large amount of developers familiar with it so they could benefit of cheap code monkeys developing for their platform, added modifications (memory management) which under Java's open source licence (Oracle owns java yet it is still open source) they should have made public for everyone's benefit (in a true Open Source spirit), modified the packaging mechanism (slightly, you know .apk), so instead of you know having a package (jar) that can run on a desktop or rather on any JVM it could only run on the devices with their* OS (or rather dalvik implementation) and guess what, now that there are some options to run those on other environments the "casually" decided to change it again instead of letting people benefit from the rich APK ecosystem somewhere else than Android


> I think the issue with Java's API was actually terribly explained even by Oracle's lawyers…

Of which none of your examples are in any way applicable to the case of google violating the Java copyright.

The Java Specification gives terms and conditions for usage which is what the lawsuit was about and not some laundry list of things which annoys a random internet person. In fact, not annoying your users isn’t even mentioned once in the conditions to call an implementation “Java” strangely enough.

Or I’m completely wrong and oracle doesn’t spend enough on their legal team.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: