Earlier HN thread today on a large chunk of OS code pasted almost verbatim by the CoPilot engine into a project, but stripped of any licensing references.
Within the last few days, another thread on artists who have spent decades developing a unique and valuable style are making parallel complaints about Dall-E/SD/etc., where inputting "Xyz in the style of [Artist]" produces exactly a copy of [Artist]'s unique style, barely distinguishable from the original.
These engines are fairly literally giant collage engines, able to parse language inputs and output a collage of the input works. Maybe some are small snippets so it could be fair use, but they are also evidently capable of outputs of a far larger scope, amounting to wholesale ripoff.
Opting out or not posting on Github or whatever prevents nothing, as stuff is posted everywhere by many, and with code, it's totally legit posting a fork under OS licensing.
Is there a solution analogous to a <NoRobots> flag? How do we verify it? Will there soon be HaveIBeenUsedAsTraining adversarial systems to probe these output engines?
Not sure of the solution, but this seems to rather rapidly overstepping boundaries of creators.
Huh, I wonder if they decided to remove the licenses and other comments from code before training on it. That would almost be necessary to avoid comments ending up inside of code.
Earlier HN thread today on a large chunk of OS code pasted almost verbatim by the CoPilot engine into a project, but stripped of any licensing references.
Within the last few days, another thread on artists who have spent decades developing a unique and valuable style are making parallel complaints about Dall-E/SD/etc., where inputting "Xyz in the style of [Artist]" produces exactly a copy of [Artist]'s unique style, barely distinguishable from the original.
These engines are fairly literally giant collage engines, able to parse language inputs and output a collage of the input works. Maybe some are small snippets so it could be fair use, but they are also evidently capable of outputs of a far larger scope, amounting to wholesale ripoff.
Opting out or not posting on Github or whatever prevents nothing, as stuff is posted everywhere by many, and with code, it's totally legit posting a fork under OS licensing.
Is there a solution analogous to a <NoRobots> flag? How do we verify it? Will there soon be HaveIBeenUsedAsTraining adversarial systems to probe these output engines?
Not sure of the solution, but this seems to rather rapidly overstepping boundaries of creators.