Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They scraped all the code submitted to GitHub and used it to train the AI behind the now-paid Copilot product


In their FAQ [1], GitHub state: GitHub Copilot is powered by Codex, a generative pretrained AI model created by OpenAI. It has been trained on natural language text and source code from publicly available sources, including code in public repositories on GitHub.

Genuine question, has there been any evidence to substantiate your claim?

[1] https://github.com/features/copilot#what-data-has-github-cop...


> including code in public repositories on GitHub.

You just provided the evidence. I was incorrect to use the term "all the code", but they clearly and openly used GitHub code to train Copilot




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: