Programmer-cum-lawyer Matthew Butterick is claiming damages to the tune of $9 billion from Microsoft, GitHub, and OpenAI in a lawsuit against the three companies over copyright issues and the use of open-source code with restrictive licenses to train GitHub Copilot.
Released as a technical preview in mid-2021, GitHub Copilot, Microsoft’s latest attempt at developing a program/code synthesizer, is now turning into a major legal headache.
In June 2021, GitHub assessed that OpenAI’s AI Codex-based text-to-code conversion tool would reproduce only 0.1% of the code. A few months down the line, research by academics from New York University discovered that GitHub Copilot produces buggy code laden with vulnerabilities as much as 40% of the time because of billions of lines of unfiltered open-source code and natural language it was trained on.
However, the class-action lawsuit filed by Matthew Butterick on behalf of “possibly millions of GitHub users” contends the legality of GitHub Copilot not for reproducing code copies or security issues but for violating licensing terms.
GitHub Copilot alleged violations and damages claim
GitHub Copilot was designed to streamline software development by enabling developers with relevant artificial intelligence-generated code suggestions as and when they type the code. Meanwhile, the code it is trained on is licensed under the MIT license, GPL, Boost Software License (BSL-1.0), BSL 2, Eclipse Public License, Mozilla Public License 2.0, the Apache license and others.
Litigants claim that Microsoft, GitHub, and OpenAI ingested and distributed licensed materials (i.e., the training code) without appropriate attribution, copyright notice, or adherence to licensing terms.
As such, the AI pair programmer, viz., a tool that works alongside a programmer that reviews each line of code, didn’t provide the necessary attribution or a copy of the license it was released under. An example cited under the class-action lawsuit includes that of Tim Davis, a Texas A&M computer-science professor. He discovered several times that Copilot reproduced code without attributing it to him, which is required under the license agreement.
“In June 2022, Copilot had 1,200,000 users. If only 1% of users have ever received Output based on Licensed Materials and only once each, Defendants have ‘only’ breached Plaintiffs’ and the Class’s Licenses 12,000 times,” the class-action lawsuit reads.
“However, each time Copilot outputs Licensed Materials without attribution, the copyright notice, or the License Terms it violates the DMCA three times. Thus, even using this extreme underestimate, Copilot has ‘only’ violated the DMCA 36,000 times.”
Suppose that same is applied to all 1.2 million users. In that case, the lawsuit estimates that distributing licensed code without attribution, copyright notice, and license terms, the damages claim is as much as $9 billion.
“If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000,” the litigants stated.
Besides open-source licenses and DMCA (§ 1202, which forbids the removal of copyright-management information), the lawsuit alleges violation of GitHub’s terms of service and privacy policies, the California Consumer Privacy Act (CCPA), and other laws.
The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Civil conspiracy.
– Declaratory relief.
— Alex J. Champandard (@alexjc) November 3, 2022
After Copilot was out of the technical preview in June 2022, GitHub competitor Fossa.com spoke with IP lawyer Kate Downing who concluded that GitHub is not committing copyright infringement if the code that Copilot is trained on is hosted on GitHub. If not, a lack of legal precedence means this would have to play out in a court of law.
Additionally, Downing suggested that any Copilot-generated code/suggestion derivative of training code that comes under the purview of any of the licenses mentioned above would have to be looked at on a case-to-case basis.
See More: What is Dynamic Programming? Working, Algorithms, and Examples
Is GitHub Copilot anti-open-source?
The class-action lawsuit seems to be a pushback against Microsoft, GitHub and OpenAI’s creation that allegedly doesn’t pay the open-source community their due and harms it in the process.
It goes on to delineate Microsoft’s history of dismal licensing practices, actions against open source software, engagement in vaporware, FUD (fear, uncertainty and doubt), and other business practices all the way back to the DOS days.
Microsoft strived to reposition itself as a company that “loves open source” post-Satya Nadella taking over as CEO. The tech giant acquired GitHub for $7.5 billion in 2018, the year it became the largest Git (open source repository) hosting service.
“Future AI products may represent a bold and innovative step forward. GitHub Copilot and OpenAI Codex, however, do not,” the lawsuit adds.
“Defendants have made no attempt to comply with the open-source licenses that are attached to much of their training data. Instead, they have pretended those licenses do not exist, and trained Codex and Copilot to do the same. By simultaneously violating the open-source licenses of tens-of-thousands—possibly millions—of software developers, Defendants have accomplished software piracy on an unprecedented scale.”
Alex Champandard, former AI programmer at Rockstar Games, founder of Creative.ai, and current event director for nucl.ai Conference at AiGameDev.com, stated the litigation to be “a solid piece of work! My assessment is that the defendants, GitHub, Microsoft and OpenAI are in a very bad position…”
Drew DeVault, the creator of Git repository host Sourcehut and several other projects wrote on his blog in June 2022, “GitHub Copilot is a bad idea as designed. It represents a flagrant disregard of FOSS licensing in of itself, and it enables similar disregard — deliberate or otherwise — among its users. I hope they will heed my suggestions, and I hope that my words to the free software community offer some concrete ways to move forward with this problem.”
His suggestions included allowing GitHub users to opt-out their repositories from being used in Copilot’s training or output model, informing users of Copilot of their obligation to attribute the original code developers, eliminating copyleft code unless the output model is treated as free or compensating original developers with a portion of Copilot’s charges.
This also puts into perspective the deep concerns the open-source community may have for Github Copilot.
“The walled garden of Copilot is antithetical—and poisonous—to open source,” writes Butterick. “It’s therefore also a betrayal of everything GitHub stood for before being acquired by Microsoft. If you were born before 2005, you remember that GitHub built its reputation on its goodies for open-source developers and fostering that community. Copilot, by contrast, is the Multiverse-of-Madness inversion of this idea.”
Microsoft’s previous ventures into AI-driven programming include the failed RobustFill, DeepCoder, and Metabob. Copilot is its most successful one yet. In June 2022, Amazon also launched its own AI-pair programming tool CodeWhisperer. Google maintains its presence in the area through its subsidiary Deepmind’s AlphaCode. The only difference between Copilot and AlphaCode is that the former suggests code while the latter generates it from scratch.
Let us know if you enjoyed reading this news on LinkedIn, Twitter, or Facebook. We would love to hear from you!
Image source: Shutterstock