Unredacted Court Documents Reveal Meta’s Covert Training of AI on Infamous Piracy Database

Staff
By Staff 5 Min Read

This ongoing legal battle between a group of authors and Meta revolves around the tech giant’s alleged use of copyrighted works to train its AI models, specifically focusing on the controversial shadow library, LibGen. The authors are seeking to amend their complaint for a third time, based on newly revealed information obtained during the discovery process, which they argue substantiates their original claims and introduces new grounds for legal action. Central to their argument is the assertion that Meta knowingly utilized pirated materials from LibGen, despite internal awareness of its illicit nature, all the way up to CEO Mark Zuckerberg. This knowledge, the authors contend, invalidates Meta’s defense that the “public availability” of such datasets absolves them of copyright infringement. They accuse Meta of using this defense as a convenient loophole while fully understanding the pirated origins of the data.

The newly unredacted documents paint a picture of Meta’s alleged deep involvement with pirated materials. The authors allege that Meta not only downloaded copyrighted works from LibGen but also actively uploaded, or “seeded,” pirated files containing their work onto torrent sites. This act of seeding, the authors argue, transforms Meta from a mere user of pirated material into a distributor, thereby strengthening their copyright infringement claims. This revelation, brought to light through a Meta corporate representative’s testimony, adds another layer of complexity to the case and further reinforces the authors’ argument for an amended complaint. They assert that this previously undisclosed information warrants the reinstatement of their Digital Millennium Copyright Act (DMCA) violation claim, which was previously dismissed due to what they now claim was insufficient evidence presented at that time.

Meta, however, disputes the authors’ narrative, characterizing their attempt to amend the complaint as a last-ditch effort based on a false and inflammatory premise. They argue that the plaintiffs have been aware of Meta’s use of LibGen and other shadow libraries since July 2024, providing ample opportunity to amend their complaint before the discovery deadline in December 2024. The tech giant insists that it revealed its use of the LibGen dataset in July 2024, though verifying this claim remains challenging due to the confidential nature of much of the discovery materials. The core of Meta’s defense rests on the assertion that the plaintiffs’ knowledge of LibGen’s use negates their justification for a third amended complaint, particularly after the discovery phase has concluded.

The history of the case reveals earlier dismissals and legal maneuvers. In November 2023, Judge Chhabria dismissed some of the authors’ claims, including the DMCA violation, citing insufficient evidence that Meta had removed copyright management information. The newly unredacted documents directly address this prior dismissal, arguing that the uncovered information provides the necessary evidence to support the DMCA claim. This back-and-forth between the two parties highlights the evolving nature of the case as new information surfaces through the discovery process.

The backdrop of this legal battle is LibGen itself—an extensive online archive of books originating from Russia. Known as one of the world’s largest and most controversial shadow libraries, LibGen has a history of legal challenges. A 2015 preliminary injunction aimed at shutting down the site proved ineffective, as its anonymous administrators simply migrated to a new domain. More recently, in September 2024, LibGen was ordered to pay $30 million in damages for copyright infringement, despite the continued anonymity of its operators. This history underscores the complexities and challenges inherent in addressing online copyright infringement, especially when dealing with entities like LibGen, which operate outside traditional legal frameworks.

The judge’s warning to Meta about future redaction requests adds another dimension to the proceedings. Judge Chhabria’s explicit caution against overly broad redaction requests signals a potential shift in the handling of sensitive information in this case. The threat of unsealing all materials if Meta submits another excessively redacted request puts pressure on the tech giant to exercise greater transparency moving forward. This development underscores the court’s commitment to ensuring a fair and open legal process, while acknowledging the delicate balance between protecting confidential information and upholding the public’s right to access court proceedings. The case remains a significant test of copyright law in the age of AI, with potential implications far beyond this specific dispute.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *