A cohort of prominent authors, including Kai Bird and Jia Tolentino, has launched a lawsuit against Microsoft, alleging the tech giant used nearly 200,000 pirated books to train its Megatron artificial intelligence model. This legal action escalates the ongoing dispute between creative professionals and technology companies regarding the use of copyrighted material in AI development. The authors claim that Microsoft’s AI was built to mimic the style and themes of their works.
The lawsuit, filed in New York federal court, seeks a court order to halt Microsoft’s alleged infringement and demands statutory damages of up to $150,000 for each purportedly misused work. Generative AI models like Megatron rely on vast datasets to learn and produce new content, but the source of that data is now a major point of contention. The authors specifically state that the pirated dataset was instrumental in creating a model that mimics their creative expression.
Microsoft has yet to comment on the allegations. This case follows closely on the heels of other significant copyright disputes in the AI sphere, including rulings involving Anthropic and Meta. These legal battles are shaping the future of how AI companies acquire and utilize data for their advanced models.
The broader landscape of AI copyright lawsuits is expanding rapidly, encompassing various forms of media. Companies like The New York Times and Dow Jones have sued AI firms over their archived content, while major record labels and photography companies are also pursuing legal action. The core argument from tech companies is often “fair use,” claiming their AI creates transformative content.