The legal battle over generative AI training data has expanded to include the backbone of regional reporting. A coalition representing nearly 400 local and regional US newspapers has filed a federal lawsuit in Manhattan against OpenAI and Microsoft, claiming their copyrighted work was systematically harvested to build commercial AI products.

A New Front in the Copyright War

While national outlets and high-profile authors have previously challenged AI giants, this case marks a significant shift toward local press. Publishers such as the Arkansas Democrat-Gazette and New York Amsterdam News argue that AI training on local reporting represents a "death knell" for community journalism, which covers the essential civic duties that algorithms cannot replicate.

Allegations of Systematic Scraping

As detailed by TNW Neural, the 55-page complaint alleges that OpenAI and Microsoft "systematically and secretly crawled" hundreds of news sites, including content behind paywalls. The lawsuit claims that the companies didn't just copy the text, but actively stripped away copyright management information—such as author bylines and publication notices—to facilitate the ingestion process.

This removal of ownership data is central to the publishers' argument, claiming it allowed LLMs to "memorize" and reproduce journalism word-for-word without attributing credit or paying for the intellectual property.

Legal Demands and Implications

The lawsuit, led by Richner Communications and attorney Matthew Platkin, seeks statutory and actual damages, as well as a return of profits. By invoking the Digital Millennium Copyright Act, the publishers aim to force AI developers to compensate local newsrooms for the value their reporting adds to the capabilities of ChatGPT and Microsoft Copilot, challenging the notion that public-facing data is free for commercial AI exploitation.